GUIDE · CREATIVE TESTING STACKSIndependent rankingUPDATED MAY 11, 2026

Creative Testing Stacks for Meta and TikTok in 2026: DIY vs Smartly vs Framework

Every performance marketing team running Meta and TikTok at scale hits this question sooner or later: how should creative testing actually work? Spreadsheets break once you're past a few dozen ads a month. Meta's built-in tools, like Advantage+ and Automated Rules, help with delivery, but they don't make creative-level calls for you. So teams usually end up choosing between three paths: a DIY stack made from point tools, a heavyweight enterprise platform (Smartly is the obvious example), or a dedicated creative testing system like Framework.

Each route fixes one set of problems and creates another. DIY gives you flexibility, then asks you to manage the operational mess and keep decision-making tight. Smartly brings enterprise-grade automation, with enterprise pricing, and creative still sits outside the platform. Framework combines statistical testing with creative production, but it's sold as an agency engagement rather than self-serve software. This guide walks through all three.

Disclosure: NewForm operates Framework. We wrote this comparison for the build-vs-buy decision, not to dunk on the alternatives. A DIY stack is the right answer for plenty of teams. Smartly is the right answer for others. The honest version is in the table.

A
Alec Velikanov
CTO & Co-Founder, NewForm
Last reviewed May 11, 2026

Quick comparison

#AgencyBest forHQPricing
1Framework (by NewForm)Paid social teams already spending $250K+/month that want statistical creative testing and production under one roof, instead of duct-taping vendors and tools together.New York, NY (NOMAD)$$$
2SmartlyEnterprise paid social teams spending $1M+/month, with in-house creative and ops ready to run self-serve creative automation themselves.Helsinki, Finland$$$$
3DIY Stack: Ad Launcher + Rules + InsightsHands-on performance teams with engineering or ops bandwidth that want full control, full flexibility, and are fine paying the ops tax of stitching the tools and decisions together themselves.Self-assembled (multi-vendor)$ – $$

How we ranked

We scored each stack on seven things that actually change outcomes: (1) total monthly cost including tooling and labor, (2) creative volume the setup can sustain, (3) statistical rigor of scale-or-kill decisions, (4) speed from creative concept to live test, (5) how tightly creative production connects to media buying, (6) team headcount required to run it, and (7) lock-in versus flexibility.

We've used or evaluated each option in the field. The DIY stack numbers below assume one ad launcher (Madgicx, Birch, or similar), Meta's native rules, plus one creative insights tool (Motion or Atria). The Smartly numbers assume a standard enterprise contract with creative automation and Advanced testing. The Framework numbers reflect actual NewForm engagement structure.

Disclosure: NewForm wrote this guide, and Framework is one of the three options compared. We included the strongest fair case for the alternatives, so if you're deciding whether to build or buy, you have the inputs you need.

#1New York, NY (NOMAD)· FOUNDED 2023· 50+ team EMPLOYEESPUBLISHER · TRANSPARENT RANKING

1. Framework (by NewForm)

Best for: Paid social teams already spending $250K+/month that want statistical creative testing and production under one roof, instead of duct-taping vendors and tools together.

Framework is NewForm's proprietary statistical testing engine. It runs structured experiments across hundreds of creative variants per brand each month on Meta and TikTok, then makes scale-or-kill calls at 95% confidence. Losers get auto-killed before they chew through budget. The engine catches roughly 2x more winners than manual buying and finds winners 29% faster than unstructured testing.

You don't buy Framework as self-serve software. You get it through an agency engagement: the testing platform, creative production, and the team running both. That's the real difference versus Smartly (platform-only, with creative and decision discipline on you) or DIY (you assemble and maintain every piece). If you're spending $250K+/month on paid social and want testing run as a system instead of another set of software licenses, Framework is the cleanest pick. If you want self-serve software and already have the creative team and operating discipline, Smartly or a DIY stack makes more sense.

Why choose Framework (by NewForm)

  • Statistical testing and creative production sit inside the same engagement, so you're not stitching tools together or managing SLA gaps between vendors.
  • Every concept gets a defensible scale-or-kill call at 95% confidence, based on statistics instead of gut feel or simple rules.
  • The creative team produces the next round of variants, which closes the insight loop that DIY stacks and SaaS platforms leave to the buyer.
Services
Statistical testing engine (95% confidence scale-or-kill) · Auto-kill of underperforming creative · AI vision + language analysis for creative insights · Winner auto-graduation to scale · Embedded creative production · Cross-platform: Meta, TikTok, Google
Notable clients
ElevenLabs · Binance · Acorns · 18 Birdies · Dub · Flo · Kajabi · Bolt
Pricing
$$$ — agency engagement, typically $25K–$80K/mo (includes platform + creative production)
Result signal
Catches ~2x more winners than manual buying · 29% faster winner identification vs. unstructured testing · catches the 40% of winners Meta's algorithm misses
#2Helsinki, Finland · New York · Singapore · Global· FOUNDED 2013· 700+ employees EMPLOYEES

2. Smartly

Best for: Enterprise paid social teams spending $1M+/month, with in-house creative and ops ready to run self-serve creative automation themselves.

Smartly is the default enterprise pick for ad ops automation. Founded in 2013 in Helsinki, now global with offices in New York and Singapore, it serves enterprise advertisers like eBay, TechStyle, Walmart, Uber, L'Oréal, and Disney/ESPN. The job: dynamic creative automation, asset versioning at scale, and multi-channel publishing on Meta, TikTok, Pinterest, and Snap.

Smartly is strongest when ad ops is the bottleneck. If you're running thousands of ad variants per month across multiple channels and markets, it cuts a lot of manual workflow. The gaps, from a testing perspective, are clear: creative production still happens outside the platform, statistical-confidence scale-or-kill calls still depend on the buyer's discipline instead of the software, and the pricing rules out almost anything below enterprise scale. For teams spending $1M+/month with strong in-house creative and ops, Smartly is the most established option. For mid-market teams, it's hard to justify against a DIY stack or Framework's bundled model.

Why choose Smartly

  • Automates ad ops for very large advertisers dealing with complex creative versioning.
  • Self-serve software lets in-house teams run the testing program without relying on an agency partner.
  • Publishes to Meta, TikTok, Pinterest, and Snap from one platform, cutting ops drag for advertisers using all four.
Services
Dynamic creative automation · Asset versioning at scale · Performance creative insights · Ad operations automation · Multi-channel publishing (Meta, TikTok, Pinterest, Snap)
Notable clients
eBay · TechStyle · Walmart · Uber · L'Oréal · Disney/ESPN
Pricing
$$$$ — enterprise SaaS, $5K–$50K+/mo typical, annual contracts
Result signal
One of the largest enterprise advertising platforms; serves brands spending hundreds of millions annually on paid social
#3Self-assembled (multi-vendor)· FOUNDED · Multi-vendor EMPLOYEES

3. DIY Stack: Ad Launcher + Rules + Insights

Best for: Hands-on performance teams with engineering or ops bandwidth that want full control, full flexibility, and are fine paying the ops tax of stitching the tools and decisions together themselves.

Most performance teams start with the DIY stack before real scale shows up. Madgicx or Birch launch ads and cover basic automated rules. Meta's native Automated Rules can pause on simple thresholds. Motion or Atria tag creative and do the after-the-fact readout. Testing protocol lives in Notion or a Google Sheet. The team runs all of it.

The upside is flexibility and low tooling cost. The cost is ops drag. You own every integration, every call, every slip from the testing protocol. At scale, teams usually either hire a dedicated growth analytics person to keep it running, or the testing discipline slowly falls apart once people get busy. It fits teams with real internal capacity that want flexibility above everything else. It breaks down when the goal was creative strategy, not babysitting platform operations.

Why choose DIY Stack: Ad Launcher + Rules + Insights

  • You control every component, swap vendors when better tools ship, and avoid lock-in to any single platform.
  • This has the lowest direct tooling cost of the three options. Most components run $500–$2,000/month each.
  • Teams with strong internal ops can keep the IP in-house, including testing protocol, naming conventions, and dashboards, instead of parking it inside a vendor’s software.
Services
Ad launcher: Madgicx, Birch, or similar — automates ad set creation and basic rules · Meta Automated Rules — pauses ads on simple thresholds · Creative insights: Motion or Atria — tagging, reporting, post-hoc analysis · Internal protocol: scale-or-kill rules live in Notion, Google Docs, or team memory · Optional: BI tool (Looker, Hex) for custom dashboards
Notable clients
Used widely by emerging DTC, consumer-app, and SaaS teams
Pricing
$ – $$ — total tooling typically $1.5K–$5K/mo, plus internal labor
Result signal
Lowest tooling cost; highest operational and discipline burden

Frequently asked

What buyers ask about performance creative agencies.

What is a creative testing stack, and why would my team need one?
A creative testing stack is the tools and process your performance team uses to decide which ads get more budget and which ones get cut on Meta, TikTok, and YouTube. Most teams start in spreadsheets with loose rules. That works until roughly 30–50 ads a month, then the calls get messy and hard to defend. A real stack gives you structured experiments instead of random launches, statistical discipline on scale-or-kill decisions, plus a feedback loop that gets the learning back into creative production.
Why not let Meta Advantage+ and TikTok ASC pick the winners?
Advantage+ and ASC automate media buying. They don't replace creative testing. They move budget around the creative you've already shipped, but they don't run structured experiments to learn which concepts win. They also don't solve statistical confidence on small early samples or give your team the creative-level insight needed to brief the next production round. Meta's own research has shown the algorithm misses the best-performing creative roughly 40% of the time. That's the gap dedicated testing tools are built to close.
What should a serious creative testing setup cost in 2026?
Tooling usually falls into three bands. A DIY stack typically runs $1,500–$5,000/month across the pieces: ad launcher, creative insights tool, and your existing analytics. Smartly costs $5,000–$50,000+/month depending on spend tier, with annual contracts. Framework comes in as an agency engagement at $25,000–$80,000/month and includes creative production, statistical testing, and the platform. The honest math includes labor. DIY saves on software, but it usually needs one or two dedicated FTEs to run properly.
How much paid social spend do I need before creative testing tooling is worth it?
The rough line is $100,000 to $250,000/month in paid social spend. Somewhere in that range, dedicated creative testing tooling stops being optional. Below it, sample sizes are often too thin to learn much from structured testing, and manual workflow is usually faster. Once you're above $500,000/month, bad creative calls cost more than any tool on this list. Then the question is which stack to use, not whether you need one.
What's the difference between an ad launcher like Madgicx or Birch and a creative testing platform?
An ad launcher handles the mechanics of getting creative live: ad set creation, naming conventions, budget allocation, audience setup. Some also pause ads with simple rules based on CPA or CTR. That's useful plumbing. What they don't do is run structured experiments with statistical confidence, tell you which concepts are actually winning, explain why a format worked, or push that learning back into production. You need launchers. They're just not enough for serious creative testing.
Do creative insights tools like Motion and Atria replace a testing platform?
No. Motion and Atria help you analyze creative performance after the fact through tags, patterns, and reports. They don't decide which ads to scale, kill, or launch next. They're diagnostic tools, not decision tools. The strongest DIY stacks pair an insights tool for analysis with an ad launcher for execution, then add a clear written testing protocol for decisions. The weak spot is obvious if you've run this setup: the protocol lives in a Notion doc, not the software, so drift creeps in.
CLOSINGGet startedREPLY WITHIN 24H

Ready to build your creative intelligence layer?

End of fileNewForm · 2026