What is AB Test Plan?

AB Test Plan is a free AI-powered tool that predicts A/B test outcomes using synthetic persona simulation. Instead of spending weeks of real traffic, you get a prediction in 60 seconds — complete with a Run/Iterate/Kill verdict, persona-by-persona reasoning, and specific iteration suggestions.

How does the A/B test prediction work?

AB Test Plan generates 6 diverse synthetic personas, each with real economic constraints (fixed budgets, time pressure), specific behavioral patterns (skepticism levels, decision styles), and existing workflow investments (switching costs). Each persona independently evaluates your control and variant, then the tool synthesizes their responses into an actionable prediction with a Run, Iterate, or Kill verdict.

Why should I trust synthetic persona predictions?

Unlike generic AI chatbots, AB Test Plan's personas have rigid constraints that force honest trade-offs — like a real person deciding whether to spend their limited budget on your tool vs. keeping their current workflow. The behavioral anchoring methodology is based on Stanford Generative Agent research and forces personas to prioritize rather than agree.

How does ICE scoring work?

ICE scoring rates each experiment idea on three dimensions: Impact (how much will this move the needle, 1-10), Confidence (how sure are you it will work, 1-10), and Ease (how easy is it to implement, 1-10). The total ICE score helps you prioritize which experiments to run first. Higher scores indicate better candidates for testing.

What frameworks does AB Test Plan use?

AB Test Plan uses ICE Scoring for prioritization, Reforge Growth Loops, Cialdini's 6 Principles of Persuasion, Fogg Behavior Model, Jobs-to-be-Done framework, loss aversion, cognitive load theory, behavioral anchoring, and trade-off forcing methodology for realistic persona simulation.

How do I calculate the right sample size for an A/B test?

The built-in calculator determines sample size based on your baseline conversion rate, minimum detectable effect (MDE), statistical significance level (typically 95%), and statistical power (typically 80%). It tells you exactly how many visitors per variation you need and how many days the test will take based on your daily traffic.

Is AB Test Plan free?

Yes, AB Test Plan is completely free. Generate experiment ideas, build hypotheses, calculate sample sizes, preview variants, and run persona predictions at no cost. No account or credit card required.

How long should I run an A/B test?

Run your test until it reaches statistical significance (typically 95% confidence) and has run for at least 1-2 full business cycles (7-14 days minimum). But first, run it through AB Test Plan's prediction simulation to make sure the test is worth running at all — 70-80% of A/B tests lose or are inconclusive.

Minimum Detectable Effect (MDE) Explained for A/B Testing

The minimum detectable effect (MDE) is the smallest improvement your A/B test is designed to reliably detect. Get it wrong and you'll either run tests for months or miss real wins. This guide explains how to choose the right MDE for your situation.

What Is MDE?

MDE answers the question: "How small of a change do I care about detecting?"

It's expressed as a relative percentage change from your baseline. If your conversion rate is 4% and your MDE is 10%, you're designing your test to detect a shift from 4.0% to 4.4% (or from 4.0% to 3.6% on the downside).

The key word is designed. A test with a 10% MDE can detect a 30% improvement — it just also has the statistical power to catch changes as small as 10%. A test with a 30% MDE would miss that same 10% improvement entirely.

Why MDE Matters

MDE is the single biggest lever on your test duration:

MDE	Sample Size (per variation)	Duration at 1k/day
5%	~63,000	126 days
10%	~16,000	32 days
15%	~7,200	15 days
20%	~4,100	9 days
30%	~1,900	4 days*

Based on 3% baseline, 95% significance, 80% power. The 4-day test would still need to run 7+ days for business cycle coverage.

Halving your MDE roughly quadruples your required sample size. This is because sample size scales with the inverse square of the effect size.

Relative vs. Absolute MDE

This is the most common source of confusion:

Relative MDE (10%): A 10% relative improvement on a 5% baseline = detecting a shift from 5.0% to 5.5%
Absolute MDE (0.5pp): A 0.5 percentage point improvement = detecting a shift from 5.0% to 5.5%

These describe the same thing, but different tools use different conventions. Always check which one your tool expects. Most modern A/B test calculators use relative MDE.

How to Choose Your MDE

The business value approach

Calculate the revenue impact of different effect sizes and find the threshold where the improvement pays for the cost of running the test.

Example: You're testing a checkout page.

10,000 daily visitors, 3% conversion rate, $50 AOV
Current daily revenue: $15,000

Relative MDE	New Rate	Extra Revenue/Month	Worth Detecting?
5%	3.15%	$2,250	Yes, significant revenue
10%	3.30%	$4,500	Yes, definitely
20%	3.60%	$9,000	Yes, big win
50%	4.50%	$22,500	Unlikely but transformative

If a 5% improvement would add $2,250/month ($27k/year), it's worth the longer test. If you're a smaller business where 5% means $50/month, you probably only care about 20%+ effects.

The practical approach

Match your MDE to your experimentation capacity:

High traffic (50k+/day): Use 5-10% MDE. You can afford sensitivity.
Medium traffic (5k-50k/day): Use 10-20% MDE. The sweet spot.
Low traffic (under 5k/day): Use 20-30% MDE. Focus on big swings.

The historical approach

Look at your past A/B test results:

If most winning tests show 15-25% lifts, a 10% MDE captures them comfortably
If most winning tests show 5-10% lifts, you need a 5% MDE to catch them
If you've never measured past effect sizes, start with 15% MDE and adjust

The MDE Trade-Off

Choosing an MDE is a trade-off between sensitivity and speed:

Small MDE (5-10%)

Catches subtle improvements
Requires large sample sizes (long tests)
Best for high-traffic sites and high-stakes pages
Risk: tests take so long that the business context changes

Large MDE (20-30%)

Only catches big improvements
Requires small sample sizes (fast tests)
Best for low-traffic sites and early-stage experiments
Risk: you miss real but modest improvements, leading to the false conclusion that "nothing works"

Medium MDE (10-20%)

Balanced sensitivity and speed
Where most teams should operate
Tests typically run 2-4 weeks

Common Mistakes

Setting MDE too low for your traffic

If a 5% MDE means a 6-month test, the test is impractical. The business context will change, code will be refactored, and the results won't be actionable. Better to use a 15% MDE and run 3 tests in that time.

Confusing MDE with expected effect

MDE is not your prediction of the outcome. It's the minimum you want to be able to detect. Your actual effect might be larger (great — the test will detect it even faster) or smaller (the test won't detect it, and that's by design — you decided effects that small aren't worth shipping).

Using the same MDE for all tests

Different tests warrant different sensitivity levels:

Pricing page: Low MDE (5-10%). Revenue impact per visitor is high.
Blog post layout: High MDE (20-30%). Revenue impact per visitor is low.
Checkout flow: Low MDE (10-15%). Every percentage point matters.
Homepage hero: Medium MDE (15-20%). Important but often hard to move.

Ignoring MDE when interpreting results

If your test ends with a measured effect of +3% but your MDE was 15%, the test was not designed to detect a 3% effect. The result is inconclusive — not negative. You can't conclude the change doesn't work; you can only conclude it doesn't produce an effect larger than 15%.

Calculate Your MDE

Use the AB Test Plan calculator to see how different MDE choices affect your test duration and projected impact. Plug in your baseline rate and daily traffic, then adjust the MDE slider to find the right balance for your situation.