A/B Test Plan Template (Free, With Examples)
A good A/B test plan answers five questions before you launch: what you believe, how you’ll measure it, how much traffic you need, how long it runs, and what result would make you ship. Copy the template below, fill in the blanks, and you have a documented, reviewable experiment.
A/B TEST PLAN ================================= 1. EXPERIMENT NAME <short, descriptive name> 2. HYPOTHESIS (If / Then / Because) IF we <change> THEN <metric> will <increase/decrease> BECAUSE <user insight / reasoning> 3. PRIMARY METRIC <the one metric that decides the test> 4. GUARDRAIL METRICS <metrics that must NOT get worse, e.g. revenue, refunds, bounce> 5. TARGET AUDIENCE / SEGMENT <who sees this test> 6. VARIATIONS Control: <current experience> Variant: <proposed change> 7. SAMPLE SIZE & DURATION Baseline conversion rate: __% Minimum detectable effect (MDE): __% Significance: 95% Power: 80% Required visitors per variation: ____ Estimated duration: __ days (min 1-2 business cycles) 8. SUCCESS CRITERIA Ship the variant if <primary metric> improves by >= <MDE> at 95% significance with no guardrail regression. 9. RISKS & ROLLBACK <what could go wrong, how you'll roll back> 10. RESULT & DECISION (post-test) Outcome: <won / lost / inconclusive> Decision: <ship / iterate / kill> Learning: <what you now know>
What every A/B test plan needs
Skip any of these and the test gets hard to trust or impossible to act on:
| Section | Why it matters |
|---|---|
| Hypothesis | Forces a falsifiable prediction, not a vague “let’s try this.” |
| Primary metric | One metric decides the test — prevents cherry-picking afterward. |
| Guardrail metrics | Catches a “win” that quietly hurts revenue or retention. |
| Sample size & duration | Decided up front so you don’t stop early and fool yourself. |
| Success criteria | The ship/kill rule, agreed before you see results. |
Example: a filled-in plan
Here is the template applied to a checkout test:
- Name: Add trust badges to checkout
- Hypothesis: If we add payment-security trust badges near the pay button, then checkout completion will increase, because exit-survey data shows payment-safety concerns.
- Primary metric: Checkout completion rate
- Guardrails: Average order value, refund rate
- Baseline / MDE: 3% completion, detect a 10% relative lift
- Sample size: ~51,000 visitors per variation (use the sample size calculator)
- Success criteria: Ship if completion improves ≥10% at 95% significance with no guardrail regression.
How to fill each section well
Write the hypothesis in If/Then/Because form — the structure forces you to name a real user insight. Full walkthrough: how to write an A/B test hypothesis. For sample size and duration, don’t guess; the math depends on your baseline rate and the minimum detectable effect you care about, and tests should run at least one to two full business cycles. Prioritize which planned test to run first with ICE scoring.
Validate the plan before you run it
A plan tells you the test is well-designed — it doesn’t tell you the idea is good. Before committing weeks of traffic, run the variant through an AI prediction: synthetic personas evaluate control vs. variant and return a Run, Iterate, or Kill verdict in about a minute. Run the live test on the ideas that pass.
Build your plan automatically
AB Test Plan generates the hypothesis, metrics, and sample size for you — then predicts the outcome.
Start Free