Back to Blog
Statistics5 min read

How to Calculate Sample Size for an A/B Test

Learn the exact formula and inputs needed to calculate the right sample size for your A/B test. Covers baseline rate, MDE, significance, power, and common mistakes.

By AB Test Plan

Getting your sample size right is the single most important statistical decision in any A/B test. Too small and you'll miss real effects. Too large and you're wasting time and traffic. This guide breaks down exactly how to calculate it.

The Sample Size Formula

The sample size per variation for a two-proportion z-test is:

n = (Zα/2 + Zβ)² × (p₁(1-p₁) + p₂(1-p₂)) / (p₂ - p₁)²

Where:

  • p₁ = baseline conversion rate (your current rate)
  • p₂ = expected conversion rate after the change
  • Zα/2 = z-score for your significance level (1.96 for 95%)
  • Zβ = z-score for your statistical power (0.84 for 80%)

Don't worry about memorizing this — tools like AB Test Plan calculate it automatically. But understanding the inputs matters.

The Four Inputs You Need

1. Baseline Conversion Rate

This is your current conversion rate — the metric you're trying to improve. Pull this from your analytics for the last 2-4 weeks.

Common baselines by industry:

  • E-commerce purchase rate: 1-4%
  • SaaS free trial signup: 2-8%
  • Email opt-in: 5-15%
  • Landing page CTA click: 10-30%

A higher baseline rate means you need fewer visitors to detect a change. A 20% conversion rate is easier to measure shifts in than a 1% rate.

2. Minimum Detectable Effect (MDE)

The MDE is the smallest improvement you want your test to reliably detect. It's expressed as a relative percentage change.

For example: if your baseline is 5% and your MDE is 10%, you're looking to detect a shift from 5.0% to 5.5% (an absolute lift of 0.5 percentage points).

How to choose your MDE:

  • 5-10% MDE: You need to detect small changes. Requires large sample sizes.
  • 10-20% MDE: The sweet spot for most teams. Balances sensitivity with practical test duration.
  • 20-50% MDE: You're only interested in big wins. Faster tests, but you'll miss smaller improvements.

The smaller your MDE, the larger your required sample size. A 5% MDE typically needs 4x more traffic than a 20% MDE.

3. Statistical Significance Level (α)

This controls your false positive rate — the probability of declaring a winner when there's actually no difference. The industry standard is 95% significance (α = 0.05).

Significance Level False Positive Risk Use Case
90% 10% Exploratory tests, low-stakes changes
95% 5% Industry standard for most A/B tests
99% 1% High-stakes changes (pricing, checkout)

Higher significance = larger sample size needed. Going from 95% to 99% increases your required sample by roughly 30%.

4. Statistical Power (1 - β)

Power is the probability that your test correctly detects a real effect when one exists. The standard is 80% power (β = 0.20).

  • 80% power: Industry standard. 20% chance of missing a real effect.
  • 90% power: More conservative. Increases sample size by about 30%.
  • 95% power: Very conservative. Rarely used in CRO — the extra traffic cost usually isn't worth it.

Worked Example

Let's say you run an e-commerce site:

  • Baseline conversion rate: 3%
  • MDE: 15% relative (detecting a shift from 3.0% to 3.45%)
  • Significance: 95%
  • Power: 80%

Result: ~12,300 visitors per variation

With a control and one variant, you need ~24,600 total visitors. At 2,000 daily visitors, that's about 13 days of testing.

Common Mistakes

1. Using absolute instead of relative MDE

A "10% MDE" on a 3% baseline means detecting a change from 3.0% to 3.3%, NOT from 3% to 13%. This confusion leads to massively undersized tests.

2. Peeking at results early

Every time you check results before reaching your sample size, you inflate your false positive rate. If you check daily on a 14-day test, your actual significance may drop from 95% to ~75%. Commit to your sample size upfront.

3. Ending the test when it first hits significance

Statistical significance can fluctuate. A test might show p < 0.05 on day 3, lose it on day 5, and regain it on day 10. Always run to your pre-calculated sample size.

4. Ignoring business cycles

If your traffic or conversion rate varies by day of week, your test should run for complete weeks. A Monday-to-Friday test will produce different results than a full-week test for most consumer businesses.

5. Not accounting for multiple variants

If you're testing 3 variants against a control, you need more traffic than a simple A/B test. Apply a Bonferroni correction or increase your sample size proportionally.

Quick Reference Table

Baseline Rate MDE 10% MDE 15% MDE 20%
1% 147,900 66,100 37,400
3% 48,100 21,500 12,200
5% 28,100 12,600 7,100
10% 13,100 5,900 3,400
20% 5,800 2,600 1,500

Per variation, 95% significance, 80% power

When You Don't Have Enough Traffic

If your calculated test duration exceeds 4-6 weeks, consider:

  1. Increase your MDE — focus on bigger swings only
  2. Test higher-funnel metrics — clicks have higher volume than purchases
  3. Combine similar pages — pool traffic across product pages
  4. Use sequential testing methods — these allow valid early stopping
  5. Run fewer variants — each additional variant dilutes your traffic

Next Step

Use the AB Test Plan sample size calculator to plug in your numbers and get an instant answer — including projected test duration and revenue impact.

sample sizestatistical significancepower analysisA/B testing math

Ready to plan your next A/B test?

Use AI to generate experiment ideas, build hypotheses, and calculate sample sizes.

Start Planning — Free