What is AB Test Plan?

AB Test Plan is a free AI-powered tool that helps you plan A/B tests from start to finish. You describe your product and goals, and it generates experiment ideas scored with the ICE framework (Impact, Confidence, Ease), builds structured hypotheses, calculates the required sample size and test duration, and lets you preview control vs variant side by side.

How does the ICE scoring framework work?

ICE scoring rates each experiment idea on three dimensions: Impact (how much will this move the needle, 1-10), Confidence (how sure are you it will work, 1-10), and Ease (how easy is it to implement, 1-10). The total ICE score helps you prioritize which experiments to run first. Higher scores indicate better candidates for testing.

What frameworks does AB Test Plan use to generate experiment ideas?

AB Test Plan uses multiple proven growth and behavioral frameworks: ICE Scoring for prioritization, Reforge Growth Loops (viral, content, paid, sales), Cialdini's 6 Principles of Persuasion (reciprocity, commitment, social proof, authority, liking, scarcity), Fogg Behavior Model (motivation, ability, trigger), Jobs-to-be-Done framework, loss aversion and framing effects, cognitive load theory, and the endowment effect.

How do I calculate the right sample size for an A/B test?

AB Test Plan's built-in calculator determines sample size based on four inputs: your baseline conversion rate, the minimum detectable effect (MDE) you want to measure, your desired statistical significance level (typically 95%), and statistical power (typically 80%). It then tells you exactly how many visitors per variation you need and how many days the test will take based on your daily traffic.

What is a good hypothesis format for A/B testing?

A strong A/B test hypothesis follows the 'If/Then/Because' structure: 'If we [make this specific change] for [this audience/segment], then [this metric] will [increase/decrease] by [estimated amount] because [psychological/behavioral reason].' This format ensures your experiment is specific, measurable, and grounded in behavioral theory rather than guesswork.

Is AB Test Plan free to use?

Yes, AB Test Plan is completely free. You can generate experiment ideas, build hypotheses, calculate sample sizes, and preview variants at no cost. No account or credit card required.

How to Calculate Sample Size for an A/B Test

Getting your sample size right is the single most important statistical decision in any A/B test. Too small and you'll miss real effects. Too large and you're wasting time and traffic. This guide breaks down exactly how to calculate it.

The Sample Size Formula

The sample size per variation for a two-proportion z-test is:

n = (Z_α/2 + Z_β)² × (p₁(1-p₁) + p₂(1-p₂)) / (p₂ - p₁)²

Where:

p₁ = baseline conversion rate (your current rate)
p₂ = expected conversion rate after the change
Z_α/2 = z-score for your significance level (1.96 for 95%)
Z_β = z-score for your statistical power (0.84 for 80%)

Don't worry about memorizing this — tools like AB Test Plan calculate it automatically. But understanding the inputs matters.

The Four Inputs You Need

1. Baseline Conversion Rate

This is your current conversion rate — the metric you're trying to improve. Pull this from your analytics for the last 2-4 weeks.

Common baselines by industry:

E-commerce purchase rate: 1-4%
SaaS free trial signup: 2-8%
Email opt-in: 5-15%
Landing page CTA click: 10-30%

A higher baseline rate means you need fewer visitors to detect a change. A 20% conversion rate is easier to measure shifts in than a 1% rate.

2. Minimum Detectable Effect (MDE)

The MDE is the smallest improvement you want your test to reliably detect. It's expressed as a relative percentage change.

For example: if your baseline is 5% and your MDE is 10%, you're looking to detect a shift from 5.0% to 5.5% (an absolute lift of 0.5 percentage points).

How to choose your MDE:

5-10% MDE: You need to detect small changes. Requires large sample sizes.
10-20% MDE: The sweet spot for most teams. Balances sensitivity with practical test duration.
20-50% MDE: You're only interested in big wins. Faster tests, but you'll miss smaller improvements.

The smaller your MDE, the larger your required sample size. A 5% MDE typically needs 4x more traffic than a 20% MDE.

3. Statistical Significance Level (α)

This controls your false positive rate — the probability of declaring a winner when there's actually no difference. The industry standard is 95% significance (α = 0.05).

Significance Level	False Positive Risk	Use Case
90%	10%	Exploratory tests, low-stakes changes
95%	5%	Industry standard for most A/B tests
99%	1%	High-stakes changes (pricing, checkout)

Higher significance = larger sample size needed. Going from 95% to 99% increases your required sample by roughly 30%.

4. Statistical Power (1 - β)

Power is the probability that your test correctly detects a real effect when one exists. The standard is 80% power (β = 0.20).

80% power: Industry standard. 20% chance of missing a real effect.
90% power: More conservative. Increases sample size by about 30%.
95% power: Very conservative. Rarely used in CRO — the extra traffic cost usually isn't worth it.

Worked Example

Let's say you run an e-commerce site:

Baseline conversion rate: 3%
MDE: 15% relative (detecting a shift from 3.0% to 3.45%)
Significance: 95%
Power: 80%

Result: ~12,300 visitors per variation

With a control and one variant, you need ~24,600 total visitors. At 2,000 daily visitors, that's about 13 days of testing.

Common Mistakes

1. Using absolute instead of relative MDE

A "10% MDE" on a 3% baseline means detecting a change from 3.0% to 3.3%, NOT from 3% to 13%. This confusion leads to massively undersized tests.

2. Peeking at results early

Every time you check results before reaching your sample size, you inflate your false positive rate. If you check daily on a 14-day test, your actual significance may drop from 95% to ~75%. Commit to your sample size upfront.

3. Ending the test when it first hits significance

Statistical significance can fluctuate. A test might show p < 0.05 on day 3, lose it on day 5, and regain it on day 10. Always run to your pre-calculated sample size.

4. Ignoring business cycles

If your traffic or conversion rate varies by day of week, your test should run for complete weeks. A Monday-to-Friday test will produce different results than a full-week test for most consumer businesses.

5. Not accounting for multiple variants

If you're testing 3 variants against a control, you need more traffic than a simple A/B test. Apply a Bonferroni correction or increase your sample size proportionally.

Quick Reference Table

Baseline Rate	MDE 10%	MDE 15%	MDE 20%
1%	147,900	66,100	37,400
3%	48,100	21,500	12,200
5%	28,100	12,600	7,100
10%	13,100	5,900	3,400
20%	5,800	2,600	1,500

Per variation, 95% significance, 80% power

When You Don't Have Enough Traffic

If your calculated test duration exceeds 4-6 weeks, consider:

Increase your MDE — focus on bigger swings only
Test higher-funnel metrics — clicks have higher volume than purchases
Combine similar pages — pool traffic across product pages
Use sequential testing methods — these allow valid early stopping
Run fewer variants — each additional variant dilutes your traffic

Next Step

Use the AB Test Plan sample size calculator to plug in your numbers and get an instant answer — including projected test duration and revenue impact.