What is AB Test Plan?

AB Test Plan is a free AI-powered tool that predicts A/B test outcomes using synthetic persona simulation. Instead of spending weeks of real traffic, you get a prediction in 60 seconds — complete with a Run/Iterate/Kill verdict, persona-by-persona reasoning, and specific iteration suggestions.

How does the A/B test prediction work?

AB Test Plan generates 6 diverse synthetic personas, each with real economic constraints (fixed budgets, time pressure), specific behavioral patterns (skepticism levels, decision styles), and existing workflow investments (switching costs). Each persona independently evaluates your control and variant, then the tool synthesizes their responses into an actionable prediction with a Run, Iterate, or Kill verdict.

Why should I trust synthetic persona predictions?

Unlike generic AI chatbots, AB Test Plan's personas have rigid constraints that force honest trade-offs — like a real person deciding whether to spend their limited budget on your tool vs. keeping their current workflow. The behavioral anchoring methodology is based on Stanford Generative Agent research and forces personas to prioritize rather than agree.

How does ICE scoring work?

ICE scoring rates each experiment idea on three dimensions: Impact (how much will this move the needle, 1-10), Confidence (how sure are you it will work, 1-10), and Ease (how easy is it to implement, 1-10). The total ICE score helps you prioritize which experiments to run first. Higher scores indicate better candidates for testing.

What frameworks does AB Test Plan use?

AB Test Plan uses ICE Scoring for prioritization, Reforge Growth Loops, Cialdini's 6 Principles of Persuasion, Fogg Behavior Model, Jobs-to-be-Done framework, loss aversion, cognitive load theory, behavioral anchoring, and trade-off forcing methodology for realistic persona simulation.

How do I calculate the right sample size for an A/B test?

The built-in calculator determines sample size based on your baseline conversion rate, minimum detectable effect (MDE), statistical significance level (typically 95%), and statistical power (typically 80%). It tells you exactly how many visitors per variation you need and how many days the test will take based on your daily traffic.

Is AB Test Plan free?

Yes, AB Test Plan is completely free. Generate experiment ideas, build hypotheses, calculate sample sizes, preview variants, and run persona predictions at no cost. No account or credit card required.

How long should I run an A/B test?

Run your test until it reaches statistical significance (typically 95% confidence) and has run for at least 1-2 full business cycles (7-14 days minimum). But first, run it through AB Test Plan's prediction simulation to make sure the test is worth running at all — 70-80% of A/B tests lose or are inconclusive.

ICE Scoring Framework: How to Prioritize A/B Test Ideas

You have 20 experiment ideas and bandwidth to run 3 this quarter. How do you decide which ones? The ICE scoring framework gives you a simple, repeatable way to rank your A/B test ideas by expected value.

What Is ICE Scoring?

ICE stands for Impact, Confidence, and Ease. You score each experiment idea on these three dimensions from 1 to 10, then average or multiply them to get a total score.

Impact: How much will this move your target metric if it works?
Confidence: How certain are you that this will produce a measurable result?
Ease: How simple is this to implement and launch?

The framework was popularized by Sean Ellis (GrowthHackers) and is now standard practice at growth teams from Reforge alumni to Y Combinator startups.

How to Score Each Dimension

Impact (1-10)

Impact measures the potential magnitude of the change on your primary metric (conversion rate, revenue, signups, etc.).

Score	Meaning	Example
1-3	Minor improvement	Changing button color
4-6	Moderate improvement	Rewriting hero copy
7-8	Significant improvement	Adding social proof to checkout
9-10	Transformative	Redesigning the entire pricing page

Tip: Anchor your scores to past experiments. If your best-ever test lifted conversions by 30%, that's your 10. Scale everything else relative to that.

Confidence (1-10)

Confidence reflects how sure you are that this experiment will produce the predicted result. This is NOT about whether it will "win" — it's about whether you'll see a measurable effect at all.

Score	Meaning	Evidence
1-3	Gut feeling, no data	"I think users want this"
4-6	Some supporting evidence	Competitor analysis, user feedback
7-8	Strong evidence	Heatmaps, user research, analogous past tests
9-10	Near-certain	Failed past test you're fixing, broken UX element

Tip: Ask yourself "what evidence do I have?" If the answer is "none," your confidence should be 3 or lower regardless of how excited you are about the idea.

Ease (1-10)

Ease captures how quickly and cheaply you can ship this experiment. Consider engineering time, design effort, QA complexity, and any cross-team dependencies.

Score	Meaning	Timeline
1-3	Major effort, multiple teams	2+ weeks of dev work
4-6	Moderate effort, single team	3-5 days
7-8	Light effort	1-2 days, mostly copy/config
9-10	Trivial	A few hours, no code changes

Tip: Be honest about ease. Teams consistently underestimate implementation time. If you're unsure, score it lower.

Calculating the ICE Score

The simplest method is to average the three scores:

ICE = (Impact + Confidence + Ease) / 3

Some teams multiply instead:

ICE = Impact × Confidence × Ease

Multiplication penalizes low scores more harshly — an experiment with Impact=9, Confidence=2, Ease=8 gets 144/1000 instead of 6.3/10. This is useful when you want to aggressively filter out low-confidence ideas.

Scored Example: E-Commerce Checkout

Imagine you're optimizing a checkout page with a 2.8% conversion rate. Here's how you might score five ideas:

Experiment	Impact	Confidence	Ease	ICE (avg)
Add trust badges near payment form	7	7	9	7.7
Simplify to single-page checkout	9	6	3	6.0
Add "30-day money back" guarantee	6	8	9	7.7
Show "X people bought today" counter	5	5	8	6.0
Offer guest checkout option	8	9	5	7.3

In this example, "trust badges" and "money-back guarantee" tie at the top. Both are high-confidence and easy to ship. The single-page checkout has the highest potential impact but scores lower because it's hard to build — you'd tackle that after running the quicker wins.

Common ICE Scoring Mistakes

1. Inflating confidence without evidence

The most common mistake. Teams score everything 7+ on confidence because they're enthusiastic. Force yourself to justify every confidence score above 5 with specific evidence.

2. Scoring in isolation

ICE scores are relative, not absolute. Score all your ideas in a single session so the scale stays consistent. An "8 Impact" should mean the same thing across your entire backlog.

3. Ignoring ease entirely

Some teams focus only on impact and confidence, then get surprised when high-scoring experiments take weeks to ship. Ease is what keeps your experimentation velocity high.

4. Never re-scoring

Your scores should change as you learn. After running an experiment in a related area, update the confidence scores of similar ideas. After a reorg, update ease scores for ideas that now require different teams.

5. Using ICE as the only input

ICE is a prioritization heuristic, not a decision algorithm. It should inform your roadmap alongside strategic priorities, resource constraints, and sequential dependencies between tests.

ICE vs. Other Frameworks

Framework	Best For	Weakness
ICE	Quick scoring of large backlogs	Subjective, no weighting
RICE	Teams that want reach/volume factored in	More complex, requires traffic data
PIE	CRO-specific prioritization	Less widely adopted
MoSCoW	Feature prioritization	Binary, no nuance

ICE wins on simplicity. You can score 30 ideas in 20 minutes. RICE is better when you have precise traffic data and need to factor in reach (how many users see the change).

How to Run an ICE Scoring Session

List all experiment ideas in a spreadsheet or tool
Score Impact first across all ideas (calibrates your scale)
Score Confidence second (forces you to check evidence)
Score Ease last (most objective dimension)
Sort by total ICE score descending
Sanity check the top 5 — does this ranking feel right?
Pick your next 2-3 experiments from the top of the list

Automate It

AB Test Plan generates experiment ideas pre-scored with ICE using AI. Describe your product and goals, and get a ranked backlog in seconds — complete with the behavioral framework each idea leverages.