ICE Scoring Framework: How to Prioritize A/B Test Ideas
Learn how to use the ICE scoring framework (Impact, Confidence, Ease) to prioritize your A/B test backlog. Includes scoring examples and common pitfalls.
You have 20 experiment ideas and bandwidth to run 3 this quarter. How do you decide which ones? The ICE scoring framework gives you a simple, repeatable way to rank your A/B test ideas by expected value.
What Is ICE Scoring?
ICE stands for Impact, Confidence, and Ease. You score each experiment idea on these three dimensions from 1 to 10, then average or multiply them to get a total score.
- Impact: How much will this move your target metric if it works?
- Confidence: How certain are you that this will produce a measurable result?
- Ease: How simple is this to implement and launch?
The framework was popularized by Sean Ellis (GrowthHackers) and is now standard practice at growth teams from Reforge alumni to Y Combinator startups.
How to Score Each Dimension
Impact (1-10)
Impact measures the potential magnitude of the change on your primary metric (conversion rate, revenue, signups, etc.).
| Score | Meaning | Example |
|---|---|---|
| 1-3 | Minor improvement | Changing button color |
| 4-6 | Moderate improvement | Rewriting hero copy |
| 7-8 | Significant improvement | Adding social proof to checkout |
| 9-10 | Transformative | Redesigning the entire pricing page |
Tip: Anchor your scores to past experiments. If your best-ever test lifted conversions by 30%, that's your 10. Scale everything else relative to that.
Confidence (1-10)
Confidence reflects how sure you are that this experiment will produce the predicted result. This is NOT about whether it will "win" — it's about whether you'll see a measurable effect at all.
| Score | Meaning | Evidence |
|---|---|---|
| 1-3 | Gut feeling, no data | "I think users want this" |
| 4-6 | Some supporting evidence | Competitor analysis, user feedback |
| 7-8 | Strong evidence | Heatmaps, user research, analogous past tests |
| 9-10 | Near-certain | Failed past test you're fixing, broken UX element |
Tip: Ask yourself "what evidence do I have?" If the answer is "none," your confidence should be 3 or lower regardless of how excited you are about the idea.
Ease (1-10)
Ease captures how quickly and cheaply you can ship this experiment. Consider engineering time, design effort, QA complexity, and any cross-team dependencies.
| Score | Meaning | Timeline |
|---|---|---|
| 1-3 | Major effort, multiple teams | 2+ weeks of dev work |
| 4-6 | Moderate effort, single team | 3-5 days |
| 7-8 | Light effort | 1-2 days, mostly copy/config |
| 9-10 | Trivial | A few hours, no code changes |
Tip: Be honest about ease. Teams consistently underestimate implementation time. If you're unsure, score it lower.
Calculating the ICE Score
The simplest method is to average the three scores:
ICE = (Impact + Confidence + Ease) / 3
Some teams multiply instead:
ICE = Impact × Confidence × Ease
Multiplication penalizes low scores more harshly — an experiment with Impact=9, Confidence=2, Ease=8 gets 144/1000 instead of 6.3/10. This is useful when you want to aggressively filter out low-confidence ideas.
Scored Example: E-Commerce Checkout
Imagine you're optimizing a checkout page with a 2.8% conversion rate. Here's how you might score five ideas:
| Experiment | Impact | Confidence | Ease | ICE (avg) |
|---|---|---|---|---|
| Add trust badges near payment form | 7 | 7 | 9 | 7.7 |
| Simplify to single-page checkout | 9 | 6 | 3 | 6.0 |
| Add "30-day money back" guarantee | 6 | 8 | 9 | 7.7 |
| Show "X people bought today" counter | 5 | 5 | 8 | 6.0 |
| Offer guest checkout option | 8 | 9 | 5 | 7.3 |
In this example, "trust badges" and "money-back guarantee" tie at the top. Both are high-confidence and easy to ship. The single-page checkout has the highest potential impact but scores lower because it's hard to build — you'd tackle that after running the quicker wins.
Common ICE Scoring Mistakes
1. Inflating confidence without evidence
The most common mistake. Teams score everything 7+ on confidence because they're enthusiastic. Force yourself to justify every confidence score above 5 with specific evidence.
2. Scoring in isolation
ICE scores are relative, not absolute. Score all your ideas in a single session so the scale stays consistent. An "8 Impact" should mean the same thing across your entire backlog.
3. Ignoring ease entirely
Some teams focus only on impact and confidence, then get surprised when high-scoring experiments take weeks to ship. Ease is what keeps your experimentation velocity high.
4. Never re-scoring
Your scores should change as you learn. After running an experiment in a related area, update the confidence scores of similar ideas. After a reorg, update ease scores for ideas that now require different teams.
5. Using ICE as the only input
ICE is a prioritization heuristic, not a decision algorithm. It should inform your roadmap alongside strategic priorities, resource constraints, and sequential dependencies between tests.
ICE vs. Other Frameworks
| Framework | Best For | Weakness |
|---|---|---|
| ICE | Quick scoring of large backlogs | Subjective, no weighting |
| RICE | Teams that want reach/volume factored in | More complex, requires traffic data |
| PIE | CRO-specific prioritization | Less widely adopted |
| MoSCoW | Feature prioritization | Binary, no nuance |
ICE wins on simplicity. You can score 30 ideas in 20 minutes. RICE is better when you have precise traffic data and need to factor in reach (how many users see the change).
How to Run an ICE Scoring Session
- List all experiment ideas in a spreadsheet or tool
- Score Impact first across all ideas (calibrates your scale)
- Score Confidence second (forces you to check evidence)
- Score Ease last (most objective dimension)
- Sort by total ICE score descending
- Sanity check the top 5 — does this ranking feel right?
- Pick your next 2-3 experiments from the top of the list
Automate It
AB Test Plan generates experiment ideas pre-scored with ICE using AI. Describe your product and goals, and get a ranked backlog in seconds — complete with the behavioral framework each idea leverages.