A/B testing is the gold standard for understanding causation, not just correlation. Done right, it’s the engine behind compounding growth at the world’s best product companies.

What Is an A/B Test?

An A/B test (also called a split test or controlled experiment) is a randomised experiment where you expose two groups of users to different versions of an experience — a control (A) and a variant (B) — and measure which performs better on a predefined metric.

The Five Steps of a Good Experiment

1. Define Your Hypothesis

A good hypothesis has three parts: observation, proposed change, and expected outcome.

Example: “Because our pricing page has too much text, simplifying it to three bullet points per tier will increase free-to-paid conversion by 15%.“

2. Choose Your Primary Metric

Pick one metric that your test is designed to move. Having multiple primary metrics increases your false positive rate. Secondary metrics are fine to monitor, but shouldn’t drive the decision.

3. Calculate Sample Size

Before running your test, calculate how many users you need per variant to detect your minimum detectable effect (MDE) at your desired statistical power (typically 80%) and significance level (typically 95%).

Running tests before reaching sample size is the single most common mistake teams make.

4. Run the Test

Expose users to variants randomly. Do not look at results until you’ve hit your sample size — this is called “peeking” and inflates false positive rates significantly.

5. Interpret Results Correctly

Statistical significance does not mean practical significance. A result can be statistically significant (p < 0.05) but so small it doesn’t matter for your business. Always look at the effect size alongside the p-value.

Common Mistakes to Avoid

Bayesian vs. Frequentist Testing

Most teams use frequentist testing (p-values, confidence intervals). Bayesian testing is increasingly popular because it lets you incorporate prior knowledge and gives more intuitive output (“there’s a 94% probability that B beats A”).

For most product teams, the choice of method matters less than the discipline of running tests correctly.