Back to glossary

Experiment Analysis Plan

A pre-registered document specifying the hypothesis, primary and secondary metrics, statistical methods, sample size, analysis timeline, and decision criteria for an experiment, written before the experiment launches to prevent post-hoc rationalization and p-hacking.

An experiment analysis plan (EAP) is the pre-commitment device that ensures experiment rigor by documenting all analysis decisions before any data is observed. By specifying the hypothesis, metrics, statistical methods, and decision criteria in advance, the EAP prevents the garden of forking paths problem where analysts make data-dependent choices that inflate false positive rates. For growth teams, EAPs are the single most important process improvement for experiment quality because they eliminate the most common source of invalid results: post-hoc analysis decisions that are unconsciously influenced by the desire to find a significant result.

A comprehensive EAP includes the following components: (1) Hypothesis: a clear, falsifiable statement of what change is expected and why. (2) Primary metric: the single metric that determines the ship/no-ship decision, with its baseline value and expected variance. (3) Secondary metrics: additional metrics that provide context and understanding but do not determine the decision. (4) Guardrail metrics: metrics that must not degrade, with specified thresholds. (5) Randomization design: the randomization unit, allocation ratio, and any stratification. (6) Sample size and duration: based on power analysis for the primary metric at the specified MDE. (7) Statistical method: the test to be used (z-test, t-test, sequential, Bayesian), the significance level, and whether one-sided or two-sided. (8) Multiple comparison strategy: how corrections will be applied for secondary metrics and subgroup analyses. (9) Pre-specified subgroups: any planned segment analyses with their rationale. (10) Decision criteria: what results would lead to shipping, iterating, or abandoning.

Every experiment should have an EAP, with the depth of detail proportional to the experiment's stakes. For low-risk UI experiments, a brief template-based EAP with primary metric, sample size, and significance level may suffice. For high-stakes experiments like pricing changes or major feature launches, a detailed EAP reviewed by the experiment review board is essential. Common pitfalls include writing the EAP after seeing preliminary results (defeating its purpose), specifying the EAP so vaguely that it does not actually constrain the analysis, including so many pre-specified subgroups that the analysis becomes a multiple testing nightmare, and not following the EAP when the results are surprising. The EAP should be stored in the experiment documentation system with a timestamp proving it was created before the experiment launched.

Advanced EAP practices include registering EAPs in a version-controlled system that provides cryptographic timestamps, incorporating decision trees that specify actions for different outcome scenarios (e.g., if primary metric is significant but guardrail is degraded, escalate to review board), specifying sensitivity analyses that will be conducted to assess robustness (e.g., analyzing with and without outlier removal), and defining a process for EAP amendments if the experiment encounters unexpected issues (like an SRM) that require analysis modifications. Some organizations use templated EAPs that auto-populate from the experimentation platform's configuration, reducing manual effort while ensuring completeness. The clinical trials community has decades of experience with pre-registration through ClinicalTrials.gov, and digital experimentation teams can learn from their practices, including the distinction between protocol amendments (legitimate changes documented transparently) and protocol deviations (unplanned departures that must be reported).

Related Terms

Experiment Review Board

A cross-functional governance body that reviews experiment designs before launch and results before ship decisions, ensuring statistical rigor, alignment with organizational metrics, and prevention of common methodological errors.

Experiment Documentation

The systematic recording of experiment hypotheses, designs, configurations, results, and learnings in a structured, searchable format that preserves institutional knowledge and enables evidence-based decision-making across the organization.

Growth Experimentation Framework

A structured organizational process for systematically generating, prioritizing, running, and learning from experiments across the entire user lifecycle, designed to maximize the rate of validated learning and compound the impact of product improvements.

Multivariate Testing

An experimentation method that simultaneously tests multiple variables and their combinations to determine which combination of changes produces the best outcome, unlike A/B testing which typically varies a single element at a time.

Split Testing

The practice of randomly dividing users into two or more groups and exposing each group to a different version of a product experience to measure which version performs better on a target metric, commonly known as A/B testing.

Holdout Testing

An experimental design where a small percentage of users are permanently excluded from receiving a new feature or set of features, serving as a long-term control group to measure the cumulative impact of product changes over time.