Intention-to-Treat

An analysis principle that evaluates experiment results based on the original random assignment of users to treatment groups, regardless of whether they actually received or engaged with the treatment, preserving the validity of randomization.

Intention-to-treat (ITT) analysis is the gold standard analysis approach for randomized experiments. It compares outcomes based on which group users were assigned to, not which treatment they actually received. If a user was assigned to the new onboarding flow but never logged in during the experiment period, they are still counted in the treatment group. If a user assigned to control accidentally received the treatment due to a bug, they remain in the control group. For growth teams, ITT analysis preserves the causal validity of the experiment by maintaining the randomized group composition. Any departure from analyzing by assignment, such as excluding users who did not engage or switching users between groups based on their actual experience, can introduce selection bias that invalidates the experiment.

The ITT estimator is simply the difference in average outcomes between the assigned treatment group and the assigned control group: tau_ITT = E[Y | Assigned to Treatment] - E[Y | Assigned to Control]. This is an unbiased estimate of the effect of being assigned to treatment, which in the presence of non-compliance (users not receiving their assigned treatment) will generally be a diluted version of the effect of actually receiving the treatment. The relationship between ITT and the per-protocol effect is formalized in the complier average causal effect (CACE or LATE), estimated using instrumental variables: CACE = ITT / compliance_rate. If 80% of users assigned to treatment actually received it, the CACE is the ITT divided by 0.8, which represents the undiluted effect for users who complied with their assignment. This adjustment is valid under the exclusion restriction (assignment affects outcomes only through treatment receipt) and monotonicity (no one does the opposite of their assignment) assumptions.

ITT should be the primary analysis for every randomized experiment. It provides a conservative, unbiased estimate of the policy-relevant question: what happens when you roll out this change? Because not all users will engage with any feature, the ITT reflects the realistic impact on the entire eligible population, not just the subset that engages. Common pitfalls include inappropriately excluding non-compliers to inflate the apparent effect size, conditioning on post-randomization variables (like feature usage) which destroys randomization, and confusing ITT with per-protocol estimates in reporting. Teams should report the ITT as the primary result and, if desired, the CACE as a secondary analysis with clear caveats about the additional assumptions required.

Advanced considerations include principal stratification, which extends the CACE framework to analyze treatment effects for different compliance types (always-takers, never-takers, compliers, defiers) when the treatment is not a simple binary. For experiments where the treatment is an algorithm change that affects different users to different degrees, the ITT can be augmented with triggered analysis that restricts to users who were actually exposed to the differing algorithm behavior, while maintaining ITT as the primary analysis. In multi-sided marketplace experiments, ITT becomes more complex because a user's assignment may affect other users' outcomes through interference. Some organizations adopt modified ITT approaches that handle common practical issues like excluding users who were randomized but never activated the app, though these modifications should be pre-specified and justified.

Related Terms

Per-Protocol Analysis

An analysis approach that evaluates experiment results based on which treatment users actually received rather than their original random assignment, providing an estimate of the treatment effect among compliant users but potentially introducing selection bias.

Triggered Analysis

An analysis technique that restricts experiment evaluation to users who actually encountered or were exposed to the experimental change, reducing noise from unaffected users while maintaining the validity of the randomization through careful implementation.

Split Testing

The practice of randomly dividing users into two or more groups and exposing each group to a different version of a product experience to measure which version performs better on a target metric, commonly known as A/B testing.

Multivariate Testing

An experimentation method that simultaneously tests multiple variables and their combinations to determine which combination of changes produces the best outcome, unlike A/B testing which typically varies a single element at a time.

Holdout Testing

An experimental design where a small percentage of users are permanently excluded from receiving a new feature or set of features, serving as a long-term control group to measure the cumulative impact of product changes over time.

Power Analysis

A statistical calculation performed before an experiment to determine the minimum sample size required to detect a meaningful effect with a specified probability, balancing the risk of false negatives against practical constraints like traffic and experiment duration.