Synthetic Control
A causal inference method that constructs a weighted combination of untreated units to create an artificial control group that closely matches the treated unit's pre-treatment characteristics and trajectory, enabling credible treatment effect estimation when only one or a few units are treated.
The synthetic control method (SCM) addresses a common challenge in quasi-experimental analysis: estimating the impact of a treatment applied to a single unit (a city, a country, a market) when no single untreated unit serves as a good comparison. Instead of relying on one imperfect comparison, SCM constructs a synthetic version of the treated unit by taking a weighted average of multiple untreated units, with weights chosen to minimize the difference between the synthetic and treated unit in pre-treatment outcomes and covariates. For growth and advertising teams, synthetic control is the gold standard for evaluating geo-level interventions like market launches, regional campaigns, or city-level feature rollouts where randomization is infeasible and no single comparison market adequately matches the treated market.
The SCM algorithm works as follows: let Y_1 be the outcome time series for the treated unit and Y_0 be the matrix of outcomes for J untreated donor units over T_0 pre-treatment periods. The method finds weights W = (w_1, ..., w_J) that minimize ||Y_1_pre - Y_0_pre * W||^2 subject to w_j >= 0 and sum(w_j) = 1. The non-negativity and summing-to-one constraints ensure the synthetic control is a convex combination, preventing extrapolation. The treatment effect at each post-treatment time point is estimated as the gap between the treated unit's actual outcome and the synthetic control's predicted outcome. Statistical inference typically uses permutation-based methods: apply the same SCM procedure to each donor unit as if it were treated, generating a distribution of placebo effects. If the treated unit's effect is extreme relative to the placebo distribution, the effect is considered significant. Meta's GeoLift package and Google's CausalImpact implement variants of synthetic control for marketing and advertising applications.
Synthetic control should be used when a treatment is applied to one or a few units at the aggregate level, there is a pool of similar untreated units available as donors, and sufficient pre-treatment data exists to assess the quality of the synthetic match. The key quality diagnostic is the pre-treatment fit: if the synthetic control closely tracks the treated unit's outcomes in the pre-treatment period, it is reasonable to expect it would continue to do so in the absence of treatment. Common pitfalls include poor pre-treatment fit (which undermines the counterfactual), too few donor units (which limits the ability to construct a good match), donor units that are indirectly affected by the treatment (spillover), and overfitting to pre-treatment noise by using too many predictors. Teams should also be cautious about using SCM for short pre-treatment periods or highly volatile time series where the pre-treatment fit may be misleadingly good.
Advanced extensions of synthetic control include the augmented synthetic control method (ASCM) by Ben-Aronow and others, which combines SCM with an outcome model to correct for imperfect pre-treatment fit. The penalized synthetic control uses ridge or elastic net regularization to improve stability when there are many donor units. For multiple treated units, the synthetic difference-in-differences (SDID) method by Arkhangelsky and others combines the strengths of DiD and SCM, using synthetic control-style weighting along both the unit and time dimensions. Bayesian synthetic control methods provide posterior distributions over the treatment effect and naturally quantify uncertainty. In the advertising industry, tools like Meta's GeoLift and Google's Matched Markets use synthetic control principles to design and analyze geo-experiments, automatically selecting treatment and control markets and computing power analysis for geographic experiments.
Related Terms
Difference-in-Differences
A quasi-experimental statistical method that estimates a treatment effect by comparing the change in outcomes over time between a group that receives a treatment and a group that does not, removing biases from time-invariant differences between groups and common time trends.
Pre-Post Analysis
A quasi-experimental method that compares metrics before and after a treatment is applied to the same group, using the pre-treatment period as a baseline to estimate the treatment effect when a randomized control group is not available.
Cluster Randomization
An experimental design that randomly assigns groups (clusters) of users rather than individual users to treatment conditions, used when individual randomization is not feasible or when interference between users within the same cluster would violate independence assumptions.
Multivariate Testing
An experimentation method that simultaneously tests multiple variables and their combinations to determine which combination of changes produces the best outcome, unlike A/B testing which typically varies a single element at a time.
Split Testing
The practice of randomly dividing users into two or more groups and exposing each group to a different version of a product experience to measure which version performs better on a target metric, commonly known as A/B testing.
Holdout Testing
An experimental design where a small percentage of users are permanently excluded from receiving a new feature or set of features, serving as a long-term control group to measure the cumulative impact of product changes over time.