Bayesian Optimization
A sequential decision-making framework that uses a probabilistic model of the objective function to efficiently search for the optimal configuration of parameters, balancing exploration of uncertain regions with exploitation of promising areas.
Bayesian optimization is a sophisticated approach to finding optimal parameter settings when each evaluation is expensive, whether in compute time, user traffic, or opportunity cost. Instead of testing a grid of configurations or using gradient-based methods, Bayesian optimization builds a surrogate model (typically a Gaussian process) of the objective function from observed evaluations, then uses this model to decide which configuration to try next. The model provides both a predicted value and an uncertainty estimate for each untested configuration, enabling intelligent exploration. For growth teams, Bayesian optimization is valuable for tuning continuous parameters like notification frequency, recommendation algorithm weights, pricing levels, and ad bidding strategies, where the parameter space is too large for exhaustive A/B testing and the objective function is smooth but unknown.
The Bayesian optimization loop operates in four steps: (1) Fit a surrogate model (Gaussian process, random forest, or neural network) to all observed configuration-outcome pairs. (2) Compute an acquisition function that balances exploration and exploitation, such as Expected Improvement (EI), Upper Confidence Bound (UCB), or Thompson Sampling. EI calculates the expected improvement over the current best observation: EI(x) = E[max(f(x) - f(x_best), 0)], which is analytically tractable under a Gaussian process model. (3) Select the configuration that maximizes the acquisition function as the next point to evaluate. (4) Evaluate the objective at the selected configuration (by running an experiment or computing a metric) and add the result to the observation set. This loop repeats until a budget is exhausted or convergence is reached. Implementations include Ax (from Meta), BoTorch, Optuna, and Spearmint.
Bayesian optimization should be used when the parameter space is continuous and moderately dimensional (typically up to 20 dimensions), each evaluation is expensive (requiring experiment traffic or significant compute), the objective function is expected to be smooth, and a relatively small number of evaluations must suffice. It is particularly well-suited for tuning ML model hyperparameters, optimizing ad campaign parameters, and finding optimal pricing or notification settings. Common pitfalls include using Bayesian optimization for high-dimensional discrete parameter spaces where it offers little advantage over random search, not accounting for the noisy nature of experiment outcomes (Gaussian processes need noise variance estimation), overfitting the surrogate model in early iterations when few observations are available, and treating the optimization result as definitive without validating the optimal configuration in a proper confirmatory experiment.
Advanced Bayesian optimization techniques include multi-task and multi-fidelity optimization, which leverages cheap approximations (low-traffic experiments, simulated environments) to guide the search before running expensive full-scale evaluations. Constrained Bayesian optimization handles settings where configurations must satisfy constraints (e.g., latency must remain below a threshold while maximizing click-through rate). Batch Bayesian optimization selects multiple configurations to evaluate in parallel, which is natural for online experimentation where multiple variants can be tested simultaneously. For growth teams, the integration of Bayesian optimization with experimentation platforms enables continuous parameter tuning: rather than running discrete A/B tests for each parameter value, the system continuously proposes and evaluates configurations, converging on optimal settings much faster than manual iteration.
Related Terms
Adaptive Experiment
An experiment design that modifies its parameters during execution based on accumulating data, including adjusting traffic allocation between variants, dropping underperforming arms, or modifying the sample size, while maintaining statistical validity through appropriate corrections.
Contextual Bandit Experiment
An adaptive experiment that uses user context (features like demographics, behavior history, and session attributes) to personalize which treatment variant each user receives, learning a policy that maps user characteristics to optimal treatments in real time.
Epsilon-Greedy
A simple exploration-exploitation algorithm used in multi-armed bandit experiments that exploits the current best-performing variant with probability (1-epsilon) and explores by randomly selecting any variant with probability epsilon, where epsilon is typically a small value like 0.1.
Multivariate Testing
An experimentation method that simultaneously tests multiple variables and their combinations to determine which combination of changes produces the best outcome, unlike A/B testing which typically varies a single element at a time.
Split Testing
The practice of randomly dividing users into two or more groups and exposing each group to a different version of a product experience to measure which version performs better on a target metric, commonly known as A/B testing.
Holdout Testing
An experimental design where a small percentage of users are permanently excluded from receiving a new feature or set of features, serving as a long-term control group to measure the cumulative impact of product changes over time.