SLO (Service Level Objective)

SLOs are internal goals that set the standard for service reliability. While SLAs are external commitments with financial consequences, SLOs are internal targets that engineering teams use to balance reliability work against feature development. An SLO of 99.95% uptime with an SLA of 99.9% means the team has a 0.05% error budget to spend on deployments, experiments, and migrations.

The error budget concept is powerful. If your SLO is 99.95% monthly uptime and you have consumed only 0.01% of your error budget, the team has room for risky deployments and experiments. If the error budget is nearly exhausted, the team should focus on reliability improvements rather than new features.

Google's SRE practices popularized SLOs as a framework for making objective decisions about reliability investment. For AI products, SLOs should cover both traditional metrics (availability, latency) and AI-specific metrics (model accuracy, response quality). An SLO like "95% of AI responses rated helpful by users" creates accountability for model quality alongside infrastructure reliability.

Related Terms

A/B Testing

Feature Flag

MLOps

Model Serving

Semantic Search

CI/CD (Continuous Integration / Continuous Deployment)