SLA (Service Level Agreement)
A formal contract between a service provider and customer that defines specific performance guarantees such as uptime percentage and response times, with financial penalties for violations.
SLAs are legally binding commitments that specify the minimum acceptable level of service. A typical SLA might guarantee 99.9% uptime (allowing approximately 8.7 hours of downtime per year), response times under 500ms for 95% of requests, and support response within 4 hours for critical issues. Violations usually trigger service credits or financial penalties.
SLAs should be set slightly below your actual capability to provide a safety margin. If your system achieves 99.95% uptime, an SLA of 99.9% gives room for unexpected incidents without breaching the agreement. The gap between your SLA and your actual performance is your error budget.
For AI-powered products, SLAs require careful consideration. LLM API dependencies introduce latency variability and availability risks outside your control. Teams should architect fallback paths, caching layers, and degraded-mode experiences that maintain SLA compliance even when upstream AI providers experience issues. Your SLA to customers should never be more aggressive than the weakest link in your dependency chain.
Related Terms
A/B Testing
A controlled experiment comparing two or more variants to determine which performs better on a defined metric, using statistical methods to ensure reliable results.
Feature Flag
A software mechanism that enables or disables features at runtime without deploying new code, used for gradual rollouts, A/B testing, and targeting specific user segments.
MLOps
The set of practices combining machine learning, DevOps, and data engineering to reliably deploy, monitor, and maintain ML models in production.
Model Serving
The infrastructure and systems that host trained ML models and handle inference requests in production, optimizing for latency, throughput, and cost.
Semantic Search
Search that understands the meaning and intent behind a query rather than just matching keywords, typically powered by embedding-based similarity comparison.
CI/CD (Continuous Integration / Continuous Deployment)
An automated software practice where code changes are continuously integrated into a shared repository, tested, and deployed to production, reducing manual intervention and accelerating delivery cycles.