Observability
The ability to understand a system's internal state from its external outputs, achieved through the three pillars of metrics, logs, and traces working together to enable effective debugging and monitoring.
Observability goes beyond traditional monitoring. While monitoring tells you when something is broken (alerting on known failure modes), observability lets you investigate why something is broken and discover unknown failure modes. The three pillars are metrics (numerical time-series data like request rates and error counts), logs (structured event records), and traces (request paths through distributed services).
Modern observability platforms like Datadog, Grafana Cloud, and Honeycomb correlate data across all three pillars. When a latency spike appears in metrics, you can drill down to the specific traces that were slow, then examine the logs from those requests to identify the root cause, all within a unified interface.
For AI systems, observability requires additional dimensions: model performance metrics (accuracy, hallucination rates), prompt/completion logging, token usage tracking, embedding quality metrics, and data drift detection. LLM-specific observability tools like LangSmith and Helicone provide these AI-native observability capabilities, complementing traditional infrastructure observability.
Related Terms
A/B Testing
A controlled experiment comparing two or more variants to determine which performs better on a defined metric, using statistical methods to ensure reliable results.
Feature Flag
A software mechanism that enables or disables features at runtime without deploying new code, used for gradual rollouts, A/B testing, and targeting specific user segments.
MLOps
The set of practices combining machine learning, DevOps, and data engineering to reliably deploy, monitor, and maintain ML models in production.
Model Serving
The infrastructure and systems that host trained ML models and handle inference requests in production, optimizing for latency, throughput, and cost.
Semantic Search
Search that understands the meaning and intent behind a query rather than just matching keywords, typically powered by embedding-based similarity comparison.
CI/CD (Continuous Integration / Continuous Deployment)
An automated software practice where code changes are continuously integrated into a shared repository, tested, and deployed to production, reducing manual intervention and accelerating delivery cycles.