AI Engineering

AI Engineering Glossary

The tools and practices for building, deploying, and operating AI systems in production — MLOps, model serving, feature flags, and experimentation.

A/B Testing

A controlled experiment comparing two or more variants to determine which performs better on a defined metric, using statistical methods to ensure reliable results.

API Gateway

A server that acts as a single entry point for client requests, handling cross-cutting concerns like authentication, rate limiting, request routing, and protocol translation before forwarding to backend services.

API Versioning

The practice of managing changes to an API over time by maintaining multiple versions simultaneously, ensuring existing clients are not broken when the API evolves.

Back-Pressure

A flow control mechanism where a system signals upstream components to slow down when it cannot process incoming data fast enough, preventing resource exhaustion and maintaining system stability.

Blue-Green Deployment

A release strategy that runs two identical production environments, switching traffic from the current version (blue) to the new version (green) once it passes validation, enabling instant rollback.

Canary Release

A deployment strategy that gradually rolls out changes to a small subset of users before expanding to the full population, allowing teams to detect issues early with minimal blast radius.

CAP Theorem

A fundamental distributed systems principle stating that a system can guarantee at most two of three properties simultaneously: Consistency, Availability, and Partition tolerance.

Chaos Engineering

The discipline of experimenting on a system by intentionally injecting failures to uncover weaknesses and build confidence that the system can withstand turbulent real-world conditions.

CI/CD (Continuous Integration / Continuous Deployment)

An automated software practice where code changes are continuously integrated into a shared repository, tested, and deployed to production, reducing manual intervention and accelerating delivery cycles.

Circuit Breaker

A resilience pattern that monitors for failures in downstream service calls and temporarily stops making requests when a failure threshold is exceeded, preventing cascade failures across the system.

Code Review

The systematic practice of having team members examine each other's code changes before merging, catching bugs, enforcing standards, sharing knowledge, and improving overall code quality.

Cold Start

The initial latency spike that occurs when a serverless function, container, or service instance is invoked after a period of inactivity and must initialize its runtime environment before processing the request.

Continuous Delivery

A software engineering practice where code changes are automatically built, tested, and prepared for production release, ensuring the codebase is always in a deployable state with releases requiring only a manual approval step.

CQRS (Command Query Responsibility Segregation)

An architectural pattern that separates read and write operations into different models, allowing each to be independently optimized, scaled, and evolved for its specific access patterns.

Database Sharding

A horizontal scaling technique that partitions data across multiple database instances based on a shard key, distributing load and enabling datasets to grow beyond single-server limits.

Distributed Systems

Systems composed of multiple networked computers that coordinate to achieve a common goal, appearing to end users as a single coherent system despite operating across many nodes.

Distributed Tracing

A method of tracking requests as they flow through multiple services in a distributed system, recording timing and metadata at each step to enable end-to-end performance analysis and debugging.

Docker

A platform for packaging applications and their dependencies into lightweight, portable containers that run consistently across any environment, from development to production.

Edge Functions

Serverless functions that execute at CDN edge locations geographically close to the user, reducing latency by processing requests near the point of origin rather than in a central data center.

Event-Driven Architecture

A software design pattern where services communicate by producing and consuming events asynchronously, enabling loose coupling, real-time reactivity, and scalable data processing.

Feature Flag

A software mechanism that enables or disables features at runtime without deploying new code, used for gradual rollouts, A/B testing, and targeting specific user segments.

GitOps

An operational framework where Git repositories serve as the single source of truth for infrastructure and application configuration, with automated agents ensuring the live environment matches the declared state.

Graceful Degradation

A system design approach where functionality progressively reduces in quality rather than failing completely when components malfunction, maintaining core functionality even under adverse conditions.

GraphQL

A query language and runtime for APIs that lets clients request exactly the data they need in a single request, reducing over-fetching and under-fetching compared to REST.

gRPC

A high-performance, open-source RPC framework that uses Protocol Buffers for serialization and HTTP/2 for transport, enabling efficient service-to-service communication with strong typing.

Health Check

An endpoint or probe that reports whether a service instance is functioning correctly, used by load balancers, orchestrators, and monitoring systems to detect and route around unhealthy instances.

Idempotency

A property of an operation where performing it multiple times produces the same result as performing it once, critical for building reliable systems that handle retries and network failures gracefully.

Immutable Infrastructure

An approach where infrastructure components are never modified after deployment; instead, updates are made by replacing entire instances with new ones built from a common base image.

Infrastructure as Code (IaC)

The practice of managing and provisioning computing infrastructure through machine-readable configuration files rather than manual processes, enabling version control, repeatability, and automation.

Kubernetes

An open-source container orchestration platform that automates the deployment, scaling, and management of containerized applications across clusters of machines.

Latency Percentiles

Statistical measures of response time distribution, where p50 represents the median latency and p99 represents the latency experienced by the slowest 1% of requests, revealing tail performance that averages hide.

Load Balancer

A network component that distributes incoming traffic across multiple backend servers to maximize throughput, minimize response time, and ensure no single server is overwhelmed.

Message Queue

An asynchronous communication mechanism where producers send messages to a queue and consumers process them independently, decoupling system components and absorbing traffic spikes.

Microservices

An architectural pattern that structures an application as a collection of loosely coupled, independently deployable services, each responsible for a specific business capability.

MLOps

The set of practices combining machine learning, DevOps, and data engineering to reliably deploy, monitor, and maintain ML models in production.

Model Serving

The infrastructure and systems that host trained ML models and handle inference requests in production, optimizing for latency, throughput, and cost.

Monorepo

A version control strategy where multiple projects, libraries, and services are stored in a single repository, enabling atomic cross-project changes, shared tooling, and simplified dependency management.

Observability

The ability to understand a system's internal state from its external outputs, achieved through the three pillars of metrics, logs, and traces working together to enable effective debugging and monitoring.

Pair Programming

A development technique where two programmers work together at one workstation, with one writing code (driver) and the other reviewing in real time (navigator), continuously collaborating on design and implementation.

Pub/Sub (Publish-Subscribe)

A messaging pattern where publishers broadcast messages to topics without knowledge of subscribers, and subscribers receive messages from topics they have registered interest in, enabling loose coupling.

Rate Limiting

A technique for controlling the number of requests a client can make to an API within a given time window, protecting services from abuse, overload, and ensuring fair resource allocation.

Rolling Deployment

A deployment strategy that gradually replaces instances of the previous application version with the new version, maintaining availability by ensuring some instances are always running throughout the process.

Saga Pattern

A design pattern for managing distributed transactions across multiple services by breaking them into a sequence of local transactions with compensating actions for rollback on failure.

Semantic Search

Search that understands the meaning and intent behind a query rather than just matching keywords, typically powered by embedding-based similarity comparison.

Serverless

A cloud computing model where the provider dynamically manages server allocation and scaling, charging only for actual compute time used rather than provisioned capacity.

Service Mesh

A dedicated infrastructure layer that handles service-to-service communication in a microservices architecture, providing traffic management, security, and observability without changing application code.

SLA (Service Level Agreement)

A formal contract between a service provider and customer that defines specific performance guarantees such as uptime percentage and response times, with financial penalties for violations.

SLI (Service Level Indicator)

A quantitative metric that measures the level of service being provided, serving as the concrete measurement against which SLOs are evaluated, such as request latency or error rate.

SLO (Service Level Objective)

An internal reliability target for a service that defines the desired level of performance, typically more stringent than the external SLA, used to guide engineering priorities and error budget decisions.

Technical Debt

The accumulated cost of shortcuts, workarounds, and suboptimal design decisions in a codebase that make future development slower and riskier, analogous to financial debt that accrues interest over time.

Test-Driven Development (TDD)

A software development approach where tests are written before the implementation code, following a red-green-refactor cycle that ensures every feature has automated test coverage from the start.

Trunk-Based Development

A source control branching model where developers integrate small, frequent changes directly into the main branch, avoiding long-lived feature branches and reducing merge conflicts.

Twelve-Factor App

A methodology of twelve best practices for building modern, scalable, and maintainable software-as-a-service applications that are portable across execution environments and suitable for cloud deployment.

WebSocket

A communication protocol providing full-duplex, persistent connections between client and server over a single TCP connection, enabling real-time bidirectional data exchange.

Browse other categories

AI Fundamentals Growth Metrics & Strategy Data & Pipelines AI Agents SEO & Search Marketing & CRO AdTech & Programmatic Personalization & Recommendations Analytics & Measurement Product Management Infrastructure & DevOps Experimentation & Causal Inference Product & Ad Testing