Key-Value Store
A simple, high-performance database that stores data as key-value pairs, optimized for fast lookups by key with minimal overhead, commonly used for caching, session storage, and feature serving.
Key-value stores offer the simplest data model: a unique key maps to a value. Operations are limited to get, put, and delete by key. This simplicity enables extreme performance, with many key-value stores achieving sub-millisecond read latency and handling millions of operations per second. Redis, DynamoDB, and Memcached are widely used key-value stores.
The trade-off is limited query flexibility. You cannot search by value, perform range queries across arbitrary attributes, or join data. The application must know the exact key to retrieve data. This makes key-value stores ideal for well-known access patterns: caching (key is the cache key), session storage (key is the session ID), configuration (key is the setting name), and rate limiting (key is the client ID).
For AI serving infrastructure, key-value stores are essential for low-latency feature lookup. When a model needs a user's features for real-time inference, the feature store's online layer (typically Redis or DynamoDB) serves precomputed features by user ID in under a millisecond. This speed is critical for keeping model inference within latency budgets when feature retrieval is on the critical path.
Related Terms
Cosine Similarity
A measure of similarity between two vectors based on the cosine of the angle between them, ranging from -1 (opposite) to 1 (identical), commonly used to compare embeddings.
Dimensionality Reduction
Techniques that reduce the number of dimensions in high-dimensional data while preserving meaningful structure, used for visualization, compression, and noise removal.
Batch Inference
Processing multiple ML predictions as a group at scheduled intervals rather than one-at-a-time on demand, optimizing for throughput and cost over latency.
Real-Time Inference
Generating ML predictions on-demand as requests arrive, typically with latency requirements under 200ms for user-facing features.
Data Pipeline
An automated sequence of data processing steps that moves data from source systems through transformations to destination systems, enabling reliable and repeatable data flows across an organization.
ETL (Extract, Transform, Load)
A data integration pattern that extracts data from source systems, transforms it into a structured format suitable for analysis, and loads it into a target data warehouse or database.