Feature Store
A centralized repository for storing, managing, and serving machine learning features, ensuring consistency between the features used during model training and those served during real-time inference.
Feature stores solve the training-serving skew problem. When a model is trained on features computed in a batch pipeline using Python and Spark, but served features computed in a real-time API using different code, subtle differences in computation logic can degrade model performance. A feature store provides a single source of truth for feature definitions and values, used by both training and serving paths.
Key capabilities include feature registration (defining features with metadata and transformation logic), offline storage (historical feature values for training), online storage (low-latency feature access for real-time serving), point-in-time correctness (ensuring training data reflects only information available at prediction time), and feature monitoring (tracking drift and quality).
Tools like Feast (open source), Tecton, and Hopsworks provide these capabilities. For growth teams building multiple ML models (churn prediction, recommendations, lead scoring), a feature store enables feature reuse across models, reduces duplicate computation, and ensures that all models see consistent, high-quality features.
Related Terms
Cosine Similarity
A measure of similarity between two vectors based on the cosine of the angle between them, ranging from -1 (opposite) to 1 (identical), commonly used to compare embeddings.
Dimensionality Reduction
Techniques that reduce the number of dimensions in high-dimensional data while preserving meaningful structure, used for visualization, compression, and noise removal.
Batch Inference
Processing multiple ML predictions as a group at scheduled intervals rather than one-at-a-time on demand, optimizing for throughput and cost over latency.
Real-Time Inference
Generating ML predictions on-demand as requests arrive, typically with latency requirements under 200ms for user-facing features.
Data Pipeline
An automated sequence of data processing steps that moves data from source systems through transformations to destination systems, enabling reliable and repeatable data flows across an organization.
ETL (Extract, Transform, Load)
A data integration pattern that extracts data from source systems, transforms it into a structured format suitable for analysis, and loads it into a target data warehouse or database.