Feature Store

Feature stores solve the training-serving skew problem. When a model is trained on features computed in a batch pipeline using Python and Spark, but served features computed in a real-time API using different code, subtle differences in computation logic can degrade model performance. A feature store provides a single source of truth for feature definitions and values, used by both training and serving paths.

Key capabilities include feature registration (defining features with metadata and transformation logic), offline storage (historical feature values for training), online storage (low-latency feature access for real-time serving), point-in-time correctness (ensuring training data reflects only information available at prediction time), and feature monitoring (tracking drift and quality).

Tools like Feast (open source), Tecton, and Hopsworks provide these capabilities. For growth teams building multiple ML models (churn prediction, recommendations, lead scoring), a feature store enables feature reuse across models, reduces duplicate computation, and ensures that all models see consistent, high-quality features.

Related Terms

Cosine Similarity

Dimensionality Reduction

Batch Inference

Real-Time Inference

Data Pipeline

ETL (Extract, Transform, Load)