Cross-Validation
A model evaluation technique that splits data into multiple folds, training and testing on different subsets in rotation, providing a more reliable estimate of model performance than a single train-test split.
Cross-validation addresses the problem that a single train-test split can give misleading results depending on which examples happen to end up in each set. In k-fold cross-validation, the data is divided into k equal folds. The model is trained k times, each time using k-1 folds for training and the remaining fold for validation. The final performance estimate is the average across all k folds.
The most common choice is 5-fold or 10-fold cross-validation, which balances computational cost with estimate reliability. Stratified cross-validation ensures each fold maintains the same class distribution as the full dataset, which is important for imbalanced classification tasks. Time-series cross-validation uses forward-looking splits to respect temporal ordering, preventing data leakage from the future.
For production ML, cross-validation is essential during model development and hyperparameter tuning. It provides confidence intervals on expected performance, identifies models that are unstable (high variance across folds suggests sensitivity to training data), and prevents over-optimistic evaluation that can result from a lucky single split. For final production evaluation, a held-out test set that was never used during any development decision provides the most unbiased performance estimate.
Related Terms
RAG (Retrieval-Augmented Generation)
A technique that grounds LLM responses in external data by retrieving relevant documents at query time and injecting them into the prompt context.
Embeddings
Dense vector representations of text, images, or other data that capture semantic meaning in a high-dimensional space, enabling similarity search and clustering.
Vector Database
A specialized database optimized for storing, indexing, and querying high-dimensional vector embeddings with sub-millisecond similarity search.
LLM (Large Language Model)
A neural network trained on massive text corpora that can generate, understand, and transform natural language for tasks like summarization, classification, and conversation.
Fine-Tuning
The process of further training a pre-trained LLM on a domain-specific dataset to specialize its behavior, style, or knowledge for a particular task.
Prompt Engineering
The practice of designing and iterating on LLM input instructions to reliably produce desired outputs for a specific task.