Cross-Validation

Cross-validation addresses the problem that a single train-test split can give misleading results depending on which examples happen to end up in each set. In k-fold cross-validation, the data is divided into k equal folds. The model is trained k times, each time using k-1 folds for training and the remaining fold for validation. The final performance estimate is the average across all k folds.

The most common choice is 5-fold or 10-fold cross-validation, which balances computational cost with estimate reliability. Stratified cross-validation ensures each fold maintains the same class distribution as the full dataset, which is important for imbalanced classification tasks. Time-series cross-validation uses forward-looking splits to respect temporal ordering, preventing data leakage from the future.

For production ML, cross-validation is essential during model development and hyperparameter tuning. It provides confidence intervals on expected performance, identifies models that are unstable (high variance across folds suggests sensitivity to training data), and prevents over-optimistic evaluation that can result from a lucky single split. For final production evaluation, a held-out test set that was never used during any development decision provides the most unbiased performance estimate.

Related Terms

RAG (Retrieval-Augmented Generation)

Embeddings

Vector Database

LLM (Large Language Model)

Fine-Tuning

Prompt Engineering