Loss Function
A mathematical function that quantifies the difference between a model's predictions and the actual target values, providing the signal that guides the optimization process during training.
The loss function defines what "good" means for your model. It converts the abstract goal of "make accurate predictions" into a concrete number that gradient descent can minimize. Different tasks require different loss functions: cross-entropy for classification, mean squared error for regression, contrastive loss for embeddings, and custom losses for specific business objectives.
Choosing the right loss function is a critical design decision because it directly determines what the model optimizes for. Cross-entropy loss encourages the model to output calibrated probabilities. Focal loss emphasizes hard examples, useful when easy examples dominate. Weighted losses let you penalize certain types of errors more heavily, reflecting their business impact. If misclassifying a churning customer costs 10x more than misclassifying a retained one, your loss function should reflect that.
For production ML systems, the loss function used during training often differs from the business metric you ultimately care about. A recommendation model trained with cross-entropy loss is evaluated on click-through rate. A churn model trained with log loss is evaluated on business value saved. Understanding the gap between training loss and business metric helps you design evaluation frameworks that accurately predict production impact.
Related Terms
RAG (Retrieval-Augmented Generation)
A technique that grounds LLM responses in external data by retrieving relevant documents at query time and injecting them into the prompt context.
Embeddings
Dense vector representations of text, images, or other data that capture semantic meaning in a high-dimensional space, enabling similarity search and clustering.
Vector Database
A specialized database optimized for storing, indexing, and querying high-dimensional vector embeddings with sub-millisecond similarity search.
LLM (Large Language Model)
A neural network trained on massive text corpora that can generate, understand, and transform natural language for tasks like summarization, classification, and conversation.
Fine-Tuning
The process of further training a pre-trained LLM on a domain-specific dataset to specialize its behavior, style, or knowledge for a particular task.
Prompt Engineering
The practice of designing and iterating on LLM input instructions to reliably produce desired outputs for a specific task.