Backpropagation
The algorithm that efficiently computes gradients of the loss function with respect to every weight in a neural network by propagating error signals backward from the output layer to the input layer.
Backpropagation is the engine that makes training deep neural networks computationally feasible. After a forward pass computes the model's prediction and the loss measures how wrong it is, backpropagation traces back through the network, computing how much each weight contributed to the error using the chain rule of calculus. These gradients then guide weight updates via gradient descent.
The algorithm works by applying the chain rule layer by layer, from output to input. Each layer computes how its inputs affected its outputs (the local gradient) and multiplies by the gradient flowing in from above. This recursive multiplication efficiently decomposes the end-to-end gradient computation into simple per-layer operations that can be parallelized on GPUs.
The practical challenges of backpropagation include vanishing gradients (gradients shrink to near-zero in deep networks, preventing early layers from learning) and exploding gradients (gradients grow uncontrollably, destabilizing training). Solutions like residual connections (skip connections), careful weight initialization, gradient clipping, and normalization layers (batch norm, layer norm) have made backpropagation reliable even in networks with hundreds of layers. These are now standard components of modern architectures.
Related Terms
RAG (Retrieval-Augmented Generation)
A technique that grounds LLM responses in external data by retrieving relevant documents at query time and injecting them into the prompt context.
Embeddings
Dense vector representations of text, images, or other data that capture semantic meaning in a high-dimensional space, enabling similarity search and clustering.
Vector Database
A specialized database optimized for storing, indexing, and querying high-dimensional vector embeddings with sub-millisecond similarity search.
LLM (Large Language Model)
A neural network trained on massive text corpora that can generate, understand, and transform natural language for tasks like summarization, classification, and conversation.
Fine-Tuning
The process of further training a pre-trained LLM on a domain-specific dataset to specialize its behavior, style, or knowledge for a particular task.
Prompt Engineering
The practice of designing and iterating on LLM input instructions to reliably produce desired outputs for a specific task.