Activation Function
A nonlinear mathematical function applied to each neuron's output in a neural network, enabling the network to learn complex, nonlinear patterns that a purely linear model could not represent.
Without activation functions, a neural network with any number of layers would be equivalent to a single linear transformation, unable to learn curved decision boundaries or complex relationships. The activation function introduces nonlinearity, giving the network the mathematical flexibility to approximate any function.
The most common activation functions include ReLU (Rectified Linear Unit: max(0, x)), which is simple, fast, and works well in practice despite being non-differentiable at zero. GELU (Gaussian Error Linear Unit) is used in modern transformers and provides smoother gradients. Sigmoid squashes values between 0 and 1, making it useful for probability outputs. Softmax generalizes sigmoid to multi-class settings, outputting a probability distribution over classes.
For practitioners, the choice of activation function affects training dynamics and model performance. ReLU can suffer from "dying neurons" where neurons get stuck outputting zero. Leaky ReLU and ELU address this by allowing small negative outputs. In transformers, GELU and SwiGLU have become standard because they provide better gradient flow and training stability. For output layers, the activation function is determined by your task: sigmoid for binary classification, softmax for multi-class, and no activation (linear) for regression.
Related Terms
RAG (Retrieval-Augmented Generation)
A technique that grounds LLM responses in external data by retrieving relevant documents at query time and injecting them into the prompt context.
Embeddings
Dense vector representations of text, images, or other data that capture semantic meaning in a high-dimensional space, enabling similarity search and clustering.
Vector Database
A specialized database optimized for storing, indexing, and querying high-dimensional vector embeddings with sub-millisecond similarity search.
LLM (Large Language Model)
A neural network trained on massive text corpora that can generate, understand, and transform natural language for tasks like summarization, classification, and conversation.
Fine-Tuning
The process of further training a pre-trained LLM on a domain-specific dataset to specialize its behavior, style, or knowledge for a particular task.
Prompt Engineering
The practice of designing and iterating on LLM input instructions to reliably produce desired outputs for a specific task.