Recurrent Neural Network (RNN)

RNNs process sequences one element at a time, maintaining a hidden state that serves as a memory of what has been seen so far. At each step, the network combines the current input with the previous hidden state to produce an output and an updated hidden state. This recurrence allows RNNs to handle variable-length sequences and capture temporal dependencies.

The basic RNN suffers from vanishing gradients, making it difficult to learn long-range dependencies. LSTM (Long Short-Term Memory) and GRU (Gated Recurrent Unit) architectures solve this with gating mechanisms that control what information to remember and forget. These gated variants were the dominant sequence modeling approach before transformers.

Transformers have largely replaced RNNs for language tasks because their parallel processing enables faster training on GPUs and their attention mechanism handles long-range dependencies more effectively. However, RNNs and their variants still have niches: real-time streaming applications where processing must happen sequentially, edge devices with limited memory (RNNs have constant memory regardless of sequence length), and certain time-series forecasting tasks. Understanding RNNs also provides context for why transformers were such a breakthrough.

Related Terms

RAG (Retrieval-Augmented Generation)

Embeddings

Vector Database

LLM (Large Language Model)

Fine-Tuning

Prompt Engineering