Back to glossary

Recurrent Neural Network (RNN)

A neural network architecture designed for sequential data that maintains a hidden state updated at each time step, allowing it to process variable-length sequences like text, time series, and audio.

RNNs process sequences one element at a time, maintaining a hidden state that serves as a memory of what has been seen so far. At each step, the network combines the current input with the previous hidden state to produce an output and an updated hidden state. This recurrence allows RNNs to handle variable-length sequences and capture temporal dependencies.

The basic RNN suffers from vanishing gradients, making it difficult to learn long-range dependencies. LSTM (Long Short-Term Memory) and GRU (Gated Recurrent Unit) architectures solve this with gating mechanisms that control what information to remember and forget. These gated variants were the dominant sequence modeling approach before transformers.

Transformers have largely replaced RNNs for language tasks because their parallel processing enables faster training on GPUs and their attention mechanism handles long-range dependencies more effectively. However, RNNs and their variants still have niches: real-time streaming applications where processing must happen sequentially, edge devices with limited memory (RNNs have constant memory regardless of sequence length), and certain time-series forecasting tasks. Understanding RNNs also provides context for why transformers were such a breakthrough.

Related Terms