Top-p Sampling (Nucleus Sampling)

Top-p sampling provides more nuanced control over randomness than temperature alone. Instead of considering all tokens or a fixed number of top tokens, it dynamically selects the minimum set of tokens whose probabilities sum to at least p. When the model is confident (one token has 95% probability), top-p 0.9 might consider only that single token. When the model is uncertain (many tokens with similar probabilities), it considers more options.

This adaptive behavior is the key advantage over top-k sampling, which always considers the same number of candidates regardless of the probability distribution. Top-p naturally narrows the pool when the model is confident and widens it when many options are viable, producing more contextually appropriate randomness.

In practice, top-p is often used alongside or instead of temperature. A common production configuration is temperature 0.7 with top-p 0.9, which provides moderate creativity while filtering out very unlikely tokens. For structured output tasks like JSON generation, top-p 0.1-0.3 helps ensure valid syntax. For open-ended generation, top-p 0.9-0.95 balances variety with coherence. Most API providers support both parameters, and experimentation is the best way to find optimal settings for your specific task.

Related Terms

RAG (Retrieval-Augmented Generation)

Embeddings

Vector Database

LLM (Large Language Model)

Fine-Tuning

Prompt Engineering