Temperature
A parameter that controls the randomness of LLM outputs by scaling the probability distribution over possible next tokens, where lower values produce more deterministic responses and higher values increase creativity.
Temperature is the most commonly adjusted inference parameter. At temperature 0, the model always picks the highest-probability token, producing deterministic but potentially repetitive output. At temperature 1.0, tokens are sampled according to their learned probabilities. Above 1.0, the distribution flattens further, making unlikely tokens more probable and producing more creative but less reliable output.
Mathematically, temperature divides the logits (raw model outputs) before the softmax function. Lower temperature sharpens the probability distribution, concentrating mass on the top tokens. Higher temperature flattens it, spreading probability more evenly. This single parameter has an outsized impact on output quality for different use cases.
The practical guideline for production systems: use temperature 0-0.3 for factual tasks like classification, extraction, and Q&A where consistency matters. Use 0.5-0.8 for balanced tasks like summarization and content generation where you want some variation but not hallucination. Use 0.8-1.2 for creative tasks like brainstorming and fiction where diversity is valued. Always test your specific use case, as the optimal temperature depends on the model, prompt, and quality criteria.
Related Terms
RAG (Retrieval-Augmented Generation)
A technique that grounds LLM responses in external data by retrieving relevant documents at query time and injecting them into the prompt context.
Embeddings
Dense vector representations of text, images, or other data that capture semantic meaning in a high-dimensional space, enabling similarity search and clustering.
Vector Database
A specialized database optimized for storing, indexing, and querying high-dimensional vector embeddings with sub-millisecond similarity search.
LLM (Large Language Model)
A neural network trained on massive text corpora that can generate, understand, and transform natural language for tasks like summarization, classification, and conversation.
Fine-Tuning
The process of further training a pre-trained LLM on a domain-specific dataset to specialize its behavior, style, or knowledge for a particular task.
Prompt Engineering
The practice of designing and iterating on LLM input instructions to reliably produce desired outputs for a specific task.