Hallucination
When an LLM generates plausible-sounding but factually incorrect or fabricated information that has no basis in its training data or provided context.
Hallucination is the most dangerous failure mode of LLMs in production. The model doesn't "know" it's making something up — it's generating the most probable next token given its context, and sometimes the most probable sequence is factually wrong. This is especially problematic because hallucinated content often sounds confident and well-written.
Common hallucination triggers include questions about specific facts (dates, numbers, names), topics with limited training data, requests that push beyond the model's knowledge, and prompts that implicitly encourage the model to guess rather than admit uncertainty. In growth applications, hallucinations can erode user trust in seconds — imagine a support bot confidently giving wrong pricing information.
Mitigation strategies include RAG (grounding responses in real data), explicit instructions to say "I don't know," output validation against known facts, temperature reduction for factual tasks, and citation requirements that force the model to reference its sources. The most robust approach is defense in depth: multiple layers of validation between the LLM's output and what the user sees.
Related Terms
LLM (Large Language Model)
A neural network trained on massive text corpora that can generate, understand, and transform natural language for tasks like summarization, classification, and conversation.
RAG (Retrieval-Augmented Generation)
A technique that grounds LLM responses in external data by retrieving relevant documents at query time and injecting them into the prompt context.
Prompt Engineering
The practice of designing and iterating on LLM input instructions to reliably produce desired outputs for a specific task.
Embeddings
Dense vector representations of text, images, or other data that capture semantic meaning in a high-dimensional space, enabling similarity search and clustering.
Vector Database
A specialized database optimized for storing, indexing, and querying high-dimensional vector embeddings with sub-millisecond similarity search.
Fine-Tuning
The process of further training a pre-trained LLM on a domain-specific dataset to specialize its behavior, style, or knowledge for a particular task.
Further Reading
5 Common RAG Pipeline Mistakes (And How to Fix Them)
Retrieval-Augmented Generation is powerful, but these common pitfalls can tank your accuracy. Here's what to watch for.
Prompt Engineering in 2026: What Actually Works
Forget the 'act as an expert' templates. After shipping dozens of LLM features in production, here are the prompt engineering techniques that actually improve outputs, reduce costs, and scale reliably.