Back to glossary

Embeddings

Dense vector representations of text, images, or other data that capture semantic meaning in a high-dimensional space, enabling similarity search and clustering.

Embeddings convert human-readable content into arrays of numbers (vectors) that machines can compare mathematically. Two pieces of text about similar topics will have vectors that are close together in this space, even if they use completely different words.

Modern embedding models like OpenAI's text-embedding-3 or Cohere's embed-v4 produce vectors with 256 to 3,072 dimensions. More dimensions capture more nuance but cost more to store and search. The choice of embedding model dramatically impacts downstream quality — a model trained on academic papers will embed technical content differently than one trained on conversational text.

Embeddings power many AI growth features: semantic search (find documents by meaning, not keywords), recommendation systems (suggest content similar to what a user liked), clustering (group users by behavioral patterns), and anomaly detection (spot unusual patterns). They're the foundation of RAG pipelines, where document embeddings enable fast retrieval of relevant context for LLM prompts.

Related Terms

Further Reading