Cosine Similarity
A measure of similarity between two vectors based on the cosine of the angle between them, ranging from -1 (opposite) to 1 (identical), commonly used to compare embeddings.
Cosine similarity is the standard metric for comparing embeddings. It measures how similar two vectors' directions are, regardless of their magnitudes. Two document embeddings with a cosine similarity of 0.95 are about the same topic; at 0.5, they share some themes; at 0.1, they're largely unrelated.
The formula is straightforward: cos(A, B) = (A dot B) / (|A| * |B|). The dot product measures directional alignment; dividing by magnitudes normalizes for vector length. This normalization is important because it means cosine similarity measures semantic similarity regardless of document length — a 100-word summary and a 10,000-word article on the same topic will have high cosine similarity.
In practice, cosine similarity powers the retrieval step of RAG pipelines, recommendation similarity scores, duplicate detection, and clustering quality metrics. Most vector databases use cosine similarity (or its cousin, dot product on normalized vectors) as the default distance metric. When tuning thresholds — e.g., "show related articles with similarity above X" — typical production values range from 0.7 (loose, more results) to 0.9 (strict, fewer but more relevant results).
Related Terms
Embeddings
Dense vector representations of text, images, or other data that capture semantic meaning in a high-dimensional space, enabling similarity search and clustering.
Vector Database
A specialized database optimized for storing, indexing, and querying high-dimensional vector embeddings with sub-millisecond similarity search.
Semantic Search
Search that understands the meaning and intent behind a query rather than just matching keywords, typically powered by embedding-based similarity comparison.
Dimensionality Reduction
Techniques that reduce the number of dimensions in high-dimensional data while preserving meaningful structure, used for visualization, compression, and noise removal.
Batch Inference
Processing multiple ML predictions as a group at scheduled intervals rather than one-at-a-time on demand, optimizing for throughput and cost over latency.
Real-Time Inference
Generating ML predictions on-demand as requests arrive, typically with latency requirements under 200ms for user-facing features.
Further Reading
Embedding Models Benchmarked: OpenAI vs Cohere vs Open-Source
Tested 12 embedding models on real production workloads. Here's what actually performs for RAG, semantic search, and clustering—with cost breakdowns and migration guides.
The State of Embedding Models in 2026
A comprehensive comparison of embedding models for semantic search, RAG, and similarity tasks.