Back to glossary

Cosine Similarity

A measure of similarity between two vectors based on the cosine of the angle between them, ranging from -1 (opposite) to 1 (identical), commonly used to compare embeddings.

Cosine similarity is the standard metric for comparing embeddings. It measures how similar two vectors' directions are, regardless of their magnitudes. Two document embeddings with a cosine similarity of 0.95 are about the same topic; at 0.5, they share some themes; at 0.1, they're largely unrelated.

The formula is straightforward: cos(A, B) = (A dot B) / (|A| * |B|). The dot product measures directional alignment; dividing by magnitudes normalizes for vector length. This normalization is important because it means cosine similarity measures semantic similarity regardless of document length — a 100-word summary and a 10,000-word article on the same topic will have high cosine similarity.

In practice, cosine similarity powers the retrieval step of RAG pipelines, recommendation similarity scores, duplicate detection, and clustering quality metrics. Most vector databases use cosine similarity (or its cousin, dot product on normalized vectors) as the default distance metric. When tuning thresholds — e.g., "show related articles with similarity above X" — typical production values range from 0.7 (loose, more results) to 0.9 (strict, fewer but more relevant results).

Related Terms

Further Reading