When you have millions of vectors, checking every one for each search is too slow. ANN (Approximate Nearest Neighbor) and HNSW let you get almost the same results much faster.
Why do we need this?
You have millions of vectors (e.g. one per document). For each search you need "the 5 closest to my query." Brute force = compare the query to every single vector. With 5 million vectors thatβs 5 million comparisons per search β too slow. ANN (Approximate Nearest Neighbor) means we accept a tiny loss in accuracy to get answers in a fraction of the time.
What is HNSW in plain English?
HNSW (Hierarchical Navigable Small World) builds a layered graph over your vectors. Think of it like a map:
- Top layer (highways): Very few nodes, each connected to others that are "far" apart. You can jump quickly across the whole set.
- Middle layer (roads): More nodes, medium jumps.
- Bottom layer (streets): Every vector is here. This is the full dataset.
Search: Start at the top (highways). Move to the node closest to your query. Go down one layer (roads), again move to the closest. Repeat until youβre on the bottom layer (streets). Then look at a small neighborhood and return the closest vectors. You only touch a small number of nodes instead of everyone β so speed is roughly O(log n) instead of O(n).
Graph: layers and nodes
Result: you only visit a path through the graph (green dashed line) instead of every vector. 95β99% accuracy, often 100Γβ1000Γ faster. Used in Pinecone, Weaviate, Milvus, Qdrant.
Complexity
| Method | Speed |
|---|---|
| Brute force | O(n) β check every vector |
| ANN (HNSW) | O(log n) β follow a path through layers |
95β99% accuracy, 100Γβ1000Γ faster. Used in Pinecone, Weaviate, Milvus, Qdrant.
Still confused by weight, sampling, or labeling?
π Key terms (simple definitions)
Weight
A number (often between 0 and 1) that says how much to "count" something. In search: a document with weight 0.8 is treated as slightly less important than 1.0. Use weights to boost recent docs, trusted sources, or downweight old or low-quality content.
Sampling
When the model picks the next word, it has a probability for each possible word. Sampling = randomly choosing one word according to those probabilities (instead of always picking the single most likely word). So the same prompt can give different answers. Temperature controls how random that choice is: low = more focused, high = more random/creative. Try the Temperature simulator to see it.
Labeling
A label is a short name or tag you give to a piece of data so you can recognize it in results. Example: "Doc A", "FAQ 3", "Product X manual". Labels donβt change the math β they just help you (or your app) show "this result came from document X" instead of a raw ID.