LLM Fundamentals | Visual Explainer

When you have millions of vectors, checking every one for each search is too slow. ANN (Approximate Nearest Neighbor) and HNSW let you get almost the same results much faster.

Why do we need this?

You have millions of vectors (e.g. one per document). For each search you need "the 5 closest to my query." Brute force = compare the query to every single vector. With 5 million vectors that’s 5 million comparisons per search — too slow. ANN (Approximate Nearest Neighbor) means we accept a tiny loss in accuracy to get answers in a fraction of the time.

What is HNSW in plain English?

HNSW (Hierarchical Navigable Small World) builds a layered graph over your vectors. Think of it like a map:

Top layer (highways): Very few nodes, each connected to others that are "far" apart. You can jump quickly across the whole set.
Middle layer (roads): More nodes, medium jumps.
Bottom layer (streets): Every vector is here. This is the full dataset.

Search: Start at the top (highways). Move to the node closest to your query. Go down one layer (roads), again move to the closest. Repeat until you’re on the bottom layer (streets). Then look at a small neighborhood and return the closest vectors. You only touch a small number of nodes instead of everyone — so speed is roughly O(log n) instead of O(n).

Graph: layers and nodes

L0 (H = highway): 3 nodes, long edgesL1: 5 nodes (roads)L2: many nodes (streets = all vectors)— Green dashed path = search: top → middle → bottom

Result: you only visit a path through the graph (green dashed line) instead of every vector. 95–99% accuracy, often 100×–1000× faster. Used in Pinecone, Weaviate, Milvus, Qdrant.

Complexity

Method	Speed
Brute force	O(n) — check every vector
ANN (HNSW)	O(log n) — follow a path through layers

95–99% accuracy, 100×–1000× faster. Used in Pinecone, Weaviate, Milvus, Qdrant.

Still confused by weight, sampling, or labeling?

📖 Key terms (simple definitions)

Weight

A number (often between 0 and 1) that says how much to "count" something. In search: a document with weight 0.8 is treated as slightly less important than 1.0. Use weights to boost recent docs, trusted sources, or downweight old or low-quality content.

Sampling

When the model picks the next word, it has a probability for each possible word. Sampling = randomly choosing one word according to those probabilities (instead of always picking the single most likely word). So the same prompt can give different answers. Temperature controls how random that choice is: low = more focused, high = more random/creative. Try the Temperature simulator to see it.

Labeling

A label is a short name or tag you give to a piece of data so you can recognize it in results. Example: "Doc A", "FAQ 3", "Product X manual". Labels don’t change the math — they just help you (or your app) show "this result came from document X" instead of a raw ID.

🚀 Chapter 15: ANN & HNSW