← Simulators
🔄 Cleaning → Embedding → Graph
From raw text to normalized text, tokens, embeddings, and a 2D graph of meaning.
From text to embedding space
1. Raw text
(THE CAFE!!)→2. Clean
(the cafe)→3. Tokenize
(the | cafe)→4. Embed
([0.2, -0.8, …])→5. Graph
(→ point in space)2D embedding space (similar = close). Axes = two of many dimensions.
cat/kitten/dog cluster together; pizza/food separate; run/walk nearby. Real embeddings use hundreds of dimensions (we show 2).
Real embedding space has many axes (e.g. 768 or 1536). Here: x, y, z as a concept.
Each word = one point (x, y, z, …). Similar words sit close in this space.