← Simulators

🔄 Cleaning → Embedding → Graph

From raw text to normalized text, tokens, embeddings, and a 2D graph of meaning.

From text to embedding space

1. Raw text
(THE CAFE!!)
2. Clean
(the cafe)
3. Tokenize
(the | cafe)
4. Embed
([0.2, -0.8, …])
5. Graph
(→ point in space)

2D embedding space (similar = close). Axes = two of many dimensions.

x axis (dimension 1)y axis (dimension 2)catkittendogpizzafoodrunwalk

cat/kitten/dog cluster together; pizza/food separate; run/walk nearby. Real embeddings use hundreds of dimensions (we show 2).

Real embedding space has many axes (e.g. 768 or 1536). Here: x, y, z as a concept.

zxyoriginword

Each word = one point (x, y, z, …). Similar words sit close in this space.