Tokens are the atoms of language for AI. They are NOT always words.
Example 1 — Simple text
Text: I love pizza
Tokens
Ilovepizza
3 tokens.
Example 2 — Rare word
Text: I love pizza margherita
Tokens
Ilovepizzamarginherita
Why split? "pizza" is common; "margherita" is rare.
Example 3 — Technical words cost more
Text: Initialize PostgreSQL database
Tokens
InitializePostgreSQLdatabase
Technical words = more tokens = more cost.