🤖 GPT Tokenizer

See how ChatGPT breaks down your text into tokens

Tokens: 0
Word emojis will appear here...
Actual tokens will appear here...

â„šī¸ Understanding Tokenization Through Analogy

The Emoji Analogy: Just like we can replace words with emojis (strawberry → 🍓), GPT replaces text chunks with token IDs. The middle box shows words as emojis to help visualize how language models compress text into symbols.

Real Tokenization: GPT doesn't work with whole words though! It breaks text into subword pieces using Byte Pair Encoding (BPE). Notice how the real tokens (box 3) are often smaller than words - this helps the model handle rare words and work across languages.

  • 1 token ≈ 4 characters in English
  • 1 token ≈ ž of a word
  • 100 tokens ≈ 75 words