Human Typing Habits and Token Counts

· ai · Source ↗

Ordinary typing habits—typos, shorthand, filler words, pasted UUIDs—change token counts without changing intent, and tokenizers bill by pattern regardless of recovered meaning.

What Matters

  • template → 1 token; tempalte → 3 tokens (OpenAI). Same word, 3× cost from a single transposition.
  • assistant → 1 token; assitant → 2 (OpenAI), 3 (Claude). Claude consistently tokenizes misspellings more expensively.
  • Shorthand backfires: pls → 2 tokens (Claude), thx → 2, w/o → 3 (Claude) vs. 1 token each for the full dictionary words.
  • A UUID like 019d6ce9-7cfe-753a-b6d6-df719510c9e3 costs 24 tokens (OpenAI) or 26 (Claude); an RFC 3339 timestamp costs 16–17 tokens.
  • Expressive punctuation leaks: Yes!! → 2 tokens, yesss → 3, reeeally → 3—tone markers that rarely help the task.
  • Suffixes fragment unpredictably: describe → 1, describer → 2, describers → 3; a tiny morpheme can double or triple the split.
  • Boundary whitespace (leading/trailing spaces) inflates counts; normal internal spacing is generally safe.

Original | Discuss on HN