How fast is N tokens per second really?

· ai coding · Source ↗

TLDR

  • tokenspeed is an interactive tool that lets you watch text stream at real LLM throughput rates, from 5 tok/s (Raspberry Pi) to 800 tok/s (Cerebras), across code, prose, think, and agent modes.

Key Takeaways

  • Four modes: code (syntax-highlighted), text (prose), think (reasoning + code interleaved), agent (tool calls + generation with pauses).
  • Keyboard shortcuts map to real-world hardware: key 1 = 5 tok/s, key 5 = 60 tok/s (hosted Claude/GPT), key 7 = 200 tok/s (Groq), key 9 = 800 tok/s (Cerebras).
  • Code is more token-dense than prose, so identical tok/s rates feel perceptually different depending on content type.
  • English prose averages ~1.3 tokens per word, so 30 tok/s is roughly 23 words/s; tokenization approximates BPE style, not any vendor encoder.
  • At 800 tok/s the stated bottleneck is your eyeballs, not the model.

Hacker News Comment Review

  • Commenters generally found the perceptual calibration valuable; the gut-feel framing resonated more than the raw numbers alone.
  • Disagreement on whether slow local speeds matter: one view is quality is the real constraint at 20-30 tok/s; another notes thinking models add 1k+ token warmup, making latency feel much worse in long sessions.

Notable Comments

  • @NitpickLawyer: reasoning models burn ~1k tokens before first output, plus prompt processing and context-growth slowdowns, compounding perceived latency significantly.

Original | Discuss on HN