tokenspeed is an interactive tool that lets you watch text stream at real LLM throughput rates, from 5 tok/s (Raspberry Pi) to 800 tok/s (Cerebras), across code, prose, think, and agent modes.
Key Takeaways
Four modes: code (syntax-highlighted), text (prose), think (reasoning + code interleaved), agent (tool calls + generation with pauses).
Code is more token-dense than prose, so identical tok/s rates feel perceptually different depending on content type.
English prose averages ~1.3 tokens per word, so 30 tok/s is roughly 23 words/s; tokenization approximates BPE style, not any vendor encoder.
At 800 tok/s the stated bottleneck is your eyeballs, not the model.
Hacker News Comment Review
Commenters generally found the perceptual calibration valuable; the gut-feel framing resonated more than the raw numbers alone.
Disagreement on whether slow local speeds matter: one view is quality is the real constraint at 20-30 tok/s; another notes thinking models add 1k+ token warmup, making latency feel much worse in long sessions.
Notable Comments
@NitpickLawyer: reasoning models burn ~1k tokens before first output, plus prompt processing and context-growth slowdowns, compounding perceived latency significantly.