Introducing talkie: a 13B vintage language model from 1930
TLDR
- Nick Levine, David Duvenaud, and Alec Radford released talkie, a 13B model trained on 260B tokens of pre-1931 English text, Apache 2.0 licensed.
Key Takeaways
-
Two checkpoints:
talkie-1930-13b-base(53.1 GB) trained on out-of-copyright text;talkie-1930-13b-it(26.6 GB) fine-tuned for chat using pre-1931 reference works. - Fine-tuning used Claude Sonnet 4.6 as a judge for DPO and Claude Opus 4.6 for rejection-sampled synthetic chats, making the chat model not fully copyright-clean.
- Key research questions the team is probing: can the model predict post-cutoff events, independently derive General Relativity, or learn to write Python from few-shot examples?
- The team aims to eventually use vintage base models as their own judges to eliminate anachronistic influence from modern LLMs in post-training.
- Training required active contamination prevention to keep post-1931 text and modern LLM knowledge out of the corpus.
Why It Matters
- Demonstrates a viable path to fully out-of-copyright base model training; releasing the training data would be the remaining step toward a “vegan” LLM.
- The fine-tuning dependency on modern LLMs is a structural unsolved problem for era-pure models, not unique to talkie – Mr. Chatterbox faced the same constraint.
- Alec Radford (GPT, GPT-2, Whisper) co-authoring signals this is serious research, not a novelty project.
Simon Willison / Simon Willison’s Weblog · 2026-04-28 · Read the original