Thoughts on Historical Language Models and Talkie-1930

May 2, 2026 · ai history · Source ↗

TLDR

Talkie-1930, the largest historical LLM to date, is trained on pre-1930 public domain texts and behaves as a temporally free-ranging 19th-century corpus ghost, not a reliable 1930-grounded chatbot.

Talkie-1930’s median self-reported year when queried 100 times is ~1860, making it more a 19th-century collective unconscious than a precise period simulator.
Training on English print culture skews Talkie’s personas toward male, London-based, literate professions: Physician, Journalist, Gentleman, Compositor.
“Talk to Lincoln” prompting on modern LLMs is a dead end; fine-tuning a vintage model on a specific author’s corpus to probe conceptual landscape (not inner life) is the higher-value use.
Counterfactual history is a credible application: push a 1911-cutoff model toward Einstein-era conceptual problems and observe what it considers thinkable vs. impossible.
Combining historical LLMs into debate or decision-making simulations is projected as a new humanistic research field by the 2030s.