Thoughts on Historical Language Models and Talkie-1930

· ai history · Source ↗

TLDR

  • Talkie-1930, the largest historical LLM to date, is trained on pre-1930 public domain texts and behaves as a temporally free-ranging 19th-century corpus ghost, not a reliable 1930-grounded chatbot.

Key Takeaways

  • Talkie-1930’s median self-reported year when queried 100 times is ~1860, making it more a 19th-century collective unconscious than a precise period simulator.
  • Training on English print culture skews Talkie’s personas toward male, London-based, literate professions: Physician, Journalist, Gentleman, Compositor.
  • “Talk to Lincoln” prompting on modern LLMs is a dead end; fine-tuning a vintage model on a specific author’s corpus to probe conceptual landscape (not inner life) is the higher-value use.
  • Counterfactual history is a credible application: push a 1911-cutoff model toward Einstein-era conceptual problems and observe what it considers thinkable vs. impossible.
  • Combining historical LLMs into debate or decision-making simulations is projected as a new humanistic research field by the 2030s.

Hacker News Comment Review

  • No substantive HN discussion yet.

Original | Discuss on HN