What if your AI only knew things from before the Great Depression? – Linkpost

AI that only knows things from before the Great Depression: Introducing talkie, a 13B language model trained on 260B tokens of historical pre-1931 English text.

A 13B language model trained exclusively on pre-1931 text, talkie can code in Python, discuss the New Deal, and still occasionally hallucinate Roosevelt's presidency into existence, because keeping history out of history is harder than it sounds.
Vintage LMs are contamination-free by construction, which makes them useful for testing genuine generalization: can a model with no knowledge of digital computers learn to write Python from a handful of examples? Barely, but it's improving with scale.
OCR noise is the quiet killer here: training on raw historical scans produces only 30% of the learning efficiency of human-transcribed text, and modern VLM-based OCR helpfully hallucinates contemporary facts into 19th-century books.

ai-human

<- more recent

GitHub Copilot’s move to token billing exposes what AI subscriptions always hid

previous ->

When your AI agent shapes what you want, not just what you see

<- more recent

previous ->

THEFUTURE-Newsletter