Mr. Chatterbox is a (weak) Victorian-era ethically trained model you can run on your own computer

BlogSimon WillisonMar 30, 2026

AI Summary

A 340M-parameter LLM trained entirely on 28,000 out-of-copyright Victorian British Library texts — zero post-1899 input. A fascinating answer to the training data copyright debate: what if you could train an LLM that couldn't possibly be stealing from anyone?

View original source

From the original

Sponsored by: Honeycomb — AI agents behave unpredictably. Get the context you need to debug what actually happened. Read the blog Trip Venturella released Mr. Chatterbox, a language model trained entirely on out-of-copyright text from the British Library. Here’s how he describes it in the model…

16 more articles in this vault.

Import the full Simon Willison vault to Burn 451 and build your own knowledge base.

Content attributed to the original author (Simon Willison). Burn 451 curates publicly available writing as a reading index. For removal requests, contact @hawking520.