All Vaults
SW

Simon Willison

LLM Blog

Simon Willison's LLM blog — coined 'prompt injection' and 'agentic engineering patterns'. Hands-on frontier model reviews, Claude Code deep dives, and the Datasette ecosystem, from simonwillison.net.

20 articles·0 phases·Updated 4/15/2026·
Curated by@hawking520
Get Burn 451

Simon Willison's LLM blog and AI tools reading list — curated posts on coding agents, datasette, and the evolving AI tooling landscape.

About this vault

Simon Willison coined 'prompt injection' and 'agentic engineering' — his LLM blog is essential reading for AI engineers. Curated from simonwillison.net: frontier model benchmarks via his pelican-on-a-bicycle test, agentic engineering patterns, Claude Code case studies, and the Datasette ecosystem he maintains. Auto-synced daily. Each piece has an AI summary so you can triage in seconds.

20 articles

Agentic Engineering & Vibe Coding

How AI coding agents are reshaping software development. Simon Willison's 'Agentic Engineering Patterns' newsletter and case studies on vibe coding, prompt injection, and Claude Code workflows — from one of the web's most-cited LLM bloggers.

Eight years of wanting, three months of building with AI

Simon Willison's favorite long-form piece on agentic engineering: Lalit Maganti spent 8 years thinking about building SQLite devtools, then 3 months shipping it with Claude Code. Case study on how AI coding agents flatten the 'I need to fully understand this' procrastination curve.

GLM-5.1: Towards Long-Horizon Tasks

Z.ai's new 754B parameter MIT-licensed model targets long-horizon agentic tasks. Simon's pelican benchmark produces a surprisingly good SVG plus unprompted CSS animations — a first from any open weights model.

Highlights from my conversation about agentic engineering on Lenny's Podcast

Simon Willison on Lenny Rachitsky's podcast: we've passed the November 2025 inflection point on coding agents. Software engineers are the bellwether for how other information workers will adopt AI — responsible vibe coding, writing code on your phone, and what comes next.

Vibe coding SwiftUI apps is a lot of fun

Simon vibe codes macOS apps with Claude Opus 4.6 and GPT-5.4 — a full SwiftUI app fits in a single text file, no Xcode required. He builds Bandwidther (network monitor) and Gpuer (GPU monitor) as menu bar tools in one sitting.

My fireside chat about agentic engineering at the Pragmatic Summit

Simon's fireside chat on the stages of agentic engineering: from 'ChatGPT as occasional helper' to 'the agent writes more code than I do' to 'supervising multiple parallel agents.' The inflection points that separate the phases — and the skills each one demands.

Perhaps not Boring Technology after all

Two years ago LLMs pushed developers toward popular languages because training data was richer. Simon argues this is no longer true — modern coding agents consume docs via 'uvx tool --help' and work fine with brand new, obscure libraries. The 'boring technology' argument may be dead.

Can coding agents relicense open source through a "clean room" implementation of code?

The chardet library debate: can coding agents pull the Compaq BIOS clean-room trick in hours instead of months? Simon on the legal and ethical gray zone of AI-generated 'clean room' reimplementations that sidestep copyleft licenses.

Steve Yegge on AI adoption at Google

Google insider claims 60% of engineers still on basic chat tools, 20% agentic power users — the same adoption curve as John Deere. Demis Hassabis publicly rebutted. Captures the messy middle of AI adoption even at frontier labs.

Frontier LLM Releases & Benchmarks

Simon's hands-on notes on the latest frontier LLMs — Muse Spark, GLM-5.1, GPT-5.4 mini/nano, Claude Mythos, Google Gemma 4. Real-world testing via the legendary pelican-on-a-bicycle benchmark.

Google AI Edge Gallery

Google's official iOS app for running Gemma 4 models locally on iPhone. The 2.54GB E2B model is fast and actually useful — image Q&A, audio transcription, and tool calling demos across 8 interactive HTML widgets.

Meta's new model is Muse Spark, and meta.ai chat has some interesting tools

Meta's first major release since Llama 4 — Muse Spark launches as hosted-only with Instant/Thinking modes and benchmarks competitive with Opus 4.6 and Gemini 3.1 Pro. Simon runs it through his signature pelican-on-a-bicycle benchmark.

Anthropic's Project Glasswing—restricting Claude Mythos to security researchers—sounds necessary to me

Anthropic held back Claude Mythos from general release due to its offensive security capabilities — the model found thousands of zero-days across every major OS and browser. Project Glasswing restricts access to security researchers until defenders catch up.

Mr. Chatterbox is a (weak) Victorian-era ethically trained model you can run on your own computer

A 340M-parameter LLM trained entirely on 28,000 out-of-copyright Victorian British Library texts — zero post-1899 input. A fascinating answer to the training data copyright debate: what if you could train an LLM that couldn't possibly be stealing from anyone?

GPT-5.4 mini and GPT-5.4 nano, which can describe 76,000 photos for $52

GPT-5.4 nano lands at $0.20/$1.25 per million tokens — cheaper than Gemini 3.1 Flash-Lite. Simon calculates it can describe 76,000 photos for $52, proving vision-language is now effectively free at scale.

Gemma 4 audio with MLX

One-line uv recipe to run Gemma 4 audio transcription locally on macOS with mlx-vlm. Demo has minor misheard words but runs fully offline. The 2026 local-first AI pattern in 8 commands.

Security & Investigative Analysis

Security research and investigative experiments: supply chain attacks, social engineering threat models, LLM-powered user profiling, and the new attack surfaces that emerge when AI meets open source.

Start reading, not hoarding.

Import this vault to Burn 451 and actually read what matters.

Content attributed to original authors. Burn 451 curates publicly available writing as a reading index. For removal requests, contact @hawking520.