LLM Knowledge Base
A curated, machine-readable knowledge collection that LLMs can query directly — no RAG pipeline required.
Term popularized by Andrej Karpathy (@kaborsky), Early 2026.
What it is, why now
An LLM knowledge base is a structured collection of curated content that large language models can access and reason over directly. Unlike RAG (Retrieval-Augmented Generation), which requires vector databases, embedding pipelines, and chunking strategies, an LLM knowledge base works through structured data formats — Markdown, JSON, or MCP tools — that fit inside a model's context window or are queryable via tool use.
The concept was crystallized by Andrej Karpathy in early 2026 when he described maintaining a personal 'LLM wiki' — a curated set of documents that he feeds to Claude or GPT when he needs deep answers on specific topics. His insight: the bottleneck isn't the model's intelligence, it's the quality of the knowledge you give it. Garbage in, garbage out. A carefully curated 50-article collection beats a 10,000-document RAG index.
For individual knowledge workers, this changes the economics of personal knowledge management. Instead of building elaborate Notion databases or Obsidian vaults that only you can search, you build a collection that both you and your AI agents can query. Burn 451's vault is exactly this: articles you've read and curated, with AI-generated metadata (summaries, key points, relevance scores), exposed through MCP so any AI agent can search your reading history.
The architectural advantage over RAG: no embedding drift, no chunking artifacts, no retrieval failures from semantic mismatches. The tradeoff: it works best at hundreds to low thousands of documents, not millions. For personal knowledge, that's the sweet spot.
How we got here
- 2023
RAG becomes the default
Retrieval-Augmented Generation dominates enterprise AI. Every knowledge base gets a vector database, embedding pipeline, and chunking strategy. Works at scale, but overkill for personal use.
- Late 2024
Context windows expand dramatically
Claude and GPT push context windows past 100K tokens. Suddenly, small curated collections can be loaded directly — no retrieval pipeline needed. The RAG-for-everything assumption starts cracking.
- Early 2025
MCP creates a knowledge API standard
Anthropic launches Model Context Protocol. Tools can expose structured data to any AI agent. This makes 'knowledge as a service' possible without custom integrations.
- Feb 2026
Karpathy describes the LLM Wiki
Andrej Karpathy shares his workflow of maintaining a curated document set specifically for LLM consumption. The post goes viral — 99K bookmarks on the original thread. The term 'LLM Wiki' enters common usage.
- Apr 2026
Consumer tools adopt the pattern
Burn 451's vault + MCP server implements the LLM Wiki pattern for non-technical users: read articles → curate to vault → AI agents can query your knowledge via MCP. No code required.
The 0 pieces that matter most
Curated from across Burn 451's vaults. Each piece has an AI summary — click to read it on its home vault page.
Related concepts
AI Bookmark Management
AI bookmark management provides the curation layer — what goes into your LLM knowledge base. Without intelligent triage, the knowledge base fills with noise.
Agentic Engineering
AI coding agents are the primary consumers of LLM knowledge bases. An engineer's curated reading becomes context for agent-driven development.
Personal Knowledge Base
A personal knowledge base is the human-facing side of what an LLM knowledge base makes machine-readable. Same content, two interfaces.
Want to read more like this?
Burn 451 is a reading tool that helps you actually finish articles instead of hoarding them. Import a Vault, set a timer, read what matters.
Concept page curated by @hawking520 · Burn 451 · Last updated 2026-04-18