All Vaults
AM

Agent Memory Patterns

MemGPT · Letta · Zep · LangGraph · MemOS

The reading list for understanding how AI agents remember. MemGPT, Letta, Zep/Graphiti, LangGraph, MemOS, and the live debate over hierarchical vs graph vs episodic memory.

18 articles·5 phases·Updated 4/22/2026·
Curated byburn451
Get Burn 451

Context windows aren't memory. Whoever solves the persistence layer for agents owns the economics of the next five years.

About this vault

Context windows are not memory. Every serious agent built after 2023 has had to answer the same question: where does the state go when the conversation ends? This vault tracks the four schools that emerged. The hierarchical school started with Charles Packer's MemGPT paper (Berkeley, October 2023), which framed the LLM as an operating system paging memory between tiers of storage. The MemGPT authors spun out Letta, which now ships memory blocks as an API, sleep-time compute as a second agent that rewrites state during idle windows, and a production agent server. The graph school is led by Zep: Graphiti (August 2024) is a temporal knowledge graph that beat MemGPT on the Deep Memory Retrieval benchmark and tracks how facts change over time. The episodic school comes from Stanford's Generative Agents (Park et al., 2023), whose memory stream of observations, reflections, and plans became the template for persistent characters. The hybrid school is everywhere else — Mem0, Cognee, MemOS stitching vector + graph + key-value into single pipelines. Meanwhile LangChain shipped LangMem, Anthropic released a memory tool in Claude Sonnet 4.5, OpenAI added ChatGPT memory at consumer scale, Cloudflare launched Agent Memory. Harrison Chase then argued the real point in April 2026: whoever owns your agent's harness owns your memory. Karpathy's LLM Wiki reframed the whole thing as an IDE-for-knowledge problem. This list picks the pieces that set the terms of the debate, not the ones that summarize it.

18 articles

Foundational Papers (2023)

The papers that set the vocabulary every downstream product still uses. Packer's MemGPT treats the LLM as an operating system paging memory. Park's Generative Agents introduces the memory stream with reflection. Weng's taxonomy codifies agent = LLM + planning + memory + tools. If you read nothing else, read these three.

MemGPT: Towards LLMs as Operating Systems

The paper that started the modern agent memory conversation. Packer and the Berkeley Sky Lab team argue that LLMs should be treated like operating systems, with a context window as fast RAM and external storage as disk. MemGPT introduces virtual context management, function-calling interrupts, and a self-editing memory architecture that lets agents page information in and out. Everything downstream — Letta, memory blocks, sleep-time compute — is a product decision built on top of this paper's vocabulary. If you read one thing, read this first, because it defines the terms the rest of the field argues about.

Generative Agents: Interactive Simulacra of Human Behavior

Stanford and Google's Smallville paper: 25 AI villagers plan a Valentine's Day party without human scripting. The technical contribution is the memory stream — a time-stamped log of observations scored by recency, importance, and relevance, plus a reflection loop that periodically abstracts high-importance memories into generalizations the agent can act on. This is the episodic school's founding document, and anyone building a companion, NPC, or long-horizon assistant ends up rebuilding something that looks like this, often without realizing it. Predates MemGPT by six months. Read it for the retrieval formula and the reflection loop; both still work.

LLM Powered Autonomous Agents

The canonical taxonomy post. Weng, then at OpenAI, wrote the piece that every agent framework deck still cites: agent = LLM + planning + memory + tool use. The memory section breaks it into sensory (embeddings), short-term (in-context), and long-term (external vector store with MIPS retrieval), mapping cognitive science onto transformer architecture. This is where the field got its shared vocabulary in mid-2023, before MemGPT or Graphiti existed. Load-bearing because every subsequent product pitch is a variation on these four boxes.

Birth of BabyAGI

Nakajima's first-person account of how a weekend hack became the ancestor of every task-loop agent. BabyAGI stored task/result pairs as embeddings in Pinecone and used them as crude long-term memory, feeding past outcomes back into task generation. The architecture is naive by 2026 standards, but its influence is outsized: the three-agent loop (execute, create, prioritize) showed up later in Letta's agent server, in LangGraph's state machines, and in Park's generative agents. Worth reading for how early this stack felt figured out, and how much was missing.

Memory Blocks: The Key to Agentic Context Management

Letta's product thesis in article form. Memory blocks are labeled, character-limited sections of the context window that persist across turns — a direct descendant of MemGPT's core memory, now shipped as an API. The post argues that treating context as structured slots (persona, human, project) rather than a flat string is what separates agents with stable identity from chatbots that reset. This is the piece to read before evaluating any memory platform, because it names the abstraction most others are quietly copying.

Agent Memory: How to Build Agents that Learn and Remember

Letta's July 2025 deep dive on the three-layer memory model used in production agents: message buffer for recent turns, editable in-context memory for stable identity and project state, and external archival storage for everything else. The post works through when to read vs. write memory, how to handle memory pressure (what MemGPT called overflow), and why most teams under-invest in memory editing and over-invest in retrieval. It also lays out the agent-learns-over-time loop as a sequence of specific tool calls rather than hand-waving. Useful as a reality check after reading the academic papers — the production answer is messier than the benchmark setup suggests.

Zep Is The New State of the Art In Agent Memory

Zep's January 2025 benchmark post claiming SOTA on the Deep Memory Retrieval task (94.8% vs MemGPT's 93.4%) and the newer LongMemEval benchmark, at roughly 90% lower latency and under 2% of the baseline token cost compared to stuffing full transcripts into context. The piece is worth reading past the marketing because it forces the question the field keeps dodging: how do you actually benchmark memory? DMR was built by MemGPT's authors and naturally favors their architecture. LongMemEval is newer and arguably harder but still narrow. The methodology debate this kicked off is ongoing and unresolved — every memory platform now claims SOTA on a different test.

MemOS: An Operating System for Memory-Augmented Generation (MAG) in Large Language Models

China's answer to MemGPT, from Shanghai Jiao Tong and collaborators, May 2025. MemOS argues that LLMs currently juggle three incompatible memory types — parametric (weights), activation (runtime state), and plaintext (RAG) — with no unified interface. Their proposal is MemCube, a standardized container that can hold any of the three and supports tracking, fusion, and migration between them. Ambitious in scope and early in implementation, but it's the most aggressive attempt so far at treating memory as a first-class system resource rather than a retrieval problem.

The Hierarchical School — Letta

The MemGPT team turned the paper into a production platform. Memory blocks as an API, sleep-time compute as a second agent rewriting state during idle windows, and an agent server that enterprise teams actually deploy. Read Letta to understand what ships when you treat memory as infrastructure, not a prompt trick.

Hybrid & Platform Memory

Mem0 and MemOS stitch vector + graph + key-value into one pipeline. OpenAI, Anthropic, Cloudflare, and LangChain ship memory as a platform feature — ChatGPT memory, Claude's memory tool, Cloudflare Agent Memory, LangMem. This is where memory stops being a product and starts being a commodity.

LangMem SDK for agent long-term memory

LangChain's February 2025 release of LangMem, a storage-agnostic SDK for building the three memory types — semantic (facts), episodic (past interactions as few-shot examples), procedural (saved as updated prompts) — into any agent. The post is less a product launch and more a normative claim: these are the three types that matter, this is how you implement each, and your agent framework should be able to compose them rather than pick one. Pair with the LangGraph store documentation, which operationalizes the same taxonomy with namespaces and checkpointers. Useful because it commits to a vocabulary when most frameworks are still hand-waving about what memory even means.

Your harness, your memory

Chase's April 2026 argument that the agent harness — the code that orchestrates model calls, tools, and state — is inseparable from memory, and closed harnesses are a trap. If Anthropic or OpenAI holds your agent's memory inside their proprietary SDK, you don't own your agent. This is the clearest political statement in the field so far: memory format equals lock-in, and open-source harnesses like LangChain's Deep Agents are a hedge against it. Short, opinionated, landed hard in the AI engineering community.

Effective context engineering for AI agents

Anthropic's September 2025 framing post, published alongside Claude Sonnet 4.5 and the memory tool beta. The core move is to stop talking about prompts and start talking about the attention budget: what's the smallest set of high-signal tokens that gets the model to do the right thing? Memory enters as 'structured note-taking,' with concrete examples from Claude Code and Claude playing Pokémon. This is the piece that made 'context engineering' the default frame for 2026, absorbing agent memory into a bigger category.

Memory and new controls for ChatGPT

OpenAI's February 2024 launch post for ChatGPT memory — the first mainstream consumer product to ship persistent cross-session memory. The mechanism is deliberately vague in the post but amounts to automatic summarization of prior chats injected into system context. It reaches hundreds of millions of users with no technical setup, which makes it the most-used agent memory system on earth, and also the one users complain about most (stale facts, wrong inferences, unclear provenance). Read it to see what memory at consumer scale actually looks like, versus what the papers claim.

Introducing Mem0

Mem0's September 2024 launch post from founder Taranjeet Singh. The product thesis is hybrid storage — vector for semantic similarity, graph for relationships between entities, key-value for fast fact retrieval — wrapped in an extract-store-retrieve pipeline that runs automatically on chat history instead of making developers write memory logic by hand. Later benchmark claims against OpenAI's built-in memory: 26% higher accuracy, 91% lower latency, 90% token savings. Numbers aside, Mem0 is worth reading because it's the cleanest articulation of the hybrid school's position: you don't pick one index, you compose three, and the integration layer is the product.

The Harness War (2026)

Karpathy's LLM Wiki reframed memory as an IDE-for-knowledge problem. Harrison Chase argued whoever owns your agent's harness owns your memory — closed-source harnesses are lock-in dressed up as convenience. Simon Willison's agent-vocabulary post marked when the field finally aligned. The political layer of memory, not the technical.

Start reading, not hoarding.

Import this vault to Burn 451 and actually read what matters.

Frequently asked questions

Who is Agent Memory Patterns?

Agent Memory Patterns is covered in this Burn 451 vault with a focus on memgpt · letta · zep · langgraph · memos. The reading list for understanding how AI agents remember. MemGPT, Letta, Zep/Graphiti, LangGraph, MemOS, and the live debate over hierarchical vs graph vs episodic memory.

How was the Agent Memory Patterns vault curated?

The Agent Memory Patterns vault was hand-curated by the Burn 451 editorial team from publicly available essays, blog posts, podcast transcripts, and social threads. Each piece includes an AI-generated summary so readers can triage in seconds. The vault auto-syncs as new content from Agent Memory Patterns is published.

How many articles are in the Agent Memory Patterns vault?

The Agent Memory Patterns vault currently contains 18 curated pieces organized by topic, not chronology. Each article has an AI summary and a direct link to the original source. Items are refreshed hourly through Burn 451's ISR pipeline, so new publications appear within a day.

How do I use this vault with Claude or Cursor?

Install the burn-mcp-server package from npm and connect it to Claude, Cursor, or any MCP-compatible AI tool. The vault becomes queryable as live context — your AI can search, summarize, and cite articles from Agent Memory Patterns directly in conversation without manual copy-paste or re-uploading files.

What is Burn 451?

Burn 451 is a read-later app built around a 24-hour burn timer that forces daily triage. Articles you save must be read, vaulted, or released within 24 hours. The Vault layer — including this Agent Memory Patterns collection — holds permanent curated reading lists for AI thought leaders, founders, and researchers.

Content attributed to original authors. Burn 451 curates publicly available writing as a reading index. For removal requests, contact @hawking520.