Agent Memory Patterns

MemGPT · Letta · Zep · LangGraph · MemOS

The reading list for understanding how AI agents remember. MemGPT, Letta, Zep/Graphiti, LangGraph, MemOS, and the live debate over hierarchical vs graph vs episodic memory.

17 articles·3 phases·Updated 7/22/2026·

Curated byburn451

Get Burn 451

“How agents learned to remember: from append-only logs to temporal graphs to a platform land-grab over who owns the store — in under three years.”

What is agent memory?

Agent memory is how an AI agent keeps state after the context window fills up or the conversation ends — the store it writes facts to, pages them back from, and revises as the world changes. The context window is working memory; agent memory is everything that has to survive beyond it.

The framing crystallized with Charles Packer's MemGPT paper (UC Berkeley, Oct 2023), which cast the LLM as an operating system paging memory between tiers. Memory as a distinct agent component predates it, but MemGPT gave the field its working vocabulary.

Four schools of agent memory, by how much structure they impose

← append & retrieve (flat)model the world (structured) →

Episodic stream

Generative Agents (Stanford): log every observation, retrieve by recency + relevance + importance. The memory-stream template behind persistent characters.

Hierarchical paging

MemGPT / Letta: the LLM as an OS, paging facts between the context window and tiered external storage; memory blocks as an editable API.

Temporal graph

Zep / Graphiti: facts as nodes and edges with valid-time, so the agent knows not just what is true but when it changed.

Hybrid pipeline

Mem0 / Cognee / MemOS: stitch vector + graph + key-value into one store. Most production systems end up here.

No school has won. The live tradeoff: flat stores are simple and fast to retrieve but forget how facts relate and change; structured stores capture relationships and time but cost more to build and maintain. Most shipped agents in 2026 run a hybrid — and the harder question has moved from "which structure" to "who owns the store."

About this vault

Context windows are not memory. Every serious agent built after 2023 has had to answer the same question: where does the state go when the conversation ends? This vault tracks the four schools that emerged. The hierarchical school started with Charles Packer's MemGPT paper (Berkeley, October 2023), which framed the LLM as an operating system paging memory between tiers of storage. The MemGPT authors spun out Letta, which now ships memory blocks as an API, sleep-time compute as a second agent that rewrites state during idle windows, and a production agent server. The graph school is led by Zep: Graphiti (August 2024) is a temporal knowledge graph that beat MemGPT on the Deep Memory Retrieval benchmark and tracks how facts change over time. The episodic school comes from Stanford's Generative Agents (Park et al., 2023), whose memory stream of observations, reflections, and plans became the template for persistent characters. The hybrid school is everywhere else — Mem0, Cognee, MemOS stitching vector + graph + key-value into single pipelines. Meanwhile LangChain shipped LangMem, Anthropic released a memory tool in Claude Sonnet 4.5, OpenAI added ChatGPT memory at consumer scale, Cloudflare launched Agent Memory. Harrison Chase then argued the real point in April 2026: whoever owns your agent's harness owns your memory. Karpathy's LLM Wiki reframed the whole thing as an IDE-for-knowledge problem. This list picks the pieces that set the terms of the debate, not the ones that summarize it.

What the field agrees on by 2026

Read the primary sources back to back and a few conclusions keep repeating:

The context window is not memory.

Every design here starts from the same admission: a bigger context window delays the problem, it doesn't solve it. State has to live somewhere it can be written, retrieved, and revised — outside the prompt.

MemGPT: LLMs as Operating Systems

Retrieval quality, not storage, is the bottleneck.

Storing everything is easy; surfacing the right three facts at the right moment is hard. This is why temporal knowledge graphs beat flat vector recall on memory benchmarks — structure improves what actually gets retrieved.

Graphiti: Temporal Knowledge Graphs

Memory should be maintained during idle time, not just at write.

Letta's sleep-time compute runs a second agent that consolidates and rewrites state between turns — treating memory as an ongoing process, not a passive log that only grows.

Sleep-time Compute

Whoever owns the harness owns the memory.

Harrison Chase's April 2026 argument: memory isn't a feature you bolt on, it's a property of the agent runtime — so the platform that runs your agent controls its memory. The strategic fight is over the store, not the model.

Your harness, your memory

Open questions the field is still working out

Where the memory researchers still disagree:

Graph, vector, or both?

Zep argues a temporal knowledge graph is state-of-the-art because it models how facts relate and change; the vector-search camp argues graphs are expensive to build and brittle to maintain, and good embeddings plus reranking get you most of the way. Most production systems hedge with a hybrid — which just reopens the question of what each layer is actually for.

Should memory live in the app or the platform?

OpenAI, Anthropic, and Cloudflare now ship memory at the platform layer; Letta and Zep sell it as an independent store you own. "Your harness, your memory" frames the stakes — platform memory is convenient but locks you in; portable memory is yours but you maintain it.

How much should an agent forget?

A perfect record of everything is a liability — stale facts, contradictions, privacy risk. The unsolved design problem is principled forgetting: what to consolidate, what to expire, and who decides. Sleep-time compute is one answer; nobody agrees it's the answer.

Is "memory" even the right frame?

Karpathy's LLM Wiki reframes the whole thing as an IDE-for-knowledge problem — not a store the agent silently reads, but a body of knowledge a human and agent co-edit. If that framing wins, agent memory becomes a document problem, not a database one.

Frequently asked questions

What is agent memory?

Agent memory is how an AI agent retains information beyond a single context window — an external store it writes facts to and retrieves them from across sessions. The context window is short-term working memory; agent memory is the long-term store that persists after a conversation ends.

What's the difference between a context window and agent memory?

A context window is the text a model sees in one call — finite, and it resets each conversation. Agent memory is a durable store (a vector database, knowledge graph, or memory blocks) the agent reads from and writes to, so it can remember facts, preferences, and past events across many conversations.

What is MemGPT?

MemGPT (Charles Packer et al., UC Berkeley, 2023) is the paper that framed the LLM as an operating system: it pages information between the limited context window and external storage tiers, letting an agent manage far more memory than fits in one prompt. Its authors went on to build Letta.

Vector memory vs graph memory — which is better?

Vector memory stores text as embeddings and retrieves by similarity — simple and fast, but blind to how facts relate or change over time. Graph memory (e.g. Zep's Graphiti) stores facts as a temporal knowledge graph, capturing relationships and when facts changed, which improves retrieval accuracy at a higher build cost. Most 2026 production systems combine both.

How do AI agents remember across sessions?

They write structured summaries or facts to an external store at the end of a turn or session, then retrieve the relevant pieces at the start of the next one. Approaches range from an append-only memory stream (Generative Agents) to hierarchical paging (MemGPT/Letta) to temporal graphs (Zep) — often updated during idle time via "sleep-time compute."

Where does Burn 451 fit in agent memory?

Burn 451 is a read-later app with an MCP server, so your saved reading becomes a queryable store an AI agent can search and cite directly — a personal-knowledge analogue to the agent-memory systems in this reading list.

17 articles

Start Here: The Founding Papers

The three sources that set the terms of the debate: the OS metaphor, the memory-stream template, and the survey that made memory a first-class agent component.

MemGPT: Towards LLMs as Operating Systems

Generative Agents: Interactive Simulacra of Human Behavior

LLM Powered Autonomous Agents

The Hierarchical School: MemGPT → Letta

Memory as tiered storage the agent pages through. Editable memory blocks, and sleep-time compute that rewrites state while the agent is idle.

Memory Blocks: The Key to Agentic Context Management

Sleep-time Compute

Agent Memory: How to Build Agents that Learn and Remember

The Graph School: Zep & Graphiti

Memory as a temporal knowledge graph — facts with valid-time, so the agent knows not just what is true but when it changed. The state-of-the-art claim, and the benchmark behind it.

Zep Is The New State of the Art In Agent Memory

Graphiti: Temporal Knowledge Graphs for Agentic Apps

The Platform Layer & the Ownership Debate

When memory becomes a platform feature. LangMem, ChatGPT memory, MemOS — and the argument that whoever owns your agent's runtime owns its memory.

LangMem SDK for agent long-term memory

Your harness, your memory

Memory and new controls for ChatGPT

MemOS: An Operating System for Memory-Augmented Generation (MAG) in Large Language Models

Reframes & Context

The pieces that reframe the problem: context engineering as the discipline around memory, the IDE-for-knowledge lens, what counts as an agent, and where the whole idea started.

Effective context engineering for AI agents

Context Engineering Our Way to Long-Horizon Agents

LLM Wiki

I think 'agent' may finally have a widely enough agreed upon definition to be useful jargon now

Birth of BabyAGI

What the community is discussing right now

July 2026

The memory conversation runs on two live fault lines: whether the elaborate stores beat plain summarization, and — now that every platform ships memory — who should own it. A snapshot of the debate, including the MemGPT author's own take:

“We are aware of issues with memory not being as reliable in ChatGPT Voice with GPT-Live. We're actively investigating and will follow up!”
Atty Eleti (@athyuttamre) · X · OpenAI · Jul 2026 — platform memory shipped at scale — and still isn't solved

“Recursive summarization is a simple and popular way to provide the illusion of infinite context... It's lossy and you'll inevitably lose important information, but it should degrade relatively gracefully. In MemGPT we use (implicit) recursive summarization on top of all the explicit memory management.”
pacjam (Charles Packer, MemGPT author) · Hacker News · on "MemGPT – self-editing memory for unbounded context" — the author: summarization alone loses information

“The branding on this is a bit much, it's not an operating system. However LLMs are the real deal, some papers claim they are achieving SOTA or significant breakthroughs in various domains.”
Scipio_Afri · Hacker News · on "MemGPT: Towards LLMs as Operating Systems" — skeptical of the "operating system" framing

“I want my personal agent to grow to know me over time and my life is not bunch of disparate points spread out across a vector space. Rather it's millions of nodes and edges that connects key things.”
frenchmajesty · Hacker News · on "Ask HN: knowledge graphs for LLM agent memory?" — graphs model a life; vectors scatter it

Two tensions run through the mid-2026 discussion. The first is structure vs. simplicity: the MemGPT author himself concedes that the cheap trick — recursive summarization — is lossy, which is exactly the gap the graph and hierarchical schools exist to fill; skeptics counter that the OS/graph framing oversells what good retrieval already does. The second is ownership: now that OpenAI, Anthropic, and Cloudflare all ship memory — and even OpenAI's is visibly flaky — the fight is less about the algorithm and more about whether your agent's memory is portable or captive. The recurring undercurrent, and the one closest to Burn's read-later premise: a store you never revisit or curate isn't memory, it's just storage.

Start reading, not hoarding.

Import this vault to Burn 451 and actually read what matters.

New to Burn? See how the read-later app works →

Concept hubs

→ Agentic Engineering → Vibe Coding → LLM Knowledge Base → Personal Knowledge Base Browse all concept pages →

Other vaults

→ Andrej Karpathy vault → Simon Willison vault → Paul Graham vault → Naval Ravikant vault → PKM & Second Brain apps vault

Content attributed to original authors. Burn 451 curates publicly available writing as a reading index. For removal requests, contact @hawking520.