Zep Is The New State of the Art In Agent Memory
AI Summary
Zep's January 2025 benchmark post claiming SOTA on the Deep Memory Retrieval task (94.8% vs MemGPT's 93.4%) and the newer LongMemEval benchmark, at roughly 90% lower latency and under 2% of the baseline token cost compared to stuffing full transcripts into context. The piece is worth reading past the marketing because it forces the question the field keeps dodging: how do you actually benchmark memory? DMR was built by MemGPT's authors and naturally favors their architecture. LongMemEval is newer and arguably harder but still narrow. The methodology debate this kicked off is ongoing and unresolved โ every memory platform now claims SOTA on a different test.
Original excerpt
Setting a new standard for agent memory with up to 100% accuracy gains and 90% lower latency.
With OpenAI's o-series models and advancements from other vendors suggesting the near-term emergence of agents capable of solving highly complex, Ph.D.-level problems, we need to rethink how these agents will access critical information. As agents become pervasive in our daily lives, they'll need access to a vast collection of continuously evolving data spanning user interactions, business operations, and world events.
This data universe will only expand. While we've seen rapid increases in LLM context window sizes and improved recall capabilities, our research shows that recall remains challenging.โฆ
18 more articles in this vault.
Import the full Agent Memory Patterns vault to Burn 451 and build your own knowledge base.
Content attributed to the original author (Preston Rasmussen). Burn 451 curates publicly available writing as a reading index. For removal requests, contact @hawking520.