Agent Memory Architecture
Giving an agent continuity across turns, sessions, and time.
Why memory is a distinct discipline
Stuffing conversation history into a context window is not memory — it's just an ever-growing transcript that eventually gets truncated or rotted by sheer volume. Real agent memory is selective: deciding what's worth persisting, in what form, for how long, and how it gets retrieved later. As of 2026 this has become a first-class architectural concern with its own benchmarks and failure modes, distinct from both context engineering and RAG, because memory specifically deals with information that must survive beyond the current task.
Three layers
Working memory is what the agent is actively reasoning with in the current task — the live context window. Short-term memory persists across a single session but not necessarily beyond it — useful for maintaining state through a long multi-step task without re-deriving everything each turn. Long-term memory persists across sessions entirely — facts about a user, prior decisions, established preferences, things that should still be true the next time the agent is invoked, possibly days or months later.
What belongs in long-term memory
Durable memory should contain only information that continues to constrain future reasoning: decisions made, preferences stated, approaches that failed and shouldn't be retried, facts that are unlikely to become stale soon. It should not become a dumping ground for every detail mentioned in passing — over-storage creates the same context pollution problem described in context engineering, just deferred to a later session. A useful test: would forgetting this fact cause a future interaction to go wrong in a way the user would notice?
Memory staleness and decay
A memory that was true when stored can become silently wrong later — a user's employer, a project's current status, a preference that changed. This is harder than simple irrelevance: a stale memory is still being retrieved as if relevant, but is now actively misleading rather than just unhelpful. Decay strategies (lowering retrieval priority for memories that haven't been reinforced recently) help with low-relevance information, but staleness in high-relevance memory — the kind retrieved constantly and trusted by default — remains a genuinely unsolved problem and should be designed around defensively: prefer verifiable facts over assumed-permanent ones, and build in mechanisms for a memory to be corrected or superseded rather than just appended to.
Implementation pattern: external memory as a tool
A common and effective pattern is to give the agent an explicit memory tool — the ability to read and write to a persistent store (a file, a key-value store, a structured database) as part of its normal tool-use loop, rather than relying on the framework to manage memory invisibly. The agent decides what to write, when to write it, and what to retrieve, which keeps memory content aligned with what the agent itself judges important rather than a generic logging policy capturing everything indiscriminately.
Cross-session identity
When the same agent serves many users or many independent task threads, correctly scoping memory to the right identity is itself a hard problem — memory written in one session must not leak into or be confused with another's, and reliably resolving "who is this, and which memories belong to them" across sessions, devices, or even renamed accounts is one of the open problems in production memory systems today.
{ "memory_id": "mem_8f2a", "content": "User prefers Slack over email for deploy alerts.", "scope": "long-term", "confidence": 0.92, "created_at": "2026-06-01T10:00:00Z", "last_accessed_at": "2026-06-20T14:22:00Z", "superseded_by": null }
Part II — Three memory tiers in practice
Letta's architecture (formerly MemGPT) makes the OS metaphor concrete. Core memory — editable blocks pinned in context (persona, user facts, active task). Recall memory — searchable conversation history outside the window. Archival memory — processed, indexed knowledge in external storage (vector or graph DB), retrieved on demand.
The agent does not "remember" because history is long; it remembers because the harness provides tools to page the right tier into context at the right moment.
Part II — Memory paging via tools
MemGPT introduced intrinsic memory management: the LLM calls functions to append to core memory, search archival storage, or pull a past conversation segment into context — analogous to page faults. The harness executes the call and updates what appears in the next inference window. This keeps the agent aligned with what it judged important rather than logging everything indiscriminately.
Typical tool surface: core_memory_append, core_memory_replace, archival_memory_insert, archival_memory_search, conversation_search. Each operation should be idempotent where possible and return confirmation of what changed.
Part II — Mem0 as a passive memory layer
Mem0 complements agent-managed memory with an automatic extraction layer: facts are inferred from conversations, stored with temporal metadata, decay in relevance over time, and resolve conflicts when new information contradicts old. Use Mem0-style layers when you need baseline long-term memory without requiring the agent to explicitly decide every write. Use explicit memory tools when writes must be auditable and intentional — legal, finance, or ops contexts.
Hybrid stacks are common: Mem0 (or similar) captures passive user preferences; core memory blocks hold session-critical task state; archival holds large reference corpora.
Part II — Conflict resolution and supersession
Stale memory is worse than missing memory — it is confidently wrong. Every durable memory record should support supersession: new facts link to superseded_by, lowered confidence, and optional last_verified_at. Retrieval should prefer highest-confidence non-superseded records and surface uncertainty when conflicting records exist.
Case study: A sales agent stored "User works at Acme Corp" in month one. In month four the user changed jobs; the agent still pitched Acme-specific integrations. Fix: memory write on explicit user statement supersedes prior employer record; retrieval prompt requires citing memory memory_id and confidence; eval includes a job-change scenario.
{ "memory_id": "mem_employer_01", "content": "User works at Northwind Traders as Head of Ops.", "scope": "long-term", "confidence": 0.95, "created_at": "2026-04-01T00:00:00Z", "last_verified_at": "2026-06-15T09:00:00Z", "superseded_by": null, "supersedes": "mem_employer_00", "source": "explicit_user_statement" }
Further reading
State persistence. The paradigm has shifted from "saving chat history" to OS-inspired architectures, where the agent has "memory paging" and actively writes to its own database.
- Letta (formerly MemGPT) — The framework that introduced the idea of managing agent memory like OS memory (virtual memory). The agent has intrinsic tools to move data from core memory (context) to archival memory (vector database) and vice versa. See also the MemGPT paper.
- Mem0 (Memory Layer for AI) — A complementary approach focused on being the passive long-term memory layer, automatically handling temporal conflict resolution and memory decay.