Why Your AI Agent Forgets You: How Hybrid Memory Fixes It

AI agents are getting better at retrieving information. They can search files, browse docs, and connect to tools like Gmail, Slack, Notion, and GitHub.

But retrieval is not the same as memory.

You tell an agent what project you are working on.
You explain your preferences.
You make decisions across multiple conversations.
You leave context in Slack, email, and ChatGPT.

When the next session starts, the agent asks you to explain everything again. This is not just a context window problem. It is a memory retrieval problem.

Flat-file memory is useful, but real agent memory needs to persist, stay scoped, and evolve with the user over time. RAG retrieves external knowledge. Memory preserves user-specific context across sessions. They are not the same layer.

This post is a follow-up to How to Build Your Second Brain with Membase, which covers the broader case for an automated memory layer. Here we go deeper into how retrieval actually works.

For the broader memory taxonomy, start with Agent Memory Beyond RAG. This post focuses on the retrieval layer: how the system decides which memory is actually useful right now.

So the question becomes: should AI agent memory use vector search or GraphRAG?

The answer is: neither is enough by itself.

Vector Search Is Useful, but It Is Not Memory

Vector search is great when the problem is "find the most semantically similar text." That works well for docs, notes, FAQs, product specs, and knowledge bases.

For example, when you ask "What is our API rate limit?", vector search can probably find the right chunk.

But personal memory is different. When you ask "Can you continue from what David said last time?", the system needs to know:

Who is David?
Which conversation was "last time"?
What did he say?
Which project was it about?
Was it a decision, preference, task, or random comment?
Is it still relevant now?

That is not just semantic similarity. That is context reconstruction.

Graph-Based Memory Helps, but It Is Not the Whole Answer

Graph-based memory is much closer to how agent memory should work. Instead of treating everything as isolated chunks, graph memory can represent people, projects, decisions, entities, episodes, and relationships.

Graph memory frameworks (for example, Graphiti) build temporal context graphs for AI agents. They track how facts change over time, maintain provenance, and support evolving real-world data.

This matters because your memory is not static.

Your project changes.
Your preferences change.
Your priorities change.
Your relationships between tasks, people, and decisions change.

Graph memory helps agents retrieve context through relationships, not just similarity.

However, graph memory is not free. It requires entity extraction, relationship extraction, graph updates, and conflict handling. It can also add latency and complexity when the query only needs simple semantic recall. That is why graph memory should not be the only retrieval path.

The Key Is to Use Both

Memory capability	Vector	Graph	Hybrid
Finds similar text	✓	△	✓
Matches exact names and terms	△	△	✓
Follows relationships	✗	✓	✓
Handles changing context	✗	✓	✓
Stays fast for simple recall	✓	△	✓
Reconstructs user context	✗	△	✓

Vector search and graph memory both have their specialties.

Sometimes the best result comes from vector similarity. Sometimes it comes from graph traversal. Sometimes an edge points to the right episode. Sometimes an entity mention is the strongest signal. Sometimes the most recent memory should be boosted, even if it is not the top semantic match.

That is why agent memory should not be treated as a single search pipeline.

It has to be a hybrid system.

How Membase Memory Search Works

Membase memory search does not rely on a single retriever. Instead, it combines multiple recall paths and fuses them into one final ranking.

Query	Single-path retrieval	Membase memory
"Continue from what we decided last time."	Finds similar text, but misses the actual decision episode.	Recovers the decision, project, owner, and latest follow-up.
"Use my usual writing tone."	Finds previous writing samples, but misses preference context.	Combines prior examples with stored user preference signals.
"What was the Slack follow-up?"	Finds related messages, but misses the action item.	Uses lexical search, graph links, episode lifting, and recency.
"What did David say about this project?"	Finds mentions of David, but may miss the relevant relationship or timeline.	Connects David, the project, the episode, and the current task.

At a high level, the pipeline uses five retrieval signals.

1. Vector Search (Cosine Similarity)

Vector search handles semantic similarity across memories. Every hybrid retrieval system starts here.

2. BM25 (Full-Text Search)

Vector search alone is not enough. Exact names, product terms, and identifiers often fail under pure embeddings. Membase also runs BM25, a widely used keyword-based ranking algorithm. It is the fastest signal in the pipeline and catches identifiers that embeddings miss.

3. Edge-Based Episode Lifting

If a relevant relationship appears in the graph, Membase lifts the episode behind that edge so the actual event, not just the relationship, is considered for ranking.

4. Entity-Based Episode Lifting

If a relevant entity appears (a person, project, or topic), Membase lifts episodes that mention that entity into the candidate pool.

5. Recency Candidates

Membase also considers date-based candidates with recency weighting. In the current fusion setup, recency contributes with a tuned weight so recent memory is not drowned out by older high-similarity matches.

These lists are then combined with Reciprocal Rank Fusion (RRF).

What Is RRF?

RRF is a simple method for merging multiple ranked search results into one final ranking. Instead of using raw similarity scores, it scores results based on rank, so items that appear high across multiple retrieval paths rise to the top. The original SIGIR 2009 paper showed that RRF outperformed individual systems and Condorcet fusion, with k = 60 working well in practice.

Membase applies the same idea to memory: vector search, BM25, edge-based episode lifting, entity-based episode lifting, and recency candidates each produce a ranked list, and RRF fuses them into the final ranking.

Membase hybrid memory retrieval pipeline combining vector search, BM25, graph traversal, episode lifting, and recency with RRF fusion — Membase combines five retrieval signals and fuses them with Reciprocal Rank Fusion.

How Hybrid Memory Brings Back the Right Context

A good memory system should not remember everything equally. It should know which memory is useful for the current task. That is why Membase combines multiple retrieval signals and fuses them with RRF.

For example, when a user asks:

"Can you continue from what we decided last time?"

A vector-only system may find text that sounds similar to the query. A keyword-only system may find messages containing "decided." A graph-only system may find related people or projects. But the agent needs more than one signal.

It needs to recover:

the actual decision episode
the people involved
the project it belonged to
the latest follow-up
whether that context is still relevant

Hybrid memory reconstructing the decision episode, people, project, and latest follow-up from a single query — Hybrid memory reconstructs the decision, the people, the project, and the latest follow-up, not just similar text.

RRF helps combine these signals into one final ranking. Instead of trusting a single retriever, each retrieval path gets a vote. Memories that rank well across semantic, lexical, graph, episode, and recency signals rise to the top.

That is how hybrid memory retrieves the context the agent actually needs.

Give Your Agent a Real Memory Layer

If your agent can search but still forgets, it does not need more context. It needs better memory retrieval. Membase combines vector search, BM25, edge-based and entity-based episode lifting, and recency with RRF fusion to bring back the right context at the right time.

Get Started

Membase is in open beta. All features are free.

Create an account at app.membase.so
Connect your first source: chat history, Obsidian, Gmail, Calendar, or Slack
Install the MCP integration for your preferred agent
Start working. Your agents now remember.

Setup takes minutes. Your memory layer compounds from there.

Membase is a self-evolving memory hub for AI agents. Learn more about Membase.