Giving Claude Code a Memory

Claude Code is stateless. Every session starts fresh — no memory of what you discussed yesterday, what you tried last week, or what broke a month ago. It reads files and runs commands just fine, but it doesn’t know anything about its own past.

If you’re working on one project, that’s fine. You re-explain things, point Claude at the right files, move on.

I’m not working on one project. I’ve got about a dozen active repos on a dedicated Linux box, with Claude sessions spinning up in different directories throughout the day. There’s also a “root” session I call Claude Prime for cross-project work — coordination, planning, memory housekeeping. In that setup, the amnesia gets old fast. You end up re-explaining the same architectural decisions, re-discovering the same gotchas, re-establishing the same preferences. Over and over.

What’s already there (and what’s missing)

Claude Code has a built-in memory system — CLAUDE.md files and a memory/ directory where it can store notes. These work well for stable facts: how to deploy, what the project structure looks like, which branch conventions to follow. I use them heavily.

But they’re summaries. They don’t capture the messy middle — the reasoning behind a decision, the thing we tried that almost worked, the “oh wait, we ran into this exact issue three weeks ago” moment. That stuff lives in the conversations themselves, and conversations vanish when the session ends.

I wanted to be able to search past sessions. Not a summary of what happened, but the actual back-and-forth.

Where the idea came from

Eric Tramel wrote a post called Searchable Agent Memory that nailed this. His approach: index conversation transcripts with BM25 keyword search, expose it as an MCP server, let the agent search its own history. Simple and practical.

BM25 is a text ranking algorithm from the search engine era — word frequency scoring, weighted by how rare each term is. No embeddings, no vector database, no GPU required. It’s just counting words and doing some math. That was important to me because my dev server is a Beelink Celeron. Not exactly a powerhouse. BM25 would run on a Raspberry Pi.

The implementation

Here’s what I already had without realizing it: Claude Code writes every conversation to a JSONL file automatically. One JSON object per line — every message, every tool call, every response. They pile up in ~/.claude/projects/, organized by project path. Nobody reads them. They just sit there.

The memory system is two Python scripts running as MCP servers.

The first one does project-scoped search. It indexes the JSONL files for whatever project you’re working in. A Claude session in the website repo only sees website conversations. It doesn’t get polluted with stuff from the API server or the infrastructure project.

The second does global search across every project on the machine. This one’s for Claude Prime — for when I know we solved something but can’t remember which project it was in, or when context from one project matters in another. Results come back tagged with the project name.

Both servers parse the JSONL into conversation turns, build a BM25 index, and watch the directory for changes. New conversations get indexed automatically. There’s a debounce timer so it doesn’t re-index on every line written during an active session.

From Claude’s side, it’s just a tool call. search_conversations("that cloudflare tunnel issue") returns ranked snippets from past sessions. Claude doesn’t know or care about BM25 or JSONL parsing.

Two layers

The system has two layers that do different things:

Curated memory is the MEMORY.md files and topic notes. Stable, verified, always loaded. “Here’s how to deploy. Here are the project conventions. Here’s what the user prefers.” I maintain these by hand (well, Claude updates them, but I decide what goes in).

Searchable history is everything else. Every conversation, unfiltered. When the curated notes say “deploy with npm run build” but don’t mention why we stopped using the old deploy script, the history has that conversation sitting in it, searchable.

It’s like the difference between documentation and git blame. Documentation tells you what. Git blame tells you why.

Keeping projects separate

I was pretty deliberate about isolation. Each project’s .mcp.json points the search server at only that project’s conversation directory. A Claude session in Project A doesn’t see Project B’s history. That’s the default, and it’s the right default — when you’re heads-down in a codebase, you want context about that codebase, not crosstalk.

The global search is there for the times you need to cross boundaries, but it’s opt-in. And it labels every result with where it came from.

What it costs to run

Nothing, really. The JSONL files already exist. The BM25 index lives in memory — it’s small. The Python scripts pull in three dependencies: bm25s, watchdog, and the mcp SDK. No database, no cloud service, no API keys.

Search results do add some tokens to the conversation context, but we’re talking maybe 1,000 tokens for a typical search. That’s noise compared to the tokens you burn re-explaining things from scratch every session.

How it actually feels

The difference is that sessions stop feeling like they start from zero. A new Claude session can search for what happened last time, find the relevant conversations, and pick up where things left off. “What were we working on?” becomes a search query instead of a blank stare.

Past debugging attempts are findable. Decisions have a paper trail. A preference I stated once — “always use main, not master” — is in the index forever.

It’s not magic. BM25 is keyword-based, so if you search for “CI/CD” and the conversation used “deploy pipeline,” you might miss it. In practice, though, natural conversations have enough vocabulary overlap that searches usually land.

What I’d tell someone building this

Claude Code’s statelessness is a reasonable default. Not every setup needs persistent memory. But if you’re on a dedicated machine, running sessions across multiple projects, day after day — the conversation transcripts are already being written. They’re sitting right there. All you need is something to read them.

The whole thing is two Python scripts, a search algorithm from the ’90s, and an open protocol. The hardest part was noticing that the raw material was already there, being generated and thrown away.