memcp Overview
memcp Overview
memcp is Engram’s memory server. It stores, indexes, and retrieves agent memories using a combination of spaced-repetition salience decay and hybrid search.
What makes it different
Most “memory” systems for AI agents are just vector databases — you embed text, store it, and query by cosine similarity. That works for retrieval but doesn’t model importance over time.
memcp uses FSRS (Free Spaced Repetition Scheduler), the same algorithm used in modern flashcard apps, to compute a salience score for each memory. Memories that get reinforced (accessed frequently, or tagged as important) stay salient. Memories that haven’t been relevant in a while decay naturally. This means agents don’t get buried in stale context.
For retrieval, memcp uses hybrid BM25+vector search: BM25 for exact-term matching (names, IDs, technical terms) and dense vector search for semantic similarity. The results are fused with RRF (Reciprocal Rank Fusion). In practice: you recall by meaning AND by exact terms. If you stored “user’s API key is abc123”, a search for “abc123” finds it even if no semantic neighbor exists.
MCP-native design
memcp exposes its tools over the Model Context Protocol (MCP). Agents interact with memcp directly as MCP tool calls — the same mechanism they use for web search, code execution, or file access.
This means memory is a first-class capability, not a hidden system prompt hack. Agents can explicitly store information, search their own memories, and recall context before responding.
Key tools
| Tool | What it does |
|---|---|
store_memory | Store a new memory with content, tags, and optional metadata |
search_memories | Hybrid search across stored memories |
recall | Automatically inject the most relevant memories into context |
delete_memory | Remove a memory by ID |
list_memories | List recent memories with filtering |
See the Tools Reference for full parameter documentation.
Context injection (recall)
The recall tool returns memories ranked by a combination of FSRS salience and hybrid search relevance. When agents call recall at the start of a conversation turn, they get the memories most likely to be useful — without needing to know exactly what to search for.
This is the primary memory access pattern: call recall with the current message, get relevant context back, include it in the response.
See Recall & Context Injection for more detail.