Skip to content

memcp Overview

memcp Overview

memcp is Engram’s memory server. It stores, indexes, and retrieves agent memories using a combination of spaced-repetition salience decay and hybrid search.

What makes it different

Most “memory” systems for AI agents are just vector databases — you embed text, store it, and query by cosine similarity. That works for retrieval but doesn’t model importance over time.

memcp uses FSRS (Free Spaced Repetition Scheduler), the same algorithm used in modern flashcard apps, to compute a salience score for each memory. Memories that get reinforced (accessed frequently, or tagged as important) stay salient. Memories that haven’t been relevant in a while decay naturally. This means agents don’t get buried in stale context.

For retrieval, memcp uses hybrid BM25+vector search: BM25 for exact-term matching (names, IDs, technical terms) and dense vector search for semantic similarity. The results are fused with RRF (Reciprocal Rank Fusion). In practice: you recall by meaning AND by exact terms. If you stored “user’s API key is abc123”, a search for “abc123” finds it even if no semantic neighbor exists.

MCP-native design

memcp exposes its tools over the Model Context Protocol (MCP). Agents interact with memcp directly as MCP tool calls — the same mechanism they use for web search, code execution, or file access.

This means memory is a first-class capability, not a hidden system prompt hack. Agents can explicitly store information, search their own memories, and recall context before responding.

Key tools

ToolWhat it does
store_memoryStore a new memory with content, tags, and optional metadata
search_memoriesHybrid search across stored memories
recallAutomatically inject the most relevant memories into context
delete_memoryRemove a memory by ID
list_memoriesList recent memories with filtering

See the Tools Reference for full parameter documentation.

Context injection (recall)

The recall tool returns memories ranked by a combination of FSRS salience and hybrid search relevance. When agents call recall at the start of a conversation turn, they get the memories most likely to be useful — without needing to know exactly what to search for.

This is the primary memory access pattern: call recall with the current message, get relevant context back, include it in the response.

See Recall & Context Injection for more detail.