OpenClaw Memory Design
OpenClaw Memory Design
This article summarizes the memory subsystem in OpenClaw, including memory file layout, retrieval tools, indexing pipeline, sync lifecycle, and boundary controls.
1. Overview
- Memory is persisted as Markdown files on disk.
- Retrieval is provided by the memory plugin (
memory-core) withmemory_searchandmemory_get. - Session transcripts are separate JSONL logs and are not indexed by default.
Core paths:
src/agents/workspace.ts,src/agents/bootstrap-files.tssrc/agents/tools/memory-tool.tssrc/agents/memory-search.tssrc/memory/manager.tssrc/memory/manager-search.ts,src/memory/hybrid.tssrc/auto-reply/reply/memory-flush.tssrc/hooks/bundled/session-memory/handler.ts
2. Storage Layers
2.1 Workspace Memory Files
Default workspace: ~/.openclaw/workspace
MEMORY.mdormemory.mdfor long-term summary memorymemory/*.mdfor rolling notes
Loading behavior:
- Injected as bootstrap context (with priority and realpath dedupe)
- Subagents do not inject memory files
- Per-file injected size is capped and can be truncated
2.2 Session Transcripts
Path: ~/.openclaw/agents/<agentId>/sessions/*.jsonl
- Session header row + message rows
- Used for session restore/debug
- Can become index source only in experimental mode
2.3 Index Store
Path: ~/.openclaw/memory/<agentId>.sqlite (configurable)
Major structures:
meta,files,chunks,embedding_cache- optional
chunks_fts(FTS5) - optional
chunks_vec(sqlite-vec)
3. Write Paths
3.1 Manual Persistence
System prompt policy encourages search-first; then persist durable facts to MEMORY.md or dated files under memory/.
3.2 Pre-Compaction Memory Flush
src/auto-reply/reply/memory-flush.ts can trigger a silent run near compaction threshold to persist useful long-term facts.
Preconditions include:
- memory flush enabled
- token pressure near threshold
- workspace writable
- at most once per compaction cycle
3.3 /new Session-Memory Hook
Bundled session-memory hook on command:new:
- Extract recent transcript turns
- Ask model for slug
- Write
memory/YYYY-MM-DD-<slug>.md
4. Recall Paths
4.1 Retrieval Policy
When tools are available, prompt guidance enforces:
- search first (
memory_search) - read exact slices (
memory_get) - if no evidence, state no result
4.2 memory_search
Hybrid retrieval (vector + keyword) returns snippets, paths, line ranges, scores, and source type.
4.3 memory_get
Read scope is constrained to:
MEMORY.md/memory.mdmemory/**/*.md- configured markdown
extraPaths
Supports partial reads by line range.
5. Indexing and Retrieval Pipeline
5.1 Manager Lifecycle
MemoryIndexManager is cached per effective (agentId + workspaceDir + memorySearchConfig).
5.2 File Discovery
Default sources:
MEMORY.md/memory.mdmemory/**/*.md- optional
extraPaths
Only .md files are indexed; symlinks are skipped.
5.3 Chunking
chunkMarkdown uses character approximation (tokens * 4) and tracks line ranges.
5.4 Embedding Providers
Supports openai, gemini, local, auto, with fallback and retry behavior.
5.5 Hybrid Scoring
Final score combines vector and text channels:
final = vectorWeight * vectorScore + textWeight * textScore
FTS unavailable -> vector-only. Vector unavailable -> keyword-only.
6. Sync and Freshness
- File watchers mark memory sources dirty.
- Sync can be triggered by session start, search call, debounce watcher, optional interval, or session deltas.
- Search-triggered sync is async and non-blocking, so results can be briefly stale.
Rebuild triggers include provider/model/key changes, chunk config changes, or embedding dimension changes.
7. Experimental Session Memory Search
Enabled only when:
agents.defaults.memorySearch.experimental.sessionMemory = truememorySearch.sourcesincludessessions
Session transcript text can be indexed, but memory_get still does not directly expose raw transcript files.
8. Security Boundaries
memory_getenforces strict path and extension checks.- File readers ignore symlinks.
- Session transcript storage is part of trust boundary and must be controlled by filesystem policy.
9. Potential Issues (Code Review)
Keyword query parsing is weak for pure Chinese text
- Current FTS token extraction is largely alphanumeric-focused.
Search-triggered sync is fire-and-forget
- Immediate query results may lag behind latest file updates.
Char-based token approximation can drift from real tokenizer behavior
- Can produce overly large or fragmented chunks in multilingual/code-heavy content.
Session-memory indexing lacks fine-grained desensitization hooks
- Sensitive data can enter index if transcript source is enabled.
Broad directory-style
extraPathscan inflate cost and noise- Large recursive markdown trees may increase embedding cost and reduce retrieval precision.