OpenClaw Memory Design

2026年3月5日

OpenClaw Memory Design

This article summarizes the memory subsystem in OpenClaw, including memory file layout, retrieval tools, indexing pipeline, sync lifecycle, and boundary controls.

1. Overview

Memory is persisted as Markdown files on disk.
Retrieval is provided by the memory plugin (memory-core) with memory_search and memory_get.
Session transcripts are separate JSONL logs and are not indexed by default.

Core paths:

src/agents/workspace.ts, src/agents/bootstrap-files.ts
src/agents/tools/memory-tool.ts
src/agents/memory-search.ts
src/memory/manager.ts
src/memory/manager-search.ts, src/memory/hybrid.ts
src/auto-reply/reply/memory-flush.ts
src/hooks/bundled/session-memory/handler.ts

2. Storage Layers

2.1 Workspace Memory Files

Default workspace: ~/.openclaw/workspace

MEMORY.md or memory.md for long-term summary memory
memory/*.md for rolling notes

Loading behavior:

Injected as bootstrap context (with priority and realpath dedupe)
Subagents do not inject memory files
Per-file injected size is capped and can be truncated

2.2 Session Transcripts

Path: ~/.openclaw/agents/<agentId>/sessions/*.jsonl

Session header row + message rows
Used for session restore/debug
Can become index source only in experimental mode

2.3 Index Store

Path: ~/.openclaw/memory/<agentId>.sqlite (configurable)

Major structures:

meta, files, chunks, embedding_cache
optional chunks_fts (FTS5)
optional chunks_vec (sqlite-vec)

3. Write Paths

3.1 Manual Persistence

System prompt policy encourages search-first; then persist durable facts to MEMORY.md or dated files under memory/.

3.2 Pre-Compaction Memory Flush

src/auto-reply/reply/memory-flush.ts can trigger a silent run near compaction threshold to persist useful long-term facts.

Preconditions include:

memory flush enabled
token pressure near threshold
workspace writable
at most once per compaction cycle

3.3 `/new` Session-Memory Hook

Bundled session-memory hook on command:new:

Extract recent transcript turns
Ask model for slug
Write memory/YYYY-MM-DD-<slug>.md

4. Recall Paths

4.1 Retrieval Policy

When tools are available, prompt guidance enforces:

search first (memory_search)
read exact slices (memory_get)
if no evidence, state no result

4.2 `memory_search`

Hybrid retrieval (vector + keyword) returns snippets, paths, line ranges, scores, and source type.

4.3 `memory_get`

Read scope is constrained to:

MEMORY.md / memory.md
memory/**/*.md
configured markdown extraPaths

Supports partial reads by line range.

5. Indexing and Retrieval Pipeline

5.1 Manager Lifecycle

MemoryIndexManager is cached per effective (agentId + workspaceDir + memorySearchConfig).

5.2 File Discovery

Default sources:

MEMORY.md / memory.md
memory/**/*.md
optional extraPaths

Only .md files are indexed; symlinks are skipped.

5.3 Chunking

chunkMarkdown uses character approximation (tokens * 4) and tracks line ranges.

5.4 Embedding Providers

Supports openai, gemini, local, auto, with fallback and retry behavior.

5.5 Hybrid Scoring

Final score combines vector and text channels:

final = vectorWeight * vectorScore + textWeight * textScore

FTS unavailable -> vector-only. Vector unavailable -> keyword-only.

6. Sync and Freshness

File watchers mark memory sources dirty.
Sync can be triggered by session start, search call, debounce watcher, optional interval, or session deltas.
Search-triggered sync is async and non-blocking, so results can be briefly stale.

Rebuild triggers include provider/model/key changes, chunk config changes, or embedding dimension changes.

7. Experimental Session Memory Search

Enabled only when:

agents.defaults.memorySearch.experimental.sessionMemory = true
memorySearch.sources includes sessions

Session transcript text can be indexed, but memory_get still does not directly expose raw transcript files.

8. Security Boundaries

memory_get enforces strict path and extension checks.
File readers ignore symlinks.
Session transcript storage is part of trust boundary and must be controlled by filesystem policy.

9. Potential Issues (Code Review)

Keyword query parsing is weak for pure Chinese text
- Current FTS token extraction is largely alphanumeric-focused.
Search-triggered sync is fire-and-forget
- Immediate query results may lag behind latest file updates.
Char-based token approximation can drift from real tokenizer behavior
- Can produce overly large or fragmented chunks in multilingual/code-heavy content.
Session-memory indexing lacks fine-grained desensitization hooks
- Sensitive data can enter index if transcript source is enabled.
Broad directory-style extraPaths can inflate cost and noise
- Large recursive markdown trees may increase embedding cost and reduce retrieval precision.

返回资源中心

OpenClaw Memory Design

1. Overview

2. Storage Layers

2.1 Workspace Memory Files

2.2 Session Transcripts

2.3 Index Store

3. Write Paths

3.1 Manual Persistence

3.2 Pre-Compaction Memory Flush

3.3 /new Session-Memory Hook

4. Recall Paths

4.1 Retrieval Policy

4.2 memory_search

4.3 memory_get

5. Indexing and Retrieval Pipeline

5.1 Manager Lifecycle

5.2 File Discovery

5.3 Chunking

5.4 Embedding Providers

5.5 Hybrid Scoring

6. Sync and Freshness

7. Experimental Session Memory Search

8. Security Boundaries

9. Potential Issues (Code Review)

目录

3.3 `/new` Session-Memory Hook

4.2 `memory_search`

4.3 `memory_get`