OpenClaw Memory Design

OpenClaw Memory Design

This article summarizes the memory subsystem in OpenClaw, including memory file layout, retrieval tools, indexing pipeline, sync lifecycle, and boundary controls.


1. Overview

  • Memory is persisted as Markdown files on disk.
  • Retrieval is provided by the memory plugin (memory-core) with memory_search and memory_get.
  • Session transcripts are separate JSONL logs and are not indexed by default.

Core paths:

  • src/agents/workspace.ts, src/agents/bootstrap-files.ts
  • src/agents/tools/memory-tool.ts
  • src/agents/memory-search.ts
  • src/memory/manager.ts
  • src/memory/manager-search.ts, src/memory/hybrid.ts
  • src/auto-reply/reply/memory-flush.ts
  • src/hooks/bundled/session-memory/handler.ts

2. Storage Layers

2.1 Workspace Memory Files

Default workspace: ~/.openclaw/workspace

  • MEMORY.md or memory.md for long-term summary memory
  • memory/*.md for rolling notes

Loading behavior:

  • Injected as bootstrap context (with priority and realpath dedupe)
  • Subagents do not inject memory files
  • Per-file injected size is capped and can be truncated

2.2 Session Transcripts

Path: ~/.openclaw/agents/<agentId>/sessions/*.jsonl

  • Session header row + message rows
  • Used for session restore/debug
  • Can become index source only in experimental mode

2.3 Index Store

Path: ~/.openclaw/memory/<agentId>.sqlite (configurable)

Major structures:

  • meta, files, chunks, embedding_cache
  • optional chunks_fts (FTS5)
  • optional chunks_vec (sqlite-vec)

3. Write Paths

3.1 Manual Persistence

System prompt policy encourages search-first; then persist durable facts to MEMORY.md or dated files under memory/.

3.2 Pre-Compaction Memory Flush

src/auto-reply/reply/memory-flush.ts can trigger a silent run near compaction threshold to persist useful long-term facts.

Preconditions include:

  • memory flush enabled
  • token pressure near threshold
  • workspace writable
  • at most once per compaction cycle

3.3 /new Session-Memory Hook

Bundled session-memory hook on command:new:

  1. Extract recent transcript turns
  2. Ask model for slug
  3. Write memory/YYYY-MM-DD-<slug>.md

4. Recall Paths

4.1 Retrieval Policy

When tools are available, prompt guidance enforces:

  • search first (memory_search)
  • read exact slices (memory_get)
  • if no evidence, state no result

4.2 memory_search

Hybrid retrieval (vector + keyword) returns snippets, paths, line ranges, scores, and source type.

4.3 memory_get

Read scope is constrained to:

  • MEMORY.md / memory.md
  • memory/**/*.md
  • configured markdown extraPaths

Supports partial reads by line range.


5. Indexing and Retrieval Pipeline

5.1 Manager Lifecycle

MemoryIndexManager is cached per effective (agentId + workspaceDir + memorySearchConfig).

5.2 File Discovery

Default sources:

  • MEMORY.md / memory.md
  • memory/**/*.md
  • optional extraPaths

Only .md files are indexed; symlinks are skipped.

5.3 Chunking

chunkMarkdown uses character approximation (tokens * 4) and tracks line ranges.

5.4 Embedding Providers

Supports openai, gemini, local, auto, with fallback and retry behavior.

5.5 Hybrid Scoring

Final score combines vector and text channels:

final = vectorWeight * vectorScore + textWeight * textScore

FTS unavailable -> vector-only. Vector unavailable -> keyword-only.


6. Sync and Freshness

  • File watchers mark memory sources dirty.
  • Sync can be triggered by session start, search call, debounce watcher, optional interval, or session deltas.
  • Search-triggered sync is async and non-blocking, so results can be briefly stale.

Rebuild triggers include provider/model/key changes, chunk config changes, or embedding dimension changes.


7. Experimental Session Memory Search

Enabled only when:

  • agents.defaults.memorySearch.experimental.sessionMemory = true
  • memorySearch.sources includes sessions

Session transcript text can be indexed, but memory_get still does not directly expose raw transcript files.


8. Security Boundaries

  • memory_get enforces strict path and extension checks.
  • File readers ignore symlinks.
  • Session transcript storage is part of trust boundary and must be controlled by filesystem policy.

9. Potential Issues (Code Review)

  1. Keyword query parsing is weak for pure Chinese text

    • Current FTS token extraction is largely alphanumeric-focused.
  2. Search-triggered sync is fire-and-forget

    • Immediate query results may lag behind latest file updates.
  3. Char-based token approximation can drift from real tokenizer behavior

    • Can produce overly large or fragmented chunks in multilingual/code-heavy content.
  4. Session-memory indexing lacks fine-grained desensitization hooks

    • Sensitive data can enter index if transcript source is enabled.
  5. Broad directory-style extraPaths can inflate cost and noise

    • Large recursive markdown trees may increase embedding cost and reduce retrieval precision.

Table of Contents