Skip to main content

How It Works

The Experience Engine is a feedback loop that intercepts every mutating tool call an agent makes, checks it against a knowledge base of past mistakes, injects relevant warnings, and then evaluates whether the agent followed those warnings — all without any manual coordination.

Hook Lifecycle Overview

Agent writes code

├─ BEFORE each mutating tool call
│ ├─ Layer 1: Read-only skip (ls, cat, git log…) → bypassed instantly, $0
│ ├─ Layer 2: Semantic search → "Have I seen this mistake before?"
│ │ Ranks by: confidence · recency · hit frequency · domain match
│ │ Follows 1-hop graph edges to surface related experiences
│ └─ Layer 3: Brain relevance filter → LLM asks "is this warning relevant HERE?"
│ ~200 tokens in, 1 token out. Fail-open if brain is slow.
│ If relevant → injects: "⚠️ Last time this caused X [id:a1b2 col:behavioral]"

└─ AFTER each session
├─ Extracts lessons from mistakes (retry loops, corrections, test failures)
├─ Stores as Q&A in vector DB with domain/language/framework tags
├─ Judge worker: evaluates FOLLOWED / IGNORED / IRRELEVANT per hint
└─ Evolution engine:
3x confirmed → promote to Behavioral (T1)
Cluster of T1 entries → generalize to Principle (T0)
3x ignored or noise → demote + archive
Memory SHRINKS as capability GROWS

Anti-Noise: 3-Layer Filter

The system is designed to inject warnings only when they are high-signal and relevant. Three layers enforce this.

Layer 1 — Read-only Skip (regex, 0ms, $0)

Commands that never mutate code bypass the pipeline entirely. This includes ls, cat, git log, docker ps, and similar read-only operations. Chained commands skip only if all parts of the chain are read-only.

This layer costs nothing and ensures the engine does not slow down pure exploration work.

Layer 2 — Quality Scoring

Each candidate experience in the knowledge base is scored before it can surface as a warning. Scoring factors:

  • Hit frequency — how often this experience was triggered
  • Recency — recent confirmations outweigh old ones (temporal decay applied)
  • Confidence aging — entries with no recent signals decay toward neutral
  • Language/framework gate — a .ts file only matches TypeScript-tagged experiences; a .cs file matches C# / dotnet only
  • Domain match — infrastructure, auth, data access, etc. must align
  • Superseded penalty — entries that have been replaced by newer knowledge rank lower
  • Session dedup — the same warning is never shown twice in a single session
  • Budget cap — maximum 8 warnings per session to prevent noise accumulation
  • Noise suppression — entries tagged wrong_repo, wrong_language, wrong_task, or stale_rule three or more times are demoted automatically

Layer 3 — Brain Relevance Filter (LLM, ~1 output token, fail-open)

After Layer 2 produces a ranked shortlist, a lightweight LLM call decides which entries are actually relevant to the current action.

Input:  ACTION: Edit Startup.cs — services.AddSingleton<DbContext>()
1. Stateful objects must be scoped, never singleton
2. Always use IMLog, never ILogger
3. Never modify ePort consumer code

Output: 1 (only warning #1 is relevant to this specific action)

Cost: approximately 200 input tokens plus 1 output token. This is $0 with a local Ollama model and approximately $0.00004 with SiliconFlow. The call is fail-open — if the brain takes more than 3 seconds to respond, all shortlisted warnings pass through rather than blocking the agent.


Judge Worker — Closed Feedback Loop

After each tool call, a detached background process evaluates whether the agent followed the hint. This happens without any cooperation from the agent.

interceptor-post.js  →  judge-worker.js  →  brain LLM
→ FOLLOWED (positive signal)
→ IGNORED (negative signal)
→ IRRELEVANT (noise tag + reason)
→ UNCLEAR (abstain)

The judge reads the tool call input and output, compares it against the injected warnings, and emits a verdict. This verdict is recorded against each experience entry and drives the evolution cycle (see below).

Manual exp-feedback commands are still accepted and produce a stronger signal than the automatic judge, but the loop is closed either way.


4-Tier Knowledge Architecture

Experiences are stored in four tiers with different retrieval costs and promotion requirements.

TierNameSizeLoadingDescription
T0Principles~400 tokensAlways loadedGeneralized rules derived from clusters of T1 entries. Match novel cases.
T1Behavioral~600 tokensAlways loadedSpecific confirmed reflexes with high follow rate.
T2QA CachesemanticRetrieved on matchDetailed Q&A pairs, retrieved by vector similarity.
T3RawstagingTTL 30 daysUnprocessed lessons extracted from session transcripts.

Lifecycle

T3 (extracted from session) → T2 (stored with tags)
T2 (3x confirmed by judge) → T1 (promoted to behavioral)
T1 (cluster of similar) → T0 (LLM generalizes to principle)
T2 (3x ignored or noise) → demote → archive

Memory shrinks as capability grows: a single T0 principle replaces multiple specific T1 and T2 entries once the pattern is fully generalized.


Evolution Cycle

Promotion Path

  1. T3 to T2 — The session-end extractor (stop-extractor.js) identifies retry loops, corrections, and test failures in the session transcript and converts them into Q&A pairs tagged with domain, language, and framework metadata.

  2. T2 to T1 — After three confirmed follows (recorded by the judge worker or manual exp-feedback followed), the entry is promoted to behavioral. It is now always loaded for the relevant language/domain scope.

  3. T1 to T0 — When a cluster of similar T1 entries accumulates, the evolution engine asks the brain LLM to generalize them into a principle. The resulting T0 entry uses the format: "when X class of failure appears, do Y because Z."

Demotion Path

An entry that receives three ignored or noise verdicts is demoted and archived. Archived entries retain their history but are excluded from retrieval.

The exp-feedback noise command accepts a reason code that controls how narrowly the demotion is applied:

ReasonEffect
wrong_languageAdds caller language to scope.lang_exclude. Entry survives for other languages.
wrong_repoAdds caller project slug to scope.project_exclude. Entry survives elsewhere.
wrong_taskCounts toward supersede ratio only. Does not narrow scope. Last resort.
stale_ruleMarks entry as referencing obsolete API. Counts toward supersede.

Use wrong_language or wrong_repo whenever the mismatch is clearly scoped — these preserve the entry for other contexts. wrong_task pushes the entry toward full deletion.


Experience Graph

Experiences are linked with typed edges. When an entry is retrieved by vector search, the engine automatically follows 1-hop edges to surface related knowledge.

DbContext singleton ──generalizes──→ "Stateful objects: always scoped"
──relates-to───→ HttpClient singleton
──supersedes───→ [old] "Use transient for DbContext"

Edge types:

  • generalizes — this entry is a broader statement of the linked entry
  • relates-to — same failure class, different context
  • supersedes — this entry replaces the linked entry due to a framework or API change

Superseded entries are not deleted. They remain in the graph with reduced rank, and /api/timeline shows the full evolution of any given knowledge area.


Temporal Reasoning

The engine tracks when entries were confirmed and uses recency as a ranking factor. When two entries contradict each other, the more recently confirmed entry ranks higher.

Example:

Jan:  "Use singleton for HttpClient"  (confirmed 5x)
Mar: "Actually, use IHttpClientFactory" → contradicts Jan entry
→ Jan entry marked superseded, not deleted
→ New entry ranked higher (recent confirmation)
→ /api/timeline shows the full evolution

This means the engine handles library upgrades and breaking changes without losing the historical record of why a pattern existed.


Key Files

FilePurpose
.experience/interceptor.jsPreToolUse intercept logic — runs Layers 1, 2, 3 and injects warnings
.experience/interceptor-post.jsPostToolUse reconciliation — feeds tool output to judge worker
.experience/judge-worker.jsBackground feedback evaluation — emits FOLLOWED / IGNORED / IRRELEVANT
.experience/experience-core.jsShared processing logic used by interceptor and extractor
.experience/stop-extractor.jsSession-end lesson extraction — converts transcript to T3 entries