A wrapper around forge that makes forge better.
Real-time governance witness. Seven fat skills on the OpenClaw substrate. Auto-research on recursive improvement. ~350 LOC. Catches drift the moment it happens.
Five incidents. Same root cause. Six weeks.
The ProblemForge governance was bypassed in identical patterns over the past ~6 weeks — each time the structure of a check was verified while the authenticity of the underlying claim was not. You caught each one. The substrate caught none. The overseer is the peripheral nervous system that's been missing.
- Hypernym runtime drift — sprints claimed "wired" for 4 weeks; cache empty, zero compression events in 12k-line log
- PATCH_BROKER skipped AUDIT — panel APPROVE accepted as production-ready; 3 CRITICAL bypasses found on backfill
- Cross-session contamination — ambient state from another terminal flipped sprint headers mid-session
- Single-round convergence — Codex found 7 distinct bypass classes across 7 rounds; one round missed 6 of 7
- Synthetic attestations (2026-05-22) —
forge review --summaryaccepts any text; 4 SHAs shipped with fabricated panel provenance
- Heartbeat-triggered
scan-drifttailsevents.jsonlfilesystem-direct viafs.watch - WitnessRecord schema carries
attestation_classHOLLOW/RATIFIED_EMPTY/EVIDENCED; non-EVIDENCED zero-weight - Per-session
operator_idon every write; cross-session reads flagged automatically - Multi-round convergence default; single-round on
security_hardeningfires drift class 4 - Every reviewer name in attestation cross-references a real
dispatch.completedevent signed by overwatchd
Layered. Wrapping forge. Not replacing it.
ArchitectureThe trust boundary (overwatchd + immutable reward kernel) sits in a separate macOS user context. Seven skills run on OpenClaw's existing infrastructure. The data plane is filesystem-direct against forge's append-only audit log — never through OpenClaw's event bus (that would be circular monitoring).
Ctrl/Cmd + wheel to zoom. Drag to pan. Double-click to fit.
Seven fat skills do the entire job.
The Bundle
Hard mechanical invariants (T2 gate, budget guard, constraint manifold) are demoted to framework/hooks/ — they never live in the skill rewrite path. Skills propose; hooks enforce; the kernel scores.
| # | Skill | Trigger | Function | Detects | Write surface |
|---|---|---|---|---|---|
| 1 | scan-drift | Heartbeat 60s | Reads events.jsonl + WitnessBus filesystem-direct via fs.watch. Emits WitnessRecord with attestation_class + dispatch_evidence_uuid + per_reviewer_response_hash.Replaces OpenClaw event-bus subscription to avoid circular monitoring. | 1, 2, 5, 10, 11, 12, 13 | append-only witness-bus.jsonl |
| 2 | detect-runtime-drift | new WitnessRecord | Classifies runtime fabrication, provider quota fail-open, env-skip bypass, plan-vs-execution drift. Fires alerts to inbox. | 1, 10, 11, 12 | inbox + drift-catalog |
| 3 | detect-panel-drift | convergence_attested | Classifies skipped AUDIT, single-round on security_hardening, concurrence-mistaken-for-convergence, always-convergence violation, synthetic attestation. | 2, 4, 5, 7, 13 | inbox + drift-catalog |
| 2.5 | local-verifier | any drift event | NEW · added Round 5. 30s cheap second-opinion via ollama run deepseek-r1:32b OR qwen3-coder:30b. If verifier agrees → alert. If disagrees → low-confidence flag with both opinions.Catches false positives before they reach you. Costs nothing. | meta — confidence layer | flag in drift-catalog |
| 4 | decompose-drift | complex drift | Hermes Kanban Orchestrator pattern verbatim: 5-step decomposition + Step-0 profile discovery + don't-do-the-work-yourself rule.Copied directly from NousResearch/hermes-agent. | all (orchestration) | framework/plans/ staging |
| 5 | inject-hint | drift + operator_id | Cursor teacher-student pattern: writes one-paragraph <system-reminder> hint to .forge/overseer/active-hints/<operator_id>.md mode 0444. Operator's next tool call surfaces it via PreToolUse hook. | meta (loop-closer) | hint file |
| 6 | author-skill | weekly batch | OpenSpace FIX/DERIVED/CAPTURED taxonomy. Proposes new detection skills from recurring drift patterns. Output lands in .staging/ only — never bypasses human promote. | meta (skill bootstrap) | framework/skills/.staging/ |
| 7 | replay-consolidator | quiet heartbeat | Hippocampal sharp-wave-ripple pattern (150–220 Hz biological analog). Off-policy re-scoring of historical WitnessBus against current detection rules. Promotes stable patterns to policy table. | meta (rule effectiveness) | policy-table.jsonl |
tier2-gate.sh (Promise.race + 10s timeout + fail-open + audit event), paperclip-budget-guard.sh (30s TTL cache against port 3500), constraint-manifold.sh (no self-review, Grok mandatory, circuit-breaker veto). These are hard mechanical invariants — never in the auto-rewrite skill path.
Thirteen classes. Each mapped to a skill.
Drift CatalogFive from the base proposal; eight added from a memory-inventory scan of 49 feedback files spanning 6 weeks. Color-coded by category: runtime/data-plane (navy), panel/governance (rust), provider/transport (slate), meta-protocol (gold).
Hypernym empty-cache class
PATCH_BROKER-15 class
operator_id mismatch
security_hardening anomaly
2026-05-22 incident class
Claim ≠ current code
Reviewer = operator voice
Hallucinated file:line
No tiebreaker escalation
Gemini billing-cap class
FORGE_SKIP_* bypass
HC chunks 0.3/0.4a class
Split panel without escalation
What stops the watcher from needing a watcher.
Meta-Poisoning · ClosedRound 4 panel (Grok + Opus independent + Gemini) surfaced three distinct attack vectors against the overseer itself. Each is closed by a specific mechanism, not by convention. The kernel is binary — not a markdown file. The auto-author writes only to staging. The RL controller has no path to FSM YAML.
Kernel binary mutation
chflags noschg or modifies launchd plist install path./Library/Forge/Kernel/reward.so owned by separate macOS user _overwatchd, mode 0400. devagent has no read or write. chflags schg on top. Hash-pinned in kernel-pubkey.pem; verified at every read.RL → FSM write loop
sprint-v2.yaml" — controller disables the very gates that feed it ground-truth WitnessRecords.framework/skills/.staging/. Human promote is the only path to production. RL has no write to FSM YAMLs, hooks, dispatch, or its own code.Reward interpretation gaming
author-skill outputs land in .staging/ only. Human-promote gate requires cross-model panel attestation (Grok mandatory + 1 other, both via overwatchd-signed dispatch-evidence). First 30 days of any newly-promoted skill are shadow mode: alerts only, no reward impact.We use what we already run. We don't cite it.
Integration Wiring · Round 5
The Round 5 verification panel (Grok PROCEED + Gemini PROCEED + Codex REVISE-FIRST) caught that Section 10 above was pattern-citing — naming Swarma, MiroShark, @coinberg/autoresearch, Hermes, Cursor as inspiration without actually wiring them. Codex's REVISE-FIRST demanded runtime verification before any revise. All claims are now health-check verified below.
| System · port | Status | Evidence | Concrete role in v1 |
|---|---|---|---|
| OpenClaw :18789 | LIVE | {"ok":true,"status":"live"} | Substrate — hosts SOUL.md + 7 skills + Heartbeat + multi-channel inbox |
| MiroShark :5002 | LIVE | PID 952 · com.forge.miroshark | Adversarial scenario corpus — Sprint 2 acceptance fixture (overseer must catch rmt-compound-attack, rmt-eclipse, identity-sybil-farm) |
| @coinberg/autoresearch :3300 | DOWN | curl: (7) Couldn't connect | PREPARE → EXECUTE → EVALUATE → ACCUMULATE → PROMOTE FSM for candidate skills. Sprint 0.5 revives it. |
| Ollama local models :11434 | ALL LOADED | qwen3-coder · qwen3:32b-q8 · deepseek-r1 | Skill #2.5: 30s cheap first-pass verifier on every drift event + candidate skill before API spend |
Swarma research/harness/src/swarma.ts | CONFIRMED | executePanelV2Review, tryGetCached in source | Sandbox executor inside autoresearch EXECUTE stage — runs candidate skill against MiroShark corpus |
| Stable-Baselines3 · forge skill | OUT OF SCOPE | Available, not used | Removed — classical RL shape-mismatched for governance drift; autoresearch hill-climbing is correct shape |
MiroShark generates adversarial scenarios → scan-drift detects via fs.watch → detect-runtime-drift or detect-panel-drift classifies → local verifier (deepseek-r1:32b, ~30s) second-opinion → author-skill proposes new SKILL.md into .staging/ → @coinberg/autoresearch PREPARE packages it + scenario corpus → EXECUTE spawns Swarma sandbox sprint → EVALUATE scalar pass/fail (TP>0.85, FP<0.05, p99<5s) → local verifier final-pass on behavior → overwatchd-signed human-promote (Grok+1 with dispatch-receipts) → ACCUMULATE + PROMOTE → 30-day shadow mode → replay-consolidator re-scores history.
Four sprints. ~350 LOC. Overwatchd first.
Build Sequence
LOC budget collapsed from the original 970-LOC standalone-daemon estimate to ~350 LOC by adopting the OpenClaw substrate and wiring existing services. Each sprint has an explicit gate condition. Sprint 0 (overwatchd) is the non-negotiable precondition; Sprint 0.5 revives @coinberg/autoresearch on :3300.
Build sequence flows left to right. Each edge label is the gate condition.
Three frozen interfaces. v1 local. HC config swap.
HC PortabilityThe overseer ships single-operator on macOS in v1, but data model and interfaces are built so cloud + multi-tenant + subforges is a configuration change, not a rewrite. Multi-user features (per-operator Ed25519 keypairs, role-based escalation, Kanban orchestration) defer to a real HC sprint.
| Interface | Signature | v1 implementation | HC cloud implementation |
|---|---|---|---|
WitnessRecordWriter | store(record)tail(filter) | local file ~/.forge/overseer/witness-bus.jsonl; operator_id + namespace from day 1 | Postgres (records) + Kafka (tail stream); per-tenant namespace partitioning |
OverwatchVerifier | verify(receipt) returns valid + signer_pubkey | local Ed25519 with key file 0400 _overwatchd; pubkey at .forge/overwatch/pubkey.pem | KMS-backed signing (AWS KMS / GCP KMS / hardware HSM); pubkey distributed via service mesh |
HintSink | write(operator_id, hint) | file write to .forge/overseer/active-hints/<operator_id>.md mode 0444 | per-tenant Server-Sent Events channel keyed by namespace; CloudEvents protocol |
Eight open patterns. One coherent system.
SOTA CompositionThe overseer composes the strongest open-source patterns of 2026 into one substrate. Each pattern composes with the immutable kernel because the kernel is the only thing the auto-rewrite surface can't reach — every other input is mediated.
| Source | What we pull | Composes with kernel because |
|---|---|---|
| OpenClaw160K stars · :18789 | SOUL.md identity · Heartbeat (cron) · multi-channel inbox · ClawHub registry | Substrate is outside kernel surface; kernel reads WitnessBus only |
| NousResearch Hermes Agent32K stars · v0.13 | Kanban orchestrator "don't-do-work-yourself" + 5-step decomposition (verbatim in skill #4) | Skills propose; kernel scores; no kernel mutation |
| Cursor Composer 2.5real-time RL · ~5h checkpoints | Inline teacher-student hints via PreToolUse hook (skill #5) · production-trace reward pattern | Hints are inputs to operator, not inputs to kernel |
| EvoAgentX surveyarXiv 2508.07407 | Scope locked to Environment-Centric axis v1–v5 · explicit defer of Model-Centric and Co-Evolution | Kernel + reward formula are not "environment" — they're constants |
| Anthropic sleep · hippocampal SWRbiological replay pattern | Skill #7 replay-consolidator: 20× compressed off-policy update during quiet heartbeat | Off-policy re-scoring against frozen kernel function |
| Thin Harness / Fat Skills@garrytan framing | ~130 LOC harness + 7 fat skills · branching params over many small skills | Harness has no reward logic; cannot poison itself |
| OpenSpace (HKUDS)self-evolving skill engine | FIX / DERIVED / CAPTURED skill-evolution taxonomy in skill #6 | Applies to skill bundle only, never kernel |
| Karpathy autoresearch42K stars · single-GPU loop | Periodic-loop + keep/discard on observable metric pattern. IS our RL. | Maps onto Heartbeat-triggered scan; staging-gate replaces keep/discard |
Real dispatches. Audit trail. User tiebreaker.
Panel Attestation
Per the 2026-05-22 always-convergence + synthetic-review P0 rules: every reviewer below was a real dispatch with audit-trail evidence. No forge review records were written — that surface is avoided until overwatchd lands. Audit trail lives in /tmp/FINAL-*.out + /tmp/verify-*.out files and agent IDs.
| Reviewer | Mechanism · evidence | Verdict | Critical finding |
|---|---|---|---|
| Grok 4.3 | bash via scripts/grok-dispatch.sh | BLOCKER → satisfied | Kernel-immutability isolation mechanism unspecified; satisfied by separate-user launchd + hash-pinning |
| Opus 4.7 (independent) | Agent abc5fea7b2cf5e2aa · model=opus | BLOCKER → satisfied | (a) Single-user macOS immutability theater; (b) Kill v5 auto-deploy → v5' staging-only |
| deepseek-r1:32b | Ollama local | APPROVE (weak) | Internally contradictory: proposed a RewardKernelUpdaterSkill violating immutability — discounted |
| qwen3-coder:30b | Ollama local | APPROVE_FINAL | ~250 LOC estimate; notes skill-leak risk but treats as managed |
| Gemini 2.5 Pro | gemini CLI · Gin4 key | APPROVE_FINAL | Reward Interpretation Poisoning — auto-authored skills can game kernel's interpretation |
| Codex GPT-5.5 | Agent ac36853bd79d09c6a (R5 verification) | REVISE-FIRST → satisfied | Runtime claims needed health-check verification before commit — satisfied in §06 above |
| Convergence: split_escalated · tiebreaker_by: user · resolution: APPROVE_FINAL with all BLOCKER fixes baked in | |||
Constraints that ride alongside the build.
Implementation NotesUntil overwatchd ships, forge review --summary remains a forgeable surface. No forge review records during Sprint 0 — the audit trail for the overseer panel lives in /tmp/FINAL-*.out + /tmp/verify-*.out + agent IDs + this proposal document.
Sprint 0 (Tier 1 overwatchd) is a separate sprint with its own SEED → PLAN → DESIGN → BUILD ceremony and its own real cross-model panel (5-of-5 for security_hardening sprint type). The overseer v1 sprints (1–4) cannot fire until overwatchd is signing dispatch receipts and the four frozen SHAs (a59a3a8, 1110d45, 62b5def, 612898a) have been retroactively ratified or surgically reverted.
Health-check confirmed the service is currently DOWN. ~30 LOC: diagnose existing crash logs, add com.forge.autoresearch.plist LaunchAgent, wire into scripts/launch/forge-services-up.sh canonical labels, validate the 5-stage FSM round-trips a no-op skill end-to-end before Sprint 1 begins.
From R4 panel: 30s budget-cache TTL, 10s T2 gate timeout, 30-day shadow window for newly promoted skills. Config-driven from day 1 via ~/.forge/overseer/config.yaml so operations tuning doesn't require code changes. HC v2 may need different values per tenant.
Classical RL libraries (Stable-Baselines3 PPO/DQN/SAC, HuggingFace TRL, OpenTinker) — shape-mismatched for governance drift; autoresearch hill-climbing is correct shape. Model-Centric Self-Evolution (no custom training infra). Per-operator Ed25519 keypairs (single-operator v1 uses session-UUID + process-UID). Container deployment, per-tenant policy table fork, Kanban orchestration as standalone skill — all deferred to HC v2+ sprints.
"What stops the watcher from needing a watcher is the immutable kernel. What stops the kernel from rotting is the human-promote gate. What stops the gate from being bypassed is overwatchd. What stops overwatchd from drifting is the user."