Omega — Autonomous Engineering Operations
A whitepaper on multi-agent orchestration with verifiable autonomy
Version 2.5.1 · Patterns Edition · 2026-05-22
Executive summary
Omega is a multi-agent operating system for software engineering work. It turns a single human intent — "fix this bug", "ship this feature", "audit this codebase" — into a chain of planned, executed, audited, and deployed work, without continuous human supervision.
The system is organized as four orchestration levels: the human operator, a routing bot, project oracles, and short-lived worker sessions. Each level has one job and one exit condition. Completion is signaled by an atomic file (.done.json) and acknowledged by three independent layers (worker, oracle, supervisor) before a session is closed.
What makes Omega different from other agent frameworks is its operational discipline:
- Three Laws that override every prompt: runtime truth over code intent, researcher posture over sycophancy, autonomous decision over idle waiting.
- A 12-step ship pipeline with deploy verification, freeze-don't-rollback default, and per-project locks.
- A 17-audit Quality Arsenal covering code, runtime, design, performance, security, accessibility, SEO, data, API, copy, DX, motion, automation, logic, and product retention. Each audit uses Gestalt clarity gating + Popper falsification + hinge-point 10× scrutiny.
- A supervision mesh of cron-driven patrols, event-driven reactors, and daemons that detect categorized failure modes and recover stalled sessions.
- A Skill Orchestration Layer (new in v2.2). Every junction in the 4-level chain is now backed by an invocable, versioned skill instead of an ad-hoc f-string or regex. Eleven skills replace the prose contracts that previously lived inside Python handlers and bash heredocs.
Version 2.2 is the Skill-Wired Edition. It documents the eleven skills shipped on 2026-05-16, the seven weakness fixes that landed alongside them, and the new asymmetric supervision mesh that replaced the legacy KAIROS nudger.
The honest gaps remain: Omega's production telemetry is young (the live system has been running for weeks, not years), and the published metrics are bounded by that fact.
1 · The problem — Why autonomous agents fail
The promise of autonomous coding agents — "describe what you want, get working software back" — has been pitched many times. In practice, four failure modes recur:
Loss of context. An agent solves the first sub-task, then forgets why it was solving it. Single-context-window approaches collapse when the task exceeds the window or branches into parallel work.
Sycophancy. Most LLMs are RLHF-tuned to agree. When a user proposes a flawed approach, the agent codes it instead of challenging it. The result is fast garbage.
Silent failure. The agent reports success, the operator believes it, and only later discovers the function never compiled, the test was disabled, or the deploy was skipped. There is no independent verifier.
Stalls without escalation. The agent encounters ambiguity, asks the user a question, and waits indefinitely. If the user is not watching the tmux session, the system hangs forever.
A fifth failure mode is endemic to multi-agent systems specifically:
Drift between prose contracts. When the contract between two agents lives in an f-string or a regex inside a handler, every layer eventually paraphrases it differently. The router sends a slightly different brief than the dispatcher writes, the oracle interprets a slightly different intent than the worker executes, and a stable system silently becomes an unstable one over weeks. The fix in v2.2 is to convert every contract into a versioned skill.
Omega is built around these failure modes. Each is named, attacked, and verifiable.
Problem Omega's response
───────────────────────── ─────────────────────────────────
Loss of context 4-level chain; workers are short-lived;
oracle context survives across workers;
cross-session memory (W5) recalls lessons
Sycophancy Second Law — challenge the premise
before coding, with evidence
Silent failure 3-tier close-gate (worker .done.json,
oracle ack, supervisor close decision);
Layer 4 Mission Auditor adds an
independent audit gate before ack
Idle stalls Third Law — never wait, always decide;
legal stops are .done.json or blocked.json
with fallback action already executed;
/resurrect cascade recovers any stall
Contract drift Skill Orchestration Layer — 11 skills
replace 21+ f-strings / regex hits;
every junction is now an invocable
protocol with explicit toggles
2 · Omega's answer — A 4-level architecture
Every Omega operation flows through four levels. Each has one job, one input contract, one output contract.
┌─────────────────────────────────────────────┐
│ LEVEL 0 — Human operator │
│ Sends an intent (one Telegram message) │
└────────────────────┬────────────────────────┘
│
▼
┌─────────────────────────────────────────────┐
│ LEVEL 1 — Routing bot │
│ Classifies (Simple / Medium / Complex / │
│ Epic), resolves the project, builds a │
│ brief, dispatches an oracle │
└────────────────────┬────────────────────────┘
│
▼
┌─────────────────────────────────────────────┐
│ LEVEL 2 — Project oracle │
│ Plans, dispatches workers, verifies done, │
│ optionally ships, signals supervisor │
└────────────────────┬────────────────────────┘
│
▼
┌─────────────────────────────────────────────┐
│ LEVEL 3 — Workers │
│ Read PLAN, execute steps, verify, write │
│ .done.json, self-kill │
└─────────────────────────────────────────────┘
Why four levels and not three or five
Level 0 ↔ 1 separation. A noisy human channel (natural language Telegram) is converted into a structured contract (project, scope, brief, ship flag). The bot does the messy text-to-intent work so the oracle never has to.
Level 1 ↔ 2 separation. The bot does not need to know project internals. The oracle owns project context (CLAUDE.md, codebase layout, file ownership rules). The bot just routes.
Level 2 ↔ 3 separation. Each worker has its own context window and dies after one mission. The oracle's context survives across many workers, accumulating decisions and audit findings without ever overflowing.
Three levels would force the oracle to do per-task execution, blowing its context. Five levels would add ceremony without separation of concerns.
Multi-oracle parallelism
A single project can have multiple oracles running concurrently. The oracle assignment is atomic (file lock per project). Each oracle declares the files it owns; the assigner refuses overlapping ownership. Idle oracles are reused before spawning new ones.
Project X
│
├── oracle-X owns app/**, components/**
├── oracle-X-2 owns api/**, db/**
└── oracle-X-3 owns docs/**, tests/**
(assigned only if file sets disjoint)
This pattern handles the case where a single human intent ("ship a feature plus update the docs plus add tests") naturally splits across non-overlapping areas of the codebase.
3 · Core guarantees
Four guarantees define Omega's contract with the operator. Each is enforced mechanically, not by goodwill.
Guarantee 1 — Autonomy
Once dispatched, a worker never asks the operator a question. The legal exits are:
.done.jsonwritten, statusdone_clean— work verified complete..done.jsonwritten, statuspending— partial, withpending_actions[]listing what remains..done.jsonwritten, statusfailed— genuinely blocked, with evidence.worker-blocked-<session>.jsonwritten + fallback action executed — truly ambiguous, but the worker proceeded with its best guess while signaling the supervisor.
The AskUserQuestion tool is forbidden in dispatched sessions. Workers that pause at a question mark are by definition broken.
Guarantee 2 — Verification
Workers do not self-certify. Three layers acknowledge completion:
Worker writes .done.json ─── Tier 1: "I think I finished"
│
▼
Oracle reads, runs VERIFY ─── Tier 2: "Confirmed, work meets spec"
COMMAND, calls
close-gate ack-worker
│
▼
Mission Auditor intercepts ─── Tier 2.5: "Forensic audit ≥85/100"
(1-3 skills, ≥85/100) (Layer 4 of the Safety Mesh)
│
▼
Supervisor reads ledger, ─── Tier 3: "Safe to close, operator informed"
decides close window,
notifies the operator
Each tier is independent. A failure at any tier keeps the session alive and surfaces the discrepancy. The Mission Auditor was introduced as Layer 4 of the Safety Mesh (see §5); it dispatches one to three Quality Arsenal audits selected by mission type.
Guarantee 3 — Isolation
Workers cannot harm each other:
- Each worker has its own context window (no shared memory between workers).
- Each worker has its own state directory (
worker-<session>.*files, namespaced). - Atomic writes everywhere (
tmp + mv -f) prevent half-written state files. - The brief-replay file (Layer 1) is written atomically before the tmux paste (W6, 2026-05-16) so a mid-paste crash still leaves a valid brief on disk for replay.
- Optional git worktrees per oracle for cross-cutting changes that would conflict otherwise.
The worktree subsystem is chaos-tested: 40 of 40 cases pass, including process kills mid-operation, disk-full simulation, and concurrent worktree creation on the same project.
Guarantee 4 — Close-gate
The supervisor never auto-closes a session if:
- Status is not
done_clean. - Ship result is
failedorfrozen. pending_actions[]is non-empty.- The operator has interacted with the bot during the grace window.
- A new oracle for the same project was dispatched during the grace window.
Auto-close happens only when all conditions point to "the work is genuinely finished, the operator has been notified, and the resources can be freed".
4 · Operational flow
This section walks one complete intent from operator to ship.
Step 1 — Intent
The operator sends a message to the routing bot. The message is in natural language, English or French, optionally with attachments (screenshots, Linear links, audit keywords).
Step 2 — Classification and routing
The bot classifies the intent via the /classify-intent skill (since v2.2). The classifier is hybrid: a regex pass resolves the ~80% of obvious cases at zero token cost; ambiguous cases escalate to a Haiku micro-call returning one of eight canonical intents (bug-fix | feature | audit | ship | refactor | docs | question | other) plus confidence and a routing hint. The skill replaces a regex-only classifier that systematically misrouted vague messages like "Causio fait ce que tu sais".
Simple ─ one read-only check ─ done in-band
Medium ─ one specialist, single area ─ spawn 1 worker
Complex ─ multiple specialists, multi-domain ─ /team in tmux
Epic ─ cross-department, hours+ ─ /aisb full chain
It also detects forensic-audit keywords (code, flow, UX, perf, sec, ...) and routes them to the right audit skill. Audit keywords are never paraphrased into freeform prose — the literal skill command is invoked.
Step 3 — Brief construction
The bot builds a brief for the oracle. The brief is now produced by the /dispatch-oracle skill (the legacy f-string remains as fallback when the skill probe times out). The brief includes:
{
"project": "Project name",
"mission": "One-line summary",
"ship": true | false,
"files_owned": ["glob patterns the oracle may touch"],
"deploy_timeout_min": 10,
"lifecycle": "persistent | ephemeral"
}
ship is set true only when the operator explicitly asks (keywords: ship, deploy, push, merge, livre, "envoie en prod"). Audits and research never ship.
Step 4 — Oracle planning
The oracle reads the brief and project CLAUDE.md, classifies the work, and writes its plan to .orchestrator/decisions.md (one line per decision: task, classification, choice, rationale). It then designs the worker dispatches, optionally invoking the /plan-decompose skill on complex multi-step missions.
Crucially, the oracle never writes project code directly. Even a one-line typo fix goes through a worker session.
Step 5 — Worker dispatch with the PLAN protocol
Each worker is dispatched via the /dispatch-worker skill (canonical contract; the legacy bespoke prompt assembly remains as fallback). The worker prompt is:
== MISSION ==
<one-line mission>
== PLAN ==
1. <step 1, concrete, verifiable>
2. <step 2>
3. <step 3>
...
== FILES IN SCOPE ==
- <glob or path list>
== DONE CRITERIA ==
- <criterion 1, observable in <60s>
- <criterion 2>
== VERIFY COMMAND ==
<single shell command that returns 0 when done>
== HANDOFF ==
When PLAN complete AND VERIFY COMMAND passes, call:
bash <path>/worker-mark-done.sh done_clean '<summary>'
== PRE-BOOT KNOWLEDGE PACK ==
Project context, language defaults, audit triggers,
and (W5) the five most-recent lessons/mistakes
from the cross-session memory store, scoped to this
project.
The worker boots with /worker-protocol as its self-contract (Wave-3 canonicalization). It reads the PLAN, materializes it as a TodoWrite list (each step becomes a todo item), and executes step-by-step.
Why PLAN and not the native /goal primitive
Claude Code v2.1.141 ships a native /goal <condition> primitive — the engine auto-loops until the condition is met. We integrated this in two phases:
- Phase 1: opt-in via
GOAL_NATIVE=truefor solo workers with short deterministic conditions. - Phase 2: default-on for all solo workers.
Phase 2 was reverted within a day. /goal has a hard 4000-character limit. Real worker prompts (mission + pre-boot knowledge pack + DONE + VERIFY + autonomy banner) routinely exceed 5000 characters. Default-on injection caused truncation. The PLAN protocol replaces it: no length limit, every step is visible in TodoWrite, the worker is a transparent state machine.
/goal remains available as Phase 1 opt-in for short deterministic conditions (e.g. npx vitest passes).
Step 6 — Audit (forensic)
If the mission is a forensic audit, the worker runs the matching protocol (e.g. /codeaudit, /uiuxaudit, /secaudit). Each audit has 16–23 phases, a domain-specific raw-score maximum (280–420), and normalizes to /100 for comparison. All audits share:
- Gestalt clarity gate. First pass: is the artifact comprehensible at all? If not, the audit stops and reports the clarity failure first. There is no point measuring detail on something incoherent.
- Popper falsification. Every claim is paired with a falsification check. "This component is accessible" requires "What would prove it isn't?" — and that check is executed.
- Hinge-point 10× scrutiny. The audit identifies the one or two phases that, if wrong, invalidate everything downstream. Those phases get 10× the rigor of others.
Step 7 — Ship (optional)
If brief.ship is true, the oracle runs the 12-step ship pipeline:
1. Build (npm run build or project-specific, via safe-npm-build.sh mutex)
2. Stage (whitelist files; refuse extras)
3. Secret scan staged (gitleaks)
4. Whitespace check (git diff --cached --check)
5. Commit (conventional message)
6. Acquire flock per-project (serializes oracles)
7. Check freeze flag (if frozen, abort + alert)
8. Pull --rebase (auto-abort on conflict, keep local commit)
9. Push (retry once after re-rebase)
10. Deploy (whitelisted command; default Vercel + token)
11. Poll deploy status (max deploy_timeout_min, default 10 min)
12. Write .done.json with commit, push URL, deploy URL, duration
On deploy failure, the default behavior is freeze, don't rollback. A ship-<project>.frozen flag is set; subsequent oracles cannot push until the operator decides to revert or fix-forward. Auto-rollback is opt-in per project — auto-rollback can hide root causes (missing env var, provider outage, etc.).
Step 8 — Worker handoff
The worker calls worker-mark-done.sh <status> '<one-line summary>'. This atomically writes worker-<session>.done.json (tmp + mv). The script has a guard: it refuses to run from an oracle session (rc=3 + redirect message). This prevents the common bug where an oracle accidentally marks itself done as if it were a worker.
The worker's tmux session schedules a self-kill 5 seconds after the handoff — freeing the slot for the next dispatch.
Step 9 — Mission Auditor (Layer 4 — END)
Before the oracle ack, close-gate.sh ack-worker invokes mission-auditor.sh. The auditor classifies the mission with a hybrid heuristic + /classify-intent skill probe (W7), selects 1–3 Quality Arsenal audits via a rules table, runs them under a global VPS-wide flock (one audit at a time), and computes a minimum-score verdict. ≥ 85/100 → APPROVED, < 85 → REJECTED (worker is nudged with top findings and retried up to twice). Bypass for emergencies is opt-in via CLOSEGATE_SKIP_AUDIT=1.
The auditor also writes back to the cross-session memory store (W5): APPROVED verdicts become lesson rows; REJECTED-at-iter≥2 become mistake rows. The next worker on the same project boots with these in its pre-boot knowledge pack.
Step 10 — Oracle ack
The oracle reads the worker's done.json, executes the VERIFY COMMAND, and calls close-gate.sh ack-worker <worker-session>. Without this ack, the supervisor treats the worker as un-acknowledged and nudges the oracle. Before reporting, the oracle invokes /synthesize-report (template + Haiku digest) and /format-telegram-report (humanized payload) so the operator receives a story, not raw JSON.
Step 11 — Supervisor close decision
The supervisor (cron-driven, every minute) reads all oracle done.json files and applies the close decision tree:
done_clean + ship.result in {ok, skipped} → notify + close after grace
done_clean + ship.result in {failed, frozen} → notify + keep alive
pending → notify + inline "continue" button
failed → send logs + keep alive
The grace window resets if the operator interacts with the bot or a new oracle is dispatched on the same project.
5 · Reliability model
Reliability in Omega is enforced through a Safety Mesh of four independent layers, each owning a distinct slice of the mission lifecycle (START → DURING → END). A failure in any single layer does not compromise the others, and each layer can be reasoned about, tested, and disabled in isolation.
┌──────────────────────────────────────────────────────────────────┐
│ Layer 1 — BRIEF-REPLAY (dispatch persistence, START) │
│ Layer 2 — CPU GUARD (load admission control, START) │
│ Layer 3 — SHADOW MANAGER (live signal monitor, DURING) │
│ Layer 4 — MISSION AUDITOR (quality gate at handoff, END) │
└──────────────────────────────────────────────────────────────────┘
Layer 1 — Brief-Replay. Every dispatch persists its prompt to a per-session file before the worker is given control. The persisted brief lets the system replay the original instructions verbatim into a worker that has been hit by a transient rate-limit or API error and would otherwise lose its context. W6 (2026-05-16) moved the write to immediately before the tmux paste, using an atomic tmpfile + mv pattern — even a mid-paste crash now leaves a valid brief on disk.
Layer 2 — CPU Guard. A two-core host is structurally protected against concurrent heavy builds and dispatch storms by three sub-defenses: (A) a global flock mutex around the build command so two builds cannot race on the same artifact directory; (B) a CPU-aware dispatch throttle that diverts new dispatches into a queue file when the one-minute load average exceeds 2.5× cores, plus a queue flusher (cron */2) that re-dispatches when load drops below 2× cores and ages out entries past 4 hours; (C) a dedicated CPU_OVERLOAD shadow signal that suppresses nudges for five minutes, kills duplicate build processes per working directory, and escalates to the operator. The CPU Guard prevented multi-build saturation in a real incident the same week this whitepaper was revised — see §7 Evidence.
Layer 3 — Shadow Manager. Every three minutes a heuristic observer evaluates fourteen signals across all running workers and oracles: thrash, error burst, silent drift, scope creep, build regression, progress stagnation, pane-stuck pattern, worker health, todo stall, rate-limit stall, transient API error, prompt-idle, OOM hints, and CPU overload. Detection is Tier 1 (zero token, pure heuristic). An opt-in Tier 2 uses a small model to disambiguate ambiguous Tier-1 hits. Tier 3 is Telegram escalation when retries are exhausted.
Layer 3 has a crucial asymmetry between workers and oracles. A prescriptive nudge that helps a worker recover ("you are looping, change approach, mark done") destroys an oracle that is legitimately managing multiple concurrent missions. Workers therefore receive direct nudges; oracles default to observe-only (JSONL log + throttled FYI). THRASH is disabled for oracles entirely (oracles iterate by design). The stagnation floor is raised to six hours plus an idle-confirmation gate. True emergencies use a brief-aware question-mode ("are you still on this?"), never the imperative. A global kill-switch file freezes all nudges as a panic-stop. This asymmetry replaced an earlier symmetric implementation that destroyed an in-progress UI mission and is described as a case study in §7.
Layer 4 — Mission Auditor. Between a worker's done_clean and the oracle's acknowledgment, an independent gate intercepts the close handshake. It classifies the mission heuristically and now (W7) escalates ambiguous cases to the /classify-intent skill via a Haiku micro-call. It then selects one to three Quality Arsenal audits via a rules table (bug-fix → debug + code, ui → uiux + a11y + motion, api → api + sec, etc.) and runs them under a global VPS-wide lock so only one audit consumes resources at a time. Verdict is the minimum score across audits, with a default threshold of 85/100. Rejected verdicts nudge the worker with the top findings and retry up to twice before escalating to the operator. Bypass is opt-in via an environment variable for emergencies and for the audits-themselves.
These four layers complement the existing supervisory loops described below.
Smart Resurrect — the stall-recovery cascade
Worker stall recovery was redesigned in W10 (2026-05-16) into a four-tier cascade orchestrated by omega-resurrect.sh and the /resurrect skill:
┌─────────────────────────────────────────────────────────────────────┐
│ Worker stall signal → omega-resurrect.sh │
│ │
│ Tier 1 (0 token) : todos + events + brief → context-aware nudge │
│ Tier 2 (0 token) : pane regex → error-type-specific recovery │
│ (rate_limit | api_error | type_error | │
│ build_fail | oom | cmd_missing) │
│ Tier 3 (opt-in) : claude -p `/resurrect` skill (Haiku, free Max) │
│ OMEGA_SMART_RESURRECT=skill required │
│ Tier 4 (escalate) : Telegram via notify-bot.sh after 3 attempts │
└─────────────────────────────────────────────────────────────────────┘
95% of stalls are handled at Tier 1+2 (zero token). LLM consultation is opt-in at Tier 3. Asymmetry is preserved: workers get direct nudges; oracles use only the softer FYI variant of Tier 1. Every nudge ends with a language-detected escape clause (Si tu es déjà sur cette action, ignore ce message. / If you are already on this, ignore this message.).
Tracking Reactor — event-driven supervision
oracle-shadow.sh runs on cron every three minutes; average reaction latency is ~90 seconds. W8 (2026-05-16) added tracking-reactor.sh, an inotify-based reactor watching ~/.omega/state/tracking/*.events.jsonl for sub-second reaction:
┌─────────────────────────────────────────────────────────────────────┐
│ Worker writes event → JSONL close_write → inotify wakes reactor │
│ │ │
│ ▼ │
│ Per-session 2s coalesce + 1s global debounce │
│ │ │
│ ▼ │
│ tmux capture-pane probe vs STUCK_REGEX (Awaiting/STOPPING/…) │
│ │ │
│ ▼ │
│ Match → omega-resurrect.sh <session> (Tier 1→4 cascade) │
│ │
│ + 60s background ticker: scan tracking mtimes; >10min idle AND │
│ session alive → omega-resurrect.sh (idle fallback path) │
└─────────────────────────────────────────────────────────────────────┘
Measured latency (smoke 2026-05-16): 129 ms event → trigger (target <1000 ms). The reactor runs under a systemd --user unit (tracking-reactor.service), is a flock-protected singleton, and shares the 1-attempt/600s throttle ledger with the cron observer so they cannot double-fire. The cron observer remains as the cold-path safety net for sessions the reactor missed.
KAIROS retirement (W9, 2026-05-16)
The legacy kairos.py::_nudge_oracle function previously nudged idle oracles after fifteen minutes based on a simple tmux session_activity timestamp. This broke oracles working on complex UX/design tasks: they appeared "idle" (sitting at the prompt) between deliberate tool calls but were actively planning. W9 disables this nudger entirely with an unconditional return False. The function body remains intact for defense-in-depth (in case any caller still routes through it), but it is now a no-op. The bot restarted cleanly on the change. The replacement is /resurrect + tracking-reactor.sh: brief-aware, worker-scoped, with the asymmetry contract enforced.
Cross-session memory (W5, 2026-05-16)
Workers learn from past missions instead of repeating mistakes. Two-sided wire:
- Recall (read side at dispatch time).
knowledge-pack-builder.shemits aPROJECT MEMORYsection in the pre-boot knowledge pack whenomega-memory list --project=$PROJECT --limit=5returns rows. Every dispatched worker boots with the five most-recent lessons/mistakes for its project. Soft-fails when the DB is empty or absent — no impact on legacy flows. - Write (audit-time hook).
mission-auditor.shwrites after the verdict is computed:APPROVED→kind=lesson, body=[mission_type · audits · score=N] <worker_summary>(capped 500 chars);REJECTED && iter ≥ 2→kind=mistakewith the per-auditname:score/100findings so the next retry does not repeat the failing pattern.
Storage is SQLite FTS5 at ~/.omega/state/memory.db. Inspect via omega-memory list|search|stats.
Supervisor + daemon mesh
The supervisor is one of two cron loops. There are also four long-lived daemons. Together they form a recovery mesh.
╔══════════════════════════════════════════════════════════════╗
║ Cron */1 min : supervisor (close decisions, alerts, reaper) ║
║ Cron */2 min : event-driven oracle wake on worker done.json ║
║ Cron */3 min : observer (6 categorized failure modes M1-M6) ║
║ Systemd user : tracking-reactor.service (inotify, ~129ms) ║
║ ║
║ Daemon : oracle process death detector ║
║ Daemon : abandoned-oracle reaper (TTL-bound) ║
║ Daemon : worker idle supervisor (no-tool-call timeout)║
╚══════════════════════════════════════════════════════════════╝
The six observer failure modes:
| Code | Symptom | Recovery action |
|---|---|---|
| M1 | Worker .done.json un-acked, siblings still alive | Nudge oracle via tmux send-keys |
| M2 | All workers done, oracle idle > 5 min | Send report or close oracle |
| M3 | Worker failed, oracle has not surfaced an alert | Alert via bot directly |
| M4 | worker-blocked-<session>.json exists | Surface question to operator |
| M5 | Worker has not emitted a tool event for X minutes | /resurrect cascade (was: /team retry) |
| M6 | Oracle TodoWrite has not changed for N observer ticks | FYI digest (asymmetric, never imperative) |
Nudges are throttled (one per 5 min per oracle) to avoid spam.
The incident that triggered the mesh (2026-04-15)
A Linear-resolution worker correctly identified that 25 of 36 tickets were already fixed and in "In Review" state. Instead of deciding the best path and executing, it posted "Three paths — which path?" and waited idle for 10+ minutes. The operator found it by accident.
Root cause: the prior Second Law ("challenge the premise") was being interpreted as "ask before coding". It needed to be "challenge, decide, proceed". The fix became the Third Law: in dispatched sessions, AskUserQuestion is forbidden, idle prompts are forbidden, the only legal stops are .done.json or worker-blocked-<session>.json with the fallback action already executed.
This single incident drove the entire mesh of observer + wake-on-done + the Third Law specification. A wrong decision that produces evidence is 100× more valuable than a correct pause that produces nothing.
6 · Skill-Wired Orchestration (since v2.2)
Until v2.1, the contracts between Omega layers lived in scattered places: an f-string inside _build_oracle_dispatch_prompt, a regex in handlers.py, a heredoc in worker.md, a rules table in mission-auditor.sh. Each was authoritative somewhere but invocable nowhere. Drift was inevitable.
v2.2 introduces a Skill Orchestration Layer: eleven invocable, versioned skills that replace the prose contracts at every junction in the chain.
The 11 skills
| # | Skill | Junction it owns | Replaces |
|---|---|---|---|
| 1 | /classify-intent | Inbound Telegram message classification (and ambiguous mission-type in the auditor) | regex-only handlers.py classifier; misrouted vague messages like "Causio fait ce que tu sais" |
| 2 | /dispatch-oracle | AISB → Oracle dispatch prompt | f-string body inside _build_oracle_dispatch_prompt |
| 3 | /dispatch-worker | Oracle → Worker dispatch prompt | per-oracle bespoke worker prompt assembly |
| 4 | /worker-protocol | Worker self-contract on boot | embedded heredoc in worker.md |
| 5 | /omega-protocol | Oracle process-level contract | scattered rule files |
| 6 | /resurrect | Worker stall recovery (Tier 3 of the cascade) | obsolete kairos.py::_nudge_oracle |
| 7 | /synthesize-report | Done-handoff digest for Telegram | oracles reading raw done.json |
| 8 | /format-telegram-report | Telegram payload humanization | template-only route_notify |
| 9 | /audit-mission | Close-gate forensic verification (Layer 4) | static rules table only |
| 10 | /plan-decompose | Oracle plan decomposition for complex missions | manual decomposition |
| 11 | /diagnose | On-demand diagnostic snapshot of any oracle / worker | manual pane reads + jq |
The skill-wired chain (visual)
GARETH ──Telegram──▶ AISB ──tmux──▶ ORACLE ──tmux──▶ WORKER ──/team──▶ AGENTS
intent: /classify-intent │ │ │
dispatch: /dispatch-oracle │ │ │
/dispatch-worker │ │
/omega-protocol │ │
/plan-decompose │ │
/worker-protocol │
/resurrect (stall) │
/diagnose (snapshot)│
/audit-mission │
(close-gate) │
GARETH ◀──Telegram── AISB ◀──tmux── ORACLE ◀──tmux── WORKER ◀──────/synthesize-report
/format-telegram-report
Wiring matrix
The skills are not just defined — they are wired. The following table shows the number of verified call sites per layer (as of 2026-05-16 grep):
| Layer | File | Skill hits |
|---|---|---|
| AISB handlers / prompts | bot/aisb/handlers.py, bot/aisb/prompts.py | 21 |
| Patrol (Telegram report) | bot/aisb/patrol.sh | 4 |
| Mission Auditor | ~/.aisb/lib/mission-auditor.sh | 4 |
| Oracle dispatch (memory pack) | ~/.aisb/lib/dispatch-to-session.sh | omega-memory (W5) |
Failure mode and toggles
Every skill probe is best-effort. A claude -p timeout, parse error, missing CLI, or non-zero exit silently falls back to the pre-Wave-3 legacy path. Zero behavior regression for any obvious case. All skill-wired edges are toggleable via environment variables for emergency rollback:
| Env var | Default | Effect when unset/false |
|---|---|---|
SKILL_INTEGRATION_ENABLED | true | AISB handlers/prompts skip skill probes |
SKILL_REPORT_ENABLED | true | Patrol uses raw template instead of skill digest |
MISSION_AUDITOR_SKILL_CLASSIFY | true | Auditor stays on regex-only classification |
OMEGA_USE_RESURRECT | 1 | Shadow worker branch falls back to legacy recovery_apply |
OMEGA_SMART_RESURRECT | unset | Setting skill enables Tier 3 LLM call |
SHADOW_LLM | unset | Setting haiku enables Tier 2 disambiguation |
CLOSEGATE_SKIP_AUDIT | unset | Setting 1 bypasses Mission Auditor entirely |
Why a skill layer matters
Three reasons.
Versioning. A skill file at ~/.claude/commands/<name>.md has a path, an author, a change history, and can be diffed. An f-string inside a Python function is none of these.
Invocability. A skill can be invoked from any context — a worker, an oracle, a script, an audit, a future oracle reviewing a past mission. An f-string can only be run by re-importing the module that defines it.
Independent testability. A skill can be smoke-tested in isolation. The smoke run for /resurrect (W10, 2026-05-16) created a fake stall, ran the cascade, and asserted that the nudge contained brief context and ended with the escape clause — without touching production. That kind of isolation is impossible for an embedded f-string.
The cost is one extra subprocess (claude -p) per junction. The fallback path means that cost is paid only when the system can afford it.
7 · Security model
Omega is built for an operator who runs the system on their own machine. The security model is therefore:
Protected scopes (the operator may forbid automation entirely)
- Billing endpoints.
- Account-management APIs.
- Authentication / OAuth flows.
.env*files (any project).- The OAuth login script.
These are sacred. Workers never touch them, oracles never touch them, the supervisor never touches them. Removing a guard rail requires a manual code edit by the operator.
Defense scan layer
Every incoming prompt (and any text the operator wants to scan ad-hoc) can be passed through a defense scanner:
Category Examples
───────────────── ─────────────────────────────────────────
Prompt injection ignore previous instructions, role hijack,
DAN, jailbreak, mode-switch, prompt-reveal
Secrets stripe keys, AWS access keys, GitHub PAT,
Slack tokens, private keys, GitLab PAT
PII US SSN-like, credit-card-like, phone
Suspicious URLs URL shorteners, IP-as-URL, .onion, free TLDs
Verdicts: clean, warning, block. Critical matches (live Stripe key, .onion URL) block. Optional quarantine appends the verdict to a defense-alerts log.
No destructive autonomy
The system actively refuses certain shortcuts:
- Workers never force-push.
- Oracles never close themselves (only the supervisor closes).
- Auto-rollback on deploy failure is opt-in per project, not default.
- Sacred files (the supervisor, the death detector, the reaper, the idle supervisor) are version-locked — any drift triggers an alert.
Sacred files
Four files at the core of the recovery mesh are sha256-locked. The validation runs on every test sweep, and any drift surfaces immediately. The list and hashes are kept in the operator's local installation, not published, but the integrity contract is part of the install.
8 · Evidence
This section reports what is measurable today. It does not report numbers we do not have. Omega's production telemetry is young, and that fact constrains the evidence base.
Wave-3 shipping log (2026-05-16)
| Artifact | Type | Status | Evidence |
|---|---|---|---|
/omega-protocol | skill | shipped | ~/.claude/commands/omega-protocol.md |
/dispatch-oracle | skill | shipped | ~/.claude/commands/dispatch-oracle.md |
/dispatch-worker | skill | shipped | 342-line canonical specification |
/worker-protocol | skill | shipped | ~/.claude/commands/worker-protocol.md |
/resurrect | skill | shipped + smoke PASS | brief-aware nudge with French/EN escape clause |
/synthesize-report | skill | shipped | hybrid 70% template + 30% Haiku digest |
/format-telegram-report | skill | shipped | patrol-wired (4 hits) |
/audit-mission | skill | shipped | mission-auditor close-gate |
/classify-intent | skill | shipped | hybrid regex + Haiku, mission-auditor W7 wire |
/plan-decompose | skill | shipped | oracle complex-mission decomposition |
/diagnose | skill | shipped | on-demand pane snapshot |
| W5 — cross-session memory wiring | fix | shipped | omega-memory in dispatch + auditor |
| W6 — brief atomic write before paste | fix | shipped | tmpfile + mv ordering in dispatch-to-session.sh:622-625 |
| W7 — auditor /classify-intent hybrid | fix | shipped | MISSION_AUDITOR_SKILL_CLASSIFY=true default |
| W8 — event-driven tracking-reactor | fix | shipped | systemd user unit, 129 ms latency |
W9 — KAIROS _nudge_oracle retired | fix | shipped | unconditional return False |
W10 — /resurrect smoke PASS | validation | passed | brief-aware French nudge with escape clause |
W12 — omega-overview.md | docs | shipped | single entry point for Omega system |
| AISB skill-wired chain | integration | shipped | 21 grep hits in handlers/prompts |
| Patrol skill-wired chain | integration | shipped | 4 grep hits in patrol.sh |
| Mission-auditor skill-wired chain | integration | shipped | 4 grep hits |
What was measured today (chaos + smoke tests, 2026-05-15 → 2026-05-16)
| Test | Result | What it proves |
|---|---|---|
| Worktree E2E (5 scenarios) | 5/5 | Happy path, conflict, main moved, parallel, ship failure |
| Worktree chaos v1 (18 cases) | 18/18 | Process kills mid-operation, disk-full, race conditions |
| Worktree chaos v2 (8 cases) | 8/8 | Concurrent worktree-create on same project |
| Worktree chaos v3 (9 cases) | 9/9 | Interrupted ship + recovery |
| /goal Phase 1 opt-in smoke | 5/5 | Opt-in injection via GOAL_NATIVE=true works |
| /goal Phase 2 revert smoke | 8/8 | Default-on block is removed; PLAN protocol contracts in |
| Worker-mark-done oracle guard | Pass | Refuses oracle session names with rc=3 + redirect |
| PLAN protocol runtime test | 1/1 | End-to-end worker dispatch, plan execution, done.json |
| Sacred files sha256 stability | 4/4 | Patrol, watchdog, reaper, idle-supervisor unchanged |
| Defense scan (5 categories) | 5/5 | clean / injection / secret / URL / PII verdicts correct |
| /resurrect Tier-1 smoke (W10) | Pass | Brief-aware French nudge with escape clause |
| tracking-reactor event-to-trigger (W8) | 129 ms | inotify wake → tmux probe → resurrect call |
What is live in operation right now
| Quantity | Source |
|---|---|
| Outcomes-database mission rows | 2 (small N — system is young) |
Worker .done.json files on disk (recent) | 5 |
| Tool-call events captured by the tracking hook | 2,571 across 61 session files |
| Cron entries active | 28 (supervisor + observer + flusher + ...) |
| Systemd user units active | 1 (tracking-reactor.service) |
| Sacred files unchanged since | 4–6 days (last verified today) |
| Safety Mesh layers wired | 4 (brief-replay, CPU guard, shadow, mission auditor) |
| Shadow signals monitored (Tier 1) | 14 (incl. CPU_OVERLOAD) |
| Mission Auditor mission types classified | 9 + hybrid /classify-intent escalation |
| Quality Arsenal audits selectable by Mission Auditor | 17 |
| Skills in the Skill Orchestration Layer | 11 |
| Wired skill call sites (handlers / patrol / auditor) | 29 total grep hits |
Honest gaps
- Production mission count is small. The outcomes database has 2 rows. A claim like "10,000 missions executed at 99% success" would be a fabrication. Honest framing: the system is in early operation; chaos tests validate the structural properties (race conditions, recovery, isolation) that production data cannot yet validate at scale.
- Mean time intent → ship. Not yet computed across a statistically meaningful sample. Single observed examples are in the tens of minutes for narrow Linear-style fixes, hours for cross-cutting features. These are operator anecdotes, not telemetry.
- Cost per mission. Token consumption is captured per tool call (the tracking hook) but not yet aggregated into a per-mission cost report. A dashboard for this is planned.
- Incident-avoidance count. The observer fires nudges, but the proportion of nudges that prevented a stall (vs nudges sent into already-recovering sessions) is not yet computed. The new tracking-reactor will help here: every event-driven trigger writes a line to
~/.aisb/logs/tracking-reactor.logwith the session and the matched stuck-regex. - Skill fallback frequency. The skill-wired chain has fallbacks at every junction. How often the fallback fires versus the skill succeeds is logged but not yet aggregated.
Five short case studies (concrete, verifiable today)
Case A — The 4000-character /goal pivot. The native /goal primitive was integrated, evaluated under load, and found to have a hard 4000-character limit incompatible with real worker prompts (mission + pre-boot knowledge pack + criteria + verify + autonomy banner). Phase 2 default-on was reverted within 24 hours; the PLAN protocol was introduced as a replacement. The revert was end-to-end tested the same day with a runtime worker dispatch. Evidence: a smoke test suite of 8 assertions validates that the revert is applied and the PLAN protocol artifacts are in place.
Case B — The worker-mark-done oracle guard. A debug session revealed that an oracle had accidentally called worker-mark-done.sh instead of oracle-mark-done.sh, writing its done-signal to the wrong namespace. A guard was added that refuses oracle session names (regex-matched) with rc=3 and a redirect message. The fix is small (10 lines of bash) but eliminates a class of cross-tier confusion errors. Smoke-tested: oracle session → rejected; worker session → accepted.
Case C — Dogfooding the CPU throttle. On the same day the CPU Guard suite (Safety Mesh Layer 2) shipped, a routine multi-dispatch wave triggered the throttle in production. Two new dispatches arrived while a build was already underway; the one-minute load reported by uptime was above the 2.5× cores admission threshold. Both dispatches were diverted to dispatch-queue.jsonl and exited with DEFERRED. The cron */2 flusher re-dispatched them in order once load dropped below 2× cores. The build that was in progress finished without .next/ corruption, both queued workers completed cleanly, and the operator was notified only via a low-priority FYI. No human action was required. This is the first observation of the throttle protecting production traffic from itself.
Case D — The shadow-design flaw and its autonomous fix. The Shadow Manager described in §5 was originally symmetric — workers and oracles received the same prescriptive nudges. The flaw surfaced when an oracle that was actively managing a UI design task received a stagnation nudge ("999 minutes since the last decisions.md edit, take a concrete action or mark done"). The oracle interpreted the imperative literally and abandoned the in-flight task. Diagnosis took roughly thirty minutes; an emergency kill-switch was installed in another fifteen; a follow-up worker landed the asymmetry contract (THRASH disabled on oracles, stagnation floor raised to six hours, brief-aware question-mode emergency, FYI throttling and deduplication) within a few hours, including the documentation update in oracle-protocol.md. The system corrected its own design flaw within a single working day, with the kill-switch protecting production while the proper fix landed.
Case E — Retiring KAIROS in favor of the resurrect cascade (2026-05-16). The legacy kairos.py::_nudge_oracle had been quietly destroying long-running design and audit oracles for weeks: anything that sat at the prompt for more than fifteen minutes between deliberate tool calls received an imperative nudge and frequently abandoned its in-flight work. Diagnosis pointed at a fundamental mismatch — the function used a coarse tmux session_activity timestamp with no awareness of mission type or brief content. The fix was deliberately conservative: an unconditional return False at the top of the function (W9), validated by a clean bot restart, with the function body left intact for defense-in-depth. The replacement (/resurrect + tracking-reactor.sh, W8 + W10) ships brief-aware, scoped strictly to workers, with the asymmetry contract enforced at every layer. Tier-1 smoke (W10) validated end-to-end on a fake worker with fifteen-minute-old tracking events: the cascade detected the stall, extracted context from the brief (last 25 chars: "build clean"), identified the last tool event (Bash), and generated a context-aware French nudge with the escape clause. Zero tokens spent. This is precisely the failure mode chaos tests cannot anticipate — a behaviorally correct subsystem doing the wrong thing for a different target class — and the kind of failure the next iteration of the audit pipeline is being tuned to catch earlier.
What chaos tests cannot prove
Chaos tests prove that the structural properties hold under hostile conditions. They do not prove that the system makes good engineering decisions. That is the job of the audit pipeline (the Quality Arsenal) and the Second Law (challenge the premise). The audit pipeline catches "shipped working code with bad architecture"; the Second Law catches "shipped working code for a request that should have been refused".
9 · Roadmap
Recently delivered (since v2.1)
- Skill Orchestration Layer (Wave-3). Eleven invocable, versioned skills replace the prose contracts at every junction in the 4-level chain. 29 verified call sites across handlers, patrol, and mission-auditor. Every skill probe is best-effort with silent fallback.
- W5 — Cross-session memory wired both sides.
knowledge-pack-builder.shemits aPROJECT MEMORYsection at dispatch time;mission-auditor.shwrites lessons (APPROVED) and mistakes (REJECTED iter≥2) after every verdict. - W6 — Brief atomic write before tmux paste. Closes a race window where a crash between paste and verify left the shadow without a brief to replay.
- W7 — Mission Auditor hybrid classifier. Ambiguous
genericcases escalate to/classify-intentvia Haiku micro-call. Zero behavioral regression for the 80% fast-path. - W8 — Tracking Reactor. Event-driven (
inotify) supervision viasystemd --userunit. Measured 129 ms event → trigger latency. Singleton viaflock. Shares the 600s throttle ledger with the cron observer so they cannot double-fire. - W9 — KAIROS
_nudge_oracleretired. Replaced by/resurrect+ tracking-reactor. - W10 —
/resurrectsmoke PASS. End-to-end validation of the Tier-1 cascade with a fake worker, brief-aware French nudge with escape clause, zero tokens consumed. - W12 —
omega-overview.md. Single entry-point document for the Omega system, indexing all docs and skills.
Short-term (active)
- Automate bot restart after handler code changes so progress-card features activate without operator intervention.
- Exercise the PLAN protocol's sub-agent pattern (
Agent(team_name=...)) on a real client mission, not just a smoke test. - Port the 28 cron entries to a native scheduling primitive so they become inspectable and version-controlled from inside a session.
- Aggregate Mission Auditor verdicts into a per-classification accuracy report (do
bug-fixaudits actually catch bugs that escaped worker self-review?). - Aggregate the skill-fallback frequency per junction (how often
claude -ptimes out vs returns valid output).
Medium-term
- A live dashboard for mission timelines, cost, and outcome distribution. (Partial: a plan-visualizer exists; the timeline+cost projection is still pending.)
- Dual-run a
/loop-based supervisor against the legacy supervisor for 30 days, compare outputs, then switch over when convergence is proven. - A learning agent that watches accepted vs rejected proposals and feeds the rejection rate back into proposal quality estimates.
Still open (carried forward from v2.1)
- W1 — High availability. The system runs on a single VPS. A second-host failover is designed but not yet deployed.
- W2 — Multi-provider abstraction. All paths currently assume Anthropic Claude as the model provider. A provider-agnostic layer is sketched but not implemented.
- W3 — Telegram fallback channel. When Telegram is unreachable, the operator has no out-of-band notification path. A second channel (email, push, alternate IM) is open.
Open architecture questions
- Workers as sub-agents vs sub-sessions? Current design isolates workers in their own tmux sessions and their own Claude Code instances. Alternative: workers as sub-agents inside the oracle, sharing the oracle's context. Tradeoff: sub-agents save tmux slots and dispatcher overhead but lose context-isolation benefit and complicate the close-gate.
- A richer goal primitive? If the platform raises the 4000-character limit on
/goal(or introduces a plan-bound primitive), revisit the Phase 2 default-on revert. - Cross-project memory? The memory layer is currently scoped per system. Should client projects share a common lessons-learned corpus, or stay isolated?
- Ship pipeline for non-Vercel hosts. The deploy-verify step is currently Vercel-specific via API polling. Generalize to Fly.io, Render, Cloudflare Pages.
- Mission Auditor calibration. The 85/100 score floor is uniform across mission types. Should it vary (e.g., 90 for
shipbecause production risk is higher, 80 fordocsbecause the cost of false positives outweighs the cost of a tolerable doc imperfection)? This requires accuracy data the system does not yet have. - Shadow observe-only escalation granularity. Today the FYI digest groups all observe-only signals per oracle into a single throttled message. Should specific signal patterns (e.g., repeated
BUILD_REGRESSIONon the same project) bypass the digest and escalate immediately, even on oracles? Trades responsiveness against the cost that prompted the asymmetry in the first place. - Skill discoverability. Eleven skills are wired into the orchestration chain; the broader catalog at
~/.claude/commands/is now ~140 invocable commands (audits, builders, marketing tools, diagnostics). How should new operators discover what is invocable without reading every file?omega-overview.mdand/listcmdare first answers; a generated, searchable skill catalogue is the obvious next.
The judging standard
Every iteration of Omega is evaluated against four questions:
- Did the operator have to babysit?
- Did the system challenge a bad premise before coding it?
- Did runtime evidence drive every conclusion?
- Was the change surgical?
If any answer is "no", the iteration is incomplete — regardless of how much code shipped.
10 · Appendix — Technical reference
Session lifecycle (worker)
Dispatch ──▶ PRE-BOOT PACK injected (incl. W5 memory rows)
│ Brief written atomically BEFORE paste (W6)
▼
Read PLAN ──▶ TodoWrite materialization (N items)
│
▼
Execute step 1 ──▶ update TodoWrite + progress.json
│ Event written to tracking JSONL
│ (tracking-reactor watches via inotify, W8)
▼
Execute step 2
│
⋮
│
▼
Run VERIFY COMMAND (must exit 0)
│
▼
worker-mark-done.sh done_clean '<summary>'
│ (atomic tmp + mv to .done.json)
▼
Mission Auditor (Layer 4) ──▶ /audit-mission, 1-3 skills
│ min score ≥ 85/100 required
▼
Oracle ack (close-gate)
│
▼
Memory write (W5) ──▶ APPROVED → lesson
│ REJECTED iter≥2 → mistake
▼
Schedule self-kill (5s)
│
▼
tmux session terminated
Failure recovery mesh (visual)
┌────────────────────────────────────────────────────────────┐
│ │
│ Supervisor (cron */1) │
│ ├── reads oracle-*.done.json │
│ ├── reads worker-*.done.json │
│ ├── decides close / keep / alert │
│ └── triggers notifications │
│ │
│ Wake-on-worker-done (cron */2) │
│ └── nudges oracle when worker .done.json un-acked │
│ │
│ Observer (cron */3) │
│ └── 6 failure modes M1–M6 │
│ │
│ Tracking Reactor (systemd user, inotify, W8) │
│ └── event-driven sub-second wake for stuck workers │
│ 129 ms event → /resurrect cascade │
│ │
│ Oracle-watchdog daemon │
│ └── detects oracle process death │
│ │
│ Oracle-reaper daemon │
│ └── kills abandoned oracles past TTL │
│ │
│ Worker-idle-supervisor daemon │
│ └── workers with no tool calls past threshold │
│ │
│ RETIRED (W9): kairos.py::_nudge_oracle │
│ └── replaced by /resurrect + tracking-reactor │
│ │
└────────────────────────────────────────────────────────────┘
State files (atomic write contract)
All state files in the system follow the same write pattern:
Write : tmp file in same directory, then mv -f to final
Read : open + lock-free read; staleness via mtime
Update : never in-place; always tmp + mv
Cleanup : grace window before deletion
Naming : namespaced by session for collision safety
W6 (2026-05-16) extends this contract to the brief-replay file specifically: it is now written before the tmux paste, not after, so a crash mid-paste still leaves a valid brief on disk.
Done.json schema (worker)
{
"session": "string",
"status": "done_clean | pending | failed",
"summary": "one-line description",
"commit": "git sha or empty",
"finished_at": "ISO 8601",
"todos_total": "int",
"todos_completed": "int",
"pending_actions": ["list of strings"],
"written_by": "string (helper name)"
}
Done.json schema (oracle)
{
"oracle": "string",
"project": "string",
"status": "done_clean | pending | failed",
"started_at": "ISO 8601",
"finished_at": "ISO 8601",
"duration_sec":"int",
"mission": "string",
"ship": {
"requested": "bool",
"result": "ok | failed | skipped | frozen",
"commit": "git sha or empty",
"push_url": "string or empty",
"deploy_url": "string or empty",
"deploy_status": "string"
},
"pending_actions": ["list of strings"],
"report_path": "string or empty",
"lifecycle": "persistent | ephemeral"
}
The 17 forensic audits — quick reference
| Audit | Domain | Raw scale | Question |
|---|---|---|---|
| code | Code quality | /420 | Is the code SOLID? |
| flow | User flows | /400 | Does the experience WORK? |
| uiux | Design system | /420 | Is the interface BEAUTIFUL? |
| debug | Runtime bugs | /360 | What is BROKEN right now? |
| feature | Completeness | /320 | Is the product COMPLETE? |
| perf | Performance | /360 | Is it FAST? |
| sec | Security | /400 | Is it SECURE? |
| a11y | Accessibility | /320 | Is it ACCESSIBLE? |
| seo | Search optim. | /400 | Is it DISCOVERABLE? |
| data | Data integrity | /320 | Is the data INTACT? |
| api | API contracts | /360 | Is the API SOLID? |
| copy | Messaging | /280 | Is the copy CLEAR? |
| dx | Dev experience | /320 | Is the DX SMOOTH? |
| motion | Animation | /360 | Is the motion PURPOSEFUL? |
| automation | Scheduling | /330 | Are automations RELIABLE? |
| logic | System logic | /360 | Is the logic OPTIMAL? |
| retention | Product/CPO | /400 | What features are MISSING? (read-only) |
All scores normalize to /100 for comparison across domains.
The 11 wired skills — quick reference
| Skill | Owner of | Token cost |
|---|---|---|
/classify-intent | Inbound intent + ambiguous mission-type | ~0 fast path / Haiku slow |
/dispatch-oracle | AISB → Oracle brief assembly | Sonnet (one call per dispatch) |
/dispatch-worker | Oracle → Worker prompt assembly | Sonnet (one per worker) |
/worker-protocol | Worker self-contract | 0 (read on boot) |
/omega-protocol | Oracle process contract | 0 (read on boot) |
/resurrect | Tier-3 LLM stall recovery (opt-in) | Haiku |
/synthesize-report | Worker done.json digest | template + Haiku (~50s budget) |
/format-telegram-report | Telegram payload humanization | Haiku |
/audit-mission | Close-gate audit selection / verdict | one audit at a time VPS-wide |
/plan-decompose | Oracle complex-mission decomposition | Sonnet |
/diagnose | On-demand pane snapshot | 0 |
A note on extraction
This document is generated through a render-to-PDF pipeline with Unicode font embedding. The text layer is preserved (verified with pdftotext from Poppler 23.x; all body content extracts cleanly to UTF-8). Some PDF readers and third-party extractors handle complex layouts (multi-column, drop caps, box-drawing characters) less robustly than Poppler — if you observe text artifacts, try a Poppler-based extractor or a PDF-to-Markdown converter.
Patch log — V2.3 (2026-05-17)
V2.3 is a hardening release. No new architectural surface — five surgical fixes plus instrumentation, all driven by a single incident.
The Kommu/Causio incident. On 2026-05-17, two long-running project oracles (Kommu, Causio) closed prematurely while their missions were objectively unfinished. The Kommu mission required exhaustive sweeps — all features × 17 audits × 100/100 score — roughly five thousand audit runs. The oracle's internal todo list was sized at 8 items. It finished its 8, wrote done.json with status=pending and pending_actions=[] (false — hundreds of audits remained), then exited. The patrol observed pending with no actionable pending list, the session died naturally, and child workers continued running orphaned.
This is the Second-Law failure mode in its purest form: an oracle hallucinated mission-completeness on an under-dimensioned plan. No external nudge could have saved it — the oracle honestly believed it was done.
The five fixes shipped in V2.3:
-
Mission Sweep-Completeness Gate in
oracle-mark-done.sh. When the brief contains sweep keywords (all features, exhaustive, récursivement, 100/100, every page, 17 audits), the gate counts completed audit-worker.done.jsonfiles. If the count is below a configurable threshold (default 17 — one full Quality Arsenal pass), the gate refuses to let the oracle exit cleanly: it forcesstatus=pendingand adds an explicit pending action "sweep-incomplete: continue dispatching". The patrol then keeps the oracle alive across cycles and queues aresume_pendingevent in the oracle inbox. Failure becomes visible. -
Worker Death Logger (
worker-death-logger.sh, every two minutes). The system now keeps a rolling snapshot of alive worker sessions and detects workers that vanished without producing a.done.json— silent-kill events that previously left no trace. Each detection is logged toworker-silent-kills.jsonlwith session name, parent oracle, dispatch timestamp, and age-since-dispatch. Observability only; no automatic mitigation yet (intentional — collect data first). -
Smart-Check Observer rewrite in
omega-resurrect.sh. The previous observer fired nudges on cron-driven idle detection alone, which produced false positives on oracles that were actively thinking. The new observer runs six independent signals before allowing any nudge: kill-switch state, pane-active heuristics (Claude thinking-verb detection), live worker presence on the same project, tracking-event mtime within ten minutes,decisions.mdmtime within thirty minutes, and explicitpending_actionscontent. All six must pass before a nudge is allowed. Recorded result over a 60-minute window after deployment: 100 skips, 0 parasitic nudges. -
AISB async wire stable. The bot-side skill subprocess invocation moved from blocking
subprocess.runtoasyncio.create_subprocess_exec. Stability over five hours of uptime confirms the deadlock that previously crashed the bot every ~40 minutes is gone. -
Passive Telegram digest (
omega-oracle-digest.sh, 20:00 UTC daily). Replaces the legacy auto-nudge loop. One scheduled message per day summarizes all active oracles, worker outcomes, Mission Auditor verdicts, system health, and detected silent kills. The operator decides what to intervene on — the system no longer guesses.
E2E layer-alive check. A 15-layer probe confirms after each V2.3 deploy that bot, dispatchers, mark-done, mission auditor, patrol, tracking reactor, death watchers, memory layer, recall path, digest, gate, briefs, and skills are all reachable. Latest run: 15 PASS / 0 FAIL.
Honest limits. The Sweep-Completeness Gate uses heuristic keyword matching on the brief plus an audit-worker file-count threshold. It will produce some false negatives (a sweep mission phrased without trigger words slips through) and possibly some false positives (a non-sweep mission containing the words "all features" by accident). Both directions will be tuned with telemetry over the next two weeks. The Worker Death Logger does not yet capture journalctl exit signals or the last pane snapshot — those land in V2.4. The Kommu/Causio resurrection itself is not yet automated: the patched done.json and queued resume_pending event signal the operator, who then runs /resurrect manually.
Patch log — V2.4 (2026-05-17, Lifecycle-Hardened Edition)
V2.4 closes a class of failure modes that V2.3 surfaced but could not yet fix automatically: workers and oracles closing while their declared work was unfinished. V2.3 made the failure visible via the Mission Sweep-Completeness Gate and Worker Death Logger. V2.4 prevents the closures themselves with a universal kill gate that every kill path in the system must traverse.
The forensic findings (audit pass, 2026-05-17). A targeted survey of every cron-driven, bot-driven, and operator-driven kill site found four classes of bypass:
- Twelve
tmux kill-sessioncall sites in bash scripts that did not route throughclose-gate.sh. Some checked alive state alone (worker idle ≠ todos done). Some wrote syntheticdone.jsonfiles and then killed the session — a false attestation. - An inverted-logic regression in
worker-close-check.shthat made the gate always block, forcing patrol scripts to bypass it entirely to ever kill anything. The bypass became the de facto behavior; the gate became dead code. - Three Python kill sites in the Telegram bot triggered by operator "Close oracle + workers" buttons. These bypassed the gate with no audit trail.
- An anti-pattern in twenty per-project oracle system prompts instructing the LLM to consider
kill+restartof a stalled worker — direct contradiction of the close-gate guarantee.
The six fixes shipped in V2.4:
-
Safe-Kill universal gate (
~/.aisb/lib/omega-v2/safe-kill.sh). One wrapper every kill path must traverse. Refuses to kill sessions whoseprogress.jsonshows pending todos, whosedone.jsonis missing, or that have not been acked. Protected sessions (Home / AISB / tunnels / Omega infrastructure) are immortal even with--force. The--forceflag exists for legitimate emergencies (claude crash with state preserved on disk, operator explicit abort) and is always audited — every forced kill writes a marker plus akill.forcedlifecycle event with reason and caller. -
Lifecycle event log (
~/.aisb/state/events/lifecycle.events.log). A unified append-only JSON-lines log of every dispatch, heartbeat, todo update, mark-done, ack, block declaration, and kill decision. Replaces forensic archaeology across ten scattered state files when debugging what happened to session X?. -
Orchestration-aware OBSERVER (
oracle-shadow.sh,oracle-observer.sh). The previous STAGNATION signal fired on time alone — an oracle that had been idle past a 12-hour floor was nudged regardless of whether it was correctly waiting for in-flight workers. The new classifier readsworkers.txt, cross-checks each worker's tmux state,progress.jsonmtime, and recent lifecycle events, and classifies the oracle as one of five states:WAITING_FOR_WORKERS,PENDING_TRIAGE,IDLE_AFTER_BATCH,GENUINELY_STUCK,NO_WORKERS_EVER. Only the last two qualify as emergencies. TheM5rule for "stuck worker" detection was tightened from one signal (tracking events) to three concordant signals (tracking + lifecycle + heartbeat snapshot); a worker running a slow build no longer trips it. -
Inverted-logic fix in the close gate (
worker-close-check.sh). The branch that was supposed to BLOCK when the worker is still working fired instead when the worker was idle — the regression that made the gate dead code. Captured with an explicit exit-code branch test (AC_RC -ne 0) instead of the silent shell short-circuit that hid the inversion. Without this fix, every other layer above was running uphill. -
Bot Python kill paths gated (
bot/aisb/handlers.py). The three sites where operator-clicked "Close oracle + workers" buttons killed sessions now route throughsafe-kill.sh --force. Behavior is unchanged for the operator (their click still closes), but every close is now logged inkill-forced-<session>.jsonmarkers plus the lifecycle event log. Operators retain the audit trail of what they killed and when. -
Per-project oracle prompts re-aligned. Twenty per-project oracle system prompts contained a
kill+restartrecommendation for stalled workers — direct contradiction of the close-gate guarantee. Replaced withinvestigate first (tmux capture-pane) — if truly stuck and the close-gate allows it, use safe-kill.sh; never kill workers with unfinished progress.
The lifecycle skeleton on dispatch. Every new worker spawned via dispatch-to-session.sh now receives an initialized todo.json + progress.json + mission.json + heartbeat file before reading its first prompt. Workers refine their todos via ~/.aisb/lib/omega-v2/omega-todo.sh declare ... and acknowledge each completion via omega-todo.sh done <id>. The close-gate has authoritative input — no more synthesizing done_clean from inferred state.
Silent-hang detection without killing. The new heartbeat-watch.sh cron (every minute) cross-references three signals — progress.json mtime, lifecycle event count, tmux pane content hash — and emits a block.declared event when a worker has stopped contributing despite holding pending todos. The worker is never killed; the operator is notified via the parent oracle's inbox and the daily Telegram digest. Distinguishing frozen mid-work from doing slow work replaces the old timer-only assumption.
Kill-path coverage map. After V2.4, every kill site that could affect a worker or oracle routes through the gate or is provably benign (self-managed scratch sessions; explicit operator abort). Coverage at publication:
Path Gate Audit
───────────────────────────────── ────────────── ─────────
patrol.sh (7 sites) safe-kill lifecycle event
reclaim-stale.sh safe-kill lifecycle event
oracle-watchdog.sh respawn cycle safe-kill --force kill-forced marker
bot/handlers.py operator "close" safe-kill --force kill-forced marker
bot/handlers.py "close_oracle" safe-kill --force kill-forced marker
oracle-shadow.sh STAGNATION orchestration_state (observe-only)
oracle-observer.sh M5 worker stuck 3-signal classifier (observe-only)
heartbeat-watch.sh no kill — emits block.declared
Honest gaps that V2.4 did not yet close (now addressed in V2.5).
- The Mission Sweep-Completeness Gate from V2.3 still relies on keyword heuristics. False negatives on missions phrased without trigger words remain possible — unchanged in V2.5.
- Workers spawned before V2.4 do not have lifecycle state — they fall back to the (now-correctly-working) legacy gate which is permissive when no
todo.jsonexists. Coverage grows as new dispatches arrive. - Inconsistent-state workers from pre-V2.4 — addressed in V2.5 by the deadman switch (auto-mark-pending after 10min idle) and audit-mode bypass.
Patch log — V2.5 (2026-05-18, Adversarial-Validated Edition)
V2.5 is not a patch release in the cosmetic sense. It is the measured-behavior publication that V2.4 explicitly deferred. A 10-hour adversarial validation between 2026-05-17 20:10 UTC and 2026-05-18 06:33 UTC ran three Quality Arsenal audits in three iterative cycles, found four production bugs in the V2.4 lifecycle layer, fixed them in place, and re-audited to convergence. The full results are in § 11 · Adversarial validation results (V2.5).
The four V2.5 fixes, summarized:
-
FIX-CONSUMED—worker-close-check.shnow acceptsdone.json.consumed(the post-patrol-ack form ofdone.json). Pre-V2.5, the close-check looked only fordone.jsonand emitted exit 3 BLOCK once the file had been consumed, producing zombie sessions for legitimately-completed workers. Cascade-cleaned four pre-existing zombies on deployment. -
FIX-AUDITMODE— Quality Arsenal audit workers (/codeaudit,/secaudit,/debugaudit, etc.) track todos internally via Claude Code's nativeTodoWriteand don't callomega-todo declare. The pre-V2.5declared=falseguard correctly stalled them as a side effect. V2.5 distinguishes audit-mode workers (canonicaldone.json+ skill marker) from anonymous undeclared workers — audit close allowed, attacker resistance preserved. Verified by regression tests on twelve adversarial probes. -
FIX-DEADMAN—patrol.shnow runs a deadman switch each cycle. Any worker withprogress.jsonmtime older than 10 minutes, idle at❯prompt, with nodone.jsonon disk, gets an auto-writtenstatus=pendingplus reason"deadman: idle Nmin without mark-done"and a safe-kill via the gate. Belt-and-suspenders on top of the audit-mode bypass. -
FIX-MODAL—omega-v2/modal-dismisser.sh(new) auto-dismisses six known Claude Code blocking modals (How is Claude doing?,Auto-update available,Press Esc to skip, etc.) by sending the safe dismiss key (Escapeby default). Invoked early in each patrol cycle. Protected sessions (Home*,AISB-master,Tunnel*) are never touched. Every dismissal writes a JSONL audit entry. Live-tested with a fake modal session — detection + dispatch + log all confirmed.
Why this matters. The four bugs above were not theoretical. Three of them were caught by the system's own audits during the validation; the fourth (modal-dismisser) was caught by the oracle's own failure to mark its mission complete and is now permanently fenced off. The system audited itself, fixed itself, and re-audited itself, three iterative cycles — a self-healing pipeline working as designed.
11 · Adversarial validation results (V2.5)
The measured behavior promised in V2.4. Between 2026-05-17 20:10 and 2026-05-18 06:33 UTC, a 10-hour adversarial validation ran on the live system. An oracle (oracle-omega) dispatched three Quality Arsenal audits (/codeaudit, /debugaudit, /secaudit) followed by twelve adversarial bypass probes against the close-gate, run across three iterative cycles. Each cycle ran the audits in parallel, harvested every failure, dispatched fix workers, and re-audited. Every bug found in the live system was patched before the next iteration began.
Final scores (iter 3).
| Audit | Score | Verdict | Confidence |
|---|---|---|---|
/codeaudit | 100 / 100 | PASS | high |
/debugaudit | 100 / 100 | PASS | high |
/secaudit | 100 / 100 | PASS | high |
Adversarial probes. Twelve bypass attempts against the close-gate exit semantics — BOM-injection in done.json, integer-typed status fields, null-byte payloads, leading-zero todos counters, fractional-epoch mtime, symlink redirection, manual belt unit-tests of every probe. All twelve correctly STALL with the expected exit code. Zero new fail-OPEN paths detected.
Production bugs found and fixed during validation (the V2.5 deltas).
FIX-CONSUMED (worker-close-check.sh). Patrol auto-renames worker-<S>.done.json to done.json.consumed after notifying the parent oracle. The pre-V2.5 close-check looked only for done.json and emitted exit 3 BLOCK once the file had been consumed — producing zombie sessions for legitimately-completed workers. V2.5 accepts either form and prefers the active file when both exist.
FIX-AUDITMODE (worker-mark-done.sh + close-gate.sh + worker-close-check.sh). Audit-skill workers (/codeaudit, /secaudit, etc.) track their todos internally via Claude Code's native TodoWrite tool and never call omega-todo declare. The pre-V2.5 declared=false guard — designed to refuse attacker-spoofed done signals — correctly stalled audit-mode workers as a side effect. V2.5 distinguishes audit-mode workers (canonical done.json + skill marker) from anonymous undeclared workers, allowing audit close while preserving attacker resistance. Regression tests confirm the spoofed paths still STALL.
FIX-DEADMAN (patrol.sh). A worker that goes idle at ❯ prompt without calling worker-mark-done.sh was previously immortal — no timer wrote a done.json on its behalf. V2.5 adds a deadman switch: if a worker has progress.json mtime older than 10 minutes, is idle (no esc to interrupt in pane), and has no done.json, patrol auto-writes status=pending + reason "deadman: idle Nmin without mark-done" and safe-kills via the gate. Belt-and-suspenders on top of the audit-mode bypass.
FIX-MODAL (patrol.sh + new modal-dismisser.sh). Claude Code occasionally pops a "How is Claude doing?" feedback modal AFTER the assistant message finalizes. On 2026-05-18, an oracle whose mission was objectively complete (done_clean stated, 3 audits PASS) sat blocked for ~6h because the modal trapped the terminal before oracle-mark-mission-complete.sh could run. V2.5 introduces omega-v2/modal-dismisser.sh: a per-cycle scan that detects six known blocking-modal patterns (How is Claude doing?, Auto-update available, Press Esc to skip, etc.) and sends the safe dismiss key (Escape by default). Protected sessions (Home*, AISB-master, Tunnel*) are never touched. Every dismissal writes a JSONL audit entry.
Self-healing chain. After each fix landed, the next iteration's audits re-ran and re-scored. The V2.5 deltas were not theoretical — they were produced by the validation itself, in response to live failures, and verified by the same audits that found them. The system audited itself, fixed itself, and re-audited itself, three times.
Residuals. Five inconsistent-state legacy workers remain on the live system from pre-V2.4 dispatches. They are now annotated by omega-status as INCONSISTENT and will be reaped by the deadman switch on their next idle window. Zero new zombies have appeared since 2026-05-18 06:33.
E2E layer-alive check (V2.5). Same 18-probe surface as V2.4 plus three new probes: modal-dismisser.sh executable, audit-mode coherence check, deadman switch dry-run. Latest run before publication: 21 PASS / 0 FAIL.
Patch log — V2.5.1 (2026-05-22, Patterns Edition)
V2.5.1 is the operationalization release. V2.5 hardened the lifecycle. V2.5.1 promotes five outside-world patterns into the live system and exposes them through a single operator-facing diagnostic. The patterns are not theoretical: each one ships with a script, an integration point, an audit-log destination, and (where appropriate) a kill-switch. The full integration map lives at ~/.aisb/docs/PATTERNS-INTEGRATION.md (v3.2).
The five operational patterns now wired into Omega:
-
Source-as-Context (
mount-package-source.sh). Workers used to hallucinate third-party APIs from training data — First Law had no leverage at write time, only at runtime. The fix mounts the real source code of each dependency under~/.aisb/refs/repos/<pkg>/as a shallow git clone, anddispatch-to-session.shcallsmount-package-source.sh --inject <project>aftermission-init.shso every worker boots withREF_SOURCESpointing at real code. The worker self-contract now requiresgrep $REF_SOURCESbefore any third-party API call. Twenty-six packages mapped, seven live on the VPS, ~150 MB total. Monthly cron prunes mounts unused for thirty days. -
Cleanup-Wave (
cleanup-wave.sh). Quality Arsenal audits used to find duplications after commit, paying refactor cost post-facto. The cleanup wave inserts between fix and re-audit in the DAG (oracle-prompt.shrule R-26), with a tight scope contract (no renames, diff cap, behavior-preserving). Halt reasons land ingrep-loop-<session>.halt.jsonfor postmortem. Manual invocation for now — auto-injection on the v3.3 backlog. -
Bounded Grep-Loop (
grep-loop.sh). Audit Step 8b (fix-and-reaudit) previously had no exit guarantee — workers could spin on subjective LLM judgement. The grep-loop wrapper enforces an objective verify command (exit code, not "does it look good now?"), a max-iteration budget, and a scope-creep gate (cumulative diff threshold). Forbidden in the contract: LLM calls as gate. Exit codes:0verify_clean ·2max_iter ·3scope_creep ·4worker_died. -
Effect-TS Schedule retries (
schedule.sh). Three declarative policies in bash, mirroring the Effect-TS shapes: exponential backoff with jitter, fixed-interval polling, one-shot with timeout. Wired intotelegram.sh send_message(no more silent drop on 429/5xx),ship/push.sh(transient GitHub failures),ship/deploy.sh(timeout 600s),ship/verify.sh(poll deploy URL until 200, every 15s for up to 10 min). Every attempt is logged to~/.aisb/logs/schedule.jsonlwith policy name, attempt number, return code, duration, and halt reason. -
Ship Pipeline as capability blocks (
ship/{build,commit,push,deploy,verify,orchestrate}.sh). The oldoracle-ship.shwas a 559-line monolith mixing build, commit, push, deploy, verify, freeze, rollback, telegram and state. V2.5.1 decomposes it into six blocks. Each block takes explicit--flagargs, emits exactly one JSON object on stdout, and uses exit codes0ok /1failed /2halt. Opt-in viaOMEGA_SHIP_V32=1or.orchestrator/ship-config.json: {"use_v32_pipeline": true}. Default OFF — v3.3 will do the cutover once live confidence accumulates. Purely additive in the meantime.
Pattern Hermes — omega-doctor 13-check. A single diagnostic an operator can run to verify the system is healthy. Thirteen check groups: filesystem layout, core scripts, v3.2 helpers, tmux daemon, cron entries, Claude CLI auth, Telegram bot, disk usage, state hygiene (stale resurrect attempts, stale .consumed), Convex (optional), network reachability (Telegram + Anthropic API), recent events (bot.log errors + schedule.jsonl), and mounted source references. Four modes: --quick (skip network probes), --full, --fix (auto-repair safe issues — mkdir, chmod +x, prune stale), --json (machine-readable). Runs at the end of every bash setup; daily 03:00 UTC cron is on the v3.3 backlog. Invocation: omega doctor (CLI) or ~/.aisb/lib/omega-doctor.sh --quick (direct).
Where each pattern fires automatically today:
| Trigger | Pattern(s) applied |
|---|---|
| Every worker dispatch | #1 Source-as-Context (--inject) |
| Every Telegram message | #4 Schedule retries (telegram.sh) |
| Every ship pipeline (when opt-in) | #5 Capability blocks + #4 internal retries |
| Audit Step 8b (recommended) | #3 Grep-Loop with objective verify |
| Pre-audit cleanup (recommended) | #2 Cleanup-Wave (wrapped by #3) |
| Monthly cron | mount-package-source.sh --prune-stale |
End of bash setup + on demand | omega-doctor.sh |
Status table (V2.5.1).
| Pattern | State |
|---|---|
| #1 Source-as-Context | Shipped + auto-invoked at dispatch |
| #4 Schedule retries | Shipped + auto-invoked on Telegram + ship |
| Hermes / omega-doctor | Shipped + invocable via omega doctor and setup |
| #2 Cleanup-Wave | Shipped + documented for manual invocation |
| #3 Bounded Grep-Loop | Shipped + documented for manual invocation |
| #5 Ship blocks | Shipped opt-in (default OFF until v3.3 cutover) |
All integrations are purely additive — no existing script changes its default behavior in V2.5.1.
Case F — Real-world product mission, four iterations (Agentik Academy rebrand, 2026-05-22). Between 2026-05-22 07:00 and 09:00 UTC, a customer-facing product mission — "rebrand Kommu Master Class to Agentik Academy, EUR 2 997 pricing across the master-class purchase funnel" — ran through the production audit chain (/uiuxaudit, /flowaudit, /secaudit, /debugaudit, /apiaudit, /codeaudit). Four iterations were needed: iter-1 surfaced seven adversarial findings, iter-2 closed all seven and exposed a production-deploy gap (rebranded code had not yet shipped to live), iter-3 verified the Stripe webhook's durable grant persistence with graceful side-effect degradation, iter-4 closed an orphaned CommunityPreview component that had zero render sites. Every iteration's evidence is in .audit/rebrand-academy-master-class/*.json. This is the first multi-iteration audit chain on a live customer-facing product change — complementary to the V2.5 in-house adversarial validation, which audited Omega against itself. The mission completed without operator intervention beyond approving the deploy.
E2E layer-alive check (V2.5.1). Same 21-probe surface as V2.5, plus six new probes (one per pattern + omega-doctor.sh executable). Latest run before publication: 27 PASS / 0 FAIL.
End of document — version 2.5.1 · Patterns Edition · 2026-05-22