Agentik Coding Workflow V2.5.1 | Patterns Edition

Omega — Autonomous Engineering Operations

A whitepaper on multi-agent orchestration with verifiable autonomy

Version 2.5.1 · Patterns Edition · 2026-05-22

Executive summary

Omega is a multi-agent operating system for software engineering work. It turns a single human intent — "fix this bug", "ship this feature", "audit this codebase" — into a chain of planned, executed, audited, and deployed work, without continuous human supervision.

The system is organized as four orchestration levels: the human operator, a routing bot, project oracles, and short-lived worker sessions. Each level has one job and one exit condition. Completion is signaled by an atomic file (.done.json) and acknowledged by three independent layers (worker, oracle, supervisor) before a session is closed.

What makes Omega different from other agent frameworks is its operational discipline:

Three Laws that override every prompt: runtime truth over code intent, researcher posture over sycophancy, autonomous decision over idle waiting.
A 12-step ship pipeline with deploy verification, freeze-don't-rollback default, and per-project locks.
A 17-audit Quality Arsenal covering code, runtime, design, performance, security, accessibility, SEO, data, API, copy, DX, motion, automation, logic, and product retention. Each audit uses Gestalt clarity gating + Popper falsification + hinge-point 10× scrutiny.
A supervision mesh of cron-driven patrols, event-driven reactors, and daemons that detect categorized failure modes and recover stalled sessions.
A Skill Orchestration Layer (new in v2.2). Every junction in the 4-level chain is now backed by an invocable, versioned skill instead of an ad-hoc f-string or regex. Eleven skills replace the prose contracts that previously lived inside Python handlers and bash heredocs.

Version 2.2 is the Skill-Wired Edition. It documents the eleven skills shipped on 2026-05-16, the seven weakness fixes that landed alongside them, and the new asymmetric supervision mesh that replaced the legacy KAIROS nudger.

The honest gaps remain: Omega's production telemetry is young (the live system has been running for weeks, not years), and the published metrics are bounded by that fact.

1 · The problem — Why autonomous agents fail

The promise of autonomous coding agents — "describe what you want, get working software back" — has been pitched many times. In practice, four failure modes recur:

Loss of context. An agent solves the first sub-task, then forgets why it was solving it. Single-context-window approaches collapse when the task exceeds the window or branches into parallel work.

Sycophancy. Most LLMs are RLHF-tuned to agree. When a user proposes a flawed approach, the agent codes it instead of challenging it. The result is fast garbage.

Silent failure. The agent reports success, the operator believes it, and only later discovers the function never compiled, the test was disabled, or the deploy was skipped. There is no independent verifier.

Stalls without escalation. The agent encounters ambiguity, asks the user a question, and waits indefinitely. If the user is not watching the tmux session, the system hangs forever.

A fifth failure mode is endemic to multi-agent systems specifically:

Drift between prose contracts. When the contract between two agents lives in an f-string or a regex inside a handler, every layer eventually paraphrases it differently. The router sends a slightly different brief than the dispatcher writes, the oracle interprets a slightly different intent than the worker executes, and a stable system silently becomes an unstable one over weeks. The fix in v2.2 is to convert every contract into a versioned skill.

Omega is built around these failure modes. Each is named, attacked, and verifiable.

        Problem                       Omega's response
─────────────────────────  ─────────────────────────────────
 Loss of context           4-level chain; workers are short-lived;
                           oracle context survives across workers;
                           cross-session memory (W5) recalls lessons
                           
 Sycophancy                Second Law — challenge the premise
                           before coding, with evidence
                           
 Silent failure            3-tier close-gate (worker .done.json,
                           oracle ack, supervisor close decision);
                           Layer 4 Mission Auditor adds an
                           independent audit gate before ack
                           
 Idle stalls               Third Law — never wait, always decide;
                           legal stops are .done.json or blocked.json
                           with fallback action already executed;
                           /resurrect cascade recovers any stall
                           
 Contract drift            Skill Orchestration Layer — 11 skills
                           replace 21+ f-strings / regex hits;
                           every junction is now an invocable
                           protocol with explicit toggles

2 · Omega's answer — A 4-level architecture

Every Omega operation flows through four levels. Each has one job, one input contract, one output contract.

                  ┌─────────────────────────────────────────────┐
                  │  LEVEL 0  —  Human operator                 │
                  │  Sends an intent (one Telegram message)     │
                  └────────────────────┬────────────────────────┘
                                       │
                                       ▼
                  ┌─────────────────────────────────────────────┐
                  │  LEVEL 1  —  Routing bot                    │
                  │  Classifies (Simple / Medium / Complex /    │
                  │  Epic), resolves the project, builds a      │
                  │  brief, dispatches an oracle                │
                  └────────────────────┬────────────────────────┘
                                       │
                                       ▼
                  ┌─────────────────────────────────────────────┐
                  │  LEVEL 2  —  Project oracle                 │
                  │  Plans, dispatches workers, verifies done,  │
                  │  optionally ships, signals supervisor       │
                  └────────────────────┬────────────────────────┘
                                       │
                                       ▼
                  ┌─────────────────────────────────────────────┐
                  │  LEVEL 3  —  Workers                        │
                  │  Read PLAN, execute steps, verify, write    │
                  │  .done.json, self-kill                      │
                  └─────────────────────────────────────────────┘

Why four levels and not three or five

Level 0 ↔ 1 separation. A noisy human channel (natural language Telegram) is converted into a structured contract (project, scope, brief, ship flag). The bot does the messy text-to-intent work so the oracle never has to.

Level 1 ↔ 2 separation. The bot does not need to know project internals. The oracle owns project context (CLAUDE.md, codebase layout, file ownership rules). The bot just routes.

Level 2 ↔ 3 separation. Each worker has its own context window and dies after one mission. The oracle's context survives across many workers, accumulating decisions and audit findings without ever overflowing.

Three levels would force the oracle to do per-task execution, blowing its context. Five levels would add ceremony without separation of concerns.

Multi-oracle parallelism

A single project can have multiple oracles running concurrently. The oracle assignment is atomic (file lock per project). Each oracle declares the files it owns; the assigner refuses overlapping ownership. Idle oracles are reused before spawning new ones.

        Project X
           │
           ├── oracle-X       owns app/**, components/**
           ├── oracle-X-2     owns api/**, db/**
           └── oracle-X-3     owns docs/**, tests/**
                                (assigned only if file sets disjoint)

This pattern handles the case where a single human intent ("ship a feature plus update the docs plus add tests") naturally splits across non-overlapping areas of the codebase.

3 · Core guarantees

Four guarantees define Omega's contract with the operator. Each is enforced mechanically, not by goodwill.

Guarantee 1 — Autonomy

Once dispatched, a worker never asks the operator a question. The legal exits are:

.done.json written, status done_clean — work verified complete.
.done.json written, status pending — partial, with pending_actions[] listing what remains.
.done.json written, status failed — genuinely blocked, with evidence.
worker-blocked-<session>.json written + fallback action executed — truly ambiguous, but the worker proceeded with its best guess while signaling the supervisor.

The AskUserQuestion tool is forbidden in dispatched sessions. Workers that pause at a question mark are by definition broken.

Guarantee 2 — Verification

Workers do not self-certify. Three layers acknowledge completion:

  Worker writes .done.json     ─── Tier 1: "I think I finished"
       │
       ▼
  Oracle reads, runs VERIFY    ─── Tier 2: "Confirmed, work meets spec"
  COMMAND, calls
  close-gate ack-worker
       │
       ▼
  Mission Auditor intercepts   ─── Tier 2.5: "Forensic audit ≥85/100"
  (1-3 skills, ≥85/100)            (Layer 4 of the Safety Mesh)
       │
       ▼
  Supervisor reads ledger,     ─── Tier 3: "Safe to close, operator informed"
  decides close window,
  notifies the operator

Each tier is independent. A failure at any tier keeps the session alive and surfaces the discrepancy. The Mission Auditor was introduced as Layer 4 of the Safety Mesh (see §5); it dispatches one to three Quality Arsenal audits selected by mission type.

Guarantee 3 — Isolation

Workers cannot harm each other:

Each worker has its own context window (no shared memory between workers).
Each worker has its own state directory (worker-<session>.* files, namespaced).
Atomic writes everywhere (tmp + mv -f) prevent half-written state files.
The brief-replay file (Layer 1) is written atomically before the tmux paste (W6, 2026-05-16) so a mid-paste crash still leaves a valid brief on disk for replay.
Optional git worktrees per oracle for cross-cutting changes that would conflict otherwise.

The worktree subsystem is chaos-tested: 40 of 40 cases pass, including process kills mid-operation, disk-full simulation, and concurrent worktree creation on the same project.

Guarantee 4 — Close-gate

The supervisor never auto-closes a session if:

Status is not done_clean.
Ship result is failed or frozen.
pending_actions[] is non-empty.
The operator has interacted with the bot during the grace window.
A new oracle for the same project was dispatched during the grace window.

Auto-close happens only when all conditions point to "the work is genuinely finished, the operator has been notified, and the resources can be freed".

4 · Operational flow

This section walks one complete intent from operator to ship.

Step 1 — Intent

The operator sends a message to the routing bot. The message is in natural language, English or French, optionally with attachments (screenshots, Linear links, audit keywords).

Step 2 — Classification and routing

The bot classifies the intent via the /classify-intent skill (since v2.2). The classifier is hybrid: a regex pass resolves the ~80% of obvious cases at zero token cost; ambiguous cases escalate to a Haiku micro-call returning one of eight canonical intents (bug-fix | feature | audit | ship | refactor | docs | question | other) plus confidence and a routing hint. The skill replaces a regex-only classifier that systematically misrouted vague messages like "Causio fait ce que tu sais".

  Simple   ─ one read-only check                   ─ done in-band
  Medium   ─ one specialist, single area           ─ spawn 1 worker
  Complex  ─ multiple specialists, multi-domain    ─ /team in tmux
  Epic     ─ cross-department, hours+              ─ /aisb full chain

It also detects forensic-audit keywords (code, flow, UX, perf, sec, ...) and routes them to the right audit skill. Audit keywords are never paraphrased into freeform prose — the literal skill command is invoked.

Step 3 — Brief construction

The bot builds a brief for the oracle. The brief is now produced by the /dispatch-oracle skill (the legacy f-string remains as fallback when the skill probe times out). The brief includes:

{
  "project": "Project name",
  "mission": "One-line summary",
  "ship": true | false,
  "files_owned": ["glob patterns the oracle may touch"],
  "deploy_timeout_min": 10,
  "lifecycle": "persistent | ephemeral"
}

ship is set true only when the operator explicitly asks (keywords: ship, deploy, push, merge, livre, "envoie en prod"). Audits and research never ship.

Step 4 — Oracle planning

The oracle reads the brief and project CLAUDE.md, classifies the work, and writes its plan to .orchestrator/decisions.md (one line per decision: task, classification, choice, rationale). It then designs the worker dispatches, optionally invoking the /plan-decompose skill on complex multi-step missions.

Crucially, the oracle never writes project code directly. Even a one-line typo fix goes through a worker session.

Step 5 — Worker dispatch with the PLAN protocol

Each worker is dispatched via the /dispatch-worker skill (canonical contract; the legacy bespoke prompt assembly remains as fallback). The worker prompt is:

== MISSION ==
<one-line mission>

== PLAN ==
1. <step 1, concrete, verifiable>
2. <step 2>
3. <step 3>
...

== FILES IN SCOPE ==
- <glob or path list>

== DONE CRITERIA ==
- <criterion 1, observable in <60s>
- <criterion 2>

== VERIFY COMMAND ==
<single shell command that returns 0 when done>

== HANDOFF ==
When PLAN complete AND VERIFY COMMAND passes, call:
bash <path>/worker-mark-done.sh done_clean '<summary>'

== PRE-BOOT KNOWLEDGE PACK ==
Project context, language defaults, audit triggers,
and (W5) the five most-recent lessons/mistakes
from the cross-session memory store, scoped to this
project.

The worker boots with /worker-protocol as its self-contract (Wave-3 canonicalization). It reads the PLAN, materializes it as a TodoWrite list (each step becomes a todo item), and executes step-by-step.

Why PLAN and not the native /goal primitive

Claude Code v2.1.141 ships a native /goal <condition> primitive — the engine auto-loops until the condition is met. We integrated this in two phases:

Phase 1: opt-in via GOAL_NATIVE=true for solo workers with short deterministic conditions.
Phase 2: default-on for all solo workers.

Phase 2 was reverted within a day. /goal has a hard 4000-character limit. Real worker prompts (mission + pre-boot knowledge pack + DONE + VERIFY + autonomy banner) routinely exceed 5000 characters. Default-on injection caused truncation. The PLAN protocol replaces it: no length limit, every step is visible in TodoWrite, the worker is a transparent state machine.

/goal remains available as Phase 1 opt-in for short deterministic conditions (e.g. npx vitest passes).

Step 6 — Audit (forensic)

If the mission is a forensic audit, the worker runs the matching protocol (e.g. /codeaudit, /uiuxaudit, /secaudit). Each audit has 16–23 phases, a domain-specific raw-score maximum (280–420), and normalizes to /100 for comparison. All audits share:

Gestalt clarity gate. First pass: is the artifact comprehensible at all? If not, the audit stops and reports the clarity failure first. There is no point measuring detail on something incoherent.
Popper falsification. Every claim is paired with a falsification check. "This component is accessible" requires "What would prove it isn't?" — and that check is executed.
Hinge-point 10× scrutiny. The audit identifies the one or two phases that, if wrong, invalidate everything downstream. Those phases get 10× the rigor of others.

Step 7 — Ship (optional)

If brief.ship is true, the oracle runs the 12-step ship pipeline:

1.  Build  (npm run build or project-specific, via safe-npm-build.sh mutex)
2.  Stage  (whitelist files; refuse extras)
3.  Secret scan staged (gitleaks)
4.  Whitespace check (git diff --cached --check)
5.  Commit (conventional message)
6.  Acquire flock per-project (serializes oracles)
7.  Check freeze flag (if frozen, abort + alert)
8.  Pull --rebase (auto-abort on conflict, keep local commit)
9.  Push (retry once after re-rebase)
10. Deploy (whitelisted command; default Vercel + token)
11. Poll deploy status (max deploy_timeout_min, default 10 min)
12. Write .done.json with commit, push URL, deploy URL, duration

On deploy failure, the default behavior is freeze, don't rollback. A ship-<project>.frozen flag is set; subsequent oracles cannot push until the operator decides to revert or fix-forward. Auto-rollback is opt-in per project — auto-rollback can hide root causes (missing env var, provider outage, etc.).

Step 8 — Worker handoff

The worker calls worker-mark-done.sh <status> '<one-line summary>'. This atomically writes worker-<session>.done.json (tmp + mv). The script has a guard: it refuses to run from an oracle session (rc=3 + redirect message). This prevents the common bug where an oracle accidentally marks itself done as if it were a worker.

The worker's tmux session schedules a self-kill 5 seconds after the handoff — freeing the slot for the next dispatch.

Step 9 — Mission Auditor (Layer 4 — END)

Before the oracle ack, close-gate.sh ack-worker invokes mission-auditor.sh. The auditor classifies the mission with a hybrid heuristic + /classify-intent skill probe (W7), selects 1–3 Quality Arsenal audits via a rules table, runs them under a global VPS-wide flock (one audit at a time), and computes a minimum-score verdict. ≥ 85/100 → APPROVED, < 85 → REJECTED (worker is nudged with top findings and retried up to twice). Bypass for emergencies is opt-in via CLOSEGATE_SKIP_AUDIT=1.

The auditor also writes back to the cross-session memory store (W5): APPROVED verdicts become lesson rows; REJECTED-at-iter≥2 become mistake rows. The next worker on the same project boots with these in its pre-boot knowledge pack.

Step 10 — Oracle ack

The oracle reads the worker's done.json, executes the VERIFY COMMAND, and calls close-gate.sh ack-worker <worker-session>. Without this ack, the supervisor treats the worker as un-acknowledged and nudges the oracle. Before reporting, the oracle invokes /synthesize-report (template + Haiku digest) and /format-telegram-report (humanized payload) so the operator receives a story, not raw JSON.

Step 11 — Supervisor close decision

The supervisor (cron-driven, every minute) reads all oracle done.json files and applies the close decision tree:

  done_clean + ship.result in {ok, skipped}     → notify + close after grace
  done_clean + ship.result in {failed, frozen}  → notify + keep alive
  pending                                       → notify + inline "continue" button
  failed                                        → send logs + keep alive

The grace window resets if the operator interacts with the bot or a new oracle is dispatched on the same project.

5 · Reliability model

Reliability in Omega is enforced through a Safety Mesh of four independent layers, each owning a distinct slice of the mission lifecycle (START → DURING → END). A failure in any single layer does not compromise the others, and each layer can be reasoned about, tested, and disabled in isolation.

   ┌──────────────────────────────────────────────────────────────────┐
   │  Layer 1 — BRIEF-REPLAY      (dispatch persistence, START)       │
   │  Layer 2 — CPU GUARD         (load admission control, START)     │
   │  Layer 3 — SHADOW MANAGER    (live signal monitor, DURING)       │
   │  Layer 4 — MISSION AUDITOR   (quality gate at handoff, END)      │
   └──────────────────────────────────────────────────────────────────┘

Layer 1 — Brief-Replay. Every dispatch persists its prompt to a per-session file before the worker is given control. The persisted brief lets the system replay the original instructions verbatim into a worker that has been hit by a transient rate-limit or API error and would otherwise lose its context. W6 (2026-05-16) moved the write to immediately before the tmux paste, using an atomic tmpfile + mv pattern — even a mid-paste crash now leaves a valid brief on disk.

Layer 2 — CPU Guard. A two-core host is structurally protected against concurrent heavy builds and dispatch storms by three sub-defenses: (A) a global flock mutex around the build command so two builds cannot race on the same artifact directory; (B) a CPU-aware dispatch throttle that diverts new dispatches into a queue file when the one-minute load average exceeds 2.5× cores, plus a queue flusher (cron */2) that re-dispatches when load drops below 2× cores and ages out entries past 4 hours; (C) a dedicated CPU_OVERLOAD shadow signal that suppresses nudges for five minutes, kills duplicate build processes per working directory, and escalates to the operator. The CPU Guard prevented multi-build saturation in a real incident the same week this whitepaper was revised — see §7 Evidence.

Layer 3 — Shadow Manager. Every three minutes a heuristic observer evaluates fourteen signals across all running workers and oracles: thrash, error burst, silent drift, scope creep, build regression, progress stagnation, pane-stuck pattern, worker health, todo stall, rate-limit stall, transient API error, prompt-idle, OOM hints, and CPU overload. Detection is Tier 1 (zero token, pure heuristic). An opt-in Tier 2 uses a small model to disambiguate ambiguous Tier-1 hits. Tier 3 is Telegram escalation when retries are exhausted.

Layer 3 has a crucial asymmetry between workers and oracles. A prescriptive nudge that helps a worker recover ("you are looping, change approach, mark done") destroys an oracle that is legitimately managing multiple concurrent missions. Workers therefore receive direct nudges; oracles default to observe-only (JSONL log + throttled FYI). THRASH is disabled for oracles entirely (oracles iterate by design). The stagnation floor is raised to six hours plus an idle-confirmation gate. True emergencies use a brief-aware question-mode ("are you still on this?"), never the imperative. A global kill-switch file freezes all nudges as a panic-stop. This asymmetry replaced an earlier symmetric implementation that destroyed an in-progress UI mission and is described as a case study in §7.

Layer 4 — Mission Auditor. Between a worker's done_clean and the oracle's acknowledgment, an independent gate intercepts the close handshake. It classifies the mission heuristically and now (W7) escalates ambiguous cases to the /classify-intent skill via a Haiku micro-call. It then selects one to three Quality Arsenal audits via a rules table (bug-fix → debug + code, ui → uiux + a11y + motion, api → api + sec, etc.) and runs them under a global VPS-wide lock so only one audit consumes resources at a time. Verdict is the minimum score across audits, with a default threshold of 85/100. Rejected verdicts nudge the worker with the top findings and retry up to twice before escalating to the operator. Bypass is opt-in via an environment variable for emergencies and for the audits-themselves.

These four layers complement the existing supervisory loops described below.

Smart Resurrect — the stall-recovery cascade

Worker stall recovery was redesigned in W10 (2026-05-16) into a four-tier cascade orchestrated by omega-resurrect.sh and the /resurrect skill:

┌─────────────────────────────────────────────────────────────────────┐
│ Worker stall signal → omega-resurrect.sh                            │
│                                                                     │
│  Tier 1 (0 token)  : todos + events + brief → context-aware nudge  │
│  Tier 2 (0 token)  : pane regex → error-type-specific recovery     │
│                       (rate_limit | api_error | type_error |        │
│                        build_fail | oom | cmd_missing)              │
│  Tier 3 (opt-in)   : claude -p `/resurrect` skill (Haiku, free Max) │
│                       OMEGA_SMART_RESURRECT=skill required          │
│  Tier 4 (escalate) : Telegram via notify-bot.sh after 3 attempts    │
└─────────────────────────────────────────────────────────────────────┘

95% of stalls are handled at Tier 1+2 (zero token). LLM consultation is opt-in at Tier 3. Asymmetry is preserved: workers get direct nudges; oracles use only the softer FYI variant of Tier 1. Every nudge ends with a language-detected escape clause (Si tu es déjà sur cette action, ignore ce message. / If you are already on this, ignore this message.).

Tracking Reactor — event-driven supervision

oracle-shadow.sh runs on cron every three minutes; average reaction latency is ~90 seconds. W8 (2026-05-16) added tracking-reactor.sh, an inotify-based reactor watching ~/.omega/state/tracking/*.events.jsonl for sub-second reaction:

┌─────────────────────────────────────────────────────────────────────┐
│  Worker writes event → JSONL close_write → inotify wakes reactor    │
│       │                                                             │
│       ▼                                                             │
│  Per-session 2s coalesce + 1s global debounce                       │
│       │                                                             │
│       ▼                                                             │
│  tmux capture-pane probe vs STUCK_REGEX (Awaiting/STOPPING/…)       │
│       │                                                             │
│       ▼                                                             │
│  Match → omega-resurrect.sh <session>  (Tier 1→4 cascade)           │
│                                                                     │
│  + 60s background ticker: scan tracking mtimes; >10min idle AND     │
│    session alive → omega-resurrect.sh (idle fallback path)          │
└─────────────────────────────────────────────────────────────────────┘

Measured latency (smoke 2026-05-16): 129 ms event → trigger (target <1000 ms). The reactor runs under a systemd --user unit (tracking-reactor.service), is a flock-protected singleton, and shares the 1-attempt/600s throttle ledger with the cron observer so they cannot double-fire. The cron observer remains as the cold-path safety net for sessions the reactor missed.

KAIROS retirement (W9, 2026-05-16)

The legacy kairos.py::_nudge_oracle function previously nudged idle oracles after fifteen minutes based on a simple tmux session_activity timestamp. This broke oracles working on complex UX/design tasks: they appeared "idle" (sitting at the prompt) between deliberate tool calls but were actively planning. W9 disables this nudger entirely with an unconditional return False. The function body remains intact for defense-in-depth (in case any caller still routes through it), but it is now a no-op. The bot restarted cleanly on the change. The replacement is /resurrect + tracking-reactor.sh: brief-aware, worker-scoped, with the asymmetry contract enforced.

Cross-session memory (W5, 2026-05-16)

Workers learn from past missions instead of repeating mistakes. Two-sided wire:

Recall (read side at dispatch time). knowledge-pack-builder.sh emits a PROJECT MEMORY section in the pre-boot knowledge pack when omega-memory list --project=$PROJECT --limit=5 returns rows. Every dispatched worker boots with the five most-recent lessons/mistakes for its project. Soft-fails when the DB is empty or absent — no impact on legacy flows.
Write (audit-time hook). mission-auditor.sh writes after the verdict is computed: APPROVED → kind=lesson, body=[mission_type · audits · score=N] <worker_summary> (capped 500 chars); REJECTED && iter ≥ 2 → kind=mistake with the per-audit name:score/100 findings so the next retry does not repeat the failing pattern.

Storage is SQLite FTS5 at ~/.omega/state/memory.db. Inspect via omega-memory list|search|stats.

Supervisor + daemon mesh

The supervisor is one of two cron loops. There are also four long-lived daemons. Together they form a recovery mesh.

  ╔══════════════════════════════════════════════════════════════╗
  ║  Cron */1 min : supervisor (close decisions, alerts, reaper) ║
  ║  Cron */2 min : event-driven oracle wake on worker done.json ║
  ║  Cron */3 min : observer (6 categorized failure modes M1-M6) ║
  ║  Systemd user : tracking-reactor.service (inotify, ~129ms)   ║
  ║                                                              ║
  ║  Daemon       : oracle process death detector                ║
  ║  Daemon       : abandoned-oracle reaper (TTL-bound)          ║
  ║  Daemon       : worker idle supervisor (no-tool-call timeout)║
  ╚══════════════════════════════════════════════════════════════╝

The six observer failure modes:

Code	Symptom	Recovery action
M1	Worker .done.json un-acked, siblings still alive	Nudge oracle via tmux send-keys
M2	All workers done, oracle idle > 5 min	Send report or close oracle
M3	Worker `failed`, oracle has not surfaced an alert	Alert via bot directly
M4	`worker-blocked-<session>.json` exists	Surface question to operator
M5	Worker has not emitted a tool event for X minutes	`/resurrect` cascade (was: `/team retry`)
M6	Oracle TodoWrite has not changed for N observer ticks	FYI digest (asymmetric, never imperative)

Nudges are throttled (one per 5 min per oracle) to avoid spam.

The incident that triggered the mesh (2026-04-15)

A Linear-resolution worker correctly identified that 25 of 36 tickets were already fixed and in "In Review" state. Instead of deciding the best path and executing, it posted "Three paths — which path?" and waited idle for 10+ minutes. The operator found it by accident.

Root cause: the prior Second Law ("challenge the premise") was being interpreted as "ask before coding". It needed to be "challenge, decide, proceed". The fix became the Third Law: in dispatched sessions, AskUserQuestion is forbidden, idle prompts are forbidden, the only legal stops are .done.json or worker-blocked-<session>.json with the fallback action already executed.

This single incident drove the entire mesh of observer + wake-on-done + the Third Law specification. A wrong decision that produces evidence is 100× more valuable than a correct pause that produces nothing.

6 · Skill-Wired Orchestration (since v2.2)

Until v2.1, the contracts between Omega layers lived in scattered places: an f-string inside _build_oracle_dispatch_prompt, a regex in handlers.py, a heredoc in worker.md, a rules table in mission-auditor.sh. Each was authoritative somewhere but invocable nowhere. Drift was inevitable.

v2.2 introduces a Skill Orchestration Layer: eleven invocable, versioned skills that replace the prose contracts at every junction in the chain.

The 11 skills

#	Skill	Junction it owns	Replaces
1	`/classify-intent`	Inbound Telegram message classification (and ambiguous mission-type in the auditor)	regex-only `handlers.py` classifier; misrouted vague messages like "Causio fait ce que tu sais"
2	`/dispatch-oracle`	AISB → Oracle dispatch prompt	f-string body inside `_build_oracle_dispatch_prompt`
3	`/dispatch-worker`	Oracle → Worker dispatch prompt	per-oracle bespoke worker prompt assembly
4	`/worker-protocol`	Worker self-contract on boot	embedded heredoc in `worker.md`
5	`/omega-protocol`	Oracle process-level contract	scattered rule files
6	`/resurrect`	Worker stall recovery (Tier 3 of the cascade)	obsolete `kairos.py::_nudge_oracle`
7	`/synthesize-report`	Done-handoff digest for Telegram	oracles reading raw `done.json`
8	`/format-telegram-report`	Telegram payload humanization	template-only `route_notify`
9	`/audit-mission`	Close-gate forensic verification (Layer 4)	static rules table only
10	`/plan-decompose`	Oracle plan decomposition for complex missions	manual decomposition
11	`/diagnose`	On-demand diagnostic snapshot of any oracle / worker	manual pane reads + jq

The skill-wired chain (visual)

GARETH ──Telegram──▶ AISB ──tmux──▶ ORACLE ──tmux──▶ WORKER ──/team──▶ AGENTS
   intent:    /classify-intent           │           │              │
   dispatch:  /dispatch-oracle           │           │              │
                                /dispatch-worker     │              │
                                  /omega-protocol    │              │
                                /plan-decompose      │              │
                                                /worker-protocol    │
                                                /resurrect (stall) │
                                                /diagnose (snapshot)│
                                                /audit-mission     │
                                                  (close-gate)      │
GARETH ◀──Telegram── AISB ◀──tmux── ORACLE ◀──tmux── WORKER ◀──────/synthesize-report
                              /format-telegram-report

Wiring matrix

The skills are not just defined — they are wired. The following table shows the number of verified call sites per layer (as of 2026-05-16 grep):

Layer	File	Skill hits
AISB handlers / prompts	`bot/aisb/handlers.py`, `bot/aisb/prompts.py`	21
Patrol (Telegram report)	`bot/aisb/patrol.sh`	4
Mission Auditor	`~/.aisb/lib/mission-auditor.sh`	4
Oracle dispatch (memory pack)	`~/.aisb/lib/dispatch-to-session.sh`	omega-memory (W5)

Failure mode and toggles

Every skill probe is best-effort. A claude -p timeout, parse error, missing CLI, or non-zero exit silently falls back to the pre-Wave-3 legacy path. Zero behavior regression for any obvious case. All skill-wired edges are toggleable via environment variables for emergency rollback:

Env var	Default	Effect when unset/false
`SKILL_INTEGRATION_ENABLED`	`true`	AISB handlers/prompts skip skill probes
`SKILL_REPORT_ENABLED`	`true`	Patrol uses raw template instead of skill digest
`MISSION_AUDITOR_SKILL_CLASSIFY`	`true`	Auditor stays on regex-only classification
`OMEGA_USE_RESURRECT`	`1`	Shadow worker branch falls back to legacy `recovery_apply`
`OMEGA_SMART_RESURRECT`	unset	Setting `skill` enables Tier 3 LLM call
`SHADOW_LLM`	unset	Setting `haiku` enables Tier 2 disambiguation
`CLOSEGATE_SKIP_AUDIT`	unset	Setting `1` bypasses Mission Auditor entirely

Why a skill layer matters

Three reasons.

Versioning. A skill file at ~/.claude/commands/<name>.md has a path, an author, a change history, and can be diffed. An f-string inside a Python function is none of these.

Invocability. A skill can be invoked from any context — a worker, an oracle, a script, an audit, a future oracle reviewing a past mission. An f-string can only be run by re-importing the module that defines it.

Independent testability. A skill can be smoke-tested in isolation. The smoke run for /resurrect (W10, 2026-05-16) created a fake stall, ran the cascade, and asserted that the nudge contained brief context and ended with the escape clause — without touching production. That kind of isolation is impossible for an embedded f-string.

The cost is one extra subprocess (claude -p) per junction. The fallback path means that cost is paid only when the system can afford it.

7 · Security model

Omega is built for an operator who runs the system on their own machine. The security model is therefore:

Protected scopes (the operator may forbid automation entirely)

Billing endpoints.
Account-management APIs.
Authentication / OAuth flows.
.env* files (any project).
The OAuth login script.

These are sacred. Workers never touch them, oracles never touch them, the supervisor never touches them. Removing a guard rail requires a manual code edit by the operator.

Defense scan layer

Every incoming prompt (and any text the operator wants to scan ad-hoc) can be passed through a defense scanner:

  Category            Examples
  ─────────────────   ─────────────────────────────────────────
  Prompt injection    ignore previous instructions, role hijack,
                      DAN, jailbreak, mode-switch, prompt-reveal
  Secrets             stripe keys, AWS access keys, GitHub PAT,
                      Slack tokens, private keys, GitLab PAT
  PII                 US SSN-like, credit-card-like, phone
  Suspicious URLs     URL shorteners, IP-as-URL, .onion, free TLDs

Verdicts: clean, warning, block. Critical matches (live Stripe key, .onion URL) block. Optional quarantine appends the verdict to a defense-alerts log.

No destructive autonomy

The system actively refuses certain shortcuts:

Workers never force-push.
Oracles never close themselves (only the supervisor closes).
Auto-rollback on deploy failure is opt-in per project, not default.
Sacred files (the supervisor, the death detector, the reaper, the idle supervisor) are version-locked — any drift triggers an alert.

Sacred files

Four files at the core of the recovery mesh are sha256-locked. The validation runs on every test sweep, and any drift surfaces immediately. The list and hashes are kept in the operator's local installation, not published, but the integrity contract is part of the install.

8 · Evidence

This section reports what is measurable today. It does not report numbers we do not have. Omega's production telemetry is young, and that fact constrains the evidence base.

Wave-3 shipping log (2026-05-16)

Artifact	Type	Status	Evidence
`/omega-protocol`	skill	shipped	`~/.claude/commands/omega-protocol.md`
`/dispatch-oracle`	skill	shipped	`~/.claude/commands/dispatch-oracle.md`
`/dispatch-worker`	skill	shipped	342-line canonical specification
`/worker-protocol`	skill	shipped	`~/.claude/commands/worker-protocol.md`
`/resurrect`	skill	shipped + smoke PASS	brief-aware nudge with French/EN escape clause
`/synthesize-report`	skill	shipped	hybrid 70% template + 30% Haiku digest
`/format-telegram-report`	skill	shipped	patrol-wired (4 hits)
`/audit-mission`	skill	shipped	mission-auditor close-gate
`/classify-intent`	skill	shipped	hybrid regex + Haiku, mission-auditor W7 wire
`/plan-decompose`	skill	shipped	oracle complex-mission decomposition
`/diagnose`	skill	shipped	on-demand pane snapshot
W5 — cross-session memory wiring	fix	shipped	`omega-memory` in dispatch + auditor
W6 — brief atomic write before paste	fix	shipped	tmpfile + mv ordering in `dispatch-to-session.sh:622-625`
W7 — auditor /classify-intent hybrid	fix	shipped	`MISSION_AUDITOR_SKILL_CLASSIFY=true` default
W8 — event-driven tracking-reactor	fix	shipped	systemd user unit, 129 ms latency
W9 — KAIROS `_nudge_oracle` retired	fix	shipped	unconditional `return False`
W10 — `/resurrect` smoke PASS	validation	passed	brief-aware French nudge with escape clause
W12 — `omega-overview.md`	docs	shipped	single entry point for Omega system
AISB skill-wired chain	integration	shipped	21 grep hits in handlers/prompts
Patrol skill-wired chain	integration	shipped	4 grep hits in patrol.sh
Mission-auditor skill-wired chain	integration	shipped	4 grep hits

What was measured today (chaos + smoke tests, 2026-05-15 → 2026-05-16)

Test	Result	What it proves
Worktree E2E (5 scenarios)	5/5	Happy path, conflict, main moved, parallel, ship failure
Worktree chaos v1 (18 cases)	18/18	Process kills mid-operation, disk-full, race conditions
Worktree chaos v2 (8 cases)	8/8	Concurrent worktree-create on same project
Worktree chaos v3 (9 cases)	9/9	Interrupted ship + recovery
/goal Phase 1 opt-in smoke	5/5	Opt-in injection via GOAL_NATIVE=true works
/goal Phase 2 revert smoke	8/8	Default-on block is removed; PLAN protocol contracts in
Worker-mark-done oracle guard	Pass	Refuses oracle session names with rc=3 + redirect
PLAN protocol runtime test	1/1	End-to-end worker dispatch, plan execution, done.json
Sacred files sha256 stability	4/4	Patrol, watchdog, reaper, idle-supervisor unchanged
Defense scan (5 categories)	5/5	clean / injection / secret / URL / PII verdicts correct
/resurrect Tier-1 smoke (W10)	Pass	Brief-aware French nudge with escape clause
tracking-reactor event-to-trigger (W8)	129 ms	inotify wake → tmux probe → resurrect call

What is live in operation right now

Quantity	Source
Outcomes-database mission rows	2 (small N — system is young)
Worker `.done.json` files on disk (recent)	5
Tool-call events captured by the tracking hook	2,571 across 61 session files
Cron entries active	28 (supervisor + observer + flusher + ...)
Systemd user units active	1 (`tracking-reactor.service`)
Sacred files unchanged since	4–6 days (last verified today)
Safety Mesh layers wired	4 (brief-replay, CPU guard, shadow, mission auditor)
Shadow signals monitored (Tier 1)	14 (incl. `CPU_OVERLOAD`)
Mission Auditor mission types classified	9 + hybrid `/classify-intent` escalation
Quality Arsenal audits selectable by Mission Auditor	17
Skills in the Skill Orchestration Layer	11
Wired skill call sites (handlers / patrol / auditor)	29 total grep hits

Honest gaps

Production mission count is small. The outcomes database has 2 rows. A claim like "10,000 missions executed at 99% success" would be a fabrication. Honest framing: the system is in early operation; chaos tests validate the structural properties (race conditions, recovery, isolation) that production data cannot yet validate at scale.
Mean time intent → ship. Not yet computed across a statistically meaningful sample. Single observed examples are in the tens of minutes for narrow Linear-style fixes, hours for cross-cutting features. These are operator anecdotes, not telemetry.
Cost per mission. Token consumption is captured per tool call (the tracking hook) but not yet aggregated into a per-mission cost report. A dashboard for this is planned.
Incident-avoidance count. The observer fires nudges, but the proportion of nudges that prevented a stall (vs nudges sent into already-recovering sessions) is not yet computed. The new tracking-reactor will help here: every event-driven trigger writes a line to ~/.aisb/logs/tracking-reactor.log with the session and the matched stuck-regex.
Skill fallback frequency. The skill-wired chain has fallbacks at every junction. How often the fallback fires versus the skill succeeds is logged but not yet aggregated.

Five short case studies (concrete, verifiable today)

Case A — The 4000-character /goal pivot. The native /goal primitive was integrated, evaluated under load, and found to have a hard 4000-character limit incompatible with real worker prompts (mission + pre-boot knowledge pack + criteria + verify + autonomy banner). Phase 2 default-on was reverted within 24 hours; the PLAN protocol was introduced as a replacement. The revert was end-to-end tested the same day with a runtime worker dispatch. Evidence: a smoke test suite of 8 assertions validates that the revert is applied and the PLAN protocol artifacts are in place.

Case B — The worker-mark-done oracle guard. A debug session revealed that an oracle had accidentally called worker-mark-done.sh instead of oracle-mark-done.sh, writing its done-signal to the wrong namespace. A guard was added that refuses oracle session names (regex-matched) with rc=3 and a redirect message. The fix is small (10 lines of bash) but eliminates a class of cross-tier confusion errors. Smoke-tested: oracle session → rejected; worker session → accepted.

Case C — Dogfooding the CPU throttle. On the same day the CPU Guard suite (Safety Mesh Layer 2) shipped, a routine multi-dispatch wave triggered the throttle in production. Two new dispatches arrived while a build was already underway; the one-minute load reported by uptime was above the 2.5× cores admission threshold. Both dispatches were diverted to dispatch-queue.jsonl and exited with DEFERRED. The cron */2 flusher re-dispatched them in order once load dropped below 2× cores. The build that was in progress finished without .next/ corruption, both queued workers completed cleanly, and the operator was notified only via a low-priority FYI. No human action was required. This is the first observation of the throttle protecting production traffic from itself.

Case D — The shadow-design flaw and its autonomous fix. The Shadow Manager described in §5 was originally symmetric — workers and oracles received the same prescriptive nudges. The flaw surfaced when an oracle that was actively managing a UI design task received a stagnation nudge ("999 minutes since the last decisions.md edit, take a concrete action or mark done"). The oracle interpreted the imperative literally and abandoned the in-flight task. Diagnosis took roughly thirty minutes; an emergency kill-switch was installed in another fifteen; a follow-up worker landed the asymmetry contract (THRASH disabled on oracles, stagnation floor raised to six hours, brief-aware question-mode emergency, FYI throttling and deduplication) within a few hours, including the documentation update in oracle-protocol.md. The system corrected its own design flaw within a single working day, with the kill-switch protecting production while the proper fix landed.

Case E — Retiring KAIROS in favor of the resurrect cascade (2026-05-16). The legacy kairos.py::_nudge_oracle had been quietly destroying long-running design and audit oracles for weeks: anything that sat at the prompt for more than fifteen minutes between deliberate tool calls received an imperative nudge and frequently abandoned its in-flight work. Diagnosis pointed at a fundamental mismatch — the function used a coarse tmux session_activity timestamp with no awareness of mission type or brief content. The fix was deliberately conservative: an unconditional return False at the top of the function (W9), validated by a clean bot restart, with the function body left intact for defense-in-depth. The replacement (/resurrect + tracking-reactor.sh, W8 + W10) ships brief-aware, scoped strictly to workers, with the asymmetry contract enforced at every layer. Tier-1 smoke (W10) validated end-to-end on a fake worker with fifteen-minute-old tracking events: the cascade detected the stall, extracted context from the brief (last 25 chars: "build clean"), identified the last tool event (Bash), and generated a context-aware French nudge with the escape clause. Zero tokens spent. This is precisely the failure mode chaos tests cannot anticipate — a behaviorally correct subsystem doing the wrong thing for a different target class — and the kind of failure the next iteration of the audit pipeline is being tuned to catch earlier.

What chaos tests cannot prove

Chaos tests prove that the structural properties hold under hostile conditions. They do not prove that the system makes good engineering decisions. That is the job of the audit pipeline (the Quality Arsenal) and the Second Law (challenge the premise). The audit pipeline catches "shipped working code with bad architecture"; the Second Law catches "shipped working code for a request that should have been refused".

9 · Roadmap

Recently delivered (since v2.1)

Skill Orchestration Layer (Wave-3). Eleven invocable, versioned skills replace the prose contracts at every junction in the 4-level chain. 29 verified call sites across handlers, patrol, and mission-auditor. Every skill probe is best-effort with silent fallback.
W5 — Cross-session memory wired both sides. knowledge-pack-builder.sh emits a PROJECT MEMORY section at dispatch time; mission-auditor.sh writes lessons (APPROVED) and mistakes (REJECTED iter≥2) after every verdict.
W6 — Brief atomic write before tmux paste. Closes a race window where a crash between paste and verify left the shadow without a brief to replay.
W7 — Mission Auditor hybrid classifier. Ambiguous generic cases escalate to /classify-intent via Haiku micro-call. Zero behavioral regression for the 80% fast-path.
W8 — Tracking Reactor. Event-driven (inotify) supervision via systemd --user unit. Measured 129 ms event → trigger latency. Singleton via flock. Shares the 600s throttle ledger with the cron observer so they cannot double-fire.
W9 — KAIROS _nudge_oracle retired. Replaced by /resurrect + tracking-reactor.
W10 — /resurrect smoke PASS. End-to-end validation of the Tier-1 cascade with a fake worker, brief-aware French nudge with escape clause, zero tokens consumed.
W12 — omega-overview.md. Single entry-point document for the Omega system, indexing all docs and skills.

Short-term (active)

Automate bot restart after handler code changes so progress-card features activate without operator intervention.
Exercise the PLAN protocol's sub-agent pattern (Agent(team_name=...)) on a real client mission, not just a smoke test.
Port the 28 cron entries to a native scheduling primitive so they become inspectable and version-controlled from inside a session.
Aggregate Mission Auditor verdicts into a per-classification accuracy report (do bug-fix audits actually catch bugs that escaped worker self-review?).
Aggregate the skill-fallback frequency per junction (how often claude -p times out vs returns valid output).

Medium-term

A live dashboard for mission timelines, cost, and outcome distribution. (Partial: a plan-visualizer exists; the timeline+cost projection is still pending.)
Dual-run a /loop-based supervisor against the legacy supervisor for 30 days, compare outputs, then switch over when convergence is proven.
A learning agent that watches accepted vs rejected proposals and feeds the rejection rate back into proposal quality estimates.

Still open (carried forward from v2.1)

W1 — High availability. The system runs on a single VPS. A second-host failover is designed but not yet deployed.
W2 — Multi-provider abstraction. All paths currently assume Anthropic Claude as the model provider. A provider-agnostic layer is sketched but not implemented.
W3 — Telegram fallback channel. When Telegram is unreachable, the operator has no out-of-band notification path. A second channel (email, push, alternate IM) is open.

Open architecture questions

Workers as sub-agents vs sub-sessions? Current design isolates workers in their own tmux sessions and their own Claude Code instances. Alternative: workers as sub-agents inside the oracle, sharing the oracle's context. Tradeoff: sub-agents save tmux slots and dispatcher overhead but lose context-isolation benefit and complicate the close-gate.
A richer goal primitive? If the platform raises the 4000-character limit on /goal (or introduces a plan-bound primitive), revisit the Phase 2 default-on revert.
Cross-project memory? The memory layer is currently scoped per system. Should client projects share a common lessons-learned corpus, or stay isolated?
Ship pipeline for non-Vercel hosts. The deploy-verify step is currently Vercel-specific via API polling. Generalize to Fly.io, Render, Cloudflare Pages.
Mission Auditor calibration. The 85/100 score floor is uniform across mission types. Should it vary (e.g., 90 for ship because production risk is higher, 80 for docs because the cost of false positives outweighs the cost of a tolerable doc imperfection)? This requires accuracy data the system does not yet have.
Shadow observe-only escalation granularity. Today the FYI digest groups all observe-only signals per oracle into a single throttled message. Should specific signal patterns (e.g., repeated BUILD_REGRESSION on the same project) bypass the digest and escalate immediately, even on oracles? Trades responsiveness against the cost that prompted the asymmetry in the first place.
Skill discoverability. Eleven skills are wired into the orchestration chain; the broader catalog at ~/.claude/commands/ is now ~140 invocable commands (audits, builders, marketing tools, diagnostics). How should new operators discover what is invocable without reading every file? omega-overview.md and /listcmd are first answers; a generated, searchable skill catalogue is the obvious next.

The judging standard

Every iteration of Omega is evaluated against four questions:

Did the operator have to babysit?
Did the system challenge a bad premise before coding it?
Did runtime evidence drive every conclusion?
Was the change surgical?

If any answer is "no", the iteration is incomplete — regardless of how much code shipped.

10 · Appendix — Technical reference

Session lifecycle (worker)

  Dispatch  ──▶  PRE-BOOT PACK injected (incl. W5 memory rows)
       │           Brief written atomically BEFORE paste (W6)
       ▼
  Read PLAN  ──▶  TodoWrite materialization (N items)
       │
       ▼
  Execute step 1 ──▶ update TodoWrite + progress.json
       │           Event written to tracking JSONL
       │           (tracking-reactor watches via inotify, W8)
       ▼
  Execute step 2
       │
       ⋮
       │
       ▼
  Run VERIFY COMMAND (must exit 0)
       │
       ▼
  worker-mark-done.sh done_clean '<summary>'
       │            (atomic tmp + mv to .done.json)
       ▼
  Mission Auditor (Layer 4)  ──▶  /audit-mission, 1-3 skills
       │                           min score ≥ 85/100 required
       ▼
  Oracle ack (close-gate)
       │
       ▼
  Memory write (W5)          ──▶  APPROVED → lesson
       │                           REJECTED iter≥2 → mistake
       ▼
  Schedule self-kill (5s)
       │
       ▼
  tmux session terminated

Failure recovery mesh (visual)

  ┌────────────────────────────────────────────────────────────┐
  │                                                            │
  │   Supervisor (cron */1)                                    │
  │   ├── reads oracle-*.done.json                             │
  │   ├── reads worker-*.done.json                             │
  │   ├── decides close / keep / alert                         │
  │   └── triggers notifications                               │
  │                                                            │
  │   Wake-on-worker-done (cron */2)                           │
  │   └── nudges oracle when worker .done.json un-acked        │
  │                                                            │
  │   Observer (cron */3)                                      │
  │   └── 6 failure modes M1–M6                                │
  │                                                            │
  │   Tracking Reactor (systemd user, inotify, W8)             │
  │   └── event-driven sub-second wake for stuck workers       │
  │       129 ms event → /resurrect cascade                    │
  │                                                            │
  │   Oracle-watchdog daemon                                   │
  │   └── detects oracle process death                         │
  │                                                            │
  │   Oracle-reaper daemon                                     │
  │   └── kills abandoned oracles past TTL                     │
  │                                                            │
  │   Worker-idle-supervisor daemon                            │
  │   └── workers with no tool calls past threshold            │
  │                                                            │
  │   RETIRED (W9): kairos.py::_nudge_oracle                   │
  │   └── replaced by /resurrect + tracking-reactor            │
  │                                                            │
  └────────────────────────────────────────────────────────────┘

State files (atomic write contract)

All state files in the system follow the same write pattern:

  Write          : tmp file in same directory, then mv -f to final
  Read           : open + lock-free read; staleness via mtime
  Update         : never in-place; always tmp + mv
  Cleanup        : grace window before deletion
  Naming         : namespaced by session for collision safety

W6 (2026-05-16) extends this contract to the brief-replay file specifically: it is now written before the tmux paste, not after, so a crash mid-paste still leaves a valid brief on disk.

Done.json schema (worker)

{
  "session":         "string",
  "status":          "done_clean | pending | failed",
  "summary":         "one-line description",
  "commit":          "git sha or empty",
  "finished_at":     "ISO 8601",
  "todos_total":     "int",
  "todos_completed": "int",
  "pending_actions": ["list of strings"],
  "written_by":      "string (helper name)"
}

Done.json schema (oracle)

{
  "oracle":      "string",
  "project":     "string",
  "status":      "done_clean | pending | failed",
  "started_at":  "ISO 8601",
  "finished_at": "ISO 8601",
  "duration_sec":"int",
  "mission":     "string",
  "ship":        {
    "requested":      "bool",
    "result":         "ok | failed | skipped | frozen",
    "commit":         "git sha or empty",
    "push_url":       "string or empty",
    "deploy_url":     "string or empty",
    "deploy_status":  "string"
  },
  "pending_actions": ["list of strings"],
  "report_path":     "string or empty",
  "lifecycle":       "persistent | ephemeral"
}

The 17 forensic audits — quick reference

Audit	Domain	Raw scale	Question
code	Code quality	/420	Is the code SOLID?
flow	User flows	/400	Does the experience WORK?
uiux	Design system	/420	Is the interface BEAUTIFUL?
debug	Runtime bugs	/360	What is BROKEN right now?
feature	Completeness	/320	Is the product COMPLETE?
perf	Performance	/360	Is it FAST?
sec	Security	/400	Is it SECURE?
a11y	Accessibility	/320	Is it ACCESSIBLE?
seo	Search optim.	/400	Is it DISCOVERABLE?
data	Data integrity	/320	Is the data INTACT?
api	API contracts	/360	Is the API SOLID?
copy	Messaging	/280	Is the copy CLEAR?
dx	Dev experience	/320	Is the DX SMOOTH?
motion	Animation	/360	Is the motion PURPOSEFUL?
automation	Scheduling	/330	Are automations RELIABLE?
logic	System logic	/360	Is the logic OPTIMAL?
retention	Product/CPO	/400	What features are MISSING? (read-only)

All scores normalize to /100 for comparison across domains.

The 11 wired skills — quick reference

Skill	Owner of	Token cost
`/classify-intent`	Inbound intent + ambiguous mission-type	~0 fast path / Haiku slow
`/dispatch-oracle`	AISB → Oracle brief assembly	Sonnet (one call per dispatch)
`/dispatch-worker`	Oracle → Worker prompt assembly	Sonnet (one per worker)
`/worker-protocol`	Worker self-contract	0 (read on boot)
`/omega-protocol`	Oracle process contract	0 (read on boot)
`/resurrect`	Tier-3 LLM stall recovery (opt-in)	Haiku
`/synthesize-report`	Worker done.json digest	template + Haiku (~50s budget)
`/format-telegram-report`	Telegram payload humanization	Haiku
`/audit-mission`	Close-gate audit selection / verdict	one audit at a time VPS-wide
`/plan-decompose`	Oracle complex-mission decomposition	Sonnet
`/diagnose`	On-demand pane snapshot	0

A note on extraction

This document is generated through a render-to-PDF pipeline with Unicode font embedding. The text layer is preserved (verified with pdftotext from Poppler 23.x; all body content extracts cleanly to UTF-8). Some PDF readers and third-party extractors handle complex layouts (multi-column, drop caps, box-drawing characters) less robustly than Poppler — if you observe text artifacts, try a Poppler-based extractor or a PDF-to-Markdown converter.

Patch log — V2.3 (2026-05-17)

V2.3 is a hardening release. No new architectural surface — five surgical fixes plus instrumentation, all driven by a single incident.

The Kommu/Causio incident. On 2026-05-17, two long-running project oracles (Kommu, Causio) closed prematurely while their missions were objectively unfinished. The Kommu mission required exhaustive sweeps — all features × 17 audits × 100/100 score — roughly five thousand audit runs. The oracle's internal todo list was sized at 8 items. It finished its 8, wrote done.json with status=pending and pending_actions=[] (false — hundreds of audits remained), then exited. The patrol observed pending with no actionable pending list, the session died naturally, and child workers continued running orphaned.

This is the Second-Law failure mode in its purest form: an oracle hallucinated mission-completeness on an under-dimensioned plan. No external nudge could have saved it — the oracle honestly believed it was done.

The five fixes shipped in V2.3:

Mission Sweep-Completeness Gate in oracle-mark-done.sh. When the brief contains sweep keywords (all features, exhaustive, récursivement, 100/100, every page, 17 audits), the gate counts completed audit-worker .done.json files. If the count is below a configurable threshold (default 17 — one full Quality Arsenal pass), the gate refuses to let the oracle exit cleanly: it forces status=pending and adds an explicit pending action "sweep-incomplete: continue dispatching". The patrol then keeps the oracle alive across cycles and queues a resume_pending event in the oracle inbox. Failure becomes visible.
Worker Death Logger (worker-death-logger.sh, every two minutes). The system now keeps a rolling snapshot of alive worker sessions and detects workers that vanished without producing a .done.json — silent-kill events that previously left no trace. Each detection is logged to worker-silent-kills.jsonl with session name, parent oracle, dispatch timestamp, and age-since-dispatch. Observability only; no automatic mitigation yet (intentional — collect data first).
Smart-Check Observer rewrite in omega-resurrect.sh. The previous observer fired nudges on cron-driven idle detection alone, which produced false positives on oracles that were actively thinking. The new observer runs six independent signals before allowing any nudge: kill-switch state, pane-active heuristics (Claude thinking-verb detection), live worker presence on the same project, tracking-event mtime within ten minutes, decisions.md mtime within thirty minutes, and explicit pending_actions content. All six must pass before a nudge is allowed. Recorded result over a 60-minute window after deployment: 100 skips, 0 parasitic nudges.
AISB async wire stable. The bot-side skill subprocess invocation moved from blocking subprocess.run to asyncio.create_subprocess_exec. Stability over five hours of uptime confirms the deadlock that previously crashed the bot every ~40 minutes is gone.
Passive Telegram digest (omega-oracle-digest.sh, 20:00 UTC daily). Replaces the legacy auto-nudge loop. One scheduled message per day summarizes all active oracles, worker outcomes, Mission Auditor verdicts, system health, and detected silent kills. The operator decides what to intervene on — the system no longer guesses.

E2E layer-alive check. A 15-layer probe confirms after each V2.3 deploy that bot, dispatchers, mark-done, mission auditor, patrol, tracking reactor, death watchers, memory layer, recall path, digest, gate, briefs, and skills are all reachable. Latest run: 15 PASS / 0 FAIL.

Honest limits. The Sweep-Completeness Gate uses heuristic keyword matching on the brief plus an audit-worker file-count threshold. It will produce some false negatives (a sweep mission phrased without trigger words slips through) and possibly some false positives (a non-sweep mission containing the words "all features" by accident). Both directions will be tuned with telemetry over the next two weeks. The Worker Death Logger does not yet capture journalctl exit signals or the last pane snapshot — those land in V2.4. The Kommu/Causio resurrection itself is not yet automated: the patched done.json and queued resume_pending event signal the operator, who then runs /resurrect manually.

Patch log — V2.4 (2026-05-17, Lifecycle-Hardened Edition)

V2.4 closes a class of failure modes that V2.3 surfaced but could not yet fix automatically: workers and oracles closing while their declared work was unfinished. V2.3 made the failure visible via the Mission Sweep-Completeness Gate and Worker Death Logger. V2.4 prevents the closures themselves with a universal kill gate that every kill path in the system must traverse.

The forensic findings (audit pass, 2026-05-17). A targeted survey of every cron-driven, bot-driven, and operator-driven kill site found four classes of bypass:

Twelve tmux kill-session call sites in bash scripts that did not route through close-gate.sh. Some checked alive state alone (worker idle ≠ todos done). Some wrote synthetic done.json files and then killed the session — a false attestation.
An inverted-logic regression in worker-close-check.sh that made the gate always block, forcing patrol scripts to bypass it entirely to ever kill anything. The bypass became the de facto behavior; the gate became dead code.
Three Python kill sites in the Telegram bot triggered by operator "Close oracle + workers" buttons. These bypassed the gate with no audit trail.
An anti-pattern in twenty per-project oracle system prompts instructing the LLM to consider kill+restart of a stalled worker — direct contradiction of the close-gate guarantee.

The six fixes shipped in V2.4:

Safe-Kill universal gate (~/.aisb/lib/omega-v2/safe-kill.sh). One wrapper every kill path must traverse. Refuses to kill sessions whose progress.json shows pending todos, whose done.json is missing, or that have not been acked. Protected sessions (Home / AISB / tunnels / Omega infrastructure) are immortal even with --force. The --force flag exists for legitimate emergencies (claude crash with state preserved on disk, operator explicit abort) and is always audited — every forced kill writes a marker plus a kill.forced lifecycle event with reason and caller.
Lifecycle event log (~/.aisb/state/events/lifecycle.events.log). A unified append-only JSON-lines log of every dispatch, heartbeat, todo update, mark-done, ack, block declaration, and kill decision. Replaces forensic archaeology across ten scattered state files when debugging what happened to session X?.
Orchestration-aware OBSERVER (oracle-shadow.sh, oracle-observer.sh). The previous STAGNATION signal fired on time alone — an oracle that had been idle past a 12-hour floor was nudged regardless of whether it was correctly waiting for in-flight workers. The new classifier reads workers.txt, cross-checks each worker's tmux state, progress.json mtime, and recent lifecycle events, and classifies the oracle as one of five states: WAITING_FOR_WORKERS, PENDING_TRIAGE, IDLE_AFTER_BATCH, GENUINELY_STUCK, NO_WORKERS_EVER. Only the last two qualify as emergencies. The M5 rule for "stuck worker" detection was tightened from one signal (tracking events) to three concordant signals (tracking + lifecycle + heartbeat snapshot); a worker running a slow build no longer trips it.
Inverted-logic fix in the close gate (worker-close-check.sh). The branch that was supposed to BLOCK when the worker is still working fired instead when the worker was idle — the regression that made the gate dead code. Captured with an explicit exit-code branch test (AC_RC -ne 0) instead of the silent shell short-circuit that hid the inversion. Without this fix, every other layer above was running uphill.
Bot Python kill paths gated (bot/aisb/handlers.py). The three sites where operator-clicked "Close oracle + workers" buttons killed sessions now route through safe-kill.sh --force. Behavior is unchanged for the operator (their click still closes), but every close is now logged in kill-forced-<session>.json markers plus the lifecycle event log. Operators retain the audit trail of what they killed and when.
Per-project oracle prompts re-aligned. Twenty per-project oracle system prompts contained a kill+restart recommendation for stalled workers — direct contradiction of the close-gate guarantee. Replaced with investigate first (tmux capture-pane) — if truly stuck and the close-gate allows it, use safe-kill.sh; never kill workers with unfinished progress.

The lifecycle skeleton on dispatch. Every new worker spawned via dispatch-to-session.sh now receives an initialized todo.json + progress.json + mission.json + heartbeat file before reading its first prompt. Workers refine their todos via ~/.aisb/lib/omega-v2/omega-todo.sh declare ... and acknowledge each completion via omega-todo.sh done <id>. The close-gate has authoritative input — no more synthesizing done_clean from inferred state.

Silent-hang detection without killing. The new heartbeat-watch.sh cron (every minute) cross-references three signals — progress.json mtime, lifecycle event count, tmux pane content hash — and emits a block.declared event when a worker has stopped contributing despite holding pending todos. The worker is never killed; the operator is notified via the parent oracle's inbox and the daily Telegram digest. Distinguishing frozen mid-work from doing slow work replaces the old timer-only assumption.

Kill-path coverage map. After V2.4, every kill site that could affect a worker or oracle routes through the gate or is provably benign (self-managed scratch sessions; explicit operator abort). Coverage at publication:

  Path                                  Gate                Audit
  ─────────────────────────────────     ──────────────      ─────────
  patrol.sh (7 sites)                   safe-kill           lifecycle event
  reclaim-stale.sh                      safe-kill           lifecycle event
  oracle-watchdog.sh respawn cycle      safe-kill --force   kill-forced marker
  bot/handlers.py operator "close"      safe-kill --force   kill-forced marker
  bot/handlers.py "close_oracle"        safe-kill --force   kill-forced marker
  oracle-shadow.sh STAGNATION           orchestration_state (observe-only)
  oracle-observer.sh M5 worker stuck    3-signal classifier (observe-only)
  heartbeat-watch.sh                    no kill — emits block.declared

Honest gaps that V2.4 did not yet close (now addressed in V2.5).

The Mission Sweep-Completeness Gate from V2.3 still relies on keyword heuristics. False negatives on missions phrased without trigger words remain possible — unchanged in V2.5.
Workers spawned before V2.4 do not have lifecycle state — they fall back to the (now-correctly-working) legacy gate which is permissive when no todo.json exists. Coverage grows as new dispatches arrive.
Inconsistent-state workers from pre-V2.4 — addressed in V2.5 by the deadman switch (auto-mark-pending after 10min idle) and audit-mode bypass.

Patch log — V2.5 (2026-05-18, Adversarial-Validated Edition)

V2.5 is not a patch release in the cosmetic sense. It is the measured-behavior publication that V2.4 explicitly deferred. A 10-hour adversarial validation between 2026-05-17 20:10 UTC and 2026-05-18 06:33 UTC ran three Quality Arsenal audits in three iterative cycles, found four production bugs in the V2.4 lifecycle layer, fixed them in place, and re-audited to convergence. The full results are in § 11 · Adversarial validation results (V2.5).

The four V2.5 fixes, summarized:

FIX-CONSUMED — worker-close-check.sh now accepts done.json.consumed (the post-patrol-ack form of done.json). Pre-V2.5, the close-check looked only for done.json and emitted exit 3 BLOCK once the file had been consumed, producing zombie sessions for legitimately-completed workers. Cascade-cleaned four pre-existing zombies on deployment.
FIX-AUDITMODE — Quality Arsenal audit workers (/codeaudit, /secaudit, /debugaudit, etc.) track todos internally via Claude Code's native TodoWrite and don't call omega-todo declare. The pre-V2.5 declared=false guard correctly stalled them as a side effect. V2.5 distinguishes audit-mode workers (canonical done.json + skill marker) from anonymous undeclared workers — audit close allowed, attacker resistance preserved. Verified by regression tests on twelve adversarial probes.
FIX-DEADMAN — patrol.sh now runs a deadman switch each cycle. Any worker with progress.json mtime older than 10 minutes, idle at ❯ prompt, with no done.json on disk, gets an auto-written status=pending plus reason "deadman: idle Nmin without mark-done" and a safe-kill via the gate. Belt-and-suspenders on top of the audit-mode bypass.
FIX-MODAL — omega-v2/modal-dismisser.sh (new) auto-dismisses six known Claude Code blocking modals (How is Claude doing?, Auto-update available, Press Esc to skip, etc.) by sending the safe dismiss key (Escape by default). Invoked early in each patrol cycle. Protected sessions (Home*, AISB-master, Tunnel*) are never touched. Every dismissal writes a JSONL audit entry. Live-tested with a fake modal session — detection + dispatch + log all confirmed.

Why this matters. The four bugs above were not theoretical. Three of them were caught by the system's own audits during the validation; the fourth (modal-dismisser) was caught by the oracle's own failure to mark its mission complete and is now permanently fenced off. The system audited itself, fixed itself, and re-audited itself, three iterative cycles — a self-healing pipeline working as designed.

11 · Adversarial validation results (V2.5)

The measured behavior promised in V2.4. Between 2026-05-17 20:10 and 2026-05-18 06:33 UTC, a 10-hour adversarial validation ran on the live system. An oracle (oracle-omega) dispatched three Quality Arsenal audits (/codeaudit, /debugaudit, /secaudit) followed by twelve adversarial bypass probes against the close-gate, run across three iterative cycles. Each cycle ran the audits in parallel, harvested every failure, dispatched fix workers, and re-audited. Every bug found in the live system was patched before the next iteration began.

Final scores (iter 3).

Audit	Score	Verdict	Confidence
`/codeaudit`	100 / 100	PASS	high
`/debugaudit`	100 / 100	PASS	high
`/secaudit`	100 / 100	PASS	high

Adversarial probes. Twelve bypass attempts against the close-gate exit semantics — BOM-injection in done.json, integer-typed status fields, null-byte payloads, leading-zero todos counters, fractional-epoch mtime, symlink redirection, manual belt unit-tests of every probe. All twelve correctly STALL with the expected exit code. Zero new fail-OPEN paths detected.

Production bugs found and fixed during validation (the V2.5 deltas).

FIX-CONSUMED (worker-close-check.sh). Patrol auto-renames worker-<S>.done.json to done.json.consumed after notifying the parent oracle. The pre-V2.5 close-check looked only for done.json and emitted exit 3 BLOCK once the file had been consumed — producing zombie sessions for legitimately-completed workers. V2.5 accepts either form and prefers the active file when both exist.

FIX-AUDITMODE (worker-mark-done.sh + close-gate.sh + worker-close-check.sh). Audit-skill workers (/codeaudit, /secaudit, etc.) track their todos internally via Claude Code's native TodoWrite tool and never call omega-todo declare. The pre-V2.5 declared=false guard — designed to refuse attacker-spoofed done signals — correctly stalled audit-mode workers as a side effect. V2.5 distinguishes audit-mode workers (canonical done.json + skill marker) from anonymous undeclared workers, allowing audit close while preserving attacker resistance. Regression tests confirm the spoofed paths still STALL.

FIX-DEADMAN (patrol.sh). A worker that goes idle at ❯ prompt without calling worker-mark-done.sh was previously immortal — no timer wrote a done.json on its behalf. V2.5 adds a deadman switch: if a worker has progress.json mtime older than 10 minutes, is idle (no esc to interrupt in pane), and has no done.json, patrol auto-writes status=pending + reason "deadman: idle Nmin without mark-done" and safe-kills via the gate. Belt-and-suspenders on top of the audit-mode bypass.

FIX-MODAL (patrol.sh + new modal-dismisser.sh). Claude Code occasionally pops a "How is Claude doing?" feedback modal AFTER the assistant message finalizes. On 2026-05-18, an oracle whose mission was objectively complete (done_clean stated, 3 audits PASS) sat blocked for ~6h because the modal trapped the terminal before oracle-mark-mission-complete.sh could run. V2.5 introduces omega-v2/modal-dismisser.sh: a per-cycle scan that detects six known blocking-modal patterns (How is Claude doing?, Auto-update available, Press Esc to skip, etc.) and sends the safe dismiss key (Escape by default). Protected sessions (Home*, AISB-master, Tunnel*) are never touched. Every dismissal writes a JSONL audit entry.

Self-healing chain. After each fix landed, the next iteration's audits re-ran and re-scored. The V2.5 deltas were not theoretical — they were produced by the validation itself, in response to live failures, and verified by the same audits that found them. The system audited itself, fixed itself, and re-audited itself, three times.

Residuals. Five inconsistent-state legacy workers remain on the live system from pre-V2.4 dispatches. They are now annotated by omega-status as INCONSISTENT and will be reaped by the deadman switch on their next idle window. Zero new zombies have appeared since 2026-05-18 06:33.

E2E layer-alive check (V2.5). Same 18-probe surface as V2.4 plus three new probes: modal-dismisser.sh executable, audit-mode coherence check, deadman switch dry-run. Latest run before publication: 21 PASS / 0 FAIL.

Patch log — V2.5.1 (2026-05-22, Patterns Edition)

V2.5.1 is the operationalization release. V2.5 hardened the lifecycle. V2.5.1 promotes five outside-world patterns into the live system and exposes them through a single operator-facing diagnostic. The patterns are not theoretical: each one ships with a script, an integration point, an audit-log destination, and (where appropriate) a kill-switch. The full integration map lives at ~/.aisb/docs/PATTERNS-INTEGRATION.md (v3.2).

The five operational patterns now wired into Omega:

Source-as-Context (mount-package-source.sh). Workers used to hallucinate third-party APIs from training data — First Law had no leverage at write time, only at runtime. The fix mounts the real source code of each dependency under ~/.aisb/refs/repos/<pkg>/ as a shallow git clone, and dispatch-to-session.sh calls mount-package-source.sh --inject <project> after mission-init.sh so every worker boots with REF_SOURCES pointing at real code. The worker self-contract now requires grep $REF_SOURCES before any third-party API call. Twenty-six packages mapped, seven live on the VPS, ~150 MB total. Monthly cron prunes mounts unused for thirty days.
Cleanup-Wave (cleanup-wave.sh). Quality Arsenal audits used to find duplications after commit, paying refactor cost post-facto. The cleanup wave inserts between fix and re-audit in the DAG (oracle-prompt.sh rule R-26), with a tight scope contract (no renames, diff cap, behavior-preserving). Halt reasons land in grep-loop-<session>.halt.json for postmortem. Manual invocation for now — auto-injection on the v3.3 backlog.
Bounded Grep-Loop (grep-loop.sh). Audit Step 8b (fix-and-reaudit) previously had no exit guarantee — workers could spin on subjective LLM judgement. The grep-loop wrapper enforces an objective verify command (exit code, not "does it look good now?"), a max-iteration budget, and a scope-creep gate (cumulative diff threshold). Forbidden in the contract: LLM calls as gate. Exit codes: 0 verify_clean · 2 max_iter · 3 scope_creep · 4 worker_died.
Effect-TS Schedule retries (schedule.sh). Three declarative policies in bash, mirroring the Effect-TS shapes: exponential backoff with jitter, fixed-interval polling, one-shot with timeout. Wired into telegram.sh send_message (no more silent drop on 429/5xx), ship/push.sh (transient GitHub failures), ship/deploy.sh (timeout 600s), ship/verify.sh (poll deploy URL until 200, every 15s for up to 10 min). Every attempt is logged to ~/.aisb/logs/schedule.jsonl with policy name, attempt number, return code, duration, and halt reason.
Ship Pipeline as capability blocks (ship/{build,commit,push,deploy,verify,orchestrate}.sh). The old oracle-ship.sh was a 559-line monolith mixing build, commit, push, deploy, verify, freeze, rollback, telegram and state. V2.5.1 decomposes it into six blocks. Each block takes explicit --flag args, emits exactly one JSON object on stdout, and uses exit codes 0 ok / 1 failed / 2 halt. Opt-in via OMEGA_SHIP_V32=1 or .orchestrator/ship-config.json: {"use_v32_pipeline": true}. Default OFF — v3.3 will do the cutover once live confidence accumulates. Purely additive in the meantime.

Pattern Hermes — omega-doctor 13-check. A single diagnostic an operator can run to verify the system is healthy. Thirteen check groups: filesystem layout, core scripts, v3.2 helpers, tmux daemon, cron entries, Claude CLI auth, Telegram bot, disk usage, state hygiene (stale resurrect attempts, stale .consumed), Convex (optional), network reachability (Telegram + Anthropic API), recent events (bot.log errors + schedule.jsonl), and mounted source references. Four modes: --quick (skip network probes), --full, --fix (auto-repair safe issues — mkdir, chmod +x, prune stale), --json (machine-readable). Runs at the end of every bash setup; daily 03:00 UTC cron is on the v3.3 backlog. Invocation: omega doctor (CLI) or ~/.aisb/lib/omega-doctor.sh --quick (direct).

Where each pattern fires automatically today:

Trigger	Pattern(s) applied
Every worker dispatch	#1 Source-as-Context (`--inject`)
Every Telegram message	#4 Schedule retries (`telegram.sh`)
Every ship pipeline (when opt-in)	#5 Capability blocks + #4 internal retries
Audit Step 8b (recommended)	#3 Grep-Loop with objective verify
Pre-audit cleanup (recommended)	#2 Cleanup-Wave (wrapped by #3)
Monthly cron	`mount-package-source.sh --prune-stale`
End of `bash setup` + on demand	`omega-doctor.sh`

Status table (V2.5.1).

Pattern	State
#1 Source-as-Context	Shipped + auto-invoked at dispatch
#4 Schedule retries	Shipped + auto-invoked on Telegram + ship
Hermes / omega-doctor	Shipped + invocable via `omega doctor` and setup
#2 Cleanup-Wave	Shipped + documented for manual invocation
#3 Bounded Grep-Loop	Shipped + documented for manual invocation
#5 Ship blocks	Shipped opt-in (default OFF until v3.3 cutover)

All integrations are purely additive — no existing script changes its default behavior in V2.5.1.

Case F — Real-world product mission, four iterations (Agentik Academy rebrand, 2026-05-22). Between 2026-05-22 07:00 and 09:00 UTC, a customer-facing product mission — "rebrand Kommu Master Class to Agentik Academy, EUR 2 997 pricing across the master-class purchase funnel" — ran through the production audit chain (/uiuxaudit, /flowaudit, /secaudit, /debugaudit, /apiaudit, /codeaudit). Four iterations were needed: iter-1 surfaced seven adversarial findings, iter-2 closed all seven and exposed a production-deploy gap (rebranded code had not yet shipped to live), iter-3 verified the Stripe webhook's durable grant persistence with graceful side-effect degradation, iter-4 closed an orphaned CommunityPreview component that had zero render sites. Every iteration's evidence is in .audit/rebrand-academy-master-class/*.json. This is the first multi-iteration audit chain on a live customer-facing product change — complementary to the V2.5 in-house adversarial validation, which audited Omega against itself. The mission completed without operator intervention beyond approving the deploy.

E2E layer-alive check (V2.5.1). Same 21-probe surface as V2.5, plus six new probes (one per pattern + omega-doctor.sh executable). Latest run before publication: 27 PASS / 0 FAIL.

End of document — version 2.5.1 · Patterns Edition · 2026-05-22

Omega — Autonomous Engineering Operations

A whitepaper on multi-agent orchestration with verifiable autonomy

Version 2.5.1 · Patterns Edition · 2026-05-22

Executive summary

What makes Omega different from other agent frameworks is its operational discipline:

Three Laws that override every prompt: runtime truth over code intent, researcher posture over sycophancy, autonomous decision over idle waiting.
A 12-step ship pipeline with deploy verification, freeze-don't-rollback default, and per-project locks.
A 17-audit Quality Arsenal covering code, runtime, design, performance, security, accessibility, SEO, data, API, copy, DX, motion, automation, logic, and product retention. Each audit uses Gestalt clarity gating + Popper falsification + hinge-point 10× scrutiny.
A supervision mesh of cron-driven patrols, event-driven reactors, and daemons that detect categorized failure modes and recover stalled sessions.
A Skill Orchestration Layer (new in v2.2). Every junction in the 4-level chain is now backed by an invocable, versioned skill instead of an ad-hoc f-string or regex. Eleven skills replace the prose contracts that previously lived inside Python handlers and bash heredocs.

The honest gaps remain: Omega's production telemetry is young (the live system has been running for weeks, not years), and the published metrics are bounded by that fact.

1 · The problem — Why autonomous agents fail

The promise of autonomous coding agents — "describe what you want, get working software back" — has been pitched many times. In practice, four failure modes recur:

Loss of context. An agent solves the first sub-task, then forgets why it was solving it. Single-context-window approaches collapse when the task exceeds the window or branches into parallel work.

Sycophancy. Most LLMs are RLHF-tuned to agree. When a user proposes a flawed approach, the agent codes it instead of challenging it. The result is fast garbage.

Stalls without escalation. The agent encounters ambiguity, asks the user a question, and waits indefinitely. If the user is not watching the tmux session, the system hangs forever.

A fifth failure mode is endemic to multi-agent systems specifically:

Omega is built around these failure modes. Each is named, attacked, and verifiable.

        Problem                       Omega's response
─────────────────────────  ─────────────────────────────────
 Loss of context           4-level chain; workers are short-lived;
                           oracle context survives across workers;
                           cross-session memory (W5) recalls lessons
                           
 Sycophancy                Second Law — challenge the premise
                           before coding, with evidence
                           
 Silent failure            3-tier close-gate (worker .done.json,
                           oracle ack, supervisor close decision);
                           Layer 4 Mission Auditor adds an
                           independent audit gate before ack
                           
 Idle stalls               Third Law — never wait, always decide;
                           legal stops are .done.json or blocked.json
                           with fallback action already executed;
                           /resurrect cascade recovers any stall
                           
 Contract drift            Skill Orchestration Layer — 11 skills
                           replace 21+ f-strings / regex hits;
                           every junction is now an invocable
                           protocol with explicit toggles

2 · Omega's answer — A 4-level architecture

Every Omega operation flows through four levels. Each has one job, one input contract, one output contract.

                  ┌─────────────────────────────────────────────┐
                  │  LEVEL 0  —  Human operator                 │
                  │  Sends an intent (one Telegram message)     │
                  └────────────────────┬────────────────────────┘
                                       │
                                       ▼
                  ┌─────────────────────────────────────────────┐
                  │  LEVEL 1  —  Routing bot                    │
                  │  Classifies (Simple / Medium / Complex /    │
                  │  Epic), resolves the project, builds a      │
                  │  brief, dispatches an oracle                │
                  └────────────────────┬────────────────────────┘
                                       │
                                       ▼
                  ┌─────────────────────────────────────────────┐
                  │  LEVEL 2  —  Project oracle                 │
                  │  Plans, dispatches workers, verifies done,  │
                  │  optionally ships, signals supervisor       │
                  └────────────────────┬────────────────────────┘
                                       │
                                       ▼
                  ┌─────────────────────────────────────────────┐
                  │  LEVEL 3  —  Workers                        │
                  │  Read PLAN, execute steps, verify, write    │
                  │  .done.json, self-kill                      │
                  └─────────────────────────────────────────────┘

Why four levels and not three or five

Level 1 ↔ 2 separation. The bot does not need to know project internals. The oracle owns project context (CLAUDE.md, codebase layout, file ownership rules). The bot just routes.

Three levels would force the oracle to do per-task execution, blowing its context. Five levels would add ceremony without separation of concerns.

Multi-oracle parallelism

        Project X
           │
           ├── oracle-X       owns app/**, components/**
           ├── oracle-X-2     owns api/**, db/**
           └── oracle-X-3     owns docs/**, tests/**
                                (assigned only if file sets disjoint)

This pattern handles the case where a single human intent ("ship a feature plus update the docs plus add tests") naturally splits across non-overlapping areas of the codebase.

3 · Core guarantees

Four guarantees define Omega's contract with the operator. Each is enforced mechanically, not by goodwill.

Guarantee 1 — Autonomy

Once dispatched, a worker never asks the operator a question. The legal exits are:

.done.json written, status done_clean — work verified complete.
.done.json written, status pending — partial, with pending_actions[] listing what remains.
.done.json written, status failed — genuinely blocked, with evidence.
worker-blocked-<session>.json written + fallback action executed — truly ambiguous, but the worker proceeded with its best guess while signaling the supervisor.

The AskUserQuestion tool is forbidden in dispatched sessions. Workers that pause at a question mark are by definition broken.

Guarantee 2 — Verification

Workers do not self-certify. Three layers acknowledge completion:

  Worker writes .done.json     ─── Tier 1: "I think I finished"
       │
       ▼
  Oracle reads, runs VERIFY    ─── Tier 2: "Confirmed, work meets spec"
  COMMAND, calls
  close-gate ack-worker
       │
       ▼
  Mission Auditor intercepts   ─── Tier 2.5: "Forensic audit ≥85/100"
  (1-3 skills, ≥85/100)            (Layer 4 of the Safety Mesh)
       │
       ▼
  Supervisor reads ledger,     ─── Tier 3: "Safe to close, operator informed"
  decides close window,
  notifies the operator

Guarantee 3 — Isolation

Workers cannot harm each other:

Each worker has its own context window (no shared memory between workers).
Each worker has its own state directory (worker-<session>.* files, namespaced).
Atomic writes everywhere (tmp + mv -f) prevent half-written state files.
The brief-replay file (Layer 1) is written atomically before the tmux paste (W6, 2026-05-16) so a mid-paste crash still leaves a valid brief on disk for replay.
Optional git worktrees per oracle for cross-cutting changes that would conflict otherwise.

The worktree subsystem is chaos-tested: 40 of 40 cases pass, including process kills mid-operation, disk-full simulation, and concurrent worktree creation on the same project.

Guarantee 4 — Close-gate

The supervisor never auto-closes a session if:

Status is not done_clean.
Ship result is failed or frozen.
pending_actions[] is non-empty.
The operator has interacted with the bot during the grace window.
A new oracle for the same project was dispatched during the grace window.

Auto-close happens only when all conditions point to "the work is genuinely finished, the operator has been notified, and the resources can be freed".

4 · Operational flow

This section walks one complete intent from operator to ship.

Step 1 — Intent

The operator sends a message to the routing bot. The message is in natural language, English or French, optionally with attachments (screenshots, Linear links, audit keywords).

Step 2 — Classification and routing

  Simple   ─ one read-only check                   ─ done in-band
  Medium   ─ one specialist, single area           ─ spawn 1 worker
  Complex  ─ multiple specialists, multi-domain    ─ /team in tmux
  Epic     ─ cross-department, hours+              ─ /aisb full chain

Step 3 — Brief construction

The bot builds a brief for the oracle. The brief is now produced by the /dispatch-oracle skill (the legacy f-string remains as fallback when the skill probe times out). The brief includes:

{
  "project": "Project name",
  "mission": "One-line summary",
  "ship": true | false,
  "files_owned": ["glob patterns the oracle may touch"],
  "deploy_timeout_min": 10,
  "lifecycle": "persistent | ephemeral"
}

ship is set true only when the operator explicitly asks (keywords: ship, deploy, push, merge, livre, "envoie en prod"). Audits and research never ship.

Step 4 — Oracle planning

Crucially, the oracle never writes project code directly. Even a one-line typo fix goes through a worker session.

Step 5 — Worker dispatch with the PLAN protocol

Each worker is dispatched via the /dispatch-worker skill (canonical contract; the legacy bespoke prompt assembly remains as fallback). The worker prompt is:

== MISSION ==
<one-line mission>

== PLAN ==
1. <step 1, concrete, verifiable>
2. <step 2>
3. <step 3>
...

== FILES IN SCOPE ==
- <glob or path list>

== DONE CRITERIA ==
- <criterion 1, observable in <60s>
- <criterion 2>

== VERIFY COMMAND ==
<single shell command that returns 0 when done>

== HANDOFF ==
When PLAN complete AND VERIFY COMMAND passes, call:
bash <path>/worker-mark-done.sh done_clean '<summary>'

== PRE-BOOT KNOWLEDGE PACK ==
Project context, language defaults, audit triggers,
and (W5) the five most-recent lessons/mistakes
from the cross-session memory store, scoped to this
project.

Why PLAN and not the native /goal primitive

Claude Code v2.1.141 ships a native /goal <condition> primitive — the engine auto-loops until the condition is met. We integrated this in two phases:

Phase 1: opt-in via GOAL_NATIVE=true for solo workers with short deterministic conditions.
Phase 2: default-on for all solo workers.

/goal remains available as Phase 1 opt-in for short deterministic conditions (e.g. npx vitest passes).

Step 6 — Audit (forensic)

Gestalt clarity gate. First pass: is the artifact comprehensible at all? If not, the audit stops and reports the clarity failure first. There is no point measuring detail on something incoherent.
Popper falsification. Every claim is paired with a falsification check. "This component is accessible" requires "What would prove it isn't?" — and that check is executed.
Hinge-point 10× scrutiny. The audit identifies the one or two phases that, if wrong, invalidate everything downstream. Those phases get 10× the rigor of others.

Step 7 — Ship (optional)

If brief.ship is true, the oracle runs the 12-step ship pipeline:

1.  Build  (npm run build or project-specific, via safe-npm-build.sh mutex)
2.  Stage  (whitelist files; refuse extras)
3.  Secret scan staged (gitleaks)
4.  Whitespace check (git diff --cached --check)
5.  Commit (conventional message)
6.  Acquire flock per-project (serializes oracles)
7.  Check freeze flag (if frozen, abort + alert)
8.  Pull --rebase (auto-abort on conflict, keep local commit)
9.  Push (retry once after re-rebase)
10. Deploy (whitelisted command; default Vercel + token)
11. Poll deploy status (max deploy_timeout_min, default 10 min)
12. Write .done.json with commit, push URL, deploy URL, duration

Step 8 — Worker handoff

The worker's tmux session schedules a self-kill 5 seconds after the handoff — freeing the slot for the next dispatch.

Step 9 — Mission Auditor (Layer 4 — END)

Step 10 — Oracle ack

Step 11 — Supervisor close decision

The supervisor (cron-driven, every minute) reads all oracle done.json files and applies the close decision tree:

  done_clean + ship.result in {ok, skipped}     → notify + close after grace
  done_clean + ship.result in {failed, frozen}  → notify + keep alive
  pending                                       → notify + inline "continue" button
  failed                                        → send logs + keep alive

The grace window resets if the operator interacts with the bot or a new oracle is dispatched on the same project.

5 · Reliability model

   ┌──────────────────────────────────────────────────────────────────┐
   │  Layer 1 — BRIEF-REPLAY      (dispatch persistence, START)       │
   │  Layer 2 — CPU GUARD         (load admission control, START)     │
   │  Layer 3 — SHADOW MANAGER    (live signal monitor, DURING)       │
   │  Layer 4 — MISSION AUDITOR   (quality gate at handoff, END)      │
   └──────────────────────────────────────────────────────────────────┘

These four layers complement the existing supervisory loops described below.

Smart Resurrect — the stall-recovery cascade

Worker stall recovery was redesigned in W10 (2026-05-16) into a four-tier cascade orchestrated by omega-resurrect.sh and the /resurrect skill:

┌─────────────────────────────────────────────────────────────────────┐
│ Worker stall signal → omega-resurrect.sh                            │
│                                                                     │
│  Tier 1 (0 token)  : todos + events + brief → context-aware nudge  │
│  Tier 2 (0 token)  : pane regex → error-type-specific recovery     │
│                       (rate_limit | api_error | type_error |        │
│                        build_fail | oom | cmd_missing)              │
│  Tier 3 (opt-in)   : claude -p `/resurrect` skill (Haiku, free Max) │
│                       OMEGA_SMART_RESURRECT=skill required          │
│  Tier 4 (escalate) : Telegram via notify-bot.sh after 3 attempts    │
└─────────────────────────────────────────────────────────────────────┘

Tracking Reactor — event-driven supervision

┌─────────────────────────────────────────────────────────────────────┐
│  Worker writes event → JSONL close_write → inotify wakes reactor    │
│       │                                                             │
│       ▼                                                             │
│  Per-session 2s coalesce + 1s global debounce                       │
│       │                                                             │
│       ▼                                                             │
│  tmux capture-pane probe vs STUCK_REGEX (Awaiting/STOPPING/…)       │
│       │                                                             │
│       ▼                                                             │
│  Match → omega-resurrect.sh <session>  (Tier 1→4 cascade)           │
│                                                                     │
│  + 60s background ticker: scan tracking mtimes; >10min idle AND     │
│    session alive → omega-resurrect.sh (idle fallback path)          │
└─────────────────────────────────────────────────────────────────────┘

KAIROS retirement (W9, 2026-05-16)

Cross-session memory (W5, 2026-05-16)

Workers learn from past missions instead of repeating mistakes. Two-sided wire:

Recall (read side at dispatch time). knowledge-pack-builder.sh emits a PROJECT MEMORY section in the pre-boot knowledge pack when omega-memory list --project=$PROJECT --limit=5 returns rows. Every dispatched worker boots with the five most-recent lessons/mistakes for its project. Soft-fails when the DB is empty or absent — no impact on legacy flows.
Write (audit-time hook). mission-auditor.sh writes after the verdict is computed: APPROVED → kind=lesson, body=[mission_type · audits · score=N] <worker_summary> (capped 500 chars); REJECTED && iter ≥ 2 → kind=mistake with the per-audit name:score/100 findings so the next retry does not repeat the failing pattern.

Storage is SQLite FTS5 at ~/.omega/state/memory.db. Inspect via omega-memory list|search|stats.

Supervisor + daemon mesh

The supervisor is one of two cron loops. There are also four long-lived daemons. Together they form a recovery mesh.

  ╔══════════════════════════════════════════════════════════════╗
  ║  Cron */1 min : supervisor (close decisions, alerts, reaper) ║
  ║  Cron */2 min : event-driven oracle wake on worker done.json ║
  ║  Cron */3 min : observer (6 categorized failure modes M1-M6) ║
  ║  Systemd user : tracking-reactor.service (inotify, ~129ms)   ║
  ║                                                              ║
  ║  Daemon       : oracle process death detector                ║
  ║  Daemon       : abandoned-oracle reaper (TTL-bound)          ║
  ║  Daemon       : worker idle supervisor (no-tool-call timeout)║
  ╚══════════════════════════════════════════════════════════════╝

The six observer failure modes:

Code	Symptom	Recovery action
M1	Worker .done.json un-acked, siblings still alive	Nudge oracle via tmux send-keys
M2	All workers done, oracle idle > 5 min	Send report or close oracle
M3	Worker `failed`, oracle has not surfaced an alert	Alert via bot directly
M4	`worker-blocked-<session>.json` exists	Surface question to operator
M5	Worker has not emitted a tool event for X minutes	`/resurrect` cascade (was: `/team retry`)
M6	Oracle TodoWrite has not changed for N observer ticks	FYI digest (asymmetric, never imperative)

Nudges are throttled (one per 5 min per oracle) to avoid spam.

The incident that triggered the mesh (2026-04-15)

6 · Skill-Wired Orchestration (since v2.2)

v2.2 introduces a Skill Orchestration Layer: eleven invocable, versioned skills that replace the prose contracts at every junction in the chain.

The 11 skills

#	Skill	Junction it owns	Replaces
1	`/classify-intent`	Inbound Telegram message classification (and ambiguous mission-type in the auditor)	regex-only `handlers.py` classifier; misrouted vague messages like "Causio fait ce que tu sais"
2	`/dispatch-oracle`	AISB → Oracle dispatch prompt	f-string body inside `_build_oracle_dispatch_prompt`
3	`/dispatch-worker`	Oracle → Worker dispatch prompt	per-oracle bespoke worker prompt assembly
4	`/worker-protocol`	Worker self-contract on boot	embedded heredoc in `worker.md`
5	`/omega-protocol`	Oracle process-level contract	scattered rule files
6	`/resurrect`	Worker stall recovery (Tier 3 of the cascade)	obsolete `kairos.py::_nudge_oracle`
7	`/synthesize-report`	Done-handoff digest for Telegram	oracles reading raw `done.json`
8	`/format-telegram-report`	Telegram payload humanization	template-only `route_notify`
9	`/audit-mission`	Close-gate forensic verification (Layer 4)	static rules table only
10	`/plan-decompose`	Oracle plan decomposition for complex missions	manual decomposition
11	`/diagnose`	On-demand diagnostic snapshot of any oracle / worker	manual pane reads + jq

The skill-wired chain (visual)

GARETH ──Telegram──▶ AISB ──tmux──▶ ORACLE ──tmux──▶ WORKER ──/team──▶ AGENTS
   intent:    /classify-intent           │           │              │
   dispatch:  /dispatch-oracle           │           │              │
                                /dispatch-worker     │              │
                                  /omega-protocol    │              │
                                /plan-decompose      │              │
                                                /worker-protocol    │
                                                /resurrect (stall) │
                                                /diagnose (snapshot)│
                                                /audit-mission     │
                                                  (close-gate)      │
GARETH ◀──Telegram── AISB ◀──tmux── ORACLE ◀──tmux── WORKER ◀──────/synthesize-report
                              /format-telegram-report

Wiring matrix

The skills are not just defined — they are wired. The following table shows the number of verified call sites per layer (as of 2026-05-16 grep):

Layer	File	Skill hits
AISB handlers / prompts	`bot/aisb/handlers.py`, `bot/aisb/prompts.py`	21
Patrol (Telegram report)	`bot/aisb/patrol.sh`	4
Mission Auditor	`~/.aisb/lib/mission-auditor.sh`	4
Oracle dispatch (memory pack)	`~/.aisb/lib/dispatch-to-session.sh`	omega-memory (W5)

Failure mode and toggles

Env var	Default	Effect when unset/false
`SKILL_INTEGRATION_ENABLED`	`true`	AISB handlers/prompts skip skill probes
`SKILL_REPORT_ENABLED`	`true`	Patrol uses raw template instead of skill digest
`MISSION_AUDITOR_SKILL_CLASSIFY`	`true`	Auditor stays on regex-only classification
`OMEGA_USE_RESURRECT`	`1`	Shadow worker branch falls back to legacy `recovery_apply`
`OMEGA_SMART_RESURRECT`	unset	Setting `skill` enables Tier 3 LLM call
`SHADOW_LLM`	unset	Setting `haiku` enables Tier 2 disambiguation
`CLOSEGATE_SKIP_AUDIT`	unset	Setting `1` bypasses Mission Auditor entirely

Why a skill layer matters

Three reasons.

Versioning. A skill file at ~/.claude/commands/<name>.md has a path, an author, a change history, and can be diffed. An f-string inside a Python function is none of these.

The cost is one extra subprocess (claude -p) per junction. The fallback path means that cost is paid only when the system can afford it.

7 · Security model

Omega is built for an operator who runs the system on their own machine. The security model is therefore:

Protected scopes (the operator may forbid automation entirely)

Billing endpoints.
Account-management APIs.
Authentication / OAuth flows.
.env* files (any project).
The OAuth login script.

These are sacred. Workers never touch them, oracles never touch them, the supervisor never touches them. Removing a guard rail requires a manual code edit by the operator.

Defense scan layer

Every incoming prompt (and any text the operator wants to scan ad-hoc) can be passed through a defense scanner:

  Category            Examples
  ─────────────────   ─────────────────────────────────────────
  Prompt injection    ignore previous instructions, role hijack,
                      DAN, jailbreak, mode-switch, prompt-reveal
  Secrets             stripe keys, AWS access keys, GitHub PAT,
                      Slack tokens, private keys, GitLab PAT
  PII                 US SSN-like, credit-card-like, phone
  Suspicious URLs     URL shorteners, IP-as-URL, .onion, free TLDs

Verdicts: clean, warning, block. Critical matches (live Stripe key, .onion URL) block. Optional quarantine appends the verdict to a defense-alerts log.

No destructive autonomy

The system actively refuses certain shortcuts:

Workers never force-push.
Oracles never close themselves (only the supervisor closes).
Auto-rollback on deploy failure is opt-in per project, not default.
Sacred files (the supervisor, the death detector, the reaper, the idle supervisor) are version-locked — any drift triggers an alert.

Sacred files

8 · Evidence

This section reports what is measurable today. It does not report numbers we do not have. Omega's production telemetry is young, and that fact constrains the evidence base.

Wave-3 shipping log (2026-05-16)

Artifact	Type	Status	Evidence
`/omega-protocol`	skill	shipped	`~/.claude/commands/omega-protocol.md`
`/dispatch-oracle`	skill	shipped	`~/.claude/commands/dispatch-oracle.md`
`/dispatch-worker`	skill	shipped	342-line canonical specification
`/worker-protocol`	skill	shipped	`~/.claude/commands/worker-protocol.md`
`/resurrect`	skill	shipped + smoke PASS	brief-aware nudge with French/EN escape clause
`/synthesize-report`	skill	shipped	hybrid 70% template + 30% Haiku digest
`/format-telegram-report`	skill	shipped	patrol-wired (4 hits)
`/audit-mission`	skill	shipped	mission-auditor close-gate
`/classify-intent`	skill	shipped	hybrid regex + Haiku, mission-auditor W7 wire
`/plan-decompose`	skill	shipped	oracle complex-mission decomposition
`/diagnose`	skill	shipped	on-demand pane snapshot
W5 — cross-session memory wiring	fix	shipped	`omega-memory` in dispatch + auditor
W6 — brief atomic write before paste	fix	shipped	tmpfile + mv ordering in `dispatch-to-session.sh:622-625`
W7 — auditor /classify-intent hybrid	fix	shipped	`MISSION_AUDITOR_SKILL_CLASSIFY=true` default
W8 — event-driven tracking-reactor	fix	shipped	systemd user unit, 129 ms latency
W9 — KAIROS `_nudge_oracle` retired	fix	shipped	unconditional `return False`
W10 — `/resurrect` smoke PASS	validation	passed	brief-aware French nudge with escape clause
W12 — `omega-overview.md`	docs	shipped	single entry point for Omega system
AISB skill-wired chain	integration	shipped	21 grep hits in handlers/prompts
Patrol skill-wired chain	integration	shipped	4 grep hits in patrol.sh
Mission-auditor skill-wired chain	integration	shipped	4 grep hits

What was measured today (chaos + smoke tests, 2026-05-15 → 2026-05-16)

Test	Result	What it proves
Worktree E2E (5 scenarios)	5/5	Happy path, conflict, main moved, parallel, ship failure
Worktree chaos v1 (18 cases)	18/18	Process kills mid-operation, disk-full, race conditions
Worktree chaos v2 (8 cases)	8/8	Concurrent worktree-create on same project
Worktree chaos v3 (9 cases)	9/9	Interrupted ship + recovery
/goal Phase 1 opt-in smoke	5/5	Opt-in injection via GOAL_NATIVE=true works
/goal Phase 2 revert smoke	8/8	Default-on block is removed; PLAN protocol contracts in
Worker-mark-done oracle guard	Pass	Refuses oracle session names with rc=3 + redirect
PLAN protocol runtime test	1/1	End-to-end worker dispatch, plan execution, done.json
Sacred files sha256 stability	4/4	Patrol, watchdog, reaper, idle-supervisor unchanged
Defense scan (5 categories)	5/5	clean / injection / secret / URL / PII verdicts correct
/resurrect Tier-1 smoke (W10)	Pass	Brief-aware French nudge with escape clause
tracking-reactor event-to-trigger (W8)	129 ms	inotify wake → tmux probe → resurrect call

What is live in operation right now

Quantity	Source
Outcomes-database mission rows	2 (small N — system is young)
Worker `.done.json` files on disk (recent)	5
Tool-call events captured by the tracking hook	2,571 across 61 session files
Cron entries active	28 (supervisor + observer + flusher + ...)
Systemd user units active	1 (`tracking-reactor.service`)
Sacred files unchanged since	4–6 days (last verified today)
Safety Mesh layers wired	4 (brief-replay, CPU guard, shadow, mission auditor)
Shadow signals monitored (Tier 1)	14 (incl. `CPU_OVERLOAD`)
Mission Auditor mission types classified	9 + hybrid `/classify-intent` escalation
Quality Arsenal audits selectable by Mission Auditor	17
Skills in the Skill Orchestration Layer	11
Wired skill call sites (handlers / patrol / auditor)	29 total grep hits

Honest gaps

Production mission count is small. The outcomes database has 2 rows. A claim like "10,000 missions executed at 99% success" would be a fabrication. Honest framing: the system is in early operation; chaos tests validate the structural properties (race conditions, recovery, isolation) that production data cannot yet validate at scale.
Mean time intent → ship. Not yet computed across a statistically meaningful sample. Single observed examples are in the tens of minutes for narrow Linear-style fixes, hours for cross-cutting features. These are operator anecdotes, not telemetry.
Cost per mission. Token consumption is captured per tool call (the tracking hook) but not yet aggregated into a per-mission cost report. A dashboard for this is planned.
Incident-avoidance count. The observer fires nudges, but the proportion of nudges that prevented a stall (vs nudges sent into already-recovering sessions) is not yet computed. The new tracking-reactor will help here: every event-driven trigger writes a line to ~/.aisb/logs/tracking-reactor.log with the session and the matched stuck-regex.
Skill fallback frequency. The skill-wired chain has fallbacks at every junction. How often the fallback fires versus the skill succeeds is logged but not yet aggregated.

Five short case studies (concrete, verifiable today)

What chaos tests cannot prove

9 · Roadmap

Recently delivered (since v2.1)

Skill Orchestration Layer (Wave-3). Eleven invocable, versioned skills replace the prose contracts at every junction in the 4-level chain. 29 verified call sites across handlers, patrol, and mission-auditor. Every skill probe is best-effort with silent fallback.
W5 — Cross-session memory wired both sides. knowledge-pack-builder.sh emits a PROJECT MEMORY section at dispatch time; mission-auditor.sh writes lessons (APPROVED) and mistakes (REJECTED iter≥2) after every verdict.
W6 — Brief atomic write before tmux paste. Closes a race window where a crash between paste and verify left the shadow without a brief to replay.
W7 — Mission Auditor hybrid classifier. Ambiguous generic cases escalate to /classify-intent via Haiku micro-call. Zero behavioral regression for the 80% fast-path.
W8 — Tracking Reactor. Event-driven (inotify) supervision via systemd --user unit. Measured 129 ms event → trigger latency. Singleton via flock. Shares the 600s throttle ledger with the cron observer so they cannot double-fire.
W9 — KAIROS _nudge_oracle retired. Replaced by /resurrect + tracking-reactor.
W10 — /resurrect smoke PASS. End-to-end validation of the Tier-1 cascade with a fake worker, brief-aware French nudge with escape clause, zero tokens consumed.
W12 — omega-overview.md. Single entry-point document for the Omega system, indexing all docs and skills.

Short-term (active)

Automate bot restart after handler code changes so progress-card features activate without operator intervention.
Exercise the PLAN protocol's sub-agent pattern (Agent(team_name=...)) on a real client mission, not just a smoke test.
Port the 28 cron entries to a native scheduling primitive so they become inspectable and version-controlled from inside a session.
Aggregate Mission Auditor verdicts into a per-classification accuracy report (do bug-fix audits actually catch bugs that escaped worker self-review?).
Aggregate the skill-fallback frequency per junction (how often claude -p times out vs returns valid output).

Medium-term

A live dashboard for mission timelines, cost, and outcome distribution. (Partial: a plan-visualizer exists; the timeline+cost projection is still pending.)
Dual-run a /loop-based supervisor against the legacy supervisor for 30 days, compare outputs, then switch over when convergence is proven.
A learning agent that watches accepted vs rejected proposals and feeds the rejection rate back into proposal quality estimates.

Still open (carried forward from v2.1)

W1 — High availability. The system runs on a single VPS. A second-host failover is designed but not yet deployed.
W2 — Multi-provider abstraction. All paths currently assume Anthropic Claude as the model provider. A provider-agnostic layer is sketched but not implemented.
W3 — Telegram fallback channel. When Telegram is unreachable, the operator has no out-of-band notification path. A second channel (email, push, alternate IM) is open.

Open architecture questions

Workers as sub-agents vs sub-sessions? Current design isolates workers in their own tmux sessions and their own Claude Code instances. Alternative: workers as sub-agents inside the oracle, sharing the oracle's context. Tradeoff: sub-agents save tmux slots and dispatcher overhead but lose context-isolation benefit and complicate the close-gate.
A richer goal primitive? If the platform raises the 4000-character limit on /goal (or introduces a plan-bound primitive), revisit the Phase 2 default-on revert.
Cross-project memory? The memory layer is currently scoped per system. Should client projects share a common lessons-learned corpus, or stay isolated?
Ship pipeline for non-Vercel hosts. The deploy-verify step is currently Vercel-specific via API polling. Generalize to Fly.io, Render, Cloudflare Pages.
Mission Auditor calibration. The 85/100 score floor is uniform across mission types. Should it vary (e.g., 90 for ship because production risk is higher, 80 for docs because the cost of false positives outweighs the cost of a tolerable doc imperfection)? This requires accuracy data the system does not yet have.
Shadow observe-only escalation granularity. Today the FYI digest groups all observe-only signals per oracle into a single throttled message. Should specific signal patterns (e.g., repeated BUILD_REGRESSION on the same project) bypass the digest and escalate immediately, even on oracles? Trades responsiveness against the cost that prompted the asymmetry in the first place.
Skill discoverability. Eleven skills are wired into the orchestration chain; the broader catalog at ~/.claude/commands/ is now ~140 invocable commands (audits, builders, marketing tools, diagnostics). How should new operators discover what is invocable without reading every file? omega-overview.md and /listcmd are first answers; a generated, searchable skill catalogue is the obvious next.

The judging standard

Every iteration of Omega is evaluated against four questions:

Did the operator have to babysit?
Did the system challenge a bad premise before coding it?
Did runtime evidence drive every conclusion?
Was the change surgical?

If any answer is "no", the iteration is incomplete — regardless of how much code shipped.

10 · Appendix — Technical reference

Session lifecycle (worker)

  Dispatch  ──▶  PRE-BOOT PACK injected (incl. W5 memory rows)
       │           Brief written atomically BEFORE paste (W6)
       ▼
  Read PLAN  ──▶  TodoWrite materialization (N items)
       │
       ▼
  Execute step 1 ──▶ update TodoWrite + progress.json
       │           Event written to tracking JSONL
       │           (tracking-reactor watches via inotify, W8)
       ▼
  Execute step 2
       │
       ⋮
       │
       ▼
  Run VERIFY COMMAND (must exit 0)
       │
       ▼
  worker-mark-done.sh done_clean '<summary>'
       │            (atomic tmp + mv to .done.json)
       ▼
  Mission Auditor (Layer 4)  ──▶  /audit-mission, 1-3 skills
       │                           min score ≥ 85/100 required
       ▼
  Oracle ack (close-gate)
       │
       ▼
  Memory write (W5)          ──▶  APPROVED → lesson
       │                           REJECTED iter≥2 → mistake
       ▼
  Schedule self-kill (5s)
       │
       ▼
  tmux session terminated

Failure recovery mesh (visual)

  ┌────────────────────────────────────────────────────────────┐
  │                                                            │
  │   Supervisor (cron */1)                                    │
  │   ├── reads oracle-*.done.json                             │
  │   ├── reads worker-*.done.json                             │
  │   ├── decides close / keep / alert                         │
  │   └── triggers notifications                               │
  │                                                            │
  │   Wake-on-worker-done (cron */2)                           │
  │   └── nudges oracle when worker .done.json un-acked        │
  │                                                            │
  │   Observer (cron */3)                                      │
  │   └── 6 failure modes M1–M6                                │
  │                                                            │
  │   Tracking Reactor (systemd user, inotify, W8)             │
  │   └── event-driven sub-second wake for stuck workers       │
  │       129 ms event → /resurrect cascade                    │
  │                                                            │
  │   Oracle-watchdog daemon                                   │
  │   └── detects oracle process death                         │
  │                                                            │
  │   Oracle-reaper daemon                                     │
  │   └── kills abandoned oracles past TTL                     │
  │                                                            │
  │   Worker-idle-supervisor daemon                            │
  │   └── workers with no tool calls past threshold            │
  │                                                            │
  │   RETIRED (W9): kairos.py::_nudge_oracle                   │
  │   └── replaced by /resurrect + tracking-reactor            │
  │                                                            │
  └────────────────────────────────────────────────────────────┘

State files (atomic write contract)

All state files in the system follow the same write pattern:

  Write          : tmp file in same directory, then mv -f to final
  Read           : open + lock-free read; staleness via mtime
  Update         : never in-place; always tmp + mv
  Cleanup        : grace window before deletion
  Naming         : namespaced by session for collision safety

W6 (2026-05-16) extends this contract to the brief-replay file specifically: it is now written before the tmux paste, not after, so a crash mid-paste still leaves a valid brief on disk.

Done.json schema (worker)

{
  "session":         "string",
  "status":          "done_clean | pending | failed",
  "summary":         "one-line description",
  "commit":          "git sha or empty",
  "finished_at":     "ISO 8601",
  "todos_total":     "int",
  "todos_completed": "int",
  "pending_actions": ["list of strings"],
  "written_by":      "string (helper name)"
}

Done.json schema (oracle)

{
  "oracle":      "string",
  "project":     "string",
  "status":      "done_clean | pending | failed",
  "started_at":  "ISO 8601",
  "finished_at": "ISO 8601",
  "duration_sec":"int",
  "mission":     "string",
  "ship":        {
    "requested":      "bool",
    "result":         "ok | failed | skipped | frozen",
    "commit":         "git sha or empty",
    "push_url":       "string or empty",
    "deploy_url":     "string or empty",
    "deploy_status":  "string"
  },
  "pending_actions": ["list of strings"],
  "report_path":     "string or empty",
  "lifecycle":       "persistent | ephemeral"
}

The 17 forensic audits — quick reference

Audit	Domain	Raw scale	Question
code	Code quality	/420	Is the code SOLID?
flow	User flows	/400	Does the experience WORK?
uiux	Design system	/420	Is the interface BEAUTIFUL?
debug	Runtime bugs	/360	What is BROKEN right now?
feature	Completeness	/320	Is the product COMPLETE?
perf	Performance	/360	Is it FAST?
sec	Security	/400	Is it SECURE?
a11y	Accessibility	/320	Is it ACCESSIBLE?
seo	Search optim.	/400	Is it DISCOVERABLE?
data	Data integrity	/320	Is the data INTACT?
api	API contracts	/360	Is the API SOLID?
copy	Messaging	/280	Is the copy CLEAR?
dx	Dev experience	/320	Is the DX SMOOTH?
motion	Animation	/360	Is the motion PURPOSEFUL?
automation	Scheduling	/330	Are automations RELIABLE?
logic	System logic	/360	Is the logic OPTIMAL?
retention	Product/CPO	/400	What features are MISSING? (read-only)

All scores normalize to /100 for comparison across domains.

The 11 wired skills — quick reference

Skill	Owner of	Token cost
`/classify-intent`	Inbound intent + ambiguous mission-type	~0 fast path / Haiku slow
`/dispatch-oracle`	AISB → Oracle brief assembly	Sonnet (one call per dispatch)
`/dispatch-worker`	Oracle → Worker prompt assembly	Sonnet (one per worker)
`/worker-protocol`	Worker self-contract	0 (read on boot)
`/omega-protocol`	Oracle process contract	0 (read on boot)
`/resurrect`	Tier-3 LLM stall recovery (opt-in)	Haiku
`/synthesize-report`	Worker done.json digest	template + Haiku (~50s budget)
`/format-telegram-report`	Telegram payload humanization	Haiku
`/audit-mission`	Close-gate audit selection / verdict	one audit at a time VPS-wide
`/plan-decompose`	Oracle complex-mission decomposition	Sonnet
`/diagnose`	On-demand pane snapshot	0

A note on extraction

Patch log — V2.3 (2026-05-17)

V2.3 is a hardening release. No new architectural surface — five surgical fixes plus instrumentation, all driven by a single incident.

The five fixes shipped in V2.3:

Mission Sweep-Completeness Gate in oracle-mark-done.sh. When the brief contains sweep keywords (all features, exhaustive, récursivement, 100/100, every page, 17 audits), the gate counts completed audit-worker .done.json files. If the count is below a configurable threshold (default 17 — one full Quality Arsenal pass), the gate refuses to let the oracle exit cleanly: it forces status=pending and adds an explicit pending action "sweep-incomplete: continue dispatching". The patrol then keeps the oracle alive across cycles and queues a resume_pending event in the oracle inbox. Failure becomes visible.
Worker Death Logger (worker-death-logger.sh, every two minutes). The system now keeps a rolling snapshot of alive worker sessions and detects workers that vanished without producing a .done.json — silent-kill events that previously left no trace. Each detection is logged to worker-silent-kills.jsonl with session name, parent oracle, dispatch timestamp, and age-since-dispatch. Observability only; no automatic mitigation yet (intentional — collect data first).
Smart-Check Observer rewrite in omega-resurrect.sh. The previous observer fired nudges on cron-driven idle detection alone, which produced false positives on oracles that were actively thinking. The new observer runs six independent signals before allowing any nudge: kill-switch state, pane-active heuristics (Claude thinking-verb detection), live worker presence on the same project, tracking-event mtime within ten minutes, decisions.md mtime within thirty minutes, and explicit pending_actions content. All six must pass before a nudge is allowed. Recorded result over a 60-minute window after deployment: 100 skips, 0 parasitic nudges.
AISB async wire stable. The bot-side skill subprocess invocation moved from blocking subprocess.run to asyncio.create_subprocess_exec. Stability over five hours of uptime confirms the deadlock that previously crashed the bot every ~40 minutes is gone.
Passive Telegram digest (omega-oracle-digest.sh, 20:00 UTC daily). Replaces the legacy auto-nudge loop. One scheduled message per day summarizes all active oracles, worker outcomes, Mission Auditor verdicts, system health, and detected silent kills. The operator decides what to intervene on — the system no longer guesses.

Patch log — V2.4 (2026-05-17, Lifecycle-Hardened Edition)

The forensic findings (audit pass, 2026-05-17). A targeted survey of every cron-driven, bot-driven, and operator-driven kill site found four classes of bypass:

Twelve tmux kill-session call sites in bash scripts that did not route through close-gate.sh. Some checked alive state alone (worker idle ≠ todos done). Some wrote synthetic done.json files and then killed the session — a false attestation.
An inverted-logic regression in worker-close-check.sh that made the gate always block, forcing patrol scripts to bypass it entirely to ever kill anything. The bypass became the de facto behavior; the gate became dead code.
Three Python kill sites in the Telegram bot triggered by operator "Close oracle + workers" buttons. These bypassed the gate with no audit trail.
An anti-pattern in twenty per-project oracle system prompts instructing the LLM to consider kill+restart of a stalled worker — direct contradiction of the close-gate guarantee.

The six fixes shipped in V2.4:

Safe-Kill universal gate (~/.aisb/lib/omega-v2/safe-kill.sh). One wrapper every kill path must traverse. Refuses to kill sessions whose progress.json shows pending todos, whose done.json is missing, or that have not been acked. Protected sessions (Home / AISB / tunnels / Omega infrastructure) are immortal even with --force. The --force flag exists for legitimate emergencies (claude crash with state preserved on disk, operator explicit abort) and is always audited — every forced kill writes a marker plus a kill.forced lifecycle event with reason and caller.
Lifecycle event log (~/.aisb/state/events/lifecycle.events.log). A unified append-only JSON-lines log of every dispatch, heartbeat, todo update, mark-done, ack, block declaration, and kill decision. Replaces forensic archaeology across ten scattered state files when debugging what happened to session X?.
Orchestration-aware OBSERVER (oracle-shadow.sh, oracle-observer.sh). The previous STAGNATION signal fired on time alone — an oracle that had been idle past a 12-hour floor was nudged regardless of whether it was correctly waiting for in-flight workers. The new classifier reads workers.txt, cross-checks each worker's tmux state, progress.json mtime, and recent lifecycle events, and classifies the oracle as one of five states: WAITING_FOR_WORKERS, PENDING_TRIAGE, IDLE_AFTER_BATCH, GENUINELY_STUCK, NO_WORKERS_EVER. Only the last two qualify as emergencies. The M5 rule for "stuck worker" detection was tightened from one signal (tracking events) to three concordant signals (tracking + lifecycle + heartbeat snapshot); a worker running a slow build no longer trips it.
Inverted-logic fix in the close gate (worker-close-check.sh). The branch that was supposed to BLOCK when the worker is still working fired instead when the worker was idle — the regression that made the gate dead code. Captured with an explicit exit-code branch test (AC_RC -ne 0) instead of the silent shell short-circuit that hid the inversion. Without this fix, every other layer above was running uphill.
Bot Python kill paths gated (bot/aisb/handlers.py). The three sites where operator-clicked "Close oracle + workers" buttons killed sessions now route through safe-kill.sh --force. Behavior is unchanged for the operator (their click still closes), but every close is now logged in kill-forced-<session>.json markers plus the lifecycle event log. Operators retain the audit trail of what they killed and when.
Per-project oracle prompts re-aligned. Twenty per-project oracle system prompts contained a kill+restart recommendation for stalled workers — direct contradiction of the close-gate guarantee. Replaced with investigate first (tmux capture-pane) — if truly stuck and the close-gate allows it, use safe-kill.sh; never kill workers with unfinished progress.

  Path                                  Gate                Audit
  ─────────────────────────────────     ──────────────      ─────────
  patrol.sh (7 sites)                   safe-kill           lifecycle event
  reclaim-stale.sh                      safe-kill           lifecycle event
  oracle-watchdog.sh respawn cycle      safe-kill --force   kill-forced marker
  bot/handlers.py operator "close"      safe-kill --force   kill-forced marker
  bot/handlers.py "close_oracle"        safe-kill --force   kill-forced marker
  oracle-shadow.sh STAGNATION           orchestration_state (observe-only)
  oracle-observer.sh M5 worker stuck    3-signal classifier (observe-only)
  heartbeat-watch.sh                    no kill — emits block.declared

Honest gaps that V2.4 did not yet close (now addressed in V2.5).

The Mission Sweep-Completeness Gate from V2.3 still relies on keyword heuristics. False negatives on missions phrased without trigger words remain possible — unchanged in V2.5.
Workers spawned before V2.4 do not have lifecycle state — they fall back to the (now-correctly-working) legacy gate which is permissive when no todo.json exists. Coverage grows as new dispatches arrive.
Inconsistent-state workers from pre-V2.4 — addressed in V2.5 by the deadman switch (auto-mark-pending after 10min idle) and audit-mode bypass.

Patch log — V2.5 (2026-05-18, Adversarial-Validated Edition)

The four V2.5 fixes, summarized:

FIX-CONSUMED — worker-close-check.sh now accepts done.json.consumed (the post-patrol-ack form of done.json). Pre-V2.5, the close-check looked only for done.json and emitted exit 3 BLOCK once the file had been consumed, producing zombie sessions for legitimately-completed workers. Cascade-cleaned four pre-existing zombies on deployment.
FIX-AUDITMODE — Quality Arsenal audit workers (/codeaudit, /secaudit, /debugaudit, etc.) track todos internally via Claude Code's native TodoWrite and don't call omega-todo declare. The pre-V2.5 declared=false guard correctly stalled them as a side effect. V2.5 distinguishes audit-mode workers (canonical done.json + skill marker) from anonymous undeclared workers — audit close allowed, attacker resistance preserved. Verified by regression tests on twelve adversarial probes.
FIX-DEADMAN — patrol.sh now runs a deadman switch each cycle. Any worker with progress.json mtime older than 10 minutes, idle at ❯ prompt, with no done.json on disk, gets an auto-written status=pending plus reason "deadman: idle Nmin without mark-done" and a safe-kill via the gate. Belt-and-suspenders on top of the audit-mode bypass.
FIX-MODAL — omega-v2/modal-dismisser.sh (new) auto-dismisses six known Claude Code blocking modals (How is Claude doing?, Auto-update available, Press Esc to skip, etc.) by sending the safe dismiss key (Escape by default). Invoked early in each patrol cycle. Protected sessions (Home*, AISB-master, Tunnel*) are never touched. Every dismissal writes a JSONL audit entry. Live-tested with a fake modal session — detection + dispatch + log all confirmed.

11 · Adversarial validation results (V2.5)

Final scores (iter 3).

Audit	Score	Verdict	Confidence
`/codeaudit`	100 / 100	PASS	high
`/debugaudit`	100 / 100	PASS	high
`/secaudit`	100 / 100	PASS	high

Production bugs found and fixed during validation (the V2.5 deltas).

Patch log — V2.5.1 (2026-05-22, Patterns Edition)

The five operational patterns now wired into Omega:

Source-as-Context (mount-package-source.sh). Workers used to hallucinate third-party APIs from training data — First Law had no leverage at write time, only at runtime. The fix mounts the real source code of each dependency under ~/.aisb/refs/repos/<pkg>/ as a shallow git clone, and dispatch-to-session.sh calls mount-package-source.sh --inject <project> after mission-init.sh so every worker boots with REF_SOURCES pointing at real code. The worker self-contract now requires grep $REF_SOURCES before any third-party API call. Twenty-six packages mapped, seven live on the VPS, ~150 MB total. Monthly cron prunes mounts unused for thirty days.
Cleanup-Wave (cleanup-wave.sh). Quality Arsenal audits used to find duplications after commit, paying refactor cost post-facto. The cleanup wave inserts between fix and re-audit in the DAG (oracle-prompt.sh rule R-26), with a tight scope contract (no renames, diff cap, behavior-preserving). Halt reasons land in grep-loop-<session>.halt.json for postmortem. Manual invocation for now — auto-injection on the v3.3 backlog.
Bounded Grep-Loop (grep-loop.sh). Audit Step 8b (fix-and-reaudit) previously had no exit guarantee — workers could spin on subjective LLM judgement. The grep-loop wrapper enforces an objective verify command (exit code, not "does it look good now?"), a max-iteration budget, and a scope-creep gate (cumulative diff threshold). Forbidden in the contract: LLM calls as gate. Exit codes: 0 verify_clean · 2 max_iter · 3 scope_creep · 4 worker_died.
Effect-TS Schedule retries (schedule.sh). Three declarative policies in bash, mirroring the Effect-TS shapes: exponential backoff with jitter, fixed-interval polling, one-shot with timeout. Wired into telegram.sh send_message (no more silent drop on 429/5xx), ship/push.sh (transient GitHub failures), ship/deploy.sh (timeout 600s), ship/verify.sh (poll deploy URL until 200, every 15s for up to 10 min). Every attempt is logged to ~/.aisb/logs/schedule.jsonl with policy name, attempt number, return code, duration, and halt reason.
Ship Pipeline as capability blocks (ship/{build,commit,push,deploy,verify,orchestrate}.sh). The old oracle-ship.sh was a 559-line monolith mixing build, commit, push, deploy, verify, freeze, rollback, telegram and state. V2.5.1 decomposes it into six blocks. Each block takes explicit --flag args, emits exactly one JSON object on stdout, and uses exit codes 0 ok / 1 failed / 2 halt. Opt-in via OMEGA_SHIP_V32=1 or .orchestrator/ship-config.json: {"use_v32_pipeline": true}. Default OFF — v3.3 will do the cutover once live confidence accumulates. Purely additive in the meantime.

Where each pattern fires automatically today:

Trigger	Pattern(s) applied
Every worker dispatch	#1 Source-as-Context (`--inject`)
Every Telegram message	#4 Schedule retries (`telegram.sh`)
Every ship pipeline (when opt-in)	#5 Capability blocks + #4 internal retries
Audit Step 8b (recommended)	#3 Grep-Loop with objective verify
Pre-audit cleanup (recommended)	#2 Cleanup-Wave (wrapped by #3)
Monthly cron	`mount-package-source.sh --prune-stale`
End of `bash setup` + on demand	`omega-doctor.sh`

Status table (V2.5.1).

Pattern	State
#1 Source-as-Context	Shipped + auto-invoked at dispatch
#4 Schedule retries	Shipped + auto-invoked on Telegram + ship
Hermes / omega-doctor	Shipped + invocable via `omega doctor` and setup
#2 Cleanup-Wave	Shipped + documented for manual invocation
#3 Bounded Grep-Loop	Shipped + documented for manual invocation
#5 Ship blocks	Shipped opt-in (default OFF until v3.3 cutover)

All integrations are purely additive — no existing script changes its default behavior in V2.5.1.

E2E layer-alive check (V2.5.1). Same 21-probe surface as V2.5, plus six new probes (one per pattern + omega-doctor.sh executable). Latest run before publication: 27 PASS / 0 FAIL.

End of document — version 2.5.1 · Patterns Edition · 2026-05-22

Agentik Coding Workflow

What Omega solves and how

4-Level Architecture

Three Laws

12-Step Ship Pipeline

17 Forensic Audits

Pattern Layer (V2.5.1)

Omega — Autonomous Engineering Operations

Executive summary

1 · The problem — Why autonomous agents fail

2 · Omega's answer — A 4-level architecture

Why four levels and not three or five

Multi-oracle parallelism

3 · Core guarantees

Guarantee 1 — Autonomy

Guarantee 2 — Verification

Guarantee 3 — Isolation

Guarantee 4 — Close-gate

4 · Operational flow

Step 1 — Intent

Step 2 — Classification and routing

Step 3 — Brief construction

Step 4 — Oracle planning

Step 5 — Worker dispatch with the PLAN protocol

Why PLAN and not the native /goal primitive

Step 6 — Audit (forensic)

Step 7 — Ship (optional)

Step 8 — Worker handoff

Step 9 — Mission Auditor (Layer 4 — END)

Step 10 — Oracle ack

Step 11 — Supervisor close decision

5 · Reliability model

Smart Resurrect — the stall-recovery cascade

Tracking Reactor — event-driven supervision

KAIROS retirement (W9, 2026-05-16)

Cross-session memory (W5, 2026-05-16)

Supervisor + daemon mesh

The incident that triggered the mesh (2026-04-15)

6 · Skill-Wired Orchestration (since v2.2)

The 11 skills

The skill-wired chain (visual)

Wiring matrix

Failure mode and toggles

Why a skill layer matters

7 · Security model

Protected scopes (the operator may forbid automation entirely)

Defense scan layer

No destructive autonomy

Sacred files

8 · Evidence

Wave-3 shipping log (2026-05-16)

What was measured today (chaos + smoke tests, 2026-05-15 → 2026-05-16)

What is live in operation right now

Honest gaps

Five short case studies (concrete, verifiable today)

What chaos tests cannot prove

9 · Roadmap

Recently delivered (since v2.1)

Short-term (active)

Medium-term

Still open (carried forward from v2.1)

Open architecture questions

The judging standard

10 · Appendix — Technical reference

Session lifecycle (worker)

Failure recovery mesh (visual)

State files (atomic write contract)

Done.json schema (worker)

Done.json schema (oracle)

The 17 forensic audits — quick reference

The 11 wired skills — quick reference

A note on extraction

Patch log — V2.3 (2026-05-17)

Patch log — V2.4 (2026-05-17, Lifecycle-Hardened Edition)

Patch log — V2.5 (2026-05-18, Adversarial-Validated Edition)

11 · Adversarial validation results (V2.5)

Patch log — V2.5.1 (2026-05-22, Patterns Edition)

Bring Omega-grade discipline to your codebase

Agentik Coding Workflow

What Omega solves and how