The Factory Runtime (v0.9.0 — Smoke Test)
Appendix · Technical depth. This chapter is for operators or engineers deploying the brain themselves. It covers implementation details that aren't needed for understanding the conceptual architecture. Skip if you're reading for strategy.
Why this chapter exists
Chapter 17 shipped the Agent Factory Pattern — 10 sub-agent SOULs plus factory.yml declaring their contracts. Chapter 18 shipped the LLM Provider Abstraction — 5 providers, brand-owned keys, per-sub-agent routing. Both were static artifacts: beautiful templates that didn't actually do anything.
Chapter 19 is the smallest viable runtime that makes the factory execute. One command. One input. One trace. You can see the 10 sub-agents run end-to-end against your configured LLMs.
operai-init factory run-once \
--domain cs \
--input sample-ticket.json \
--output trace.md
That's it. No webhooks, no daemon, no event queue. Just: given a ticket, dispatch through the 10 sub-agents, merge their outputs, print a structured trace. Exactly enough to prove the factory works.
The v0.9.0 scope (and what's next)
| Feature | v0.9.0 | v0.9.1 | v0.9.2 | v0.9.3 |
|---|---|---|---|---|
Manual run-once CLI |
✅ | ✅ | ✅ | ✅ |
| Sequential sub-agent execution | ✅ | — | — | — |
Parallel execution (max_parallel) |
— | ✅ | ✅ | ✅ |
| Trace to markdown + JSON | ✅ | ✅ | ✅ | ✅ |
| Mock-LLM mode (offline) | ✅ | ✅ | ✅ | ✅ |
| Event-queue daemon | — | ✅ | ✅ | ✅ |
| Webhook receivers (Richpanel/Gorgias/Zendesk) | — | — | ✅ | ✅ |
| Action executor (send reply via helpdesk API) | — | — | — | ✅ |
| Retries + circuit breakers | — | — | — | ✅ |
Ship one, prove it, ship the next.
The components shipped in v0.9.0
Four new modules in the repo under operai_init/factory_runtime/:
| Module | Lines | Role |
|---|---|---|
config.py |
~230 | Parses factory.yml (minimal-YAML stdlib parser, no PyYAML dep) + validates SOULs exist + resolves default_llm with per-sub-agent overrides |
executor.py |
~130 | execute(sub_agent, factory_config, input_dict) — reads SOUL, resolves LLM, calls llm.chat() with json_mode=True, validates output against declared schema, returns SubAgentResult |
orchestrator.py |
~75 | run_once(fc, event) — sequential loop over sorted sub-agents, merges each output into shared context, extracts final action from escalation-scorer |
trace.py |
~90 | Markdown + JSON writers for OrchestrationResult |
Plus CLI glue (cmd_run_once in factory.py) and 3 sample fixtures (refund-request.json, complaint-angry.json, vip-product-question.json).
Zero external dependencies. The YAML parser is custom (fit-for-purpose — it understands our factory.yml shape and nothing else). The LLM client is already stdlib-only from Ch.18.
What the run-once flow does
Given a sample ticket:
{
"raw_ticket": "Hola, he recibido mi pedido hace 10 días...",
"customer_email": "<email:subject_...>",
"order_value_eur": 80.00,
"order_age_days": 10,
"brand_voice": "Warm, direct, customer-first..."
}
The orchestrator:
- Loads
/opt/operai/agents/cs/factory/factory.yml - Sorts sub-agents by declared
order - Iterates:
- Build input dict by extracting declared
inputskeys from shared context - Callllm.chat(system=SOUL, user=json.dumps(input), json_mode=True, provider/model=resolved_from_factory_yml)- Parse JSON response + validate declared output keys present - Merge output into context - Reads
escalation-scorer.outputfor the finalaction - Reads
drafter.outputfor thedraft_reply - Writes a full markdown (or JSON) trace
Total latency for CS factory mock run: ~1.5s (sequential). Real-LLM latency depends on provider; typically 5-10s for 10 sequential calls.
Trace output format
Markdown trace per ticket looks like this (abbreviated):
# Factory trace — refund-request
## Meta
- Domain: `cs`
- Sub-agents invoked: 10
- Total latency: 9,234 ms
- Total cost: $0.002871
- Overall OK: True
## Input event
```json
{...ticket...}
Sub-agent execution
[1] ✓ triage
- Model:
openai/gpt-4o-mini - Latency: 412 ms · Tokens: 320/45 · Cost: $0.000120
Input:
{"raw_ticket": "..."}
Output:
{"category": "refund", "priority": "P3", "sentiment": -0.4, "language": "es"}
[2] ✓ policy-lookup...
Summary
- Final action:
human_review - Rationale: Refund eligible with amount >€50 → policy requires human check.
Draft reply:
Hola María, gracias por tu mensaje. Hemos revisado...
The trace is designed to be **pasted into a code review or PR**. A T-shaped CS specialist can read the chain in under 30 seconds and approve or escalate.
## Mock mode for offline smoke tests
```bash
operai-init factory run-once --domain cs --input sample.json --mock-llm
The --mock-llm flag skips the LLM entirely and uses canned responses keyed by sub-agent name. Useful for:
- CI tests (no API key needed)
- First-time founders testing the install before configuring providers
- Demos where you want deterministic output
Mock responses live in operai_init/factory_runtime/executor.py → _MOCK_RESPONSES. They're representative but obviously not real inference.
Pre-flight enforcement
Without --mock-llm, run-once checks that operai-init llm configure has been completed. If no providers are configured, it refuses with a clear pointer:
✗ no LLM providers configured. Run: operai-init llm configure (or use --mock-llm)
This is the same pre-flight operai-init status shows — consistent across the surface.
What you can actually do with v0.9.0
Concrete workflows unblocked today:
- Smoke-test your install end-to-end. Run the mock. If you see 10 sub-agents complete and a draft reply generated, the repo is wired correctly.
- Tune a SOUL and validate. Edit
agents/cs/factory/sub-agents/drafter/SOUL.md, re-run, compare the new draft against the old. No systemctl reload, no redeploy. - Compare LLM providers per sub-agent. Edit
factory.ymlto swaptriagefromopenai/gpt-4o-minitogemini/gemini-2.5-flash. Run both. Compare output quality and cost. - Debug a specific sub-agent. Use
--limit 3to only run the first 3 sub-agents. Fast iteration while you tune the early steps. - Generate regression fixtures. Run on canonical tickets, capture traces, commit to git. Future SOUL changes can be diffed against known-good outputs.
These five are enough to replace what a T-shaped CS specialist would otherwise do with ad-hoc LLM playgrounds (ChatGPT tab, Anthropic Console, Gemini AI Studio), because the factory runtime routes to their configured providers, applies the full SOUL chain, and produces auditable traces.
What v0.9.0 does NOT do (deliberately)
- No webhooks: Richpanel/Gorgias/Zendesk don't POST here yet. v0.9.2.
- No daemon: the runtime is not a long-running process. Manual
run-onceonly. v0.9.1. - No action execution: we print the recommendation. We don't send the reply. The founder reviews the trace, decides, and acts via the helpdesk UI. v0.9.3.
- No parallel dispatch: sub-agents run strictly in declared
order. v0.9.1 addsmax_parallelrespect. - No retries: failed sub-agents log the error and the chain continues with a partial context. Robust retry logic + circuit breakers land in v0.9.3.
- No brain auto-lookup: the input JSON must include pre-loaded fields like
brand_voice,applicable_policies,customer_order_history. Automatic brain lookups (viabrain_query) come in v0.9.1. - No cost budget enforcement:
factory.ymldeclarescost_budget_per_ticket_eurbut v0.9.0 doesn't check it. v0.9.3.
Each limitation is a known-scope carry-over, not a bug. If the founder wants any of these today, the honest answer is Custom Engagement (Ch.13 Path 3b).
The commercial framing
Runtime v2.9 now delivers a real, demonstrable system end-to-end. A founder buying the open-source repo can, in the same afternoon:
curl usecompai.com/init | bash→ swarm infrastructureoperai-init llm configure→ wire their providersoperai-init factory enable --domain cs→ deploy the factory templatesoperai-init factory run-once --domain cs --input demo.json→ see 10 sub-agents execute against their LLMs, produce a draft reply, emit a trace
The story is complete on day one. What ships across v0.9.1–v0.9.3 is the "lights-out" automation layer — valuable, but not the part that makes the product demonstrable.
Roadmap pointer
- v0.9.0 (this chapter):
run-oncesmoke test - v0.9.1: parallel dispatch + event queue daemon + brain auto-lookup
- v0.9.2: webhook receivers for 4 helpdesks
- v0.9.3: action executor via Guardrail meta-agent + retries + cost budget enforcement
At that point, a deployed compAI processes real tickets end-to-end at €0.002 each, with M-shaped supervisors reviewing ~20% of outputs, and 7-domain-agent parity with the McKinsey "50-100 specialized agents" benchmark.
Fork the repo, read the playbook, and adapt the artifacts to your own stack. For hands-on help, email hello@usecompai.com.
Fork the repo