The LLM Provider Abstraction
Appendix · Technical depth. This chapter is for operators or engineers deploying the brain themselves. It covers implementation details that aren't needed for understanding the conceptual architecture. Skip if you're reading for strategy.
Why this chapter exists
compAI promised, in Chapter 15 (Pillar 5 — Dynamic Sourcing), that the swarm would never lock a brand to a single LLM vendor. repo v2.8 ships the surface that makes that promise real: a unified LLM client that speaks to five providers today and is designed to add more without changing a single line of agent code.
This matters for three reasons:
- Your LLM bill is yours. compAI never touches your API keys. Each brand configures its own Anthropic / OpenAI / Gemini / Qwen / MiniMax account. The VPS stores those keys at mode 600. We sell software; you pay for inference.
- Vendor outage ≠ swarm outage. Fallback chains let the runtime route around a 429 / 5xx automatically. If Anthropic goes down, the CS factory keeps answering tickets on OpenAI. This is the operational translation of McKinsey's "dynamic sourcing."
- Cost optimization at the sub-agent level. Not every LLM call needs Claude Opus. Your triage sub-agent can run on GPT-4o-mini at $0.15/M tokens. Your drafter can run on Sonnet-4.5 for brand-voice quality. Your language-detect can run on Gemini 2.5 Flash's free tier. The savings compound.
The five providers in v2.8
| Provider | Models shipped | Where to get a key |
|---|---|---|
| Anthropic | haiku-4.5 ($1/$5), sonnet-4.5 ($3/$15), opus-4.7 ($15/$75) | console.anthropic.com/settings/keys |
| OpenAI | gpt-4o-mini ($0.15/$0.60), gpt-4o ($2.50/$10), gpt-5-mini ($0.50/$2), gpt-5 ($10/$30) | platform.openai.com/api-keys |
| Google Gemini | gemini-2.5-flash ($0.075/$0.30), gemini-2.5-pro ($1.25/$5) | aistudio.google.com/apikey |
| Alibaba Qwen | qwen-turbo ($0.05/$0.20), qwen-plus ($0.40/$1.20), qwen-max ($2/$6) | dashscope.console.aliyun.com |
| MiniMax | minimax-m2.5 ($0.30/$1.20), minimax-text01 ($0.20/$1.10) | minimaxi.com |
Prices are USD per 1M input / output tokens, reviewed 2026-04-21. They drift quarterly. operai_init/llm/registry.py is the single source of truth the swarm consults.
Architecture
operai_init/llm/
├── registry.py # 5 providers × 12 models × pricing + capability flags
├── config.py # /opt/operai/credentials/llm-providers.json (mode 600)
├── client.py # Unified dispatcher with fallback chains
├── usage.py # SQLite of every call: tokens, cost, latency, caller
├── cli.py # `operai-init llm configure|test|set-default|usage|...`
└── providers/
├── _http.py # Shared urllib helper
├── anthropic.py # x-api-key auth
├── openai.py # Bearer auth, OpenAI-compatible
├── gemini.py # x-goog-api-key, different request shape
├── qwen.py # DashScope international, Alibaba-native shape
└── minimax.py # OpenAI-compatible v2 endpoint
Zero external dependencies. Every provider uses urllib.request from the stdlib. No anthropic, openai, google-generativeai packages. The repo stays small, supply-chain clean, and works in air-gapped deployments.
The unified client
Every LLM call in the swarm — domain agents, factory sub-agents, meta-agents, Punta de Flecha deliberations — flows through one function:
from operai_init.llm import client as llm
resp = llm.chat(
system="You are the triage sub-agent. Classify tickets...",
user=ticket_text,
model="gpt-4o-mini", # optional; uses default if omitted
json_mode=True,
caller="cs-factory:triage", # for cost attribution
)
print(resp.text) # parsed content
print(resp.cost_usd) # computed from registry
print(resp.tokens_in) # provider-reported
print(resp.latency_ms) # wall-clock
print(resp.provider) # which one actually served
print(resp.model) # which model alias
Internally: resolve model → provider, call handler, record usage, return LLMResponse. On failure (401/429/5xx/timeout/network): fall through the configured chain, record the error, retry the next entry. If all fail: raise LLMError with the chain's last error.
Founder setup (one time)
# Interactive — paste keys, test each provider
operai-init llm configure
# Or per-provider
operai-init llm configure anthropic
operai-init llm configure openai
operai-init llm configure gemini
# etc.
# Set brand-wide default (required before agents can run)
operai-init llm set-default --provider anthropic --model haiku-4.5
# Set fallback chain (optional but recommended)
operai-init llm fallback openai/gpt-4o-mini gemini/gemini-2.5-flash
# Verify
operai-init llm list
operai-init llm test anthropic
operai-init status # now shows LLM section with configured providers + default
Per-sub-agent overrides in factory.yml
repo v2.8 updates the CS factory's factory.yml to demonstrate how to route each sub-agent to the right model:
default_llm:
provider: anthropic
model: haiku-4.5
fallback_llm:
- { provider: openai, model: gpt-4o-mini }
- { provider: gemini, model: gemini-2.5-flash }
sub_agents:
- name: triage
llm: { provider: openai, model: gpt-4o-mini } # fast classifier
order: 1
- name: language-detect
llm: { provider: gemini, model: gemini-2.5-flash } # free-tier coverage
order: 4
- name: drafter
llm: { provider: anthropic, model: sonnet-4.5 } # customer-facing copy
order: 9
- name: brand-voice-check
llm: { provider: anthropic, model: sonnet-4.5 } # quality gate
order: 7
# Other 6 sub-agents fall back to default_llm (haiku-4.5)
The effect: the CS factory's per-ticket cost shifts from ~€0.002 uniform Haiku to ~€0.0015 mixed. The cheap-and-fast sub-agents (triage, language-detect) offload to mini models; the quality-critical ones (drafter, brand-voice-check) use Sonnet.
Cost visibility
Every call lands in /opt/operai/state/llm-usage.db (SQLite). Query via CLI:
operai-init llm usage --since 30
Output:
LLM usage — last 30 days
PROVIDER MODEL CALLS TOK_IN TOK_OUT USD
------------ -------------- ------- ---------- ---------- --------
anthropic sonnet-4.5 412 124,231 28,541 $0.7997
openai gpt-4o-mini 1,203 382,111 92,411 $0.1128
gemini gemini-2.5-f. 821 201,002 45,998 $0.0289
Total: 2,436 calls, $0.9414
Errors surface separately:
Recent errors
2026-04-21T10:14:02Z anthropic/haiku-4.5 cs-factory:triage
401 Unauthorized: invalid x-api-key
--json dumps machine-readable for dashboards.
Pre-flight: agents refuse to start without LLM config
As decided in v2.8 planning, there is no default provider. Install.sh no longer picks Anthropic for you. operai-init status flags the missing config loudly:
LLM providers
✗ no providers configured — agents will refuse to start. Run: operai-init llm configure
Agent-runner (when the v0.7 runtime lands) will refuse to spawn if llm-providers.json is empty. This prevents silent fallback to the maintainer's keys (which don't exist in a brand's deployment) and forces the founder to make the explicit choice.
Why not use the provider SDKs
Three reasons:
- Supply chain:
pip install anthropic openai google-generativeaipulls ~100MB of dependencies. Each with their own vulnerability surface. The repo'surllib-only approach is ~500 lines of HTTP that we own. - Air-gap friendly: SDKs often require outbound to vendor-managed telemetry endpoints. Raw HTTP doesn't.
- Uniform interface: SDKs have incompatible APIs (Anthropic's
messages.createvs OpenAI'schat.completions.createvs Gemini'sgenerateContent). Abstracting once, well, is simpler than juggling three abstractions.
The tradeoff: we don't get SDK features like streaming, batch API, prompt caching out of the box. All three are on the roadmap as opt-in extensions to the client.
What v2.8 does NOT ship
Honest scoping:
- No streaming support.
chat()is request/response only. Streaming adds to v2.9. - No Anthropic batch API (50% cheaper async batch submissions). Requires different client shape; v2.9.
- No prompt caching. Anthropic's prompt-caching requires specific headers per call; v2.9.
- No local provider (Ollama, LM Studio). the founder's v2.8 decision: 5 cloud providers only. Ollama lands when there's operator demand.
- No Bedrock / Azure OpenAI wrappers. These require AWS/Azure auth flows; v2.9+.
- No runtime orchestrator for the factories. The LLM client is ready; agent-runner still heartbeats. The factory runtime (which actually spawns sub-agent calls in parallel) ships in v0.9.
Commercial framing
All of this is in the open-source repo. the founder's v2.5 rule held: dynamic sourcing is a differentiator, not an upsell. A team forking the repo gets:
- Seven domain agents (monolithic, SOULs)
- CS factory reference with 10 sub-agents (shipped v2.6)
- Meta-agent governance (critic + guardrail + compliance, v2.5)
- Multi-LLM-provider abstraction for all of the above (v2.8)
- Phase 1 structured-source ingestion (v2.4)
The "brand-owned" model now extends from hardware (self-hosted VPS), code (open repo), tokens (per-brand API keys), to vendor choice (any of 5 LLMs). Every axis of lock-in removed.
→ Back to Ch.17 Agent Factory Pattern · Forward to Ch.19 (v2.9)
Fork the repo, read the playbook, and adapt the artifacts to your own stack. For hands-on help, email hello@usecompai.com.
Fork the repo