refactor/thin-orchestrator · 2026-05-16

Engine: thin orchestrator over CLI agents.

The old engine pinned three vendor SDKs (Gemini, Claude, Perplexity) and let each one bring its own web search. The new engine does the opposite. It owns no SDK. It owns a router. Each request fans out across whichever CLI agents are on PATH, picks the highest-tier one that returns valid JSON, and falls through cleanly when one fails.

Why the rewrite

Three forces converged:

Vendor lock. GeminiProvider used the Google GenAI SDK with google_search grounding. Replacing the provider meant rewriting the search half too, because they were the same method call.
Subscription pricing. As of 2026-06-15 Claude Code's CLI (claude -p) draws from a flat-rate Agent SDK quota tied to the Pro / Max plan, not API keys. Codex did the same earlier. CLIs became cheaper than API calls for daily-research workloads.
Resilience. Any single provider's rate limit took the engine down. With four CLIs on PATH, the chain falls through silently.

Layer architecture

Three pure modules, one glue module. Each pure module knows nothing about the others. agent_research.py is the only place where they meet.

The teal block is the only place modules meet. Replace any peer (e.g. swap SearXNG for Kagi, swap subprocess for HTTP) without touching the others.

Dispatch chain

agents.dispatch walks a list of AgentSpec in tier order. Failure modes are normalised: non-zero exit, timeout, empty stdout, OSError all funnel into the same "this one lost, try the next" path. Errors accumulate; only when every agent in the chain fails does an AgentError surface.

Dashed grey arrows are failure paths (timeout, non-zero exit, empty stdout, OSError — all normalised). Solid teal arrows are the success exit. min_tier=8 default keeps the bottom two agents on standby for explicit configuration.

Search: SearXNG + optional Brave

search.search() returns a list of plain dicts {title, url, snippet, engine}. The first backend is a local SearXNG Docker container at 127.0.0.1:8888; if it's down or every upstream engine is suspended, the chain falls to Brave (when BRAVE_API_KEY is set). When every backend fails it raises SearchError — the caller can choose to either skip the idea or proceed with empty context.

# Failure path: search down, chain still produces an idea
search_results = []
try:
    search_results = search(query, n=8)
except SearchError:
    search_results = []  # prompt notes "no live web results available"
prompt = build_research_prompt(domain, format_search_context(search_results), lens)
dispatch(prompt, chain, timeout=180)

Why not have the LLM search? Search and generation are different failure surfaces. Coupling them makes a search problem look like a generation problem and vice versa. Decoupling lets you debug them independently and swap them independently.

Skills: prompts as files

The research prompt lives at engine/skills/research/SKILL.md as YAML-frontmatter + Jinja body. The frontmatter follows the same shape as Claude Code skills, so a skill written here is one move from being a Claude Code plugin skill.

---
name: research
description: Synthesize one novel startup idea from a domain + web context.
inputs:
  - domain: string
  - search_context: string
  - purpose_lens: string | null
output_format: json
version: 1
---

You are a startup research analyst. Synthesize ONE novel startup
idea in the "{{ domain }}" domain.

Recent web context:
{{ search_context }}
{% if purpose_lens %}

Value lens (mandatory frame): {{ purpose_lens }}
{% endif %}

Output ONLY a single JSON object …

Editing the file changes engine behavior on the next run. No code change, no redeploy. Verified by mutating the file mid-process and re-rendering.

Failure modes

Failure modes and how the engine handles them
Failure	Surface	Behavior
SearXNG container down	search.py	Try Brave; else prompt notes "no live results"; agent synthesizes from prior knowledge
Upstream engine suspended (e.g. Brave CAPTCHA)	SearXNG	Other engines still return; only impact is fewer snippets
Claude rate-limited / quota exhausted	agents.py	Non-zero exit; chain falls to codex; idea still produced
Subprocess timeout (default 180s)	agents.py	Normalised to failure; next agent gets the prompt
Agent returns invalid JSON	agent_research.py	parse_idea_response raises; idea skipped, others continue
No agents on PATH	main.py	Hard fail with explicit error message at startup

Test surface

56 tests, 1 live-integration test gated on RUN_INTEGRATION=1. Tests are biased toward the orchestration seams (mocked subprocess.run, mocked httpx.get) because that's where most bugs live; the live test is a smoke check that the whole pipe works.

Module	Tests	Focus
search.py	12 + 1	Backend protocol, fallback ordering, error surfaces
agents.py	19	Detection by tier, dispatch fallback, encoding, empty-stdout-as-failure
agent_research.py	14	Prompt building, JSON parse tolerance, search-fail degradation
skill_loader.py	9	Frontmatter parsing, Jinja rendering, strict undefined
agents extras	2 + 3	ollama_model arg precedence, UTF-8 encoding regression

Refactor timeline

Stage 1. Self-hosted SearXNG in Docker, loopback only.
Stage 2. search.py with pluggable backend chain.
Stage 3. agents.py with subprocess dispatch + tier table.
Stage 4. main.py rewired; agent_research.py as the only glue.
Stage 5. Legacy SDK path marked deprecated (kept importable).
Stage 6. Config-driven ollama model, tier table in config.yaml.
Stage 7. Prompt extracted to skills/research/SKILL.md.