Asagiri 朝霧

refactor/thin-orchestrator · 2026-05-16

Engine: thin orchestrator over CLI agents.

The old engine pinned three vendor SDKs (Gemini, Claude, Perplexity) and let each one bring its own web search. The new engine does the opposite. It owns no SDK. It owns a router. Each request fans out across whichever CLI agents are on PATH, picks the highest-tier one that returns valid JSON, and falls through cleanly when one fails.

Why the rewrite

Three forces converged:

  1. Vendor lock. GeminiProvider used the Google GenAI SDK with google_search grounding. Replacing the provider meant rewriting the search half too, because they were the same method call.
  2. Subscription pricing. As of 2026-06-15 Claude Code's CLI (claude -p) draws from a flat-rate Agent SDK quota tied to the Pro / Max plan, not API keys. Codex did the same earlier. CLIs became cheaper than API calls for daily-research workloads.
  3. Resilience. Any single provider's rate limit took the engine down. With four CLIs on PATH, the chain falls through silently.

Layer architecture

Three pure modules, one glue module. Each pure module knows nothing about the others. agent_research.py is the only place where they meet.

Engine layer diagram main.py at the top calls agent_research.run_agent_research. That function fans out to three independent peers: search.search hits local SearXNG (with optional Brave fallback); skill_loader.load_skill reads skills/research/SKILL.md and renders Jinja; agents.dispatch tries CLI agents in order claude, codex, gemini, ollama. None of search, skill_loader, agents know about each other. main.py CLI entry / config router agent_research.py orchestration (the only glue layer) search.py SearXNG + Brave fallback skill_loader.py SKILL.md → Jinja render agents.py CLI subprocess dispatch localhost:8888 Docker SearXNG claude → codex → gemini → ollama (on PATH)
The teal block is the only place modules meet. Replace any peer (e.g. swap SearXNG for Kagi, swap subprocess for HTTP) without touching the others.

Dispatch chain

agents.dispatch walks a list of AgentSpec in tier order. Failure modes are normalised: non-zero exit, timeout, empty stdout, OSError all funnel into the same "this one lost, try the next" path. Errors accumulate; only when every agent in the chain fails does an AgentError surface.

Dispatch fallback flow Linear chain: claude tier 10 first, on failure codex tier 9, on failure gemini tier 7 if min_tier permits, on failure ollama tier 6 if a model is configured. Each agent has the same three failure paths (timeout, non-zero exit, empty stdout) all of which fall through to the next. claude -p tier 10 · Sonnet 4.6 codex exec tier 9 · GPT-5.3 gemini -p tier 7 · Flash ollama run tier 6 · local model fail fail fail ok ok ok ok DispatchResult(agent=<name>, output=…) min_tier=8 default · gemini and ollama gated out lower min_tier to 6 + set OLLAMA_MODEL to broaden the chain
Dashed grey arrows are failure paths (timeout, non-zero exit, empty stdout, OSError — all normalised). Solid teal arrows are the success exit. min_tier=8 default keeps the bottom two agents on standby for explicit configuration.

search.search() returns a list of plain dicts {title, url, snippet, engine}. The first backend is a local SearXNG Docker container at 127.0.0.1:8888; if it's down or every upstream engine is suspended, the chain falls to Brave (when BRAVE_API_KEY is set). When every backend fails it raises SearchError — the caller can choose to either skip the idea or proceed with empty context.

# Failure path: search down, chain still produces an idea
search_results = []
try:
    search_results = search(query, n=8)
except SearchError:
    search_results = []  # prompt notes "no live web results available"
prompt = build_research_prompt(domain, format_search_context(search_results), lens)
dispatch(prompt, chain, timeout=180)
Why not have the LLM search? Search and generation are different failure surfaces. Coupling them makes a search problem look like a generation problem and vice versa. Decoupling lets you debug them independently and swap them independently.

Skills: prompts as files

The research prompt lives at engine/skills/research/SKILL.md as YAML-frontmatter + Jinja body. The frontmatter follows the same shape as Claude Code skills, so a skill written here is one move from being a Claude Code plugin skill.

---
name: research
description: Synthesize one novel startup idea from a domain + web context.
inputs:
  - domain: string
  - search_context: string
  - purpose_lens: string | null
output_format: json
version: 1
---

You are a startup research analyst. Synthesize ONE novel startup
idea in the "{{ domain }}" domain.

Recent web context:
{{ search_context }}
{% if purpose_lens %}

Value lens (mandatory frame): {{ purpose_lens }}
{% endif %}

Output ONLY a single JSON object …

Editing the file changes engine behavior on the next run. No code change, no redeploy. Verified by mutating the file mid-process and re-rendering.

Failure modes

Failure modes and how the engine handles them
Failure Surface Behavior
SearXNG container down search.py Try Brave; else prompt notes "no live results"; agent synthesizes from prior knowledge
Upstream engine suspended (e.g. Brave CAPTCHA) SearXNG Other engines still return; only impact is fewer snippets
Claude rate-limited / quota exhausted agents.py Non-zero exit; chain falls to codex; idea still produced
Subprocess timeout (default 180s) agents.py Normalised to failure; next agent gets the prompt
Agent returns invalid JSON agent_research.py parse_idea_response raises; idea skipped, others continue
No agents on PATH main.py Hard fail with explicit error message at startup

Test surface

56 tests, 1 live-integration test gated on RUN_INTEGRATION=1. Tests are biased toward the orchestration seams (mocked subprocess.run, mocked httpx.get) because that's where most bugs live; the live test is a smoke check that the whole pipe works.

ModuleTestsFocus
search.py12 + 1Backend protocol, fallback ordering, error surfaces
agents.py19Detection by tier, dispatch fallback, encoding, empty-stdout-as-failure
agent_research.py14Prompt building, JSON parse tolerance, search-fail degradation
skill_loader.py9Frontmatter parsing, Jinja rendering, strict undefined
agents extras2 + 3ollama_model arg precedence, UTF-8 encoding regression

Refactor timeline

  1. Stage 1. Self-hosted SearXNG in Docker, loopback only.
  2. Stage 2. search.py with pluggable backend chain.
  3. Stage 3. agents.py with subprocess dispatch + tier table.
  4. Stage 4. main.py rewired; agent_research.py as the only glue.
  5. Stage 5. Legacy SDK path marked deprecated (kept importable).
  6. Stage 6. Config-driven ollama model, tier table in config.yaml.
  7. Stage 7. Prompt extracted to skills/research/SKILL.md.