Engine: thin orchestrator over CLI agents.
The old engine pinned three vendor SDKs (Gemini, Claude, Perplexity) and let each one bring its own web search. The new engine does the opposite. It owns no SDK. It owns a router. Each request fans out across whichever CLI agents are on PATH, picks the highest-tier one that returns valid JSON, and falls through cleanly when one fails.
Why the rewrite
Three forces converged:
-
Vendor lock.
GeminiProviderused the Google GenAI SDK withgoogle_searchgrounding. Replacing the provider meant rewriting the search half too, because they were the same method call. -
Subscription pricing. As of 2026-06-15 Claude Code's CLI
(
claude -p) draws from a flat-rate Agent SDK quota tied to the Pro / Max plan, not API keys. Codex did the same earlier. CLIs became cheaper than API calls for daily-research workloads. - Resilience. Any single provider's rate limit took the engine down. With four CLIs on PATH, the chain falls through silently.
Layer architecture
Three pure modules, one glue module. Each pure module knows nothing about
the others. agent_research.py is the only place where they meet.
Dispatch chain
agents.dispatch walks a list of AgentSpec in tier
order. Failure modes are normalised: non-zero exit, timeout, empty stdout,
OSError all funnel into the same "this one lost, try the next" path. Errors
accumulate; only when every agent in the chain fails does an
AgentError surface.
min_tier=8 default keeps the bottom two agents on
standby for explicit configuration.
Search: SearXNG + optional Brave
search.search() returns a list of plain dicts
{title, url, snippet, engine}. The first backend is a local
SearXNG Docker container at 127.0.0.1:8888; if it's down or
every upstream engine is suspended, the chain falls to Brave (when
BRAVE_API_KEY is set). When every backend fails it raises
SearchError — the caller can choose to either skip the idea
or proceed with empty context.
# Failure path: search down, chain still produces an idea
search_results = []
try:
search_results = search(query, n=8)
except SearchError:
search_results = [] # prompt notes "no live web results available"
prompt = build_research_prompt(domain, format_search_context(search_results), lens)
dispatch(prompt, chain, timeout=180)
Skills: prompts as files
The research prompt lives at
engine/skills/research/SKILL.md as YAML-frontmatter + Jinja
body. The frontmatter follows the same shape as Claude Code skills, so a
skill written here is one move from being a Claude Code plugin skill.
---
name: research
description: Synthesize one novel startup idea from a domain + web context.
inputs:
- domain: string
- search_context: string
- purpose_lens: string | null
output_format: json
version: 1
---
You are a startup research analyst. Synthesize ONE novel startup
idea in the "{{ domain }}" domain.
Recent web context:
{{ search_context }}
{% if purpose_lens %}
Value lens (mandatory frame): {{ purpose_lens }}
{% endif %}
Output ONLY a single JSON object …
Editing the file changes engine behavior on the next run. No code change, no redeploy. Verified by mutating the file mid-process and re-rendering.
Failure modes
| Failure | Surface | Behavior |
|---|---|---|
| SearXNG container down | search.py | Try Brave; else prompt notes "no live results"; agent synthesizes from prior knowledge |
| Upstream engine suspended (e.g. Brave CAPTCHA) | SearXNG | Other engines still return; only impact is fewer snippets |
| Claude rate-limited / quota exhausted | agents.py | Non-zero exit; chain falls to codex; idea still produced |
| Subprocess timeout (default 180s) | agents.py | Normalised to failure; next agent gets the prompt |
| Agent returns invalid JSON | agent_research.py | parse_idea_response raises; idea skipped, others continue |
| No agents on PATH | main.py | Hard fail with explicit error message at startup |
Test surface
56 tests, 1 live-integration test gated on
RUN_INTEGRATION=1. Tests are biased toward the orchestration
seams (mocked subprocess.run, mocked httpx.get) because that's where most
bugs live; the live test is a smoke check that the whole pipe works.
| Module | Tests | Focus |
|---|---|---|
| search.py | 12 + 1 | Backend protocol, fallback ordering, error surfaces |
| agents.py | 19 | Detection by tier, dispatch fallback, encoding, empty-stdout-as-failure |
| agent_research.py | 14 | Prompt building, JSON parse tolerance, search-fail degradation |
| skill_loader.py | 9 | Frontmatter parsing, Jinja rendering, strict undefined |
| agents extras | 2 + 3 | ollama_model arg precedence, UTF-8 encoding regression |
Refactor timeline
- Stage 1. Self-hosted SearXNG in Docker, loopback only.
- Stage 2.
search.pywith pluggable backend chain. - Stage 3.
agents.pywith subprocess dispatch + tier table. - Stage 4.
main.pyrewired;agent_research.pyas the only glue. - Stage 5. Legacy SDK path marked deprecated (kept importable).
- Stage 6. Config-driven ollama model, tier table in
config.yaml. - Stage 7. Prompt extracted to
skills/research/SKILL.md.