benchmarks & savings

Real, reproducible numbers (run benchmark.py + thesis.py). The thesis: most decisions are easy — pay frontier prices only for the hard ones.

Customer savings — 1M decisions/mo, local handles 71%

vs Claude Haiku $4,000 → $1,260 · saves 68.5%

vs GPT-4o $10,000 → $3,000 · saves 70.0%

vs Mistral Small $600 → $274 · saves 54.3%

break-even local share 1–17% — we measure far more

dispatch gross margin ~90%

Cost-aware leaderboard — local + 6 frontiers + 356 via OpenRouter

code qwen2.5 (local) · $0

reason qwen2.5 (local) · $0

summarize qwen2.5 (local) · $0

tool-calling qwen2.5 (local) 2/2 · $0

Local wins every route it passes; a frontier wins only where no local model is adequate (cheapest-frontier seen: DeepSeek $0.0003). Objective graders: math · multiple-choice · reading · instruction-following · structured-JSON · code · tool-calls. Machine-readable pricing: /pricing.json.

savings calculator

decisions / mo

local share %

frontier baseline

dispatch fee

frontier-only

with dispatch

you save

Same model as the headline table: local share runs at $0 on your infra, the escalated remainder pays your frontier, dispatch adds the per-decision fee. Estimate — your real baseline depends on your token mix.