Real, reproducible numbers (run benchmark.py + thesis.py). The thesis: most decisions are easy — pay frontier prices only for the hard ones.
Customer savings — 1M decisions/mo, local handles 71%
vs Claude Haiku$4,000 → $1,260 · saves 68.5%
vs GPT-4o$10,000 → $3,000 · saves 70.0%
vs Mistral Small$600 → $274 · saves 54.3%
break-even local share1–17% — we measure far more
dispatch gross margin~90%
Cost-aware leaderboard — local + 6 frontiers + 356 via OpenRouter
codeqwen2.5 (local) · $0
reasonqwen2.5 (local) · $0
summarizeqwen2.5 (local) · $0
tool-callingqwen2.5 (local) 2/2 · $0
Local wins every route it passes; a frontier wins only where no local model is adequate (cheapest-frontier seen: DeepSeek $0.0003). Objective graders: math · multiple-choice · reading · instruction-following · structured-JSON · code · tool-calls. Machine-readable pricing: /pricing.json.
savings calculator
decisions / mo
local share %
frontier baseline
dispatch fee
frontier-only
with dispatch
you save
Same model as the headline table: local share runs at $0 on your infra, the escalated remainder pays your frontier, dispatch adds the per-decision fee. Estimate — your real baseline depends on your token mix.