Skip to main content

Providers & Models Reference

muonroi-cli supports multiple AI providers and routes different task types to different models, optimizing for cost and output quality. This guide covers provider setup, role-based routing, and cost management.

Supported Providers

Seven providers are natively supported. Set the corresponding environment variable to enable each:

ProviderModelsEnvironment Variable
AnthropicClaude Opus 4.7, Sonnet 4.6, Haiku 4.5MUONROI_API_KEY
OpenAIGPT-4o, GPT-4o-mini, o3, o4-miniOPENAI_API_KEY
GoogleGemini 2.5 Pro, Gemini 2.5 FlashGOOGLE_API_KEY
DeepSeekDeepSeek V4 Flash, DeepSeek V4 ProDEEPSEEK_API_KEY
xAIGrok 3, Grok 3 MiniXAI_API_KEY
SiliconFlowQwen, GLM, InternLM (with vision proxy support)SILICONFLOW_API_KEY
OllamaAny local modelKeyless — defaults to http://localhost:11434

Model ID Matching

Model IDs are matched by prefix. This means:

  • Models outside the built-in catalog work automatically (e.g., deepseek-*, gpt-*, grok-*)
  • You can use newer model releases without code changes
  • Prefix matching is case-insensitive

Role-Based Routing

Route different task types to different models based on their computational needs and cost-benefit profile.

Task Roles

Four roles handle different task types:

RoleTask TypesTypical ModelUse Case
leaderplan, analyze, architectureClaude Sonnet 4.6Complex reasoning, design decisions
implementgenerate, refactor, codingDeepSeek V4 FlashFast iteration, cost-efficient output generation
verifydebug, review, validationClaude Sonnet 4.6Correctness-critical, requires deep reasoning
researchdocs, knowledge synthesisDeepSeek V4 FlashLower-risk content, cost savings

Configuration

Set roleModels in your CLI settings (JSON):

{
"roleModels": {
"leader": "claude-sonnet-4-6",
"implement": "deepseek-v4-flash",
"verify": "claude-sonnet-4-6",
"research": "deepseek-v4-flash"
}
}

Resolution Priority

When selecting a model for a task, muonroi-cli checks in this order:

  1. Explicit overrideMUONROI_MODEL env var (suppresses all routing)
  2. Role model — from roleModels config
  3. PIL tier — from tier-based fallback (hot/warm/cold)
  4. Session default — from settings.json or CLI argument

Cost Optimization

Role-based routing can reduce monthly API costs by 80–90% while maintaining output quality where it matters.

Cost Comparison

Single-model setup (Claude for everything):

100 tasks/day × $0.02/task average = ~$60/month

muonroi-cli with role-based routing:

70% cheap tasks (implement, research)    → deepseek-v4-flash @ $0.001/task
30% quality tasks (leader, verify) → claude-sonnet-4-6 @ $0.015/task

Result: ~$5–8/month with equivalent output quality where it matters

Key insight: Use premium models only for tasks requiring deep reasoning (plan, analyze, debug). Route commodity tasks to fast, cheap models.

Tier-Based Fallback

When roleModels is not configured, muonroi-cli falls back to a 3-tier budget-aware routing system:

  • hot — Premium models (plan, analyze)
  • warm — Mid-tier models
  • cold — Cheapest models (docs, simple tasks)

Budget-Aware Downgrade

If your monthly API spend approaches a configured limit, muonroi-cli automatically downgrades to cheaper tiers to stay within budget. Configure your monthly limit in settings:

{
"monthlyBudgetUSD": 50,
"tierModels": {
"hot": "claude-sonnet-4-6",
"warm": "gpt-4o-mini",
"cold": "deepseek-v4-flash"
}
}

When the ledger approaches monthlyBudgetUSD, subsequent tasks route to warm or cold until the billing period resets.

Mode Models

Override the model for specific agent modes:

{
"modeModels": {
"agent": "claude-sonnet-4-6",
"plan": "claude-opus-4-7",
"ask": "deepseek-v4-flash"
}
}
ModeWhen Used
agentStandard agent execution (GSD skills, explore, code review)
planWriting detailed plans (/gsd:plan-phase)
askQuick questions and interactive mode

Mode overrides take precedence over role-based routing.

Environment Variables

VariablePurposeDefault
MUONROI_API_KEYAnthropic API key (primary)
MUONROI_MODELAbsolute model override (suppresses all routing)
MUONROI_BASE_URLCustom base URL for Anthropichttps://api.anthropic.com
OPENAI_API_KEYOpenAI API key
GOOGLE_API_KEYGoogle API key
DEEPSEEK_API_KEYDeepSeek API key
XAI_API_KEYxAI (Grok) API key
SILICONFLOW_API_KEYSiliconFlow API key
OLLAMA_URLLocal Ollama endpointhttp://localhost:11434

Setting Keys

Example .env file:

# Primary provider
MUONROI_API_KEY=sk-ant-...

# Multi-provider routing
OPENAI_API_KEY=sk-proj-...
DEEPSEEK_API_KEY=sk-...
GOOGLE_API_KEY=AIza...

# Override (use sparingly)
MUONROI_MODEL=claude-opus-4-7

Cost Forensics

Track API usage and cost per session or date range:

# Plain table output
muonroi-cli usage forensics <session-id-prefix>

# Machine-readable JSON
muonroi-cli usage forensics <session-id-prefix> --json

Example:

muonroi-cli usage forensics abc123
muonroi-cli usage forensics "session-2025-05" --json

Output includes:

  • Task count per provider/model
  • Token usage (input/output)
  • Cost per task and total
  • Timestamp range

Use this to audit routing decisions and validate cost savings from role-based setup.