OpenRouter COO: How Agents Are Actually Going Into Production
Chris from OpenRouter shares data on agent adoption: tool call rates jumped 5x in one year, reasoning tokens now 50% of output. Here's what's working.
What OpenRouter's Trillion Tokens Reveal About Agent Adoption
Chris, co-founder and COO of OpenRouter, sits at a unique vantage point. Processing over a trillion tokens daily across 70+ cloud providers, OpenRouter sees how AI is actually being used in productionânot demos, not experiments, but real workloads at scale.
The data tells a clear story: agents are no longer theoretical. They're shipping.
The tool calling explosion: "Sub 5% to well north of 25%. And this is trending up rapidly." On Anthropic models alone, the percentage of API calls ending with a tool request jumped 5x in twelve months. This is the "exhaust signature" of agents being deployed into production.
The SLA moment: Around July 2025, something shifted. Chris recalls: "Suddenly we started getting questions from customers about our SLAs and our uptime... that's an extremely strong indicator that these things have suddenly gone from groups of companies testing them out to being very much in production. And if they go down, it starts to matter."
Reasoning tokens now dominate: One year ago, reasoning models didn't exist in production. Now, 50% of all output tokens OpenRouter sees are internal reasoning tokens. Agents are thinking before they act.
Why Model Mixing Is the New Standard
The most successful agents don't use a single modelâthey use multiple models for different tasks:
Frontier models for planning: Claude, GPT-4, Gemini handle the "judgment calls"âunderstanding context, planning next steps, making decisions that require nuance.
Smaller models for execution: Cheaper, faster models like Qwen and MiniMax handle the tool calls themselves. Chris explains: "They're using smaller specialty models to do tool call requests and to execute. Less smart from a judgment perspective but extremely accurate, extremely good with tool use."
This patternâreason with the best, execute with the fastâis how production agents manage both quality and cost.
The Inference Quality Problem Nobody Talks About
Here's something counterintuitive: the same model weights produce different results on different clouds.
OpenRouter's benchmarking revealed that identical models can have:
- Different accuracy scores across providers
- Different tool-calling frequencies
- Meaningful variance in production performance
"Why would the exact same model with the exact same smarts choose to use tools differently in different situations?" The answer lies in subtle differences in how inference stacks are implementedâquantization, serving infrastructure, API handling.
This is why OpenRouter created "Exacto endpoints"ârouting pools that only include providers benchmarked for tool-calling accuracy. For agents, inference quality matters as much as model quality.
The Founder's Biggest Mistake Building Agents
When asked what founders get wrong, Chris's answer was unexpected: they don't build for optionality.
"It's extremely hard to predict what we're going to need in 12 months and where that inference will come from and what kind of models we might need."
The solution isn't picking the perfect model todayâit's building infrastructure that lets you switch models tomorrow. An agent that's locked to one provider can't:
- Test when a new frontier model drops
- Downgrade to cheaper models once the use case is proven
- Failover when providers have outages
What Enterprise Agents Actually Need
For teams deploying agents at scale, Chris identified the critical concerns:
Uptime and failover: Production agents can't go down. Period. This means multi-provider routing, automatic failover, and real monitoring.
Data policy clarity: "Where are their data centers? Do they actually own the GPUs or do they have GPUs that are leased in different data centers? Where's the decryption happening?" Enterprise security teams need answers.
Burst capacity: Agents run on schedulesâovernight batch jobs, periodic workflows. Buying committed capacity for spiky workloads doesn't work. Shared infrastructure does.
4 Takeaways for Teams Building AI Agents
- Tool calling is the agent signature - If you're not measuring tool call rates, you're not measuring agent adoption
- Mix frontier and specialty models - Use the best models for reasoning, fast models for execution
- Inference quality varies wildly - The same model can behave differently across providers; benchmark your specific use case
- Build for optionality, not perfection - The model landscape changes monthly; lock-in is the real risk
Why This Matters for AI-Powered Organizations
OpenRouter's data confirms what we've been seeing: long-running agents are here, and the infrastructure patterns that make them work are becoming clear.
The shift isn't just technicalâit's operational. When customers start asking about SLAs, when tool call rates 5x in a year, when reasoning tokens hit 50% of output... that's production adoption at scale.
The question for organizations isn't whether to deploy agents. It's how to build the infrastructure that lets agents actually work: multi-model routing, inference quality monitoring, and the flexibility to adapt as the landscape evolves.


