OpenRouter COO: How Agents Are Actually Going Into Production

Wed Jan 28 2026 01:00:00 GMT+0100 (Central European Standard Time)AI Day

agentsenterpriseinferencetool-use

What OpenRouter's Trillion Tokens Reveal About Agent Adoption

Chris, co-founder and COO of OpenRouter, sits at a unique vantage point. Processing over a trillion tokens daily across 70+ cloud providers, OpenRouter sees how AI is actually being used in production—not demos, not experiments, but real workloads at scale.

The data tells a clear story: agents are no longer theoretical. They're shipping.

The tool calling explosion: "Sub 5% to well north of 25%. And this is trending up rapidly." On Anthropic models alone, the percentage of API calls ending with a tool request jumped 5x in twelve months. This is the "exhaust signature" of agents being deployed into production.

The SLA moment: Around July 2025, something shifted. Chris recalls: "Suddenly we started getting questions from customers about our SLAs and our uptime... that's an extremely strong indicator that these things have suddenly gone from groups of companies testing them out to being very much in production. And if they go down, it starts to matter."

Reasoning tokens now dominate: One year ago, reasoning models didn't exist in production. Now, 50% of all output tokens OpenRouter sees are internal reasoning tokens. Agents are thinking before they act.

Why Model Mixing Is the New Standard

The most successful agents don't use a single model—they use multiple models for different tasks:

Frontier models for planning: Claude, GPT-4, Gemini handle the "judgment calls"—understanding context, planning next steps, making decisions that require nuance.

Smaller models for execution: Cheaper, faster models like Qwen and MiniMax handle the tool calls themselves. Chris explains: "They're using smaller specialty models to do tool call requests and to execute. Less smart from a judgment perspective but extremely accurate, extremely good with tool use."

This pattern—reason with the best, execute with the fast—is how production agents manage both quality and cost.

The Inference Quality Problem Nobody Talks About

Here's something counterintuitive: the same model weights produce different results on different clouds.

OpenRouter's benchmarking revealed that identical models can have:

Different accuracy scores across providers
Different tool-calling frequencies
Meaningful variance in production performance

"Why would the exact same model with the exact same smarts choose to use tools differently in different situations?" The answer lies in subtle differences in how inference stacks are implemented—quantization, serving infrastructure, API handling.

This is why OpenRouter created "Exacto endpoints"—routing pools that only include providers benchmarked for tool-calling accuracy. For agents, inference quality matters as much as model quality.

The Founder's Biggest Mistake Building Agents

When asked what founders get wrong, Chris's answer was unexpected: they don't build for optionality.

"It's extremely hard to predict what we're going to need in 12 months and where that inference will come from and what kind of models we might need."

The solution isn't picking the perfect model today—it's building infrastructure that lets you switch models tomorrow. An agent that's locked to one provider can't:

Test when a new frontier model drops
Downgrade to cheaper models once the use case is proven
Failover when providers have outages

What Enterprise Agents Actually Need

For teams deploying agents at scale, Chris identified the critical concerns:

Uptime and failover: Production agents can't go down. Period. This means multi-provider routing, automatic failover, and real monitoring.

Data policy clarity: "Where are their data centers? Do they actually own the GPUs or do they have GPUs that are leased in different data centers? Where's the decryption happening?" Enterprise security teams need answers.

Burst capacity: Agents run on schedules—overnight batch jobs, periodic workflows. Buying committed capacity for spiky workloads doesn't work. Shared infrastructure does.

4 Takeaways for Teams Building AI Agents

Tool calling is the agent signature - If you're not measuring tool call rates, you're not measuring agent adoption
Mix frontier and specialty models - Use the best models for reasoning, fast models for execution
Inference quality varies wildly - The same model can behave differently across providers; benchmark your specific use case
Build for optionality, not perfection - The model landscape changes monthly; lock-in is the real risk

Why This Matters for AI-Powered Organizations

OpenRouter's data confirms what we've been seeing: long-running agents are here, and the infrastructure patterns that make them work are becoming clear.

The shift isn't just technical—it's operational. When customers start asking about SLAs, when tool call rates 5x in a year, when reasoning tokens hit 50% of output... that's production adoption at scale.

The question for organizations isn't whether to deploy agents. It's how to build the infrastructure that lets agents actually work: multi-model routing, inference quality monitoring, and the flexibility to adapt as the landscape evolves.