OpenAI Agent RFT: Train Agents for Efficient Tool Use

2025-11-10 OpenAI

tutorial agents training developer-tools

Why Agent RFT Changes How Models Learn Tool Usage

This is OpenAI’s build hour on Agent RFT (Reinforcement Fine-Tuning for agents) - the technical deep dive on how to train agents to use your specific tools better. Will (fine-tuning engineering) and Theo (solutions architect) walk through a complete example.

Agent RFT is the first time models interact with the outside world during training. The key innovation: during training, the agent can actually call your tool endpoints and explore different ways of using them. Then your custom grader endpoint provides reward signal. The model learns organically by trying many different tool calling strategies and hill-climbing on your task.

The demo makes it concrete. They modified FinQA (financial QA benchmark) to be harder - the agent only gets the question, no context, and must search through 2,800 financial reports to find the right one and answer, all within 10 tool calls. Tools: semantic search, list directories, cat to read documents.

The before/after is striking. Baseline GPT-5: 59% accuracy. After just 10 training steps: 73% accuracy (+11 points). But equally impressive: tool calls dropped from 8-9 to much fewer, tokens from 2,500 to 1,500, latency down 10% (5 seconds faster). The model learned to use tools more efficiently.

The variance plot is the diagnostic tool. Before training, you run each sample multiple times and look at score variance. Samples with high variance (sometimes 0, sometimes 1) are where the model can learn - good reasoning paths vs bad ones. Samples that always score 0 or always 1 don’t provide learning signal.

Watch the tool call distribution during training. The dashboard shows how tool usage evolves: initially heavy on “search”, then shifts to more “list” and “cat” calls as the model learns what works. “The model is just learning to use those tools much more efficiently.”

10 Technical Insights From the Agent RFT Demo

Agent RFT = tools during training - First time models call external endpoints during training process
Custom grader endpoint - You define reward signal; model learns what “good” looks like
FinQA demo - 59% → 73% accuracy in 10 steps; 8-9 tool calls → much fewer
Latency reduction - 10% faster (5 seconds); tokens 2500 → 1500
Compute multiplier - Controls exploration; higher = more variance, more endpoint load
Variance diagnostic - Run samples 3x, look for variance; that’s where learning happens
Tool call budget - Can constrain to 10 calls max; model learns to stay within budget
Model grader vs string grader - Model grader handles formatting variance (0.07 vs 7%)
Unique rollout IDs - Track tool calls across rollouts for state management
Watch tool distribution - Dashboard shows which tools model learns to favor

What This Means for Custom Agent Development

Agent RFT lets models learn tool usage by actually using tools during training - exploring strategies and hill-climbing on your reward signal. The implication: agents can be trained to use your specific APIs efficiently, not just generically. Custom tool expertise becomes a trainable property.