Reasoning Models
Also known as: reasoning model, thinking model, inference-time compute, test-time compute
What are Reasoning Models?
Reasoning models are a class of large language models specifically designed to spend more computation at inference time to solve complex problems. Rather than generating answers in a single forward pass, these models produce extended chains of thought — exploring approaches, checking their work, backtracking when they hit dead ends, and iterating toward a solution. OpenAI’s o1 and o3, Anthropic’s Claude with extended thinking, Google’s Gemini with thinking, and DeepSeek-R1 are prominent examples. They represent a shift in where compute is invested: traditional scaling focuses on pre-training (bigger models, more data), while reasoning models scale inference-time compute.
How They Differ from Standard Models
Standard LLMs generate responses token by token with roughly constant computation per token. Reasoning models allocate variable computation depending on problem difficulty — spending seconds on simple questions and minutes on hard ones. They are trained using reinforcement learning on reasoning traces, rewarding chains of thought that lead to correct answers. The internal reasoning may be visible to users (as with Claude’s extended thinking) or hidden. On complex mathematics, science, and coding tasks, reasoning models dramatically outperform standard models of equivalent size, sometimes matching the performance of much larger conventional models.
Why Reasoning Models Matter
Reasoning models have demonstrated that scaling inference-time compute can be as powerful as scaling pre-training compute, opening a second dimension of improvement for AI capabilities. For practitioners, this means harder problems become tractable: multi-step coding challenges, graduate-level science questions, and complex planning tasks that stump standard models. The tradeoff is cost and latency — reasoning models use more tokens and take longer to respond. Choosing between a standard model and a reasoning model is a practical architectural decision: use standard models for straightforward tasks where speed and cost matter, and reasoning models for complex tasks where accuracy is worth the additional compute.
Related Reading
- Chain-of-Thought - The reasoning technique these models are built on
- Scaling Laws - Reasoning models extend scaling to inference time