Lukasz Kaiser: Reasoning Models Are Just Getting Started
Transformer co-inventor reveals why the reasoning paradigm will automate computer tasks faster than expected. Compute, not research, is the bottleneck.
How Lukasz Kaiser Sees the Future of Reasoning AI
Lukasz Kaiser holds a unique position in AI history: he co-authored the 2017 "Attention Is All You Need" paper that introduced Transformers, and he's the only one of the eight authors who chose to remain an engineer rather than found a startup. Now at OpenAI, he led the research that produced the O1 reasoning models—what he calls "a new paradigm" fundamentally different from scaling transformers. This interview offers a rare insider view of where AI is actually headed.
On the reasoning paradigm: "There was this transformer paradigm when we were scaling up transformers... But there is the new paradigm which is reasoning and that one is only starting. I feel like this paradigm is so young that it's only on this very steep path up." Kaiser distinguishes between diminishing returns on pure transformer scaling and the untapped potential of reasoning models, which he says "learn from an order of magnitude less data."
On no AI winter coming: "I don't think there is any winter in this sense coming. If anything, it may actually have a very sharp improvement in the next year or two—which is something to almost be a little scared of." While some speculate about hitting scaling walls, Kaiser sees the reasoning paradigm as offering a new steep climb with plenty of headroom.
On the ultimate bottleneck: "That's the ultimate bottleneck. Like it's GPUs and energy. I think Sam is basically getting as much more as is possible. And some people worry will we be able to use them. I do not worry." The constraint isn't research capability or ideas—it's raw compute. Every GPU they can get will be productively used.
On tasks vs. jobs: "I believe reasoning models even currently are probably capable of doing most of them... these tasks are coming fast." Kaiser clarifies the distinction: AI won't replace entire jobs immediately, but computer-based tasks—clicking, writing, programming—are being automated now. "Within a matter of months" coding AI went from adequate to genuinely helpful.
On the new paradigm's youth: "We've scaled it up a little bit but there could be way more scaling it up. There's way more research methods to make it better." Unlike transformer scaling which has plateaued due to data constraints, the reasoning paradigm has barely begun. The combination of bigger base models plus reasoning could yield compounding improvements.
6 Insights From Lukasz Kaiser on Reasoning Models and AI Progress
- Two paradigms, different trajectories - Pure transformer scaling is constrained by data; reasoning models are on a steep upward path with room to grow
- Computer tasks first, physical world later - Expect rapid automation of screen-based work; robotics and physical tasks will take longer
- Coding is the canary - AI coding capabilities went from "okay" to "real help" in just three months; "half the time people just ask Codex to code for them first"
- No AGI—but does it matter? - Kaiser dislikes the term AGI; more important is that AI can now "work for hours and do something useful"
- Distillation vs. scaling trade-off - OpenAI balances training the biggest models possible with making them cheap enough to serve 800M+ users
- 1-2 year horizon for sharp improvement - Reasoning paradigm plus new compute infrastructure could produce dramatic capability jumps soon
What This Means for Organizations Planning AI Adoption
Kaiser's framing resolves the apparent contradiction between "AI progress is slowing" and "AI progress is accelerating"—they're talking about different paradigms. Pure transformer scaling has matured; reasoning models are just beginning. For organizations planning AI adoption, this suggests the capabilities available 12-24 months from now may be dramatically better than today, particularly for tasks that benefit from extended "thinking time." The era of AI that can work for hours, not seconds, is arriving faster than most expect.


