Lukasz Kaiser

About Lukasz Kaiser

Lukasz Kaiser is a deep learning researcher at OpenAI and one of the eight co-authors of the landmark 2017 paper "Attention Is All You Need" that introduced the Transformer architecture. What makes Kaiser unique among the "Transformer Eight" is his choice: while seven of his co-authors left to found AI startups (including Cohere, Adept, and Character.AI), Kaiser remained an engineer, eventually joining OpenAI in 2021.

At OpenAI, Kaiser has been at the center of the company's most important breakthroughs. He served as the long-context lead for GPT-4 and led the research team that developed the O1 reasoning models—what he calls "a new paradigm" fundamentally different from pure transformer scaling. His X/Twitter announcement when O1 launched captured this significance: "I'm so happy to see o1 launch! Leading this research with my colleagues for almost 3 years and working on related ideas even longer convinced me: it's a new paradigm."

Before his AI career, Kaiser was a tenured researcher at University Paris Diderot specializing in logic and automata theory. He received his PhD from RWTH Aachen University and his MSc from the University of Wroclaw, Poland. This formal methods background may explain his focus on reasoning and verification in AI systems.

Career Highlights

OpenAI (2021-present): Research Scientist, led O1/O3 reasoning model development, GPT-4 long-context lead
Google Brain (2014-2021): Staff Research Scientist, co-authored Transformer paper
University Paris Diderot: Tenured researcher in logic and automata theory
Co-authored: "Attention Is All You Need" (2017), TensorFlow system, Tensor2Tensor and Trax libraries

Notable Positions

On the Reasoning Paradigm

Kaiser draws a sharp distinction between two AI paradigms. The original transformer scaling paradigm—"just predict the next word and train a bigger and bigger model on more and more data"—has plateaued due to data constraints. But the reasoning paradigm is fundamentally different:

"Reasoning models learn from another order of magnitude less data. This paradigm is so young that it's only on this very steep path up... We've scaled it up a little bit but there could be way more."

On Staying an Engineer

Unlike his Transformer co-authors who became founders, Kaiser chose to remain hands-on:

"Welcome the... authors of the paper that says attention is all you need. Ladies and gentlemen, the only person who is still an engineer—Lukasz."

This choice has put him at the center of OpenAI's most consequential work, from GPT-4 to reasoning models.

On the AGI Timeline

Kaiser dislikes the term "AGI" but emphasizes the practical reality: AI can now work for hours on useful tasks, not just answer in seconds. For computer-based tasks—clicking, writing, programming—automation is "coming fast," while physical-world robotics remains in its infancy.

Key Quotes

"There is the new paradigm which is reasoning and that one is only starting. This paradigm is so young that it's only on this very steep path up." (on reasoning models)
"I don't think there is any winter in this sense coming. If anything, it may actually have a very sharp improvement in the next year or two—which is something to almost be a little scared of." (on AI progress)
"That's the ultimate bottleneck—GPUs and energy." (on constraints)
"It's a new paradigm. Models that train hidden CoTs are more powerful than raw Transformers, learn from less data, generalize better." (on O1 launch)

AI Agents - The autonomous systems Kaiser's reasoning models enable
Supervision Threshold - When AI crosses from assistance to autonomy