How DeepMind Organizes Frontier Model Development

Sebastian Bourjou leads pre-training on Gemini 3 at Google DeepMind—his first podcast ever. With a background spanning Gopher, Chinchilla, and Retro, he offers rare visibility into how frontier AI research is actually organized. The interview with Matt Turk covers everything from architecture decisions to what "research taste" means in practice.

On building systems not models: "We're not really building a model anymore. I think we're really building a system at this point. People have sometimes this view that we're just training a neural network architecture and that's it. But it's really the entire system around the network." This reframes what "training" means—it's infrastructure, coordination, evaluation, and integration, not just the neural network.

On research taste: "Being allergic to complexity... we have a certain budget of complexity we can use and a certain amount of research risk we can accumulate before things go bad. Often times we don't necessarily want to use the best performance version of a research idea, but we'd rather trade off some performance for a slightly lower complexity version." The counterintuitive insight: simpler wins over optimal because it enables more future progress.

On the best model test: "The amount of time people spend using the model to make themselves more productive internally is increasing over time. Every new generation of models, it's pretty clear the model can do new things and help us in our research." Internal usage—researchers using their own models for research—is the real evaluation beyond benchmarks.

On the team scale: "It's a fairly large team at this point. Maybe 150-200 people work on a day-to-day on the pre-training side between data, model, infrastructure, evals." This is the scale required to make progress on frontier models—and coordinating this many people is "actually quite complicated."

On AI for AI research: "Especially in the next year with more agentic workflows being enabled... that should be able to really accelerate our work. A lot of the day-to-day work is running experiments, babysitting experiments, analyzing data, collecting results. The interesting part is forming hypotheses and designing new experiments." The meta-loop: using AI to accelerate AI research by automating the mechanical parts.

5 Insights From Bourjou on AI Research at Scale

Systems over models - Gemini 3 isn't just a neural network; it's infrastructure, data pipelines, evaluation, and integration work that's collectively more important than architecture
150-200 people coordinate on pre-training - The scale of frontier model development requires massive coordination, and getting progress from everyone matters more than a few racing ahead
Research taste = avoiding complexity - The best researchers don't pursue optimal solutions; they pursue solutions simple enough to enable future progress
Internal usage is the real eval - Beyond benchmarks, the true test is whether researchers themselves are more productive with each new model generation
Agentic workflows accelerate research - DeepMind expects AI to automate experiment running and analysis, freeing researchers for hypothesis formation

What This Means for AI Organizations

Bourjou's perspective reframes what building AI actually means at the frontier: it's a systems integration problem requiring 150+ people, where research taste means actively avoiding complexity, and where the models themselves increasingly accelerate their own development. For organizations thinking about AI capabilities, the implication is clear—even the most sophisticated AI labs view this as infrastructure and coordination work, not magic model training.

Inside Gemini 3: How 200 Researchers Build Frontier AI

How DeepMind Organizes Frontier Model Development

5 Insights From Bourjou on AI Research at Scale

What This Means for AI Organizations