Hinton & Jeff Dean: The Collaboration That Built Modern AI

interview research enterprise future-of-work

How Hinton and Dean Shaped the Deep Learning Era

This is one of those rare conversations where you’re hearing history directly from the people who made it. Geoffrey Hinton (Nobel laureate, “godfather of AI”) and Jeff Dean (Google chief scientist, Gemini co-lead) have been collaborating since 2012, and their partnership essentially created modern AI.

The anecdotes alone are worth the watch. AlexNet - the model that started the deep learning revolution - was trained on two GPUs in Alex Krizhevsky’s bedroom at his parents’ house. “The good news was we paid for the GPU boards but his parents paid for the electricity,” Hinton jokes. When they decided to sell, they incorporated as “DNN Research” specifically to get acquisition money rather than salary money (“one’s 10 times bigger than the other”). The auction happened during NeurIPS at a Lake Tahoe casino - “upstairs we were doing this auction and you had to raise by a million” while slot machines rang downstairs.

The scaling insight is fascinating in retrospect. Dean admits he built data parallelism into his 1990 undergrad thesis but “didn’t really even realize it myself” - he made “a huge mistake” by not increasing model size as he added processors. Hinton confesses he “didn’t really fully get the lesson until 2014” that bigger models just work better. They had a simple mantra at Google Brain: “bigger model, more data, more compute.”

The Research in Motion (Blackberry) story is a cautionary tale for every enterprise. Hinton offered them better speech recognition technology for free via an intern. They declined, saying they “weren’t interested in speech recognition.” Dean’s wry response: “Well, you didn’t need it. You had a keyboard.” This from the Canadian company whose owners later complained that Canadian research is “never exploited in Canada.”

On transformers, Hinton admits he “didn’t pay nearly enough attention” initially because he’s interested in brain-plausible mechanisms. The sequential dependency problem of LSTMs led to the insight of just “saving all the states and attending to them.” Combined with mixture-of-experts, these algorithmic improvements have “multiplied together” - we’re now doing billions of times more compute than 10 years ago.

4 Insights From Hinton and Dean on AI History

  • AlexNet’s training budget was two GPUs and a teenager’s bedroom - breakthroughs don’t require billion-dollar infrastructure initially
  • “Bigger model, more data, more compute” was the informal scaling law at Google Brain years before formal scaling laws were published
  • Corporate blindness killed Blackberry: they rejected free speech recognition tech because they had keyboards
  • Algorithmic improvements (transformers, sparse models) multiply with hardware improvements - the compute increase is “billions of times” over a decade