Newsfeed / People / John Schulman
John Schulman

John Schulman

openairesearchreinforcement-learningpioneer

About John Schulman

John Schulman is a co-founder of OpenAI and one of the most influential researchers in reinforcement learning. He invented PPO (Proximal Policy Optimization), the algorithm that powered much of OpenAI's early success including RLHF. He's now building Thinking Machines.

Career Highlights

  • Thinking Machines (2024-present): Co-founder
  • OpenAI (2015-2024): Co-founder, led RL research team
  • PPO (2017): Invented Proximal Policy Optimization, became standard RL algorithm
  • RLHF: Key contributor to reinforcement learning from human feedback
  • Berkeley PhD: Studied under Pieter Abbeel

Notable Positions

On the ChatGPT Speed Run

How fast could it have been done with hindsight:

"With full hindsight, you could probably do something back in 2018 or 2019 with a few people that would get to GPT-3.5 level. NanoGPT was programmed by one person on one box in half a year. Maybe in the future we'll get the demo scene ChatGPT - one file that trains the whole thing and scrapes the web in a day."

On Early OpenAI Culture

The rag-tag beginnings:

"Early OpenAI was more rag tag, almost like an academic group. People worked in groups of one, two, three on research projects that would turn into papers. We were influenced by DeepMind who pioneered this way of working with AlphaGo."

On Failed Projects

Universe was right but too early:

"Universe was a deeply correct idea but way too early - maybe a decade too early. We tried to create lots of RL environments and joint train on all of them for a general RL agent. The system was unwieldy and models didn't generalize. Not all projects are successful - maybe even the norm is for a project not to be part of the main branch of the tech tree."

On Research Management

Two valid approaches:

"I've seen people take very different approaches and be successful. One model: hands-on manager writing code, reading all reports' code, giving detailed technical feedback. Another: hands-off manager being a sounding board, giving career advice, letting people do their own thing. Both work in different places."

On Multi-Agent Training

Why games matter:

"I'm pretty fond of ideas around multi-agent training and games. Games give you automatic curriculum - if you're playing against copies of yourself, opponents get better as you get better. There are theoretical CS reasons why setting up games might solve really hard problems."

On Using AI for Research

How he works now:

"If I have an idea now, I fire off a bunch of questions to GPT-5 Pro and have it do literature searches. I'll write a paragraph or two and tell the model to flesh it out. Keeping a lab notebook is probably even more useful now - paste your notebook into the LLM for feedback."

Key Quotes

  • "GPT-3.5 level in 2018-2019 with a few people and full hindsight."
  • "Universe was a decade too early."
  • "Most projects don't end up on the main branch of the tech tree."