
John Schulman
About John Schulman
John Schulman is a co-founder of OpenAI and one of the most influential researchers in reinforcement learning. He invented PPO (Proximal Policy Optimization), the algorithm that powered much of OpenAI's early success including RLHF. He's now building Thinking Machines.
Career Highlights
- Thinking Machines (2024-present): Co-founder
- OpenAI (2015-2024): Co-founder, led RL research team
- PPO (2017): Invented Proximal Policy Optimization, became standard RL algorithm
- RLHF: Key contributor to reinforcement learning from human feedback
- Berkeley PhD: Studied under Pieter Abbeel
Notable Positions
On the ChatGPT Speed Run
How fast could it have been done with hindsight:
"With full hindsight, you could probably do something back in 2018 or 2019 with a few people that would get to GPT-3.5 level. NanoGPT was programmed by one person on one box in half a year. Maybe in the future we'll get the demo scene ChatGPT - one file that trains the whole thing and scrapes the web in a day."
On Early OpenAI Culture
The rag-tag beginnings:
"Early OpenAI was more rag tag, almost like an academic group. People worked in groups of one, two, three on research projects that would turn into papers. We were influenced by DeepMind who pioneered this way of working with AlphaGo."
On Failed Projects
Universe was right but too early:
"Universe was a deeply correct idea but way too early - maybe a decade too early. We tried to create lots of RL environments and joint train on all of them for a general RL agent. The system was unwieldy and models didn't generalize. Not all projects are successful - maybe even the norm is for a project not to be part of the main branch of the tech tree."
On Research Management
Two valid approaches:
"I've seen people take very different approaches and be successful. One model: hands-on manager writing code, reading all reports' code, giving detailed technical feedback. Another: hands-off manager being a sounding board, giving career advice, letting people do their own thing. Both work in different places."
On Multi-Agent Training
Why games matter:
"I'm pretty fond of ideas around multi-agent training and games. Games give you automatic curriculum - if you're playing against copies of yourself, opponents get better as you get better. There are theoretical CS reasons why setting up games might solve really hard problems."
On Using AI for Research
How he works now:
"If I have an idea now, I fire off a bunch of questions to GPT-5 Pro and have it do literature searches. I'll write a paragraph or two and tell the model to flesh it out. Keeping a lab notebook is probably even more useful now - paste your notebook into the LLM for feedback."
Key Quotes
- "GPT-3.5 level in 2018-2019 with a few people and full hindsight."
- "Universe was a decade too early."
- "Most projects don't end up on the main branch of the tech tree."
Related Reading
- Scaling Laws - What Schulman helped discover
- End of Scaling Era - The transition Schulman is navigating
- Ilya Sutskever - Fellow OpenAI co-founder