Core Message
In this technical talk, John Carmack, founder of Keen Technologies, argues that while Large Language Models (LLMs) are impressive, they are not the path to Artificial General Intelligence (AGI). The true frontier lies in solving fundamental, yet-unsolved problems in reinforcement learning (RL) that even simple animals master, such as continuous learning, transfer learning, and acting under real-world constraints. He makes a compelling case for using the “solved” Atari game environment to tackle these deeper challenges, culminating in a demonstration of an RL agent learning to play a physical Atari console with a robotic arm.
Key Arguments & Findings
1. The Limits of Current AI
- LLMs Aren’t the Whole Answer: Carmack views LLMs as a “giant blender” of existing human knowledge. While magical, they fundamentally cannot learn new things from scratch or handle novel situations in the way a child or animal can.
- Engineering vs. Science: He contrasts his past work in engineering (where there’s a clear line of sight to a solution) with his current work in AI, which he sees as true science—an exploration for knowledge nobody currently possesses.
2. Why Atari is Still a Crucial Benchmark
- Unbiased and Diverse: Unlike custom-built research environments, the 100+ Atari games were designed for humans, offering an unbiased and rich set of challenges.
- Far from Solved: While agents can achieve superhuman scores with massive amounts of training (e.g., 200 million frames), they fail at learning quickly. On benchmarks like Atari 100K (learning in under 2 hours), their performance is unimpressive, highlighting a major gap compared to human learning.
- Avoids ‘Cheating’: Modern games tempt researchers to bypass hard perception problems by accessing internal game data (like inventory or position). Atari forces agents to learn from pixels alone, just as a human would.
3. The ‘Robo-Atari’ Project: Learning in the Real World
To expose the flaws in simulation-based learning, Carmack’s team built a system where an agent plays a physical Atari console.
- The Setup: A camera watches the screen, and a robotic actuator (the “Rootroller”) physically moves the joystick.
- Latency is a Killer: The real world isn’t turn-based; the environment doesn’t wait for the agent. The ~180ms of end-to-end latency, while human-like, causes many state-of-the-art RL algorithms to fail because they assume an action’s consequence is immediate.
- Perception is Hard: The single most difficult technical problem was not the learning algorithm, but reliably reading the score from the screen via a camera to provide a reward signal.
- Physicality Matters: The agent had to learn to compensate for the physical quirks of the robot, such as the delay between moving the joystick and pressing the fire button.
4. Major Unsolved RL Problems
Carmack outlines the core research his company is focused on, using suites of Atari games as the testbed:
- Sequential Learning & Catastrophic Forgetting: Agents trained on one game forget it after learning a new one. This is a primary unsolved problem.
- Transfer Learning: The ability to learn a new game faster after mastering others has “almost completely failed” in RL research.
- Sparse Rewards & Intrinsic Motivation: Agents need to learn without constant rewards, driven by curiosity, much like humans who don’t stare at the score while playing a game.
Conclusion & Call to Action
The path to more general intelligence requires moving beyond optimizing scores in perfect simulations. Carmack advocates for a new community benchmark focused on sequential multitask learning across a suite of Atari games. This would force researchers to confront catastrophic forgetting and transfer learning directly. He stresses that an agent’s ability to learn and adapt under real-world constraints like latency and imperfect perception is a more meaningful measure of progress than achieving a high score on a single, isolated task.
Mentoring Question for You
Carmack highlights the vast difference between an agent learning in a clean, turn-based simulation versus the messy, high-latency real world. In your own work or projects, where are you operating in an idealized “simulation”? What real-world complexities—like delays, noisy data, or unpredictable physical factors—might your current approach be ignoring, and how could addressing them lead to a more robust solution?
Source: https://youtube.com/watch?v=4epAfU1FCuQ&si=Gvk74hXPR5rTguyC