List of Flash News about Bitter Lesson
Time | Details |
---|---|
2025-10-01 17:09 |
Andrej Karpathy on Sutton’s Bitter Lesson: LLM Scaling Limits, RL-First Agents, and the AI Trading Narrative to Watch
According to @karpathy, Richard Sutton questions whether LLMs are truly bitter-lesson‑pilled because they depend on finite, human-generated datasets that embed bias, challenging the idea that performance can scale indefinitely with more compute and data, source: @karpathy. Sutton advocates a classic RL-first architecture that learns through world interaction without giant supervised pretraining or human teleoperation, emphasizing intrinsic motivation such as fun, curiosity, and prediction-quality rewards, source: @karpathy. He highlights that agents should continue learning at test time by default rather than being trained once and deployed statically, source: @karpathy. Karpathy notes that while AlphaZero shows pure RL can surpass human-initialized systems (AlphaGo), Go is a closed, simplified domain, whereas frontier LLMs rely on human text to initialize billions of parameters before pervasive RL fine-tuning, framing pretraining as "crappy evolution" to solve cold start, source: @karpathy. He adds that today’s LLMs are heavily engineered by humans across pretraining, curation, and RL environments, and the field may not be sufficiently bitter‑lesson‑pilled, source: @karpathy. Actionably, he cites directions like intrinsic motivation, curiosity, empowerment, multi‑agent self‑play, and culture as areas for further work beyond benchmaxxing, positioning the AI‑agent path as an active research narrative, source: @karpathy. |