List of AI News about GPT45
| Time | Details |
|---|---|
|
2026-03-10 23:56 |
Weak AGI Criteria Debate: GPT-4.5, GPT-3, and GPT-4 Benchmarks Analyzed — Latest 2026 Analysis
According to Ethan Mollick on X, citing a post by Stefan Schubert, claims of meeting "weak AGI" criteria hinge on several benchmarks: a Loebner Prize–style weak Turing Test allegedly met by GPT-4.5, Winograd Schema Challenge performance attributed to GPT-3, and approximately 75% SAT accuracy by GPT-4, with an Atari 1984 game competency suggested as the remaining item; however, as reported by Metaculus via Mollick, forecasters now expect "weak AGI" to arrive later than they did pre-ChatGPT, indicating continued uncertainty about standard definitions and verification of these benchmarks as industry milestones. According to the linked X posts by Mollick and Schubert, these assertions are discussion points rather than peer-reviewed validations, underscoring the need for audited, reproducible evaluations before labeling progress as "weak AGI." |
|
2026-02-14 03:52 |
Metacalculus Bet Update: GPT-4.5 Nears ‘Weakly General AI’ Milestone — Only Classic Atari Remains
According to Ethan Mollick on X, the long-standing Metacalculus bet for reaching “weakly general artificial intelligence” has three of four proxies reportedly met: a Loebner Prize–equivalent weak Turing Test by GPT-4.5, Winograd Schema Challenge by GPT-3, and 75% SAT performance by GPT-4, leaving only a classic Atari game benchmark outstanding. As reported by Mollick’s post, these claims suggest rapid progress across language understanding and standardized testing, but independent, peer-reviewed confirmations for each proxy vary and should be verified against original evaluations. According to prior public benchmarks, Winograd-style tasks have seen strong model performance, SAT scores near or above the cited threshold have been reported for GPT-4 by OpenAI’s technical documentation, and Atari performance is a long-standing reinforcement learning yardstick, highlighting a remaining gap in embodied or interactive competence. For businesses, this signals near-term opportunities to productize high-stakes reasoning (test-prep automation, policy Q&A, enterprise knowledge assistants) while monitoring interactive-agent performance on game-like environments as a proxy for tool use, planning, and autonomy. As reported by Metaculus community forecasts, milestone framing can shift timelines and investment focus; organizations should track third-party evaluations and reproducible benchmarks before recalibrating roadmaps. |
