Thinking Tokens Boost LLM Performance

According to emollick, adding more thinking tokens keeps improving LLM hacking, math, and science with no plateau per UK AISI data.

Source

Analysis

In the rapidly evolving landscape of artificial intelligence, recent insights from industry experts highlight the enduring power of scaling laws in enhancing large language model (LLM) performance. On May 15, 2026, Ethan Mollick, a prominent AI researcher and professor, shared a tweet emphasizing the 'Second Scaling Law,' which suggests that adding more 'thinking tokens'—essentially extending the model's reasoning process through additional computational steps—can significantly boost capabilities in complex tasks like math, science, crossword puzzles, and even simulated hacking scenarios. This observation stems from an update by the UK AI Safety Institute (AISI), as referenced in Mollick's post and further detailed by Natália Coelho, indicating no plateau in performance gains yet observed.

Key Takeaways

The Second Scaling Law focuses on inference-time scaling, where increasing thinking tokens during model usage leads to better outcomes in diverse domains, according to updates from the UK AI Safety Institute.
No performance plateau has been detected, suggesting ongoing potential for LLM improvements without solely relying on larger training datasets or more compute during pre-training.
This trend opens doors for practical AI applications in business, from enhanced problem-solving in R&D to optimized decision-making in competitive industries.

Deep Dive into the Second Scaling Law

Scaling laws in AI, first popularized by research from OpenAI in 2020, describe how model performance improves predictably with increases in data, parameters, and compute. The 'Second Scaling Law,' as discussed in recent analyses, shifts focus to test-time or inference-time scaling. This involves prompting models to generate intermediate reasoning steps—termed 'thinking tokens'—which mimic human-like chain-of-thought processes. According to Ethan Mollick's May 2026 tweet, this method has shown remarkable efficacy across tasks, including those requiring logical deduction and creative problem-solving.

Research Breakthroughs and Evidence

A key update comes from the UK AI Safety Institute's report, which Mollick references, demonstrating that models like 'Mythos Preview' achieve higher benchmarks when allocated more tokens for deliberation. Natália Coelho's analysis on X (formerly Twitter) visualizes this with cost on the x-axis, showing linear improvements without diminishing returns. This aligns with earlier findings from a 2023 paper by Google DeepMind on chain-of-thought prompting, which improved arithmetic reasoning by up to 50% in models like PaLM.

Implementation involves techniques such as self-consistency, where models generate multiple reasoning paths and vote on the best answer, or tree-of-thoughts, exploring branching possibilities. Challenges include higher computational costs and latency, but solutions like efficient token management in frameworks such as Hugging Face's Transformers library mitigate these.

Business Impact and Opportunities

For businesses, this scaling law translates to tangible opportunities in AI-driven innovation. In sectors like finance and healthcare, LLMs enhanced with thinking tokens can perform advanced risk assessments or diagnostic simulations more accurately. Market trends indicate a growing demand for such capabilities; a 2024 Gartner report predicts that by 2027, 70% of enterprises will adopt inference-time optimization for AI tools.

Monetization strategies include developing SaaS platforms that offer 'thinking-enhanced' AI services, charging based on token usage. Companies like Anthropic and OpenAI are already integrating these in products like Claude and GPT-4, creating competitive edges. Regulatory considerations, such as those from the EU AI Act of 2024, emphasize transparency in high-risk AI systems, requiring businesses to document scaling methods for compliance. Ethically, best practices involve auditing for biases amplified by extended reasoning, ensuring fair outcomes.

Future Outlook

Looking ahead, the absence of a plateau in the Second Scaling Law predicts a shift toward hybrid scaling models, combining pre-training with dynamic inference. By 2030, this could lead to AI systems rivaling human experts in specialized fields, per projections from the AI Index 2024 by Stanford University. Industry shifts may favor startups specializing in efficient compute, while key players like Google and Meta invest in hardware optimizations. However, ethical implications demand vigilant oversight to prevent misuse in sensitive areas.

Frequently Asked Questions

What is the Second Scaling Law in AI?

The Second Scaling Law refers to performance gains from increasing inference-time compute, such as adding thinking tokens, as highlighted in Ethan Mollick's 2026 analysis and UK AISI updates.

How does adding thinking tokens improve LLM performance?

By enabling chain-of-thought reasoning, models break down complex problems, leading to better accuracy in tasks like math and science, with no observed plateau according to recent reports.

What are the business opportunities from this trend?

Businesses can monetize through enhanced AI tools for decision-making, with market growth projected by Gartner, focusing on sectors like finance and R&D.

Are there challenges in implementing thinking tokens?

Yes, including higher costs and latency, but solutions like optimized libraries address these, per industry practices.

What ethical considerations arise?

Potential bias amplification requires auditing, aligning with regulations like the EU AI Act for responsible deployment.

Mythos OpenAI reasoning UK AISI

Ethan Mollick

@emollick

Professor @Wharton studying AI, innovation & startups. Democratizing education using tech