Alibaba's Tongyi DeepResearch AI Agent Surpasses GPT-4o and DeepSeek-V3 in Deep Research Using Only 3.3B Active Parameters | AI News Detail | Blockchain.News
Latest Update
10/30/2025 10:00:00 AM

Alibaba's Tongyi DeepResearch AI Agent Surpasses GPT-4o and DeepSeek-V3 in Deep Research Using Only 3.3B Active Parameters

Alibaba's Tongyi DeepResearch AI Agent Surpasses GPT-4o and DeepSeek-V3 in Deep Research Using Only 3.3B Active Parameters

According to @godofprompt, Alibaba has released Tongyi DeepResearch, a 30B parameter open-source AI agent that outperforms GPT-4o and DeepSeek-V3 in deep research tasks while using just 3.3B active parameters (source: https://twitter.com/godofprompt/status/1983836518067401208). Unlike the industry trend of scaling to 600B+ parameters, Alibaba's innovation lies in its training approach. The model introduces 'agentic mid-training,' an intermediate phase that teaches the AI how to act as an agent before learning specific tasks, bridging the gap between language pre-training and task-specific post-training. This paradigm shift addresses the alignment issues seen in traditional supervised fine-tuning and reinforcement learning. All training data is AI-generated, with no human annotation, and includes complex, multi-hop reasoning samples. The model achieves state-of-the-art results: 32.9% on Humanity's Last Exam, 43.4% on BrowseComp, and 75% on xbench-DeepSearch. Remarkably, training was done on just two H100 GPUs for two days at under $500 per task. This demonstrates significant business opportunities for cost-efficient, high-performing AI agents and signals a shift toward smarter training over brute-force scaling (source: arxiv.org/abs/2510.24701; github.com/Alibaba-NLP/DeepResearch).

Source

Analysis

In the rapidly evolving landscape of artificial intelligence, Alibaba's recent release of Tongyi DeepResearch marks a significant shift towards parameter-efficient AI agents capable of outperforming larger models in complex research tasks. Announced on October 30, 2025, this 30 billion parameter model activates only 3.3 billion parameters during operation, yet it surpasses benchmarks set by industry leaders like GPT-4o and DeepSeek-V3. According to the research paper on arXiv, the model's innovation lies in its agentic mid-training phase, which bridges pre-training and post-training by instilling agent-like behaviors early on. This approach addresses alignment conflicts where models struggle to balance agentic capabilities with user preferences during standard supervised fine-tuning and reinforcement learning. The training data is entirely synthetic, generated without human annotation, using a knowledge graph system that incorporates uncertainty and scales difficulty dynamically. Key performance metrics include a 32.9 percent score on Humanity's Last Exam, compared to 26.6 percent for OpenAI's DeepResearch, and 43.4 percent on BrowseComp versus 30.0 percent for DeepSeek-V3.1, as detailed in the same arXiv paper. With heavy mode enabling parallel agents and synthesis, scores rise to 38.3 percent on Humanity's Last Exam and 58.3 percent on BrowseComp. This development comes amid a broader industry trend where companies like OpenAI and Google are pushing towards massive scales exceeding 600 billion parameters, but Alibaba demonstrates that smarter training paradigms can achieve state-of-the-art reasoning with far less computational overhead. Trained on just two H100 GPUs for two days at under 500 dollars for specific tasks, per the GitHub repository for Alibaba-NLP DeepResearch, this model challenges the brute-force scaling narrative dominating AI research since 2023. In the context of global AI advancements, this open-source release democratizes access to advanced agentic AI, potentially accelerating innovation in sectors like academic research and data analysis where deep, multi-hop reasoning is essential. By focusing on efficiency, Alibaba positions itself as a leader in sustainable AI development, especially as energy costs for training large models have skyrocketed, with reports from 2024 indicating that training a single GPT-like model can consume electricity equivalent to thousands of households annually.

From a business perspective, Tongyi DeepResearch opens up substantial market opportunities by enabling cost-effective deployment of AI agents in enterprise environments. As of October 2025, the AI agent market is projected to grow from 5.2 billion dollars in 2024 to over 20 billion dollars by 2030, driven by demand for autonomous systems in research, customer service, and decision-making, according to market analysis from Statista. Alibaba's model, with its 128K context window and ability to handle superhuman complexity like 20 percent of training samples exceeding 32K tokens with over 10 tool invocations, offers businesses a way to integrate advanced reasoning without the prohibitive costs associated with proprietary giants. Monetization strategies could include licensing the open-source framework for customized agent development, or offering cloud-based services through Alibaba Cloud, which already hosts similar AI tools. Key players in the competitive landscape, such as Microsoft with its Azure AI and Anthropic's Claude, may face pressure to optimize their models similarly, fostering a shift towards efficiency-focused R&D. Regulatory considerations are crucial here; for instance, the EU's AI Act, effective from August 2024, emphasizes transparency and energy efficiency, which this model aligns with by reducing computational demands. Ethical implications involve ensuring synthetic data generation avoids biases, and Alibaba's approach includes best practices like injecting uncertainty to mimic real-world scenarios. Businesses can capitalize on this by addressing implementation challenges, such as integrating the model with existing workflows, through phased rollouts and pilot programs. For example, in the pharmaceutical industry, where deep research agents could accelerate drug discovery, companies might save millions by using efficient models instead of resource-intensive ones, as evidenced by a 2025 McKinsey report highlighting AI's potential to cut R&D costs by 20 to 30 percent. Overall, this breakthrough signals a pivot in AI business models towards accessibility and scalability, empowering startups and SMEs to compete with tech behemoths.

Delving into the technical details, Tongyi DeepResearch's architecture leverages a novel training pipeline that includes agentic mid-training to embed behaviors like autonomous searching, reasoning, and synthesis before task-specific learning, as explained in the arXiv research paper from October 2025. This results in superior performance on benchmarks like 75.0 percent on xbench-DeepSearch versus 70.0 percent for GLM-4.5, and a leading 90.6 percent on FRAMES. Implementation considerations include the model's compatibility with standard hardware, requiring only modest resources for fine-tuning, which mitigates challenges like high energy consumption that plagued models in 2024. Future outlook points to widespread adoption in AI-driven automation, with predictions from Gartner in 2025 forecasting that 40 percent of knowledge work will be augmented by agents by 2028. Challenges such as ensuring model robustness in uncertain environments can be solved through continued open-source contributions, as seen on the GitHub repository. Ethical best practices recommend regular audits for alignment, building on frameworks established by the AI Alliance in 2024. In summary, this development not only enhances the competitive edge for Alibaba but also sets a precedent for the industry to prioritize intelligent design over sheer scale, potentially leading to more innovative and equitable AI ecosystems by 2030.

FAQ: What is Tongyi DeepResearch? Tongyi DeepResearch is an open-source AI agent developed by Alibaba, released in October 2025, that excels in deep research tasks with high efficiency. How does it compare to GPT-4o? It outperforms GPT-4o in several benchmarks, such as Humanity's Last Exam, using fewer active parameters. What are the business benefits? Businesses can leverage its cost-effectiveness for tasks like research and analysis, reducing operational expenses significantly.

God of Prompt

@godofprompt

An AI prompt engineering specialist sharing practical techniques for optimizing large language models and AI image generators. The content features prompt design strategies, AI tool tutorials, and creative applications of generative AI for both beginners and advanced users.