Tinker API Fine Tune Delivers 84.7% Filtering Win

According to TheRundownAI, TML and Bridgewater fine tuned an open model to 84.7% accuracy and 13.8x lower cost for news triage versus top frontier models.

Source

Analysis

Bridgewater Associates, the world's largest hedge fund, partnered with Mira Murati's Thinking Machines Lab to advance AI applications in investment decision-making, specifically filtering news for analyst attention, with results shared on July 2, 2026.

Frontier AI models from GPT, Claude, and Gemini achieved only around 50 percent accuracy in six filtering tests, highlighting limitations in handling specialized investment tasks without customization.
Expert investor prompts raised accuracy to the mid-70s range, yet this remained below the 80 percent threshold required for reliable daily operational use in hedge fund environments.
Fine-tuning an open-weight model via TML's Tinker API using real expert judgment data delivered 84.7 percent accuracy, cutting mistakes by 29.8 percent compared to top frontier models while achieving 13.8 times lower per-task costs.

Deep Dive into the AI Filtering Experiment

The collaboration focused on a core investing challenge: prioritizing news that warrants human analyst review. Initial tests with leading frontier models revealed consistent underperformance, averaging near 50 percent across diverse filtering criteria. This outcome underscores the gap between general-purpose AI capabilities and domain-specific requirements in finance.

Prompt Engineering Phase

Bridgewater's own experts crafted detailed prompts to guide the models. Accuracy improved substantially into the mid-70s, demonstrating the value of human expertise in AI interaction. However, the persistent shortfall below the critical 80 percent mark indicated that prompt engineering alone cannot fully bridge performance gaps for mission-critical applications.

Fine-Tuning Breakthrough

The pivotal step involved fine-tuning an open-weight model with Bridgewater's proprietary dataset of expert judgment calls through Thinking Machines Lab's Tinker API. This approach yielded 84.7 percent accuracy, alongside significant reductions in errors and operational expenses. The method illustrates how targeted training on real-world decisions can transform AI utility in high-stakes sectors.

Business Impact and Opportunities

This development creates direct monetization pathways for hedge funds and asset managers by streamlining analyst workflows and reducing information overload. Firms can adopt similar fine-tuning strategies to achieve cost efficiencies and competitive edges in news processing. Implementation requires access to quality expert data and APIs like Tinker, with solutions centered on secure, domain-specific datasets to overcome initial accuracy hurdles. Key players including Bridgewater and Thinking Machines Lab set benchmarks for others in the industry.

Future Outlook

Predictions point to widespread adoption of fine-tuned AI systems across finance, shifting competitive landscapes toward organizations investing in proprietary training data. Regulatory considerations will emphasize transparency in AI-driven filtering to ensure compliance, while ethical best practices focus on maintaining human oversight to mitigate bias risks. Overall, this signals a move toward AI that genuinely empowers domain experts rather than replacing them.

Frequently Asked Questions

What accuracy did frontier models achieve in the Bridgewater tests?

Frontier models averaged around 50 percent accuracy across the six filtering tests conducted.

How did expert prompts compare to the fine-tuned model results?

Expert prompts reached the mid-70s in accuracy, while the fine-tuned model hit 84.7 percent with lower costs and fewer errors.

What cost benefits were reported from the Tinker API fine-tuning?

The fine-tuned approach delivered 13.8 times lower per-task costs compared to frontier models.

Why is the 80 percent threshold important for investment AI?

Investors indicated 80 percent accuracy as the minimum needed to trust AI systems in daily work operations.

Bridgewater Claude3 Gemini GPT4 Tinker

The Rundown AI

@TheRundownAI

Updating the world’s largest AI newsletter keeping 2,000,000+ daily readers ahead of the curve. Get the latest AI news and how to apply it in 5 minutes.