Ai Benchmarks News | Blockchain.News

AI BENCHMARKS

OpenAI Abandons SWE-bench Verified After Finding 59% of Failed Tests Were Flawed
Ai Benchmarks

OpenAI Abandons SWE-bench Verified After Finding 59% of Failed Tests Were Flawed

OpenAI reveals major contamination issues in SWE-bench Verified benchmark, showing frontier AI models memorized solutions and tests rejected correct code.

Harvey AI Launches Global Legal Benchmark for UK, Australia, Spain
Ai Benchmarks

Harvey AI Launches Global Legal Benchmark for UK, Australia, Spain

Harvey's BigLaw Bench Global doubles benchmark size, testing AI legal capabilities across jurisdictions as model scores hit 90% on core tasks.