cyclomatic complexity AI News List

predict.info — Premium Domain For Sale Domain only: USD 200,000. Prediction platform technology priced separately. predict.info

Inquire

cyclomatic complexity AI News List | Blockchain.News

AI News List

List of AI News about cyclomatic complexity

Time	Details
2026-03-29 19:21	SlopCodeBench Analysis: Wisconsin and MIT Expose AI Coding Benchmark Failures with 11 Models, 93 Checkpoints, and 0 End to End Solves According to God of Prompt on X, researchers from the University of Wisconsin and MIT introduced SlopCodeBench, showing that pass rate focused AI coding benchmarks miss structural decay in iterative software development; across 11 models including Claude Opus 4.6 and GPT 5.4, zero models solved a problem end to end and verbosity rose in 89.8% of trajectories (as reported by God of Prompt). According to the same X thread, SlopCodeBench uses 20 problems and 93 checkpoints, forcing models to extend their own prior code with updated specs, revealing rising cyclomatic complexity and duplicated scaffolds even when tests continue to pass. As reported by God of Prompt, agent erosion measured 0.68 versus 0.31 for human maintained repos, agent verbosity 0.32 versus 0.11 for humans, costs grew 2.9x without correctness gains, and the highest strict solve rate across models was 17.2%. According to the thread, anti slop prompting reduced initial verbosity by 34.5% on GPT 5.4 but did not change the degradation slope, implying architectural incentives drive local optimizations that accumulate complexity—highlighting business risks for AI code assistants and the need for benchmarks that measure maintainability, extensibility, and lifecycle cost. Source

Time

Details

2026-03-29
19:21

SlopCodeBench Analysis: Wisconsin and MIT Expose AI Coding Benchmark Failures with 11 Models, 93 Checkpoints, and 0 End to End Solves

According to God of Prompt on X, researchers from the University of Wisconsin and MIT introduced SlopCodeBench, showing that pass rate focused AI coding benchmarks miss structural decay in iterative software development; across 11 models including Claude Opus 4.6 and GPT 5.4, zero models solved a problem end to end and verbosity rose in 89.8% of trajectories (as reported by God of Prompt). According to the same X thread, SlopCodeBench uses 20 problems and 93 checkpoints, forcing models to extend their own prior code with updated specs, revealing rising cyclomatic complexity and duplicated scaffolds even when tests continue to pass. As reported by God of Prompt, agent erosion measured 0.68 versus 0.31 for human maintained repos, agent verbosity 0.32 versus 0.11 for humans, costs grew 2.9x without correctness gains, and the highest strict solve rate across models was 17.2%. According to the thread, anti slop prompting reduced initial verbosity by 34.5% on GPT 5.4 but did not change the degradation slope, implying architectural incentives drive local optimizations that accumulate complexity—highlighting business risks for AI code assistants and the need for benchmarks that measure maintainability, extensibility, and lifecycle cost.

Source