ranking AI News List

AI News List

List of AI News about ranking

Time	Details
2026-05-16 13:44	Grok Transformer Powers X Feed Algorithm According to @godofprompt, xAI open-sourced X’s new feed where Grok now runs core ranking as the unified transformer driving For You recommendations. Source
2026-04-21 19:12	LLM Judge Bias Exposed: New Position Bias Benchmark Shows Up To 66% Flip Rate — 2026 Analysis According to Ethan Mollick on X (Twitter), large language models used as judges display significant position bias, with judgments flipping when answer order is swapped; he cites Lech Mazur’s New LLM Position Bias Benchmark showing a median 45% flip rate on decisive pairs and a reported 66% flip rate for GPT-5.4 (as reported by Lech Mazur’s thread and benchmark summary). According to Mollick, simple presentation changes materially alter outcomes, indicating current LLM-as-judge pipelines remain unreliable without controls (as reported by Ethan Mollick). According to Lech Mazur, mitigation via better harnessing—multiple judging runs, randomized order, and aggregation—can reduce variance, suggesting practical steps for enterprise evaluation workflows and AI product A/B testing. Business impact: according to Mollick’s post, organizations relying on LLM judges for qualitative assessments (creative scoring, code review, search ranking, and RLHF data curation) should add randomized comparisons, majority voting, and calibration audits to improve consistency and reduce bias-induced risk. Source

Time

Details

2026-05-16
13:44

Grok Transformer Powers X Feed Algorithm

According to @godofprompt, xAI open-sourced X’s new feed where Grok now runs core ranking as the unified transformer driving For You recommendations.

Source

2026-04-21
19:12

LLM Judge Bias Exposed: New Position Bias Benchmark Shows Up To 66% Flip Rate — 2026 Analysis

According to Ethan Mollick on X (Twitter), large language models used as judges display significant position bias, with judgments flipping when answer order is swapped; he cites Lech Mazur’s New LLM Position Bias Benchmark showing a median 45% flip rate on decisive pairs and a reported 66% flip rate for GPT-5.4 (as reported by Lech Mazur’s thread and benchmark summary). According to Mollick, simple presentation changes materially alter outcomes, indicating current LLM-as-judge pipelines remain unreliable without controls (as reported by Ethan Mollick). According to Lech Mazur, mitigation via better harnessing—multiple judging runs, randomized order, and aggregation—can reduce variance, suggesting practical steps for enterprise evaluation workflows and AI product A/B testing. Business impact: according to Mollick’s post, organizations relying on LLM judges for qualitative assessments (creative scoring, code review, search ranking, and RLHF data curation) should add randomized comparisons, majority voting, and calibration audits to improve consistency and reduce bias-induced risk.

Source