grep AI News List

predict.info — Premium Domain For Sale Domain only: USD 200,000. Prediction platform technology priced separately. predict.info

Inquire

AI News List

List of AI News about grep

Time	Details
2026-04-05 15:00	Coding Agents Beat Million-Token Context Models: Duke’s Grep and Sed Breakthrough Shows 17.3% Avg Gain Across 5 Long-Context Benchmarks According to God of Prompt on X, citing Duke University researchers, off-the-shelf coding agents using terminal tools like grep and sed outperform long-context LLMs by an average of 17.3% across five benchmarks ranging from 188K to 3 trillion tokens, with no task-specific training or architectural changes. As reported by the X thread, the agents navigated directory-structured corpora, autonomously chaining multi-hop searches, extracting entities, and even writing Python classifiers, beating prior state of the art on four of five tests including BrowseComp-Plus (88.5% vs 80.0%) and Natural Questions over a 3T-token corpus (56.0% vs 50.9%). According to the same source, adding retrievers like BM25 or dense embeddings often reduced performance by suppressing the agents’ native filesystem exploration, while organizing text as hierarchical files (not a single flat JSON) yielded a 6-point advantage. Business impact: as reported by the X thread, enterprises can cut RAG complexity and long-context costs by packaging large document stores as repository-like folders and leveraging code-focused agents (e.g., Codex, Claude Code) with shell tools, enabling scalable, auditable long-document QA and analytics without fine-tuning. Source

Time

Details

2026-04-05
15:00

Coding Agents Beat Million-Token Context Models: Duke’s Grep and Sed Breakthrough Shows 17.3% Avg Gain Across 5 Long-Context Benchmarks

According to God of Prompt on X, citing Duke University researchers, off-the-shelf coding agents using terminal tools like grep and sed outperform long-context LLMs by an average of 17.3% across five benchmarks ranging from 188K to 3 trillion tokens, with no task-specific training or architectural changes. As reported by the X thread, the agents navigated directory-structured corpora, autonomously chaining multi-hop searches, extracting entities, and even writing Python classifiers, beating prior state of the art on four of five tests including BrowseComp-Plus (88.5% vs 80.0%) and Natural Questions over a 3T-token corpus (56.0% vs 50.9%). According to the same source, adding retrievers like BM25 or dense embeddings often reduced performance by suppressing the agents’ native filesystem exploration, while organizing text as hierarchical files (not a single flat JSON) yielded a 6-point advantage. Business impact: as reported by the X thread, enterprises can cut RAG complexity and long-context costs by packaging large document stores as repository-like folders and leveraging code-focused agents (e.g., Codex, Claude Code) with shell tools, enabling scalable, auditable long-document QA and analytics without fine-tuning.

Source