QKnorm AI News List

QKnorm AI News List | Blockchain.News

AI News List

List of AI News about QKnorm

Time	Details
2026-03-09 22:28	Karpathy’s Autoresearch Boosts Nanochat Training: 11% Faster Time to GPT-2 Benchmark — Analysis and Business Implications According to Andrej Karpathy on Twitter, an agent-driven autoresearch run tuned the nanochat model and delivered about 20 additive training changes that transferred from a depth-12 to a depth-24 model, reducing the leaderboard Time to GPT-2 from 2.02 hours to 1.80 hours (~11% improvement) as reported by Karpathy. According to Karpathy, the autonomous workflow executed roughly 700 edits and validated improvements via lower validation loss before stacking them for the final result, with specific fixes including adding a scaler to parameterless QKnorm to sharpen attention, applying regularization to Value Embeddings, widening banded attention, correcting AdamW betas, and tuning both weight decay schedules and initialization. As reported by Karpathy, the changes are committed publicly on GitHub (commit 6ed7d1d82cee16c2e26f45d559ad3338447a6c1b), and he plans a second round plus multi-agent parallelism, arguing that frontier labs can generalize this agent-swarm approach to optimize proxy metrics on small models and promote winning ideas to larger scales. According to Karpathy, this creates operational leverage for model training orchestration, suggesting near-term business opportunities in automated hyperparameter optimization platforms, agentic MLOps for training pipelines, and cost-time reduction tools for foundation model pretraining and fine-tuning. Source

Time

Details

2026-03-09
22:28

Karpathy’s Autoresearch Boosts Nanochat Training: 11% Faster Time to GPT-2 Benchmark — Analysis and Business Implications

According to Andrej Karpathy on Twitter, an agent-driven autoresearch run tuned the nanochat model and delivered about 20 additive training changes that transferred from a depth-12 to a depth-24 model, reducing the leaderboard Time to GPT-2 from 2.02 hours to 1.80 hours (~11% improvement) as reported by Karpathy. According to Karpathy, the autonomous workflow executed roughly 700 edits and validated improvements via lower validation loss before stacking them for the final result, with specific fixes including adding a scaler to parameterless QKnorm to sharpen attention, applying regularization to Value Embeddings, widening banded attention, correcting AdamW betas, and tuning both weight decay schedules and initialization. As reported by Karpathy, the changes are committed publicly on GitHub (commit 6ed7d1d82cee16c2e26f45d559ad3338447a6c1b), and he plans a second round plus multi-agent parallelism, arguing that frontier labs can generalize this agent-swarm approach to optimize proxy metrics on small models and promote winning ideas to larger scales. According to Karpathy, this creates operational leverage for model training orchestration, suggesting near-term business opportunities in automated hyperparameter optimization platforms, agentic MLOps for training pipelines, and cost-time reduction tools for foundation model pretraining and fine-tuning.

Source