nanochat Flash News List

nanochat Flash News List | Blockchain.News

Flash News List

List of Flash News about nanochat

Time	Details
2026-01-07 23:01	Karpathy Reveals nanochat Scaling-Law Breakthrough: Compute-Optimal LLMs on 8x H100 for about $100, CORE-Score Benchmarks vs GPT-2/3 According to @karpathy, nanochat’s first public miniseries v1 demonstrates compute-optimal LLM training across model sizes at fixed FLOPs with an end-to-end pipeline and reproducible scripts. source: @karpathy on X Jan 7, 2026; nanochat GitHub discussion #420 He reports nanochat reproduces Chinchilla-like scaling with equal exponents on parameters and data near 0.5 and a single compute-independent constant of about 8 tokens per parameter versus 20 reported in Chinchilla. source: @karpathy on X Jan 7, 2026; Hoffmann et al. 2022 Chinchilla The sweep from d10 to d20 achieves non-intersecting training curves at batch sizes around 2^19 (about 0.5M) on one 8x H100 node without gradient accumulation. source: @karpathy on X Jan 7, 2026 He aligns nanochat with GPT-2 and estimated GPT-3 using the CORE score for an objective cross-series comparison. source: @karpathy on X Jan 7, 2026; DCLM paper (CORE score) The total experiment cost is about $100 for roughly 4 hours on 8x H100, with all tuning and code pushed to master for reproduction via scaling_laws.sh and miniseries.sh. source: @karpathy on X Jan 7, 2026; nanochat GitHub discussion #420 This implies roughly $3.1 per H100 GPU-hour for the described run, offering a live reference for pricing compute in AI workloads. source: calculation based on @karpathy on X Jan 7, 2026 For crypto markets, decentralized GPU networks that price or facilitate GPU time make these cost and scaling benchmarks directly relevant to workload pricing and benchmarking on networks like Render Network (RNDR) and Akash Network (AKT). source: Render Network documentation; Akash Network documentation Source
2025-10-13 15:16	Andrej Karpathy Releases nanochat: Train a ChatGPT-Style LLM in 4 Hours for about $100 on 8x H100, Setting Clear GPU Cost Benchmarks for Traders According to @karpathy, nanochat is a minimal from-scratch full-stack pipeline that lets users train and serve a simple ChatGPT-like LLM via a single script on a cloud GPU and converse with it in a web UI in about 4 hours, enabling an end-to-end training and inference workflow. source: @karpathy. He specifies the codebase has about 8,000 lines and includes tokenizer training in Rust, pretraining on FineWeb with CORE evaluation, midtraining on SmolTalk and multiple-choice data with tool use, supervised fine-tuning, optional RL on GSM8K via GRPO, and an inference engine with KV cache, Python tool use, CLI, a ChatGPT-like web UI, plus an auto report card. source: @karpathy. Disclosed cost and timing benchmarks are about $100 for roughly 4 hours on an 8x H100 node and about $1000 for about 41.6 hours, with a 24-hour depth-30 run reaching MMLU in the 40s, ARC-Easy in the 70s, and GSM8K in the 20s. source: @karpathy. From these figures, the implied compute rate is roughly $3.1 per H100-hour (about $100 across 32 H100-hours) and about $3.0 per H100-hour at the longer run (about $1000 across 332.8 H100-hours), providing concrete GPU-hour cost benchmarks for trading models of AI training spend. source: @karpathy. He also notes that around 12 hours surpasses GPT-2 on the CORE metric and that capability improves with more training, positioning nanochat as a transparent strong-baseline stack and the capstone for LLM101n with potential as a research harness. source: @karpathy. For crypto market participants tracking AI infrastructure, these cost-performance disclosures offer reference points to assess demand for centralized cloud and decentralized GPU compute tied to open-source LLM training workflows. source: @karpathy. Source

Time

Details

2026-01-07
23:01

Karpathy Reveals nanochat Scaling-Law Breakthrough: Compute-Optimal LLMs on 8x H100 for about $100, CORE-Score Benchmarks vs GPT-2/3

According to @karpathy, nanochat’s first public miniseries v1 demonstrates compute-optimal LLM training across model sizes at fixed FLOPs with an end-to-end pipeline and reproducible scripts. source: @karpathy on X Jan 7, 2026; nanochat GitHub discussion #420 He reports nanochat reproduces Chinchilla-like scaling with equal exponents on parameters and data near 0.5 and a single compute-independent constant of about 8 tokens per parameter versus 20 reported in Chinchilla. source: @karpathy on X Jan 7, 2026; Hoffmann et al. 2022 Chinchilla The sweep from d10 to d20 achieves non-intersecting training curves at batch sizes around 2^19 (about 0.5M) on one 8x H100 node without gradient accumulation. source: @karpathy on X Jan 7, 2026 He aligns nanochat with GPT-2 and estimated GPT-3 using the CORE score for an objective cross-series comparison. source: @karpathy on X Jan 7, 2026; DCLM paper (CORE score) The total experiment cost is about $100 for roughly 4 hours on 8x H100, with all tuning and code pushed to master for reproduction via scaling_laws.sh and miniseries.sh. source: @karpathy on X Jan 7, 2026; nanochat GitHub discussion #420 This implies roughly $3.1 per H100 GPU-hour for the described run, offering a live reference for pricing compute in AI workloads. source: calculation based on @karpathy on X Jan 7, 2026 For crypto markets, decentralized GPU networks that price or facilitate GPU time make these cost and scaling benchmarks directly relevant to workload pricing and benchmarking on networks like Render Network (RNDR) and Akash Network (AKT). source: Render Network documentation; Akash Network documentation

Source

2025-10-13
15:16

Andrej Karpathy Releases nanochat: Train a ChatGPT-Style LLM in 4 Hours for about $100 on 8x H100, Setting Clear GPU Cost Benchmarks for Traders

According to @karpathy, nanochat is a minimal from-scratch full-stack pipeline that lets users train and serve a simple ChatGPT-like LLM via a single script on a cloud GPU and converse with it in a web UI in about 4 hours, enabling an end-to-end training and inference workflow. source: @karpathy. He specifies the codebase has about 8,000 lines and includes tokenizer training in Rust, pretraining on FineWeb with CORE evaluation, midtraining on SmolTalk and multiple-choice data with tool use, supervised fine-tuning, optional RL on GSM8K via GRPO, and an inference engine with KV cache, Python tool use, CLI, a ChatGPT-like web UI, plus an auto report card. source: @karpathy. Disclosed cost and timing benchmarks are about $100 for roughly 4 hours on an 8x H100 node and about $1000 for about 41.6 hours, with a 24-hour depth-30 run reaching MMLU in the 40s, ARC-Easy in the 70s, and GSM8K in the 20s. source: @karpathy. From these figures, the implied compute rate is roughly $3.1 per H100-hour (about $100 across 32 H100-hours) and about $3.0 per H100-hour at the longer run (about $1000 across 332.8 H100-hours), providing concrete GPU-hour cost benchmarks for trading models of AI training spend. source: @karpathy. He also notes that around 12 hours surpasses GPT-2 on the CORE metric and that capability improves with more training, positioning nanochat as a transparent strong-baseline stack and the capstone for LLM101n with potential as a research harness. source: @karpathy. For crypto market participants tracking AI infrastructure, these cost-performance disclosures offer reference points to assess demand for centralized cloud and decentralized GPU compute tied to open-source LLM training workflows. source: @karpathy.

Source