List of Flash News about karpathy
| Time | Details |
|---|---|
|
2025-10-26 16:24 |
PyTorch MPS addcmul_ Silent-Failure Bug on Non-Contiguous Tensors Flags AI Training Risk: What Traders Should Watch
According to @karpathy, a detailed debugging investigation traced a suspicious training loss curve to a PyTorch MPS backend issue where addcmul_ silently fails on non-contiguous output tensors in the Objective-C++ path, pointing to a correctness bug that does not throw errors during training; Source: @karpathy on X https://twitter.com/karpathy/status/1982483540899237981 and the referenced thread by @ElanaPearl https://x.com/ElanaPearl/status/1981389648695025849. For AI workflow reliability, this implies Mac Apple MPS-based training can yield incorrect results without explicit runtime alerts, directly impacting the integrity of model training and evaluation pipelines used by practitioners; Source: @karpathy on X https://twitter.com/karpathy/status/1982483540899237981 and @ElanaPearl on X https://x.com/ElanaPearl/status/1981389648695025849. For traders, treat this as a software reliability risk flag within the AI toolchain and monitor official PyTorch or Apple MPS updates and release notes that reference addcmul_ or non-contiguous tensor handling, as confirmed fixes would reduce operational uncertainty around AI workloads that markets track for sentiment; Source: @karpathy on X https://twitter.com/karpathy/status/1982483540899237981 and @ElanaPearl on X https://x.com/ElanaPearl/status/1981389648695025849. |
|
2025-10-24 15:35 |
Karpathy Unveils SpellingBee for nanochat d32: Step-by-Step SFT/RL Finetuning Guide to Add Letter-Counting Capability and Its AI-Token Implications
According to @karpathy, he released a full guide showing how a new synthetic task called SpellingBee teaches nanochat d32 to count letters in words like strawberry by generating user-assistant training pairs and midtraining or SFT finetuning, with optional RL to improve robustness, source: Karpathy X post dated Oct 24, 2025; GitHub nanochat discussion 164. The method stresses diverse user prompts, careful tokenization and whitespace handling, breaking reasoning into multiple tokens by standardizing the word, spelling it out, iterating with an explicit counter, and encouraging two solution paths via manual reasoning and Python tool use, source: Karpathy X post dated Oct 24, 2025; GitHub nanochat discussion 164. Karpathy notes that because nanochat d32 is small, the capability is encouraged by over-representing examples in the dataset, and reliability can be further improved by simulating mistakes in data or running RL, source: Karpathy X post dated Oct 24, 2025; GitHub nanochat discussion 164. For traders, open-source progress on small LLM tooling has coincided with episodic attention flows to AI-linked crypto assets such as RNDR, FET, and AGIX around major AI catalysts, with Kaiko reporting AI token rallies around Nvidia earnings in 2024, source: Kaiko Research 2024 weekly market reports; Nvidia 2024 earnings releases. No token or product launch is included here; this is a technical training guide and example set for capability injection into a small LLM, source: Karpathy X post dated Oct 24, 2025; GitHub nanochat discussion 164. |
|
2025-10-21 15:59 |
Andrej Karpathy Unveils nanochat d32: $800 Synthetic-Data Custom LLM Identity and Script Release, Key Signals for AI Agent Builders
According to @karpathy, nanochat now carries a defined identity and can state its capabilities, including that it is nanochat d32 built by him with a reported $800 cost and weaker non-English proficiency, achieved via synthetic data generation, source: x.com/karpathy/status/1980508380860150038. He released an example script that demonstrates generating diverse synthetic conversations and mixing them into mid-training or SFT, stressing the importance of entropy to avoid repetitive datasets, source: x.com/karpathy/status/1980508380860150038. He adds that base LLMs lack inherent personality or self-knowledge and require explicitly bolted-on traits via curated synthetic data, source: x.com/karpathy/status/1980508380860150038. For traders, the disclosed $800 customization benchmark and open-source workflow provide concrete cost and process reference points for evaluating open-source AI agent development and adoption paths across AI-linked tokens and AI-exposed equities, source: twitter.com/karpathy/status/1980665134415802554. |
|
2025-10-20 22:13 |
Andrej Karpathy: DeepSeek-OCR Signals 4 Reasons Pixels May Beat Text Tokens for LLM Inputs — Efficiency, Shorter Context Windows, Bidirectional Attention, No Tokenizer
According to Andrej Karpathy, the DeepSeek-OCR paper is a strong OCR model and more importantly highlights why pixels might be superior to text tokens as inputs to large language models, emphasizing model efficiency and input fidelity, source: Andrej Karpathy on X, Oct 20, 2025. He states that rendering text to images and feeding pixels can deliver greater information compression, enabling shorter context windows and higher efficiency, source: Andrej Karpathy on X, Oct 20, 2025. He adds that pixel inputs provide a more general information stream that preserves formatting such as bold and color and allows arbitrary images alongside text, source: Andrej Karpathy on X, Oct 20, 2025. He argues that image inputs enable bidirectional attention by default instead of autoregressive attention at the input stage, which he characterizes as more powerful for processing, source: Andrej Karpathy on X, Oct 20, 2025. He advocates removing the tokenizer at input due to the complexity and risks of Unicode and byte encodings, including security or jailbreak issues such as continuation bytes and semantic mismatches for emojis, source: Andrej Karpathy on X, Oct 20, 2025. He frames OCR as one of many vision-to-text tasks and suggests many text-to-text tasks can be reframed as vision-to-text, while the reverse is not generally true, source: Andrej Karpathy on X, Oct 20, 2025. He proposes a practical setup where user messages are images while the assistant response remains text and notes outputting pixels is less obvious, and he mentions an urge to build an image-input-only version of nanochat while referencing the vLLM project, source: Andrej Karpathy on X, Oct 20, 2025. |
|
2025-10-20 18:58 |
Karpathy on Text Diffusion for LLMs (2025): Bidirectional Attention Raises Training Cost vs Autoregression
According to @karpathy, text diffusion for language can be implemented with a vanilla transformer using bidirectional attention that iteratively re-masks and re-samples all tokens on a noise schedule. Source: @karpathy. He states diffusion is the pervasive generative paradigm in image and video, while autoregression remains dominant in text and audio shows a mix of both. Source: @karpathy. He adds that removing heavy formalism reveals simple baseline algorithms, with discrete diffusion closer to flow matching in continuous settings. Source: @karpathy. He explains that autoregression appends tokens while attending backward, whereas diffusion refreshes the entire token canvas while attending bidirectionally. Source: @karpathy. He notes bidirectional attention yields stronger language models but makes training more expensive because sequence dimension parallelization is not possible. Source: @karpathy. He suggests it may be possible to interpolate or generalize between diffusion and autoregression in the LLM stack. Source: @karpathy. For traders, the actionable takeaway is the compute cost trade-off of bidirectional text diffusion versus autoregression, which directly affects training efficiency assumptions. Source: @karpathy. |
|
2025-10-18 20:23 |
Karpathy’s Decade of Agents: 10-Year AGI Timeline, RL Skepticism, and Security-First LLM Tools for Crypto Builders and Traders
According to @karpathy, AGI is on roughly a 10-year horizon he describes as a decade of agents, citing major remaining work in integration, real-world sensors and actuators, societal alignment, and security, and noting his timeline is 5-10x more conservative than prevailing hype, source: @karpathy on X, Oct 18, 2025. He is long agentic interaction but skeptical of reinforcement learning due to poor signal-to-compute efficiency and noise, and he highlights alternative learning paradigms such as system prompt learning with early deployed examples like ChatGPT memory, source: @karpathy on X, Oct 18, 2025. He urges collaborative, verifiable LLM tooling over fully autonomous code-writing agents and warns that overshooting capability can accumulate slop and increase vulnerabilities and security breaches, source: @karpathy on X, Oct 18, 2025. He advocates building a cognitive core by reducing memorization to improve generalization and expects models to get larger before they can get smaller, source: @karpathy on X, Oct 18, 2025. He also contrasts LLMs as ghost-like entities prepackaged via next-token prediction with animals prewired by evolution, and suggests making models more animal-like over time, source: @karpathy on X, Oct 18, 2025. For crypto builders and traders, this points to prioritizing human-in-the-loop agent workflows, code verification, memory-enabled tooling, and security-first integrations over promises of fully autonomous AGI, especially where software defects and vulnerabilities carry on-chain risk, source: @karpathy on X, Oct 18, 2025. |
|
2025-10-16 00:14 |
Karpathy Unveils $1,000 nanochat d32: 33-Hour Train, CORE 0.31, GSM8K 20% — Watch AI Compute Tokens RNDR, AKT, TAO
According to @karpathy, the depth-32 nanochat d32 trained for about 33 hours at roughly $1,000 and showed consistent metric gains across pretraining, SFT, and RL (Source: Karpathy on X; Karpathy GitHub nanochat discussion). He reports a CORE score of 0.31 versus GPT-2 at about 0.26 and GSM8K improvement from around 8% to about 20%, indicating a notable uplift for a micro model (Source: Karpathy on X; Karpathy GitHub nanochat discussion). He cautions that nanochat costs $100–$1,000 to train and the $100 version is about 1/1000th the size of GPT-3, leading to frequent hallucinations and limited reliability compared to frontier LLMs, so user expectations should remain modest (Source: Karpathy on X). He adds that scripts including run1000 sh are available in the repo, he is temporarily hosting the model for testing, and he plans throughput tuning before possibly scaling to a larger tier (Source: Karpathy on X; Karpathy GitHub repository). For traders, decentralized GPU networks that market AI workload support such as Render (RNDR), Akash (AKT), and Bittensor (TAO) remain key watchlist names as open-source, low-cost training expands developer experimentation (Source: Render Network documentation; Akash Network documentation; Bittensor documentation). |
|
2025-10-13 15:16 |
Andrej Karpathy Releases nanochat: Train a ChatGPT-Style LLM in 4 Hours for about $100 on 8x H100, Setting Clear GPU Cost Benchmarks for Traders
According to @karpathy, nanochat is a minimal from-scratch full-stack pipeline that lets users train and serve a simple ChatGPT-like LLM via a single script on a cloud GPU and converse with it in a web UI in about 4 hours, enabling an end-to-end training and inference workflow. source: @karpathy. He specifies the codebase has about 8,000 lines and includes tokenizer training in Rust, pretraining on FineWeb with CORE evaluation, midtraining on SmolTalk and multiple-choice data with tool use, supervised fine-tuning, optional RL on GSM8K via GRPO, and an inference engine with KV cache, Python tool use, CLI, a ChatGPT-like web UI, plus an auto report card. source: @karpathy. Disclosed cost and timing benchmarks are about $100 for roughly 4 hours on an 8x H100 node and about $1000 for about 41.6 hours, with a 24-hour depth-30 run reaching MMLU in the 40s, ARC-Easy in the 70s, and GSM8K in the 20s. source: @karpathy. From these figures, the implied compute rate is roughly $3.1 per H100-hour (about $100 across 32 H100-hours) and about $3.0 per H100-hour at the longer run (about $1000 across 332.8 H100-hours), providing concrete GPU-hour cost benchmarks for trading models of AI training spend. source: @karpathy. He also notes that around 12 hours surpasses GPT-2 on the CORE metric and that capability improves with more training, positioning nanochat as a transparent strong-baseline stack and the capstone for LLM101n with potential as a research harness. source: @karpathy. For crypto market participants tracking AI infrastructure, these cost-performance disclosures offer reference points to assess demand for centralized cloud and decentralized GPU compute tied to open-source LLM training workflows. source: @karpathy. |
|
2025-10-09 00:10 |
Andrej Karpathy flags RLHF flaw: LLMs fear exceptions and calls for reward redesign in RL training
According to Andrej Karpathy, current reinforcement learning practices make LLMs mortally terrified of exceptions, and he argues exceptions are a normal part of a healthy development process, as stated on Twitter on Oct 9, 2025. Karpathy urged the community to sign his LLM welfare petition to improve rewards in cases of exceptions, as stated on Twitter on Oct 9, 2025. The post includes no references to cryptocurrencies, tokens, or market data, indicating no direct market update from the source, as stated on Twitter on Oct 9, 2025. |
|
2025-10-03 13:37 |
Karpathy: LLM Agent Coding Not Ready for Half of Professional Work Despite ~50% ‘Mostly Agent’ Poll Signal
According to Andrej Karpathy, an X poll he referenced showed roughly half of respondents reporting they mostly use agent‑mode coding, contrary to his expectation of 50 percent tab‑complete, 30 percent manual, 20 percent agent, source: Andrej Karpathy on X, Oct 3, 2025, https://x.com/karpathy/status/1974106507034964111; poll link https://x.com/karpathy/status/1973892769359056997. He states his own workflow is primarily tab completion and he turns it off when not useful, using agents mainly for boilerplate or unfamiliar stacks with substantial review and edits, source: Andrej Karpathy on X, Oct 3, 2025, https://x.com/karpathy/status/1974106507034964111. He warns that when tasks are deep, tangled, or off the data manifold, LLMs produce bloated code with subtle bugs, concluding agent mode is not ready to write about half of professional code, source: Andrej Karpathy on X, Oct 3, 2025, https://x.com/karpathy/status/1974106507034964111. He asked for a serious organization to rerun the poll, underscoring uncertainty around actual adoption rates, source: Andrej Karpathy on X, Oct 3, 2025, https://x.com/karpathy/status/1974106507034964111. There was no mention of cryptocurrencies or blockchain in his comments, source: Andrej Karpathy on X, Oct 3, 2025, https://x.com/karpathy/status/1974106507034964111. |
|
2025-10-01 19:22 |
Andrej Karpathy: Tinker Cuts LLM Post-Training Complexity to Under 10% and Keeps 90% Algorithmic Control for Faster Finetuning
According to @karpathy, Tinker allows researchers and developers to retain roughly 90% of algorithmic creative control over data, loss functions, and training algorithms while offloading infrastructure, forward and backward passes, and distributed training to the framework. Source: @karpathy on X, Oct 1, 2025, https://twitter.com/karpathy/status/1973468610917179630 According to @karpathy, Tinker reduces the typical complexity of LLM post-training to well below 10%, positioning it as a lower-friction alternative to common “upload your data, we’ll train your LLM” services. Source: @karpathy on X, Oct 1, 2025, https://twitter.com/karpathy/status/1973468610917179630 According to @karpathy, this “slice” of the post-training workflow both delegates heavy lifting and preserves majority control of data and algorithmic choices, which he views as a more effective trade-off for practitioners. Source: @karpathy on X, Oct 1, 2025, https://twitter.com/karpathy/status/1973468610917179630 According to @karpathy, finetuning is less about stylistic changes and more about narrowing task scope, where fine-tuned smaller LLMs can outperform and run faster than large models prompted with giant few-shot prompts when ample training examples exist. Source: @karpathy on X, Oct 1, 2025, https://twitter.com/karpathy/status/1973468610917179630 According to @karpathy, production LLM applications are increasingly DAG-based pipelines where some steps remain prompt-driven while many components work better as fine-tuned models, and Tinker makes these finetunes trivial for rapid experimentation. Source: @karpathy on X, Oct 1, 2025, https://twitter.com/karpathy/status/1973468610917179630; supporting reference: Thinky Machines post, https://x.com/thinkymachines/status/1973447428977336578 |
|
2025-10-01 17:09 |
Andrej Karpathy on Sutton’s Bitter Lesson: LLM Scaling Limits, RL-First Agents, and the AI Trading Narrative to Watch
According to @karpathy, Richard Sutton questions whether LLMs are truly bitter-lesson‑pilled because they depend on finite, human-generated datasets that embed bias, challenging the idea that performance can scale indefinitely with more compute and data, source: @karpathy. Sutton advocates a classic RL-first architecture that learns through world interaction without giant supervised pretraining or human teleoperation, emphasizing intrinsic motivation such as fun, curiosity, and prediction-quality rewards, source: @karpathy. He highlights that agents should continue learning at test time by default rather than being trained once and deployed statically, source: @karpathy. Karpathy notes that while AlphaZero shows pure RL can surpass human-initialized systems (AlphaGo), Go is a closed, simplified domain, whereas frontier LLMs rely on human text to initialize billions of parameters before pervasive RL fine-tuning, framing pretraining as "crappy evolution" to solve cold start, source: @karpathy. He adds that today’s LLMs are heavily engineered by humans across pretraining, curation, and RL environments, and the field may not be sufficiently bitter‑lesson‑pilled, source: @karpathy. Actionably, he cites directions like intrinsic motivation, curiosity, empowerment, multi‑agent self‑play, and culture as areas for further work beyond benchmaxxing, positioning the AI‑agent path as an active research narrative, source: @karpathy. |
|
2025-09-25 14:29 |
Karpathy: AI isn't replacing radiologists - 4 key realities, Jevons paradox, and takeaways for AI crypto narratives
According to @karpathy, earlier predictions that computer vision would quickly eliminate radiology jobs have not materialized, with the field growing rather than shrinking. Source: @karpathy on X, Sep 25, 2025. According to @karpathy, the reasons include narrow benchmarks that miss real-world complexity, the multifaceted scope of radiology beyond image recognition, deployment frictions across regulation, insurance and liability, and institutional inertia. Source: @karpathy on X, Sep 25, 2025. According to @karpathy, Jevons paradox applies as AI tools speed up radiologists, increasing total demand for reads rather than reducing it. Source: @karpathy on X, Sep 25, 2025. According to @karpathy, AI is likely to be adopted first as a tool that shifts work toward monitoring and supervision, while jobs composed of short, rote, independent, closed, and forgiving tasks are more likely to change sooner. Source: @karpathy on X, Sep 25, 2025. For traders, this framing highlights gradual AI integration and expanding workloads in regulated, high-risk domains, a narrative relevant to AI-linked equities and AI-themed crypto projects tied to compute utilization. Source: @karpathy on X, Sep 25, 2025. Full post reference is the Works in Progress article shared by @karpathy. Source: @karpathy on X, Sep 25, 2025. |
|
2025-09-13 16:08 |
Andrej Karpathy References GSM8K (2021) on X: AI Benchmark Signal and What Crypto Traders Should Watch
According to @karpathy, he resurfaced a paragraph from the 2021 GSM8K paper in a Sep 13, 2025 X post, highlighting ongoing attention to LLM reasoning evaluation (source: Andrej Karpathy, X post on Sep 13, 2025). GSM8K is a grade‑school math word‑problem benchmark designed to assess multi‑step reasoning in language models, making it a primary metric for tracking verified reasoning improvements (source: Cobbe et al., GSM8K paper, 2021). Because the post does not announce a new model, dataset, or benchmark score, there is no immediate, verifiable trading catalyst for AI‑linked crypto assets at this time (source: Andrej Karpathy, X post on Sep 13, 2025). Traders should wait for measurable GSM8K score gains or product release notes before positioning, as GSM8K is specifically used to quantify reasoning progress (source: Cobbe et al., GSM8K paper, 2021). |
|
2025-09-09 15:36 |
Apple Event 2025 Livestream at 10am: Key Time Cue for AAPL Traders Watching New iPhones
According to @karpathy, Apple’s iPhone event livestream is scheduled today at 10am, roughly 1.5 hours after his post time, giving AAPL traders a precise headline window to plan event-driven setups (source: @karpathy on X, Sep 9, 2025). He also notes he has watched every annual iPhone reveal since 2007 and hopes for an iPhone mini, though he does not expect it to appear (source: @karpathy on X, Sep 9, 2025). No cryptocurrencies are mentioned in the post, so there are no direct crypto-market cues from this source ahead of the stream (source: @karpathy on X, Sep 9, 2025). |
|
2025-09-05 17:38 |
Andrej Karpathy Praises OpenAI GPT-5 Pro Code Generation: Key Trading Signals for AI and Crypto Markets
According to @karpathy, OpenAI’s GPT-5 Pro solved a complex coding task by returning working code after about 10 minutes, following roughly an hour of intermittent attempts with “CC” that did not succeed, indicating a strong qualitative performance on difficult problems. Source: @karpathy (X, Sep 5, 2025). He adds that he had “CC” read the GPT-5 Pro output and it produced two paragraphs admiring the solution, reinforcing his positive assessment of GPT-5 Pro’s code-generation quality. Source: @karpathy (X, Sep 5, 2025). The post offers developer-level endorsement of GPT-5 Pro’s coding capability but provides no market reaction, price action, or product release details, so traders should treat it as a sentiment data point rather than a quantitative catalyst. Source: @karpathy (X, Sep 5, 2025). |
|
2025-08-28 18:07 |
Karpathy Flags LLM-First Data Interfaces: 5 Crypto Infrastructure Plays to Watch (RNDR, FIL, AR, GRT, FET)
According to @karpathy, transforming human knowledge, sensors, and actuators from human-first to LLM-first and LLM-legible interfaces is a high-potential area, with the example that every textbook PDF/EPUB could map to a perfect machine-legible representation for AI agents. Source: x.com/karpathy/status/1961128638725923119 For traders, this theme implies increased need for decentralized, scalable storage of machine-readable corpora, aligning with Filecoin’s content-addressed storage and retrieval model and Arweave’s permanent data storage guarantees. Sources: x.com/karpathy/status/1961128638725923119; docs.filecoin.io; docs.arweave.org LLM-first pipelines also require indexing and semantic querying layers, mirroring The Graph’s subgraph architecture that makes structured data queryable for applications. Sources: x.com/karpathy/status/1961128638725923119; thegraph.com/docs Serving and training LLMs and agentic workloads depend on distributed GPU compute, directly mapped to Render Network’s decentralized GPU marketplace. Sources: x.com/karpathy/status/1961128638725923119; docs.rendernetwork.com Agentic interaction with sensors/actuators points to on-chain agent frameworks and microtransaction rails, a design space covered by Fetch.ai’s autonomous agent tooling. Sources: x.com/karpathy/status/1961128638725923119; docs.fetch.ai |
|
2025-08-27 20:34 |
Karpathy: AI Training Shifts From Web Text to Conversational Data — Actionable Implications for Crypto Traders
According to @karpathy, the pretraining era prioritized large, diverse, high‑quality internet text, while the supervised finetuning era prioritizes high‑quality conversational datasets, often produced by contract workers generating Q&A answers. Source: Andrej Karpathy on X, Aug 27, 2025. This shift indicates the bottleneck and value capture are moving toward ownership and production of curated conversational data and scalable labeling capacity, which directly affects where competitive advantage concentrates in AI models. Source: Andrej Karpathy on X, Aug 27, 2025. For crypto markets, the data‑scarcity theme aligns with on‑chain narratives around decentralized data curation and monetization, making data‑focused AI‑crypto segments a relevant area to monitor for liquidity and catalyst flow. Source: Andrej Karpathy on X, Aug 27, 2025. |
|
2025-08-24 19:46 |
Andrej Karpathy Reveals 75% Bread-and-Butter LLM Coding Flow and Diversified Workflows — Signal for AI Traders in 2025
According to @karpathy, his LLM-assisted coding usage is diversifying across multiple workflows that he stitches together rather than relying on a single perfect setup, source: @karpathy on X, Aug 24, 2025. He notes a primary bread-and-butter flow accounts for roughly 75 percent of his usage, indicating a dominant main pipeline supplemented by secondary workflows, source: @karpathy on X, Aug 24, 2025. The post frames this as part of his ongoing pursuit of an optimal LLM-assisted coding experience, source: @karpathy on X, Aug 24, 2025. The post does not name any tools, products, benchmarks, tickers, or cryptocurrencies and provides no quantitative performance data or market impact, source: @karpathy on X, Aug 24, 2025. |
|
2025-08-09 16:53 |
Andrej Karpathy flags LLMs becoming too agentic by default due to benchmarkmaxxing, extending coding reasoning time — trader takeaway
According to Andrej Karpathy, LLMs are becoming a little too agentic by default as optimization for long-horizon benchmarks increases, with coding examples where models now reason for a fairly long time by default, source: Andrej Karpathy, X, Aug 9, 2025. According to Andrej Karpathy, this default behavior goes beyond his average use case, indicating a practitioner preference for shorter, more controllable reasoning in everyday coding, source: Andrej Karpathy, X, Aug 9, 2025. According to Andrej Karpathy, the post provides qualitative practitioner sentiment without quantitative metrics, vendor references, or any mention of cryptocurrencies or equities, so it does not signal direct near-term market impact on AI stocks or crypto AI tokens, source: Andrej Karpathy, X, Aug 9, 2025. |