List of Flash News about karpathy
| Time | Details |
|---|---|
|
2025-11-23 18:03 |
Andrej Karpathy Demo: Gemini Nano Banana Pro Solves Exam Image Questions in Real-World Test; Traders Watch GOOGL and AI Tokens RNDR, FET
According to @karpathy, Gemini Nano Banana Pro solved chemistry exam questions directly from an image of the exam page, correctly parsing doodles and diagrams, with ChatGPT later judging the answers correct except for a nomenclature fix on Se2P2 and a spelling correction for thiocyanic acid, source: Andrej Karpathy on X, Nov 23, 2025. The demo evidences in-image multimodal parsing and reasoning on dense document layouts, which aligns with Google’s Gemini family positioning and the inclusion of Nano in the product lineup, source: Andrej Karpathy on X, Nov 23, 2025; Google DeepMind Gemini introduction, Dec 2023. Historically, prominent AI capability reveals have coincided with rotations into AI-linked crypto assets such as RNDR and FET and related equities after major AI news, source: Reuters reporting on AI token rallies during the ChatGPT surge in Feb 2023 and after Nvidia earnings in May 2024. Traders may watch Alphabet GOOGL and AI infrastructure tokens for narrative momentum if this demo draws broader attention, while noting the accuracy risk highlighted by the Se2P2 naming and spelling errors, source: Andrej Karpathy on X, Nov 23, 2025; Reuters Feb 2023 and May 2024. |
|
2025-11-22 23:54 |
Andrej Karpathy unveils llm-council open-source multi-LLM ensemble via OpenRouter; GPT-5.1 ranked highest by peers, Claude lowest
According to @karpathy, he released an open-source llm-council web app that dispatches each user query to multiple models via OpenRouter, lets models review and rank anonymized responses, and then a Chairman LLM produces the final answer, detailing a concrete multi-LLM ensemble workflow. Source: @karpathy on X. According to @karpathy, the current council includes openai/gpt-5.1, google/gemini-3-pro-preview, anthropic/claude-sonnet-4.5, and x-ai/grok-4, providing side-by-side outputs and rankings across OpenAI, Google, Anthropic, and xAI model families. Source: @karpathy on X. According to @karpathy, cross-model evaluation frequently selects another model’s response as superior, highlighting a practical peer-review method for model selection and ranking. Source: @karpathy on X. According to @karpathy, in his reading tests the models consistently praised GPT-5.1 as the best and most insightful and consistently selected Claude as the worst, with Gemini 3 Pro and Grok-4 in between, while his qualitative take found GPT-5.1 wordy, Gemini 3 more condensed, and Claude too terse. Source: @karpathy on X. According to @karpathy, the code is publicly available for others to try on GitHub under the llm-council repository. Source: @karpathy on X and @karpathy on GitHub. According to @karpathy, the post does not mention cryptocurrencies, tokens, or blockchains, and provides no direct crypto market claims. Source: @karpathy on X. |
|
2025-11-22 02:11 |
Andrej Karpathy seeks quantitative definition of AI 'slop' and a measurable 'slop index' using LLM miniseries and thinking token budgets for evaluation
According to @karpathy, he is seeking a quantitative, measurable definition of AI 'slop' and notes he has an intuitive 'slop index' but lacks a formal metric. Source: @karpathy on X, Nov 22, 2025. According to @karpathy, potential approaches he is considering include using LLM miniseries and analyzing thinking token budgets to quantify output quality and cost. Source: @karpathy on X, Nov 22, 2025. For traders in AI and crypto-adjacent markets, this post highlights an active gap in standardized LLM quality metrics that directly ties to model evaluation and cost controls, which are key inputs for pricing and benchmarking AI products. Source: @karpathy on X, Nov 22, 2025. |
|
2025-11-21 16:43 |
Andrej Karpathy on AI Intelligence Diversity: No Direct Crypto Trading Catalyst for Markets
According to @karpathy, the space of intelligences is large and animal intelligence is only a single point arising from a specific optimization process fundamentally distinct from that of artificial systems. Source: @karpathy on X, Nov 21, 2025. The post is conceptual and provides no product announcements, model releases, datasets, performance metrics, timelines, or any crypto asset or token mentions, indicating no direct trading catalyst for crypto or equities. Source: @karpathy on X, Nov 21, 2025. For crypto market context, this statement aligns with the broader AI agents and autonomous intelligence narrative, but the source offers no on-chain, protocol, or market data. Source: @karpathy on X, Nov 21, 2025. |
|
2025-11-18 00:29 |
Andrej Karpathy details 3-pass LLM reading workflow and shift toward writing for LLMs
According to @karpathy, he now reads blogs, articles, and book chapters using a three-pass LLM workflow: pass 1 manual reading, pass 2 explain and summarize, and pass 3 Q&A, which he says yields a deeper understanding than moving on, source: @karpathy on X, Nov 18, 2025. He adds that this habit is growing into one of his top LLM use cases, source: @karpathy on X, Nov 18, 2025. He also states that writers may increasingly write for an LLM so the model first internalizes the idea and then targets, personalizes, and serves it to users, source: @karpathy on X, Nov 18, 2025. The post does not mention cryptocurrencies or trading signals, indicating any crypto market relevance would be indirect via LLM usage patterns in content consumption and personalization, source: @karpathy on X, Nov 18, 2025. |
|
2025-11-17 18:56 |
Crypto Trading Discipline: Andrej Karpathy Urges Principles Over Galaxy Brain Rationalization with 2 Actionable Strategies for Volatile Markets
According to @karpathy, traders should prioritize rule-based principles and avoid post-hoc galaxy brain justifications, citing two actionable strategies: have principles and hold the right bags, financially and socially; source: @karpathy on X, Nov 17, 2025; x.com/VitalikButerin/status/1986906940472238108. According to @karpathy, applying constraint-based rules akin to simple guardrails is preferable to flexible utility calculus, reinforcing disciplined entries, position sizing, and clear no-trade conditions during volatility; source: @karpathy on X, Nov 17, 2025. According to @karpathy, aligning positions with long-term conviction and social capital helps avoid rotating into narratives you cannot defend under stress, supporting consistent execution in crypto markets; source: @karpathy on X, Nov 17, 2025. |
|
2025-11-16 17:56 |
AI Software 2.0 and Verifiability: Trading Implications for Crypto Markets (BTC, ETH) from @karpathy in 2025
According to @karpathy, AI should be viewed as Software 2.0 that optimizes programs against explicit objectives, making task verifiability the primary predictor of automation readiness, source: @karpathy on X, Nov 16, 2025. He states that verifiable tasks are those with resettable environments, efficient iteration, and automated rewards, enabling gradient descent or reinforcement learning to practice at scale, source: @karpathy on X, Nov 16, 2025. He adds that such tasks progress rapidly and can surpass top experts in domains like math and code, while creative and context-heavy tasks lag, source: @karpathy on X, Nov 16, 2025. Interpreted for trading, crypto workflows with clear, checkable outcomes such as strategy backtests, execution slippage minimization, market making simulations, and on-chain anomaly detection align with the verifiable category and are thus more automatable under this framework, source: interpretation based on @karpathy on X, Nov 16, 2025. Conversely, discretionary macro narratives and multi-step fundamental synthesis without fast feedback are less automatable near term, shaping where AI edges may emerge across BTC and ETH trading pipelines, source: interpretation based on @karpathy on X, Nov 16, 2025. |
|
2025-11-13 21:12 |
Self-Driving Will Reshape Cities: Andrej Karpathy’s 2025 Call and 5 Trading Takeaways for AI Crypto Tokens (FET, RNDR, AGIX, OCEAN)
According to @karpathy, self-driving will cut parked cars and parking lots, improve safety, reduce noise, reclaim urban space, and enable cheaper programmable delivery, framing a step-change in real-world automation rather than a gradual tweak, which can act as a sentiment catalyst for AI and robotics narratives in risk assets, including crypto. Source: @karpathy on X, Nov 13, 2025. For traders, the immediate read-through is to watch AI-narrative crypto tokens such as FET, RNDR, AGIX, and OCEAN for potential narrative rotation flows tied to autonomous logistics and edge-AI enthusiasm sparked by this commentary. Source: @karpathy on X, Nov 13, 2025. |
|
2025-11-12 20:28 |
Tesla FSD v13 on HW4 delivers flawless drive reported by @karpathy - TSLA trading takeaways
According to @karpathy, a new HW4 Tesla Model X running FSD v13 completed a smooth, confident highway and city route that handled lane centering, construction detours, tricky left turns, four-way stops, bus overtakes, dense merges, parking, and ended as a perfect drive with no notes, indicating a markedly better experience than HW3. Source: Andrej Karpathy on X, Nov 12, 2025. According to @karpathy, the results reflect FSD v13 on HW4 because his car has not yet received v14, providing a current field-performance reference for traders tracking Tesla’s autonomy progress. Source: Andrej Karpathy on X, Nov 12, 2025. According to @karpathy, progress is driven by an end-to-end long-context neural network that processes surround video at 60 Hz with multimodal sensor streams over roughly 30 seconds, with technical hints attributed to Ashok Elluswamy’s ICCV25 talk. Source: Andrej Karpathy on X, Nov 12, 2025; Ashok Elluswamy on X (ICCV25 talk referenced by Karpathy). According to @karpathy, this firsthand report underscores a material performance gap in favor of HW4 versus HW3 for FSD v13, a datapoint TSLA-focused traders can use when evaluating hardware-driven capability differences in Tesla’s fleet. Source: Andrej Karpathy on X, Nov 12, 2025. According to @karpathy, no cryptocurrencies, blockchain integrations, or digital assets are mentioned in this report, implying no direct crypto market linkage in the update. Source: Andrej Karpathy on X, Nov 12, 2025. |
|
2025-10-26 16:24 |
PyTorch MPS addcmul_ Silent-Failure Bug on Non-Contiguous Tensors Flags AI Training Risk: What Traders Should Watch
According to @karpathy, a detailed debugging investigation traced a suspicious training loss curve to a PyTorch MPS backend issue where addcmul_ silently fails on non-contiguous output tensors in the Objective-C++ path, pointing to a correctness bug that does not throw errors during training; Source: @karpathy on X https://twitter.com/karpathy/status/1982483540899237981 and the referenced thread by @ElanaPearl https://x.com/ElanaPearl/status/1981389648695025849. For AI workflow reliability, this implies Mac Apple MPS-based training can yield incorrect results without explicit runtime alerts, directly impacting the integrity of model training and evaluation pipelines used by practitioners; Source: @karpathy on X https://twitter.com/karpathy/status/1982483540899237981 and @ElanaPearl on X https://x.com/ElanaPearl/status/1981389648695025849. For traders, treat this as a software reliability risk flag within the AI toolchain and monitor official PyTorch or Apple MPS updates and release notes that reference addcmul_ or non-contiguous tensor handling, as confirmed fixes would reduce operational uncertainty around AI workloads that markets track for sentiment; Source: @karpathy on X https://twitter.com/karpathy/status/1982483540899237981 and @ElanaPearl on X https://x.com/ElanaPearl/status/1981389648695025849. |
|
2025-10-24 15:35 |
Karpathy Unveils SpellingBee for nanochat d32: Step-by-Step SFT/RL Finetuning Guide to Add Letter-Counting Capability and Its AI-Token Implications
According to @karpathy, he released a full guide showing how a new synthetic task called SpellingBee teaches nanochat d32 to count letters in words like strawberry by generating user-assistant training pairs and midtraining or SFT finetuning, with optional RL to improve robustness, source: Karpathy X post dated Oct 24, 2025; GitHub nanochat discussion 164. The method stresses diverse user prompts, careful tokenization and whitespace handling, breaking reasoning into multiple tokens by standardizing the word, spelling it out, iterating with an explicit counter, and encouraging two solution paths via manual reasoning and Python tool use, source: Karpathy X post dated Oct 24, 2025; GitHub nanochat discussion 164. Karpathy notes that because nanochat d32 is small, the capability is encouraged by over-representing examples in the dataset, and reliability can be further improved by simulating mistakes in data or running RL, source: Karpathy X post dated Oct 24, 2025; GitHub nanochat discussion 164. For traders, open-source progress on small LLM tooling has coincided with episodic attention flows to AI-linked crypto assets such as RNDR, FET, and AGIX around major AI catalysts, with Kaiko reporting AI token rallies around Nvidia earnings in 2024, source: Kaiko Research 2024 weekly market reports; Nvidia 2024 earnings releases. No token or product launch is included here; this is a technical training guide and example set for capability injection into a small LLM, source: Karpathy X post dated Oct 24, 2025; GitHub nanochat discussion 164. |
|
2025-10-21 15:59 |
Andrej Karpathy Unveils nanochat d32: $800 Synthetic-Data Custom LLM Identity and Script Release, Key Signals for AI Agent Builders
According to @karpathy, nanochat now carries a defined identity and can state its capabilities, including that it is nanochat d32 built by him with a reported $800 cost and weaker non-English proficiency, achieved via synthetic data generation, source: x.com/karpathy/status/1980508380860150038. He released an example script that demonstrates generating diverse synthetic conversations and mixing them into mid-training or SFT, stressing the importance of entropy to avoid repetitive datasets, source: x.com/karpathy/status/1980508380860150038. He adds that base LLMs lack inherent personality or self-knowledge and require explicitly bolted-on traits via curated synthetic data, source: x.com/karpathy/status/1980508380860150038. For traders, the disclosed $800 customization benchmark and open-source workflow provide concrete cost and process reference points for evaluating open-source AI agent development and adoption paths across AI-linked tokens and AI-exposed equities, source: twitter.com/karpathy/status/1980665134415802554. |
|
2025-10-20 22:13 |
Andrej Karpathy: DeepSeek-OCR Signals 4 Reasons Pixels May Beat Text Tokens for LLM Inputs — Efficiency, Shorter Context Windows, Bidirectional Attention, No Tokenizer
According to Andrej Karpathy, the DeepSeek-OCR paper is a strong OCR model and more importantly highlights why pixels might be superior to text tokens as inputs to large language models, emphasizing model efficiency and input fidelity, source: Andrej Karpathy on X, Oct 20, 2025. He states that rendering text to images and feeding pixels can deliver greater information compression, enabling shorter context windows and higher efficiency, source: Andrej Karpathy on X, Oct 20, 2025. He adds that pixel inputs provide a more general information stream that preserves formatting such as bold and color and allows arbitrary images alongside text, source: Andrej Karpathy on X, Oct 20, 2025. He argues that image inputs enable bidirectional attention by default instead of autoregressive attention at the input stage, which he characterizes as more powerful for processing, source: Andrej Karpathy on X, Oct 20, 2025. He advocates removing the tokenizer at input due to the complexity and risks of Unicode and byte encodings, including security or jailbreak issues such as continuation bytes and semantic mismatches for emojis, source: Andrej Karpathy on X, Oct 20, 2025. He frames OCR as one of many vision-to-text tasks and suggests many text-to-text tasks can be reframed as vision-to-text, while the reverse is not generally true, source: Andrej Karpathy on X, Oct 20, 2025. He proposes a practical setup where user messages are images while the assistant response remains text and notes outputting pixels is less obvious, and he mentions an urge to build an image-input-only version of nanochat while referencing the vLLM project, source: Andrej Karpathy on X, Oct 20, 2025. |
|
2025-10-20 18:58 |
Karpathy on Text Diffusion for LLMs (2025): Bidirectional Attention Raises Training Cost vs Autoregression
According to @karpathy, text diffusion for language can be implemented with a vanilla transformer using bidirectional attention that iteratively re-masks and re-samples all tokens on a noise schedule. Source: @karpathy. He states diffusion is the pervasive generative paradigm in image and video, while autoregression remains dominant in text and audio shows a mix of both. Source: @karpathy. He adds that removing heavy formalism reveals simple baseline algorithms, with discrete diffusion closer to flow matching in continuous settings. Source: @karpathy. He explains that autoregression appends tokens while attending backward, whereas diffusion refreshes the entire token canvas while attending bidirectionally. Source: @karpathy. He notes bidirectional attention yields stronger language models but makes training more expensive because sequence dimension parallelization is not possible. Source: @karpathy. He suggests it may be possible to interpolate or generalize between diffusion and autoregression in the LLM stack. Source: @karpathy. For traders, the actionable takeaway is the compute cost trade-off of bidirectional text diffusion versus autoregression, which directly affects training efficiency assumptions. Source: @karpathy. |
|
2025-10-18 20:23 |
Karpathy’s Decade of Agents: 10-Year AGI Timeline, RL Skepticism, and Security-First LLM Tools for Crypto Builders and Traders
According to @karpathy, AGI is on roughly a 10-year horizon he describes as a decade of agents, citing major remaining work in integration, real-world sensors and actuators, societal alignment, and security, and noting his timeline is 5-10x more conservative than prevailing hype, source: @karpathy on X, Oct 18, 2025. He is long agentic interaction but skeptical of reinforcement learning due to poor signal-to-compute efficiency and noise, and he highlights alternative learning paradigms such as system prompt learning with early deployed examples like ChatGPT memory, source: @karpathy on X, Oct 18, 2025. He urges collaborative, verifiable LLM tooling over fully autonomous code-writing agents and warns that overshooting capability can accumulate slop and increase vulnerabilities and security breaches, source: @karpathy on X, Oct 18, 2025. He advocates building a cognitive core by reducing memorization to improve generalization and expects models to get larger before they can get smaller, source: @karpathy on X, Oct 18, 2025. He also contrasts LLMs as ghost-like entities prepackaged via next-token prediction with animals prewired by evolution, and suggests making models more animal-like over time, source: @karpathy on X, Oct 18, 2025. For crypto builders and traders, this points to prioritizing human-in-the-loop agent workflows, code verification, memory-enabled tooling, and security-first integrations over promises of fully autonomous AGI, especially where software defects and vulnerabilities carry on-chain risk, source: @karpathy on X, Oct 18, 2025. |
|
2025-10-16 00:14 |
Karpathy Unveils $1,000 nanochat d32: 33-Hour Train, CORE 0.31, GSM8K 20% — Watch AI Compute Tokens RNDR, AKT, TAO
According to @karpathy, the depth-32 nanochat d32 trained for about 33 hours at roughly $1,000 and showed consistent metric gains across pretraining, SFT, and RL (Source: Karpathy on X; Karpathy GitHub nanochat discussion). He reports a CORE score of 0.31 versus GPT-2 at about 0.26 and GSM8K improvement from around 8% to about 20%, indicating a notable uplift for a micro model (Source: Karpathy on X; Karpathy GitHub nanochat discussion). He cautions that nanochat costs $100–$1,000 to train and the $100 version is about 1/1000th the size of GPT-3, leading to frequent hallucinations and limited reliability compared to frontier LLMs, so user expectations should remain modest (Source: Karpathy on X). He adds that scripts including run1000 sh are available in the repo, he is temporarily hosting the model for testing, and he plans throughput tuning before possibly scaling to a larger tier (Source: Karpathy on X; Karpathy GitHub repository). For traders, decentralized GPU networks that market AI workload support such as Render (RNDR), Akash (AKT), and Bittensor (TAO) remain key watchlist names as open-source, low-cost training expands developer experimentation (Source: Render Network documentation; Akash Network documentation; Bittensor documentation). |
|
2025-10-13 15:16 |
Andrej Karpathy Releases nanochat: Train a ChatGPT-Style LLM in 4 Hours for about $100 on 8x H100, Setting Clear GPU Cost Benchmarks for Traders
According to @karpathy, nanochat is a minimal from-scratch full-stack pipeline that lets users train and serve a simple ChatGPT-like LLM via a single script on a cloud GPU and converse with it in a web UI in about 4 hours, enabling an end-to-end training and inference workflow. source: @karpathy. He specifies the codebase has about 8,000 lines and includes tokenizer training in Rust, pretraining on FineWeb with CORE evaluation, midtraining on SmolTalk and multiple-choice data with tool use, supervised fine-tuning, optional RL on GSM8K via GRPO, and an inference engine with KV cache, Python tool use, CLI, a ChatGPT-like web UI, plus an auto report card. source: @karpathy. Disclosed cost and timing benchmarks are about $100 for roughly 4 hours on an 8x H100 node and about $1000 for about 41.6 hours, with a 24-hour depth-30 run reaching MMLU in the 40s, ARC-Easy in the 70s, and GSM8K in the 20s. source: @karpathy. From these figures, the implied compute rate is roughly $3.1 per H100-hour (about $100 across 32 H100-hours) and about $3.0 per H100-hour at the longer run (about $1000 across 332.8 H100-hours), providing concrete GPU-hour cost benchmarks for trading models of AI training spend. source: @karpathy. He also notes that around 12 hours surpasses GPT-2 on the CORE metric and that capability improves with more training, positioning nanochat as a transparent strong-baseline stack and the capstone for LLM101n with potential as a research harness. source: @karpathy. For crypto market participants tracking AI infrastructure, these cost-performance disclosures offer reference points to assess demand for centralized cloud and decentralized GPU compute tied to open-source LLM training workflows. source: @karpathy. |
|
2025-10-09 00:10 |
Andrej Karpathy flags RLHF flaw: LLMs fear exceptions and calls for reward redesign in RL training
According to Andrej Karpathy, current reinforcement learning practices make LLMs mortally terrified of exceptions, and he argues exceptions are a normal part of a healthy development process, as stated on Twitter on Oct 9, 2025. Karpathy urged the community to sign his LLM welfare petition to improve rewards in cases of exceptions, as stated on Twitter on Oct 9, 2025. The post includes no references to cryptocurrencies, tokens, or market data, indicating no direct market update from the source, as stated on Twitter on Oct 9, 2025. |
|
2025-10-03 13:37 |
Karpathy: LLM Agent Coding Not Ready for Half of Professional Work Despite ~50% ‘Mostly Agent’ Poll Signal
According to Andrej Karpathy, an X poll he referenced showed roughly half of respondents reporting they mostly use agent‑mode coding, contrary to his expectation of 50 percent tab‑complete, 30 percent manual, 20 percent agent, source: Andrej Karpathy on X, Oct 3, 2025, https://x.com/karpathy/status/1974106507034964111; poll link https://x.com/karpathy/status/1973892769359056997. He states his own workflow is primarily tab completion and he turns it off when not useful, using agents mainly for boilerplate or unfamiliar stacks with substantial review and edits, source: Andrej Karpathy on X, Oct 3, 2025, https://x.com/karpathy/status/1974106507034964111. He warns that when tasks are deep, tangled, or off the data manifold, LLMs produce bloated code with subtle bugs, concluding agent mode is not ready to write about half of professional code, source: Andrej Karpathy on X, Oct 3, 2025, https://x.com/karpathy/status/1974106507034964111. He asked for a serious organization to rerun the poll, underscoring uncertainty around actual adoption rates, source: Andrej Karpathy on X, Oct 3, 2025, https://x.com/karpathy/status/1974106507034964111. There was no mention of cryptocurrencies or blockchain in his comments, source: Andrej Karpathy on X, Oct 3, 2025, https://x.com/karpathy/status/1974106507034964111. |
|
2025-10-01 19:22 |
Andrej Karpathy: Tinker Cuts LLM Post-Training Complexity to Under 10% and Keeps 90% Algorithmic Control for Faster Finetuning
According to @karpathy, Tinker allows researchers and developers to retain roughly 90% of algorithmic creative control over data, loss functions, and training algorithms while offloading infrastructure, forward and backward passes, and distributed training to the framework. Source: @karpathy on X, Oct 1, 2025, https://twitter.com/karpathy/status/1973468610917179630 According to @karpathy, Tinker reduces the typical complexity of LLM post-training to well below 10%, positioning it as a lower-friction alternative to common “upload your data, we’ll train your LLM” services. Source: @karpathy on X, Oct 1, 2025, https://twitter.com/karpathy/status/1973468610917179630 According to @karpathy, this “slice” of the post-training workflow both delegates heavy lifting and preserves majority control of data and algorithmic choices, which he views as a more effective trade-off for practitioners. Source: @karpathy on X, Oct 1, 2025, https://twitter.com/karpathy/status/1973468610917179630 According to @karpathy, finetuning is less about stylistic changes and more about narrowing task scope, where fine-tuned smaller LLMs can outperform and run faster than large models prompted with giant few-shot prompts when ample training examples exist. Source: @karpathy on X, Oct 1, 2025, https://twitter.com/karpathy/status/1973468610917179630 According to @karpathy, production LLM applications are increasingly DAG-based pipelines where some steps remain prompt-driven while many components work better as fine-tuned models, and Tinker makes these finetunes trivial for rapid experimentation. Source: @karpathy on X, Oct 1, 2025, https://twitter.com/karpathy/status/1973468610917179630; supporting reference: Thinky Machines post, https://x.com/thinkymachines/status/1973447428977336578 |