TiKZ AI News List

Time	Details
2026-04-23 19:09	GPT-5.5 Nears TikZ Unicorn Benchmark: Latest Analysis on Multimodal Reasoning and Code Generation According to Sam Altman on X, citing a post by Sebastien Bubeck, GPT-5.5 is getting very close to fully passing the community “TikZ unicorn” test, a challenging LaTeX TikZ rendering benchmark that stresses visual-spatial reasoning and code synthesis. As reported by Sebastien Bubeck on X, the model produced runnable TikZ code for the unicorn figure, enabling independent verification and signaling stronger symbolic reasoning and structured code generation. According to the original X posts, this progress suggests improved multimodal alignment and geometry-aware planning that could accelerate enterprise use cases in technical documentation, automated plotting, scientific publishing workflows, and CAD-adjacent diagram generation. As reported by the same sources, while GPT-5.5 has not fully saturated the benchmark, its near-pass rate indicates practical gains for developer tooling, LaTeX automation, and data visualization assistants where reproducible vector graphics matter. Source
2026-04-21 02:10	Kimi 2.6 Thinking Analysis: Open-Weights Reasoning, 74-Page Trace, and Coding Demos vs Closed-Source SoTA According to Ethan Mollick on X, Kimi 2.6 Thinking shows strong open-weights reasoning capabilities but still trails closed-source state-of-the-art, producing a 74-page thinking trace on the Lem Test with only an adequate final answer, plus competent TiKZ and twigl outputs (source: Ethan Mollick). As reported by Ethan Mollick, these results suggest Kimi’s chain-of-thought style traceability and reproducibility may aid enterprise auditability, while gaps in final-answer quality indicate teams should benchmark Kimi 2.6 Thinking against closed models for mission-critical reasoning and code synthesis. According to Ethan Mollick, the model generated an acceptable TiKZ unicorn and a serviceable twigl shader for a neo-gothic city in waves, implying practical utility for technical graphics prototyping but highlighting rough edges in polish and accuracy compared to premium closed models. Source
2026-04-16 20:47	Claude Opus 4.7 Shows Breakthrough TikZ Drawing Skills: Best ‘Sparks of AGI’ Unicorn Yet According to Ethan Mollick on Twitter, Anthropic’s Claude Opus 4.7 now generates the strongest TikZ-based “Sparks unicorn” to date, outperforming prior attempts even without deliberate chain-of-thought, and performing exceptionally when it does reason (source: Ethan Mollick, Twitter, Apr 16, 2026). As reported by Mollick, the unicorn is rendered in TikZ—a LaTeX diagram language not intended for free-form artwork—mirroring the original Sparks of AGI evaluation where a model’s ability to draw a primitive unicorn signaled emergent capabilities (source: Ethan Mollick, Twitter; Microsoft Research, “Sparks of Artificial General Intelligence,” 2023). According to Microsoft Research, the unicorn task probes compositional reasoning and programmatic graphics generation, which are relevant for enterprise automation of technical documentation, scientific figures, and reproducible visualization workflows in LaTeX (source: Microsoft Research, 2023). For businesses, improved TikZ code synthesis suggests near-term productivity gains in scientific publishing, data-heavy reports, and developer tooling where LLMs convert natural language into maintainable vector-graphic code, reducing designer handoff time and enabling version-controlled diagrams at scale (source: Ethan Mollick, Twitter; Microsoft Research, 2023). Source
2026-04-09 00:51	Gemini 3.1 Recreates ‘Sparks’ Unicorn in TikZ: Latest Analysis on Multimodal Reasoning Capabilities According to Ethan Mollick on X, Google’s Gemini 3.1 generated a recognizable unicorn drawing using TikZ, a scientific diagramming language not optimized for illustration, echoing the original “Sparks of AGI” benchmark where a primitive unicorn drawing signaled unexpected abilities (as reported by Ethan Mollick, citing the Gemini 3.1 output). According to Mollick, the successful TikZ rendering highlights Gemini 3.1’s code synthesis and visual reasoning coordination, which are key for enterprise use cases like programmatic graphics, LaTeX automation, and data visualization workflows. As reported by Mollick, reproducing this historical benchmark suggests improved instruction following, tool use, and compositional generalization, creating business opportunities in document automation, technical publishing, and CAD-adjacent graphics where deterministic text-to-diagram generation is valuable. Source
2026-03-12 01:47	Hunter Alpha on OpenRouter: Early Performance Analysis with Lem Test and TiKZ Benchmarks According to Ethan Mollick on X, the new Hunter Alpha model on OpenRouter shows only average early performance, with examples from the Lem Test and the Sparks TiKZ unicorn illustrating mixed reasoning and code-generation quality. As reported by Ethan Mollick, these ad hoc benchmarks suggest Hunter Alpha lags top-tier frontier models in structured reasoning and precise LaTeX TiKZ rendering, which may limit enterprise adoption for high-stakes tasks. According to OpenRouter’s model marketplace listings, rapid iteration and community evaluation can inform fine-tuning priorities for reasoning, tool use, and reproducible diagram generation, creating opportunities for developers to position Hunter Alpha for education tooling, lightweight document automation, and diagram prototyping if reliability improves. Source

2026-04-23
19:09

GPT-5.5 Nears TikZ Unicorn Benchmark: Latest Analysis on Multimodal Reasoning and Code Generation

According to Sam Altman on X, citing a post by Sebastien Bubeck, GPT-5.5 is getting very close to fully passing the community “TikZ unicorn” test, a challenging LaTeX TikZ rendering benchmark that stresses visual-spatial reasoning and code synthesis. As reported by Sebastien Bubeck on X, the model produced runnable TikZ code for the unicorn figure, enabling independent verification and signaling stronger symbolic reasoning and structured code generation. According to the original X posts, this progress suggests improved multimodal alignment and geometry-aware planning that could accelerate enterprise use cases in technical documentation, automated plotting, scientific publishing workflows, and CAD-adjacent diagram generation. As reported by the same sources, while GPT-5.5 has not fully saturated the benchmark, its near-pass rate indicates practical gains for developer tooling, LaTeX automation, and data visualization assistants where reproducible vector graphics matter.

Source

2026-04-21
02:10

Kimi 2.6 Thinking Analysis: Open-Weights Reasoning, 74-Page Trace, and Coding Demos vs Closed-Source SoTA

According to Ethan Mollick on X, Kimi 2.6 Thinking shows strong open-weights reasoning capabilities but still trails closed-source state-of-the-art, producing a 74-page thinking trace on the Lem Test with only an adequate final answer, plus competent TiKZ and twigl outputs (source: Ethan Mollick). As reported by Ethan Mollick, these results suggest Kimi’s chain-of-thought style traceability and reproducibility may aid enterprise auditability, while gaps in final-answer quality indicate teams should benchmark Kimi 2.6 Thinking against closed models for mission-critical reasoning and code synthesis. According to Ethan Mollick, the model generated an acceptable TiKZ unicorn and a serviceable twigl shader for a neo-gothic city in waves, implying practical utility for technical graphics prototyping but highlighting rough edges in polish and accuracy compared to premium closed models.

Source

2026-04-16
20:47

Claude Opus 4.7 Shows Breakthrough TikZ Drawing Skills: Best ‘Sparks of AGI’ Unicorn Yet

According to Ethan Mollick on Twitter, Anthropic’s Claude Opus 4.7 now generates the strongest TikZ-based “Sparks unicorn” to date, outperforming prior attempts even without deliberate chain-of-thought, and performing exceptionally when it does reason (source: Ethan Mollick, Twitter, Apr 16, 2026). As reported by Mollick, the unicorn is rendered in TikZ—a LaTeX diagram language not intended for free-form artwork—mirroring the original Sparks of AGI evaluation where a model’s ability to draw a primitive unicorn signaled emergent capabilities (source: Ethan Mollick, Twitter; Microsoft Research, “Sparks of Artificial General Intelligence,” 2023). According to Microsoft Research, the unicorn task probes compositional reasoning and programmatic graphics generation, which are relevant for enterprise automation of technical documentation, scientific figures, and reproducible visualization workflows in LaTeX (source: Microsoft Research, 2023). For businesses, improved TikZ code synthesis suggests near-term productivity gains in scientific publishing, data-heavy reports, and developer tooling where LLMs convert natural language into maintainable vector-graphic code, reducing designer handoff time and enabling version-controlled diagrams at scale (source: Ethan Mollick, Twitter; Microsoft Research, 2023).

Source

2026-04-09
00:51

Gemini 3.1 Recreates ‘Sparks’ Unicorn in TikZ: Latest Analysis on Multimodal Reasoning Capabilities

According to Ethan Mollick on X, Google’s Gemini 3.1 generated a recognizable unicorn drawing using TikZ, a scientific diagramming language not optimized for illustration, echoing the original “Sparks of AGI” benchmark where a primitive unicorn drawing signaled unexpected abilities (as reported by Ethan Mollick, citing the Gemini 3.1 output). According to Mollick, the successful TikZ rendering highlights Gemini 3.1’s code synthesis and visual reasoning coordination, which are key for enterprise use cases like programmatic graphics, LaTeX automation, and data visualization workflows. As reported by Mollick, reproducing this historical benchmark suggests improved instruction following, tool use, and compositional generalization, creating business opportunities in document automation, technical publishing, and CAD-adjacent graphics where deterministic text-to-diagram generation is valuable.

Source

2026-03-12
01:47

Hunter Alpha on OpenRouter: Early Performance Analysis with Lem Test and TiKZ Benchmarks

According to Ethan Mollick on X, the new Hunter Alpha model on OpenRouter shows only average early performance, with examples from the Lem Test and the Sparks TiKZ unicorn illustrating mixed reasoning and code-generation quality. As reported by Ethan Mollick, these ad hoc benchmarks suggest Hunter Alpha lags top-tier frontier models in structured reasoning and precise LaTeX TiKZ rendering, which may limit enterprise adoption for high-stakes tasks. According to OpenRouter’s model marketplace listings, rapid iteration and community evaluation can inform fine-tuning priorities for reasoning, tool use, and reproducible diagram generation, creating opportunities for developers to position Hunter Alpha for education tooling, lightweight document automation, and diagram prototyping if reliability improves.

Source

List of AI News about TiKZ