Cuda News

Cuda

ParallelKernelBench Exposes LLM Weakness in Multi-GPU Kernels

ParallelKernelBench shows GPT-5.5 and peers struggle with multi-GPU CUDA kernels, solving less than 31% of tasks. Here's why it matters.

by Rebeca Moen
Jun 24, 2026

Cuda

NVIDIA Introduces CCCL Runtime to Modernize CUDA Development

NVIDIA's CCCL Runtime brings modern C++ abstractions to CUDA, enabling safer, more efficient GPU programming for developers.

by Luisa Crawford
Jun 23, 2026

Cuda

Nvidia's New MoE Kernels Promise 93% Speedup for AI Training

Nvidia unveils advanced MoE training kernels, boosting AI model throughput by up to 93% in GPT pre-training and redefining large-scale efficiency.

by Rongchai Wang
Jun 16, 2026

Cuda

NVIDIA CUDA 13.3 Brings Tile Programming to C++

NVIDIA CUDA 13.3 introduces tile-based GPU programming in C++, optimizing Tensor Core use and simplifying kernel development.

by Luisa Crawford
May 27, 2026

Cuda

NVIDIA Unveils CompileIQ to Maximize GPU Kernel Performance

NVIDIA's AI-powered CompileIQ optimizes GPU kernel performance using evolutionary algorithms, enabling up to 15% gains in critical AI workloads.

by Peter Zhang
May 27, 2026

Cuda

NVIDIA CUDA 13.3 Boosts GPU Programming with Tile C++ and Python

NVIDIA CUDA 13.3 introduces Tile C++ programming, Python updates, and CompileIQ, delivering up to 15% kernel speedups and enhancing GPU development.

by James Ding
May 27, 2026

Cuda

NVIDIA Nsight Tools Slash Vision AI Decode Times by 85% in New VC-6 Batch Mode

NVIDIA's optimized VC-6 batch mode achieves submillisecond 4K image decoding, delivering up to 85% faster per-image processing for AI training pipelines.

by Felix Pinkston
Apr 03, 2026

Cuda

NVIDIA Releases CUDA Tile for BASIC in April Fools Joke With Real Tech

NVIDIA's cuTile BASIC announcement showcases CUDA Tile's language-agnostic design while poking fun at legacy code. The underlying tech is genuinely significant.

by Iris Coleman
Apr 02, 2026

Cuda

NVIDIA CUDA 13.2 Update: Latest CUDA News Today (Ampere & Ada GPUs)

CUDA 13.2 extends tile-based GPU programming to older architectures, adds Python profiling tools, and delivers up to 5x speedups with new Top-K algorithms.

by Iris Coleman
Mar 30, 2026

Cuda

CUDA News Today: NVIDIA Brings CUDA to Third-Party Platforms

NVIDIA now allows developers to access CUDA via third-party platforms, simplifying software deployment and integration across various OS and package managers.

by Terrill Dicki
Mar 30, 2026

Cuda

NVIDIA CCCL 3.1 Adds Floating-Point Determinism Controls for GPU Computing

NVIDIA's CCCL 3.1 introduces three determinism levels for parallel reductions, letting developers trade performance for reproducibility in GPU computations.

by Caroline Bishop
Mar 06, 2026

Cuda

NVIDIA Brings CUDA Tile Programming to Julia with cuTile.jl Release

NVIDIA releases cuTile.jl, enabling Julia developers to write high-performance GPU kernels using tile-based programming with near-parity Python performance.

by James Ding
Mar 04, 2026

Cuda

NVIDIA cuda.compute Brings C++ GPU Performance to Python Developers

NVIDIA's new cuda.compute library topped GPU MODE benchmarks, delivering CUDA C++ performance through pure Python with 2-4x speedups over custom kernels.

by Tony Kim
Feb 19, 2026

Cuda

NVIDIA Integrates CUDA Tile Backend for OpenAI Triton GPU Programming

NVIDIA's new CUDA Tile IR backend for OpenAI Triton enables Python developers to access Tensor Core performance without CUDA expertise. Requires Blackwell GPUs.

by Alvin Lang
Jan 31, 2026

Cuda

NVIDIA CUDA 13.1 Drops CUB Boilerplate with New Single-Call API

NVIDIA simplifies GPU development with CUB single-call API in CUDA 13.1, eliminating repetitive two-phase memory allocation code without performance loss.

by Felix Pinkston
Jan 22, 2026