Cuda News

Cuda

NVIDIA CUDA 13.1 Drops CUB Boilerplate with New Single-Call API

NVIDIA simplifies GPU development with CUB single-call API in CUDA 13.1, eliminating repetitive two-phase memory allocation code without performance loss.

by Felix Pinkston
Jan 22, 2026

Cuda

NVIDIA cuTile Python Guide Shows 90% cuBLAS Performance for Matrix Ops

NVIDIA releases detailed cuTile Python tutorial for Blackwell GPUs, demonstrating matrix multiplication achieving over 90% of cuBLAS performance with simplified code.

by Timothy Morano
Jan 15, 2026

Cuda

NVIDIA Enhances cuML Accessibility by Reducing CUDA Binary Size for PyPI Distribution

NVIDIA introduces pip-installable cuML wheels on PyPI, simplifying installation and broadening accessibility by reducing CUDA binary sizes.

by Timothy Morano
Dec 16, 2025

Cuda

NVIDIA Enhances Memory Safety with Compile-Time Instrumentation for Compute Sanitizer

NVIDIA's latest update to Compute Sanitizer introduces compile-time instrumentation to improve memory safety in CUDA C++ applications, reducing false negatives and enhancing bug detection.

by Ted Hisokawa
Dec 11, 2025

Cuda

NVIDIA's ComputeEval 2025.2 Challenges LLMs with Advanced CUDA Tasks

NVIDIA expands ComputeEval with 232 new CUDA challenges, testing LLMs' capabilities in complex programming tasks. Discover the impact on AI-assisted coding.

by Peter Zhang
Nov 07, 2025

Cuda

Enhancing GPU Efficiency: Understanding Global Memory Access in CUDA

Explore how efficient global memory access in CUDA can unlock GPU performance. Learn about coalesced memory patterns, profiling techniques, and best practices for optimizing CUDA kernels.

by Alvin Lang
Sep 30, 2025

Cuda

NVIDIA Enhances Vision AI with CUDA-Accelerated VC-6

NVIDIA introduces CUDA-accelerated VC-6 to optimize vision AI pipelines, leveraging GPU parallelism for high-performance data processing, reducing I/O bottlenecks, and enhancing AI application efficiency.

by Rongchai Wang
Sep 11, 2025

Cuda

Enhancing CUDA Kernel Performance with Shared Memory Register Spilling

Discover how CUDA 13.0 optimizes kernel performance by using shared memory for register spilling, reducing latency and improving efficiency in GPU computations.

by Darius Baruo
Aug 28, 2025

Cuda

NVIDIA Introduces Wheel Variants to Simplify CUDA-Accelerated Python Package Deployment

NVIDIA launches Wheel Variants to streamline CUDA-accelerated Python package installation, addressing compatibility challenges and optimizing user experience across diverse hardware setups.

by Timothy Morano
Aug 14, 2025

Cuda

Enhancing CUDA Performance: The Role of Vectorized Memory Access

Explore how vectorized memory access in CUDA C/C++ can significantly improve bandwidth utilization and reduce instruction count, according to NVIDIA's latest insights.

by Felix Pinkston
Aug 05, 2025

Cuda

NVIDIA's CUTLASS 4.0: Advancing GPU Performance with New Python Interface

NVIDIA unveils CUTLASS 4.0, introducing a Python interface to enhance GPU performance for deep learning and high-performance computing, utilizing CUDA Tensors and Spatial Microkernels.

by Ted Hisokawa
Jul 18, 2025

Cuda

NVIDIA Expands Python Capabilities with CUDA Kernel Fusion Tools

NVIDIA introduces cuda.cccl, bridging the gap for Python developers by providing essential building blocks for CUDA kernel fusion, enhancing performance across GPU architectures.

by Tony Kim
Jul 10, 2025

Cuda

Exploring Handwritten PTX Code for GPU Optimization in CUDA

Delve into the potential of handwritten PTX code for enhancing GPU performance in CUDA applications, as outlined by NVIDIA experts.

by Luisa Crawford
Jul 03, 2025

Cuda

Enhancing CUDA Development: Compiler Explorer Unveiled

Compiler Explorer is revolutionizing CUDA development by offering a seamless web-based platform for writing, compiling, and running GPU kernels, fostering collaboration and innovation.

by Timothy Morano
Jun 19, 2025

Cuda

NVIDIA's cuEmbed Boosts GPU Performance for Embedding Lookups

NVIDIA unveils cuEmbed, a CUDA library that significantly enhances embedding lookups on GPUs, promising improved performance for recommendation systems and other applications.

by Caroline Bishop
May 16, 2025