CUDA
NVIDIA's ComputeEval 2025.2 Challenges LLMs with Advanced CUDA Tasks
NVIDIA expands ComputeEval with 232 new CUDA challenges, testing LLMs' capabilities in complex programming tasks. Discover the impact on AI-assisted coding.
Enhancing GPU Efficiency: Understanding Global Memory Access in CUDA
Explore how efficient global memory access in CUDA can unlock GPU performance. Learn about coalesced memory patterns, profiling techniques, and best practices for optimizing CUDA kernels.
NVIDIA Enhances Vision AI with CUDA-Accelerated VC-6
NVIDIA introduces CUDA-accelerated VC-6 to optimize vision AI pipelines, leveraging GPU parallelism for high-performance data processing, reducing I/O bottlenecks, and enhancing AI application efficiency.
Enhancing CUDA Kernel Performance with Shared Memory Register Spilling
Discover how CUDA 13.0 optimizes kernel performance by using shared memory for register spilling, reducing latency and improving efficiency in GPU computations.
NVIDIA Introduces Wheel Variants to Simplify CUDA-Accelerated Python Package Deployment
NVIDIA launches Wheel Variants to streamline CUDA-accelerated Python package installation, addressing compatibility challenges and optimizing user experience across diverse hardware setups.
Enhancing CUDA Performance: The Role of Vectorized Memory Access
Explore how vectorized memory access in CUDA C/C++ can significantly improve bandwidth utilization and reduce instruction count, according to NVIDIA's latest insights.
NVIDIA's CUTLASS 4.0: Advancing GPU Performance with New Python Interface
NVIDIA unveils CUTLASS 4.0, introducing a Python interface to enhance GPU performance for deep learning and high-performance computing, utilizing CUDA Tensors and Spatial Microkernels.
NVIDIA Expands Python Capabilities with CUDA Kernel Fusion Tools
NVIDIA introduces cuda.cccl, bridging the gap for Python developers by providing essential building blocks for CUDA kernel fusion, enhancing performance across GPU architectures.
Exploring Handwritten PTX Code for GPU Optimization in CUDA
Delve into the potential of handwritten PTX code for enhancing GPU performance in CUDA applications, as outlined by NVIDIA experts.
Enhancing CUDA Development: Compiler Explorer Unveiled
Compiler Explorer is revolutionizing CUDA development by offering a seamless web-based platform for writing, compiling, and running GPU kernels, fostering collaboration and innovation.
NVIDIA's cuEmbed Boosts GPU Performance for Embedding Lookups
NVIDIA unveils cuEmbed, a CUDA library that significantly enhances embedding lookups on GPUs, promising improved performance for recommendation systems and other applications.
Decoding PTX: The Core of NVIDIA CUDA GPU Computing
Explore PTX, the assembly language for NVIDIA CUDA GPUs, its role in enabling forward compatibility, and its significance in the GPU computing landscape.
NVIDIA's CUDA Libraries Enhance Cybersecurity with AI-Powered Solutions
NVIDIA's CUDA libraries are revolutionizing cybersecurity by integrating AI, offering enhanced threat detection, real-time response, and scalability to tackle modern cyber threats.
Enhancing Deep Learning with nvmath-python's Matrix Multiplication and Epilog Fusion
Discover how nvmath-python leverages NVIDIA CUDA-X math libraries for high-performance matrix operations, optimizing deep learning tasks with epilog fusion, as detailed by Szymon Karpiński.
Numbast Bridges CUDA C++ and Python Ecosystems
Numbast introduces an automated pipeline to convert CUDA C++ APIs into Numba bindings, enhancing Python developers' access to CUDA's performance.