GPU PROGRAMMING
NVIDIA CUDA 13.3 Brings Tile Programming to C++
NVIDIA CUDA 13.3 introduces tile-based GPU programming in C++, optimizing Tensor Core use and simplifying kernel development.
NVIDIA CUDA 13.3 Boosts GPU Programming with Tile C++ and Python
NVIDIA CUDA 13.3 introduces Tile C++ programming, Python updates, and CompileIQ, delivering up to 15% kernel speedups and enhancing GPU development.
NVIDIA Releases CUDA Tile for BASIC in April Fools Joke With Real Tech
NVIDIA's cuTile BASIC announcement showcases CUDA Tile's language-agnostic design while poking fun at legacy code. The underlying tech is genuinely significant.
NVIDIA Brings CUDA Tile Programming to Julia with cuTile.jl Release
NVIDIA releases cuTile.jl, enabling Julia developers to write high-performance GPU kernels using tile-based programming with near-parity Python performance.
NVIDIA Integrates CUDA Tile Backend for OpenAI Triton GPU Programming
NVIDIA's new CUDA Tile IR backend for OpenAI Triton enables Python developers to access Tensor Core performance without CUDA expertise. Requires Blackwell GPUs.
NVIDIA cuTile Python Guide Shows 90% cuBLAS Performance for Matrix Ops
NVIDIA releases detailed cuTile Python tutorial for Blackwell GPUs, demonstrating matrix multiplication achieving over 90% of cuBLAS performance with simplified code.
CUDA Toolkit 13.0 Unveils Advanced Features for Enhanced GPU Programming
NVIDIA's CUDA Toolkit 13.0 introduces innovative features like tile-based programming and unified Arm platform support, enhancing developer productivity and GPU performance.
Enhancing CUDA Efficiency: Key Techniques for Aspiring Developers
Discover essential techniques to optimize NVIDIA CUDA performance, tailored for new developers, as explained by NVIDIA experts.