NVIDIA
NVIDIA NeMo-RL Utilizes GRPO for Advanced Reinforcement Learning
NVIDIA introduces NeMo-RL, an open-source library for reinforcement learning, enabling scalable training with GRPO and integration with Hugging Face models.
NVIDIA Expands Python Capabilities with CUDA Kernel Fusion Tools
NVIDIA introduces cuda.cccl, bridging the gap for Python developers by providing essential building blocks for CUDA kernel fusion, enhancing performance across GPU architectures.
NVIDIA's Helix Parallelism Revolutionizes AI with Multi-Million Token Inference
NVIDIA introduces Helix Parallelism, a breakthrough in AI, enabling faster real-time inference with multi-million-token contexts, enhancing performance and user experience.
NVIDIA Boosts AI Factories With DPU-Enhanced Kubernetes Service Proxy
NVIDIA advances AI applications with DPU-accelerated service proxies for Kubernetes, enhancing performance, efficiency, and security for AI clouds according to NVIDIA.
NVIDIA Enhances cuQuantum with Dynamic Gradients and DMRG Primitives
NVIDIA's cuQuantum SDK introduces dynamic gradients, DMRG primitives, and performance improvements, enhancing quantum computing emulations on Tensor Core GPUs.
RAPIDS Introduces GPU Polars Streaming and Unified GNN API Enhancements
NVIDIA's RAPIDS suite version 25.06 unveils new features including GPU Polars streaming, a unified GNN API, and zero-code ML speedups, enhancing Python data science capabilities.
NVIDIA Unveils Data Flywheel Blueprint to Optimize AI Agents
NVIDIA introduces the Data Flywheel Blueprint, a workflow aimed at enhancing AI agents by reducing costs and improving efficiency using automated experimentation and self-improving loops.
CoreWeave Marks Milestone with NVIDIA GB300 NVL72 Platform Deployment
CoreWeave becomes the first AI cloud provider to deploy NVIDIA's GB300 NVL72 systems, enhancing AI performance and expanding its cloud capabilities.
Exploring Handwritten PTX Code for GPU Optimization in CUDA
Delve into the potential of handwritten PTX code for enhancing GPU performance in CUDA applications, as outlined by NVIDIA experts.
NVIDIA Omniverse Deprecates Launcher for Enhanced Developer Experience
NVIDIA announces the deprecation of the Omniverse Launcher on October 1st, aiming to streamline developer access to essential tools and resources directly through platforms like GitHub.
NVIDIA RTX AI Boosts Image Editing with FLUX.1 Kontext Release
NVIDIA RTX AI and TensorRT enhance Black Forest Labs' FLUX.1 Kontext model, streamlining image generation and editing with faster performance and lower VRAM requirements.
Effective FP8 Training: Exploring Per-Tensor and Per-Block Scaling Strategies
Explore NVIDIA's FP8 training strategies, focusing on per-tensor and per-block scaling methods, for enhanced numerical stability and accuracy in low-precision AI model training.
NVIDIA's Llama 3.2 NeMo Retriever Enhances Multimodal RAG Pipelines
NVIDIA introduces the Llama 3.2 NeMo Retriever Multimodal Embedding Model, boosting efficiency and accuracy in retrieval-augmented generation pipelines by integrating visual and textual data processing.
NVIDIA Expands Support for Google DeepMind's Gemma 3n on RTX and Jetson
NVIDIA announces the general availability of Google DeepMind's Gemma 3n on NVIDIA RTX and Jetson platforms, enhancing AI capabilities for developers.
NVIDIA Supports Google DeepMind's Gemma 3n on Jetson and RTX
NVIDIA announces support for Google DeepMind's Gemma 3n on Jetson and RTX platforms, enhancing AI capabilities across devices with optimized models.