Tensorrt News

Tensorrt

NVIDIA's Inference Software Slashes AI Token Costs by 5x

NVIDIA's software stack on Blackwell GPUs reduces token costs by 5x, driving AI inference efficiency for major players like Baseten and Deep Infra.

by Luisa Crawford
Jun 30, 2026

Tensorrt

NVIDIA TensorRT 11 Adds Multi-GPU Inference Support

NVIDIA's TensorRT 11 introduces multi-device inference, enabling AI models to scale across GPUs, critical for generative AI demands.

by Peter Zhang
Jun 27, 2026

Tensorrt

NVIDIA TensorRT Brings FP8 Quantization to AI Deployment

NVIDIA TensorRT optimizes AI inference with FP8 quantization, offering faster performance and smaller models for scalable deployment.

by Darius Baruo
Jun 10, 2026

Tensorrt

How to Reduce Pipeline Friction in AI Model Serving

Learn practical strategies to eliminate inefficiencies in AI model serving pipelines using tools like TensorRT and Dynamo-Triton.

by Peter Zhang
May 13, 2026

Tensorrt

NVIDIA TensorRT for RTX Brings Self-Optimizing AI to Consumer GPUs

NVIDIA's TensorRT for RTX introduces adaptive inference that automatically optimizes AI workloads at runtime, delivering 1.32x performance gains on RTX 5090.

by Iris Coleman
Jan 27, 2026

Tensorrt

NVIDIA's Breakthrough: 4x Faster Inference in Math Problem Solving with Advanced Techniques

NVIDIA achieves a 4x faster inference in solving complex math problems using NeMo-Skills, TensorRT-LLM, and ReDrafter, optimizing large language models for efficient scaling.

by Terrill Dicki
Nov 10, 2025

Tensorrt

Optimizing Large Language Models with NVIDIA's TensorRT: Pruning and Distillation Explained

Explore how NVIDIA's TensorRT Model Optimizer utilizes pruning and distillation to enhance large language models, making them more efficient and cost-effective.

by Timothy Morano
Oct 07, 2025

Tensorrt

Optimizing LLM Inference with TensorRT: A Comprehensive Guide

Explore how TensorRT-LLM enhances large language model inference by optimizing performance through benchmarking and tuning, offering developers a robust toolset for efficient deployment.

by Luisa Crawford
Jul 07, 2025

Tensorrt

NVIDIA RTX AI Boosts Image Editing with FLUX.1 Kontext Release

NVIDIA RTX AI and TensorRT enhance Black Forest Labs' FLUX.1 Kontext model, streamlining image generation and editing with faster performance and lower VRAM requirements.

by Lawrence Jengar
Jul 02, 2025

Tensorrt

NVIDIA TensorRT Enhances Stable Diffusion 3.5 on RTX GPUs

NVIDIA's TensorRT SDK significantly boosts the performance of Stable Diffusion 3.5, reducing VRAM requirements by 40% and doubling efficiency on RTX GPUs.

by Rebeca Moen
Jun 12, 2025

Tensorrt

NVIDIA Unveils TensorRT for RTX to Boost AI Application Performance

NVIDIA introduces TensorRT for RTX, a new SDK aimed at enhancing AI application performance on NVIDIA RTX GPUs, supporting both C++ and Python integrations for Windows and Linux.

by Alvin Lang
Jun 12, 2025

Tensorrt

NVIDIA Unveils TensorRT for RTX: Enhanced AI Inference on Windows 11

NVIDIA introduces TensorRT for RTX, an optimized AI inference library for Windows 11, enhancing AI experiences across creativity, gaming, and productivity apps.

by Lawrence Jengar
May 19, 2025

Tensorrt

NVIDIA's FP4 Image Generation Boosts RTX 50 Series GPU Performance

NVIDIA's latest TensorRT update introduces FP4 image generation for RTX 50 series GPUs, enhancing AI model performance and efficiency. Explore the advancements in generative AI technology.

by Terrill Dicki
May 14, 2025

Tensorrt

Microsoft and NVIDIA Enhance Llama Model Performance on Azure AI Foundry

Microsoft and NVIDIA collaborate to significantly boost Meta Llama model performance on Azure AI Foundry using NVIDIA TensorRT-LLM optimizations, enhancing throughput, reducing latency, and improving cost efficiency.

by Ted Hisokawa
Mar 21, 2025

Tensorrt

NVIDIA Enhances Llama 3.3 70B Model Performance with TensorRT-LLM

Discover how NVIDIA's TensorRT-LLM boosts Llama 3.3 70B model inference throughput by 3x using advanced speculative decoding techniques.

by Rebeca Moen
Dec 18, 2024

TENSORRT