Tensorrt News

Tensorrt

NVIDIA's TensorRT-LLM MultiShot Enhances AllReduce Performance with NVSwitch

NVIDIA introduces TensorRT-LLM MultiShot to improve multi-GPU communication efficiency, achieving up to 3x faster AllReduce operations by leveraging NVSwitch technology.

by Alvin Lang
Nov 03, 2024

Tensorrt

NVIDIA Enhances Llama 3.1 405B Performance with TensorRT Model Optimizer

NVIDIA's TensorRT Model Optimizer significantly boosts performance of Meta's Llama 3.1 405B large language model on H200 GPUs.

by Lawrence Jengar
Aug 30, 2024

Tensorrt

NVIDIA Enhances TensorRT Model Optimizer v0.15 with Improved Inference Performance

NVIDIA releases TensorRT Model Optimizer v0.15, offering enhanced inference performance through new features like cache diffusion and expanded AI model support.

by Zach Anderson
Aug 16, 2024

Tensorrt

Enhanced AI Performance with NVIDIA TensorRT 10.0's Weight-Stripped Engines

NVIDIA introduces TensorRT 10.0 with weight-stripped engines, offering >95% compression for AI apps.

by Jessie A Ellis
Jun 12, 2024

Tensorrt

StreamingLLM Breakthrough: Handling Over 4 Million Tokens with 22.2x Inference Speedup

SwiftInfer, leveraging StreamingLLM's groundbreaking technology, significantly enhances large language model inference, enabling efficient handling of over 4 million tokens in multi-round conversations with a 22.2x speedup.

by Massar Tanya Ming Yau Chong
Jan 09, 2024

TENSORRT

NVIDIA's TensorRT-LLM MultiShot Enhances AllReduce Performance with NVSwitch

NVIDIA Enhances Llama 3.1 405B Performance with TensorRT Model Optimizer

NVIDIA Enhances TensorRT Model Optimizer v0.15 with Improved Inference Performance

Enhanced AI Performance with NVIDIA TensorRT 10.0's Weight-Stripped Engines

StreamingLLM Breakthrough: Handling Over 4 Million Tokens with 22.2x Inference Speedup