Quantization News

Quantization

NVIDIA Jetson Memory Tricks Let Edge Devices Run 10B Parameter AI Models

NVIDIA reveals optimization techniques that reclaim up to 12GB of memory on Jetson devices, enabling multi-billion parameter LLMs to run on edge hardware.

by Rongchai Wang
Apr 21, 2026

Quantization

Enhancing AI Model Efficiency with Quantization Aware Training and Distillation

Explore how Quantization Aware Training (QAT) and Quantization Aware Distillation (QAD) optimize AI models for low-precision environments, enhancing accuracy and inference performance.

by Rongchai Wang
Sep 11, 2025

Quantization

Enhancing Large Language Models: NVIDIA's Post-Training Quantization Techniques

NVIDIA's post-training quantization (PTQ) advances performance and efficiency in AI models, leveraging formats like NVFP4 for optimized inference without retraining, according to NVIDIA.

by Ted Hisokawa
Aug 02, 2025

Quantization

Nexa AI Enhances DeepSeek R1 Distill Performance with NexaQuant on AMD Platforms

Nexa AI introduces NexaQuant technology for DeepSeek R1 Distills, optimizing performance on AMD platforms with improved inference capabilities and reduced memory footprint.

by Lawrence Jengar
Feb 20, 2025

QUANTIZATION

NVIDIA Jetson Memory Tricks Let Edge Devices Run 10B Parameter AI Models

Enhancing AI Model Efficiency with Quantization Aware Training and Distillation

Enhancing Large Language Models: NVIDIA's Post-Training Quantization Techniques

Nexa AI Enhances DeepSeek R1 Distill Performance with NexaQuant on AMD Platforms