MUON
Muon
NVIDIA Megatron Boosts LLM Training With Muon Optimizer
NVIDIA integrates Muon and advanced optimizers into Megatron to enhance large-scale LLM training with near-parity throughput to AdamW.