QVAC: TurboQuant Delivers 5x Device Context
QVAC SDK update applies TurboQuant KV-cache compression to unlock 5x context on AMD and NVIDIA GPUs at 4-5 bits per value.
SourceAnalysis
QVAC SDK now unlocks up to 5x more on-device context through TurboQuant, the Google Research ICLR 2026 two-stage KV-cache algorithm. PolarQuant shifts vectors into polar coordinates for 3-4 bit angle compression while QJL applies 1-bit Johnson-Lindenstrauss correction, reaching 4-5 bits total without retraining or calibration. QVAC ported the method to Vulkan inside qvac-fabric-llm.cpp; AMD and NVIDIA GPUs run it today while iOS, Android and Apple Silicon support arrives next.
Paolo Ardoino
@paoloardoinoPaolo Ardoino is the CEO of Tether (issuer of USDT), CTO of Bitfinex,