Microsoft Bing Visual Search, a tool enabling users worldwide to search using photographs, has been significantly optimized through a collaboration with NVIDIA, resulting in a remarkable performance boost. According to NVIDIA Technical Blog, the integration of NVIDIA's TensorRT, CV-CUDA, and nvImageCodec into Bing's TuringMM visual embedding model has led to a 5.13x increase in throughput for offline indexing pipelines, reducing both energy consumption and costs.
Multimodal AI and Visual Search
Multimodal AI technologies, like Microsoft's TuringMM, are essential for applications that require seamless interaction between different data types such as text and images. A popular model for joint image-text understanding is CLIP, which uses a dual encoder architecture to process hundreds of millions of image-caption pairs. These advanced models are critical for tasks such as text-based visual search, zero-shot image classification, and image captioning.
Optimization Efforts
The optimization of Bing's visual embedding pipeline was achieved by leveraging NVIDIA's GPU acceleration technologies. The effort focused on enhancing the performance of the TuringMM pipeline by using NVIDIA's TensorRT for model execution, which improved the efficiency of computationally expensive layers in transformer architectures. Additionally, the use of nvImageCodec and CV-CUDA accelerated the image decoding and preprocessing stages, leading to a significant reduction in latency for image processing tasks.
Implementation and Results
Prior to optimization, Bing's visual embedding model operated on a GPU server cluster that handled inference tasks for various deep learning services across Microsoft. The original implementation, using ONNXRuntime with CUDA Execution Provider, faced bottlenecks due to image decoding processes handled by OpenCV. By integrating NVIDIA's libraries, the pipeline's throughput increased from 88 queries per second (QPS) to 452 QPS, showcasing a 5.14x speedup.
These enhancements not only improved processing speed but also reduced the computational load on CPUs by offloading tasks to GPUs, thus maximizing power efficiency. The NVIDIA TensorRT contributed most to the performance gains, while the nvImageCodec and CV-CUDA libraries added an additional 27% improvement.
Conclusion
The successful optimization of Microsoft Bing Visual Search highlights the potential of NVIDIA's accelerated libraries in enhancing AI-driven applications. The collaboration demonstrates how GPU resources can be effectively utilized to accelerate deep learning and image processing workloads, even when baseline systems already employ GPU acceleration. These advancements pave the way for more efficient and responsive visual search capabilities, benefiting both users and service providers.
For more detailed insights into the optimization process, visit the original NVIDIA Technical Blog.
Image source: Shutterstock