inference
Alibaba Unveils Its First Home-Grown AI Chip
Chinese e-commerce giant Alibaba unveiled its first artificial intelligence inference chip on Wednesday, a move which could further invigorate its already rip-roaring cloud computing business.
Together AI Unveils Inference Engine 2.0 with Turbo and Lite Endpoints
Together AI launches Inference Engine 2.0, offering Turbo and Lite endpoints for enhanced performance, quality, and cost-efficiency.
Hugging Face Introduces Inference-as-a-Service with NVIDIA NIM for AI Developers
Hugging Face and NVIDIA collaborate to offer Inference-as-a-Service, enhancing AI model efficiency and accessibility for developers.
Strategies to Optimize Large Language Model (LLM) Inference Performance
NVIDIA experts share strategies to optimize large language model (LLM) inference performance, focusing on hardware sizing, resource optimization, and deployment methods.
NVIDIA Triton Inference Server Excels in MLPerf Inference 4.1 Benchmarks
NVIDIA Triton Inference Server achieves exceptional performance in MLPerf Inference 4.1 benchmarks, demonstrating its capabilities in AI model deployment.
Enhancing AI Inference with NVIDIA NIM and Google Kubernetes Engine
NVIDIA collaborates with Google Cloud to integrate NVIDIA NIM with Google Kubernetes Engine, offering scalable AI inference solutions through Google Cloud Marketplace.
NVIDIA GH200 Superchip Boosts Llama Model Inference by 2x
The NVIDIA GH200 Grace Hopper Superchip accelerates inference on Llama models by 2x, enhancing user interactivity without compromising system throughput, according to NVIDIA.
Accelerating Causal Inference with NVIDIA RAPIDS and cuML
Discover how NVIDIA RAPIDS and cuML enhance causal inference by leveraging GPU acceleration for large datasets, offering significant speed improvements over traditional CPU-based methods.
NVIDIA's TensorRT-LLM Multiblock Attention Enhances AI Inference on HGX H200
NVIDIA's TensorRT-LLM introduces multiblock attention, significantly boosting AI inference throughput by up to 3.5x on the HGX H200, tackling challenges of long-sequence lengths.
AWS Expands NVIDIA NIM Microservices for Enhanced AI Inference
AWS and NVIDIA enhance AI inference capabilities by expanding NIM microservices across AWS platforms, boosting efficiency and reducing latency for generative AI applications.
Perplexity AI Leverages NVIDIA Inference Stack to Handle 435 Million Monthly Queries
Perplexity AI utilizes NVIDIA's inference stack, including H100 Tensor Core GPUs and Triton Inference Server, to manage over 435 million search queries monthly, optimizing performance and reducing costs.
NVIDIA's AI Inference Platform: Driving Efficiency and Cost Savings Across Industries
NVIDIA's AI inference platform enhances performance and reduces costs for industries like retail and telecom, leveraging advanced technologies like the Hopper platform and Triton Inference Server.
NVIDIA Enhances AI Inference with Full-Stack Solutions
NVIDIA introduces full-stack solutions to optimize AI inference, enhancing performance, scalability, and efficiency with innovations like the Triton Inference Server and TensorRT-LLM.
DeepSeek-R1 Enhances GPU Kernel Generation with Inference Time Scaling
NVIDIA's DeepSeek-R1 model uses inference-time scaling to improve GPU kernel generation, optimizing performance in AI models by efficiently managing computational resources during inference.
Together AI Unveils Cost-Effective On-Demand Dedicated Endpoints
Together AI introduces Dedicated Endpoints with up to 43% lower pricing, offering enhanced GPU inference capabilities for scaling AI applications, providing high-performance and cost-efficiency.
NVIDIA Unveils GeForce NOW for Enhanced Game AI and Developer Access
NVIDIA's GeForce NOW expands its cloud gaming service, offering new AI tools for developers and seamless game preview experiences, broadening access for gamers globally.
Maximizing AI Value Through Efficient Inference Economics
Explore how understanding AI inference costs can optimize performance and profitability, as enterprises balance computational challenges with evolving AI models.
NVIDIA Dynamo Enhances Large-Scale AI Inference with llm-d Community
NVIDIA collaborates with the llm-d community to enhance open-source AI inference capabilities, leveraging its Dynamo platform for improved large-scale distributed inference.
NVIDIA Unveils TensorRT for RTX: Enhanced AI Inference on Windows 11
NVIDIA introduces TensorRT for RTX, an optimized AI inference library for Windows 11, enhancing AI experiences across creativity, gaming, and productivity apps.
NVIDIA's GB200 NVL72 and Dynamo Enhance MoE Model Performance
NVIDIA's latest innovations, GB200 NVL72 and Dynamo, significantly enhance inference performance for Mixture of Experts (MoE) models, boosting efficiency in AI deployments.
NVIDIA Unveils NVFP4 for Enhanced Low-Precision AI Inference
NVIDIA introduces NVFP4, a new 4-bit floating-point format under the Blackwell architecture, aiming to optimize AI inference with improved accuracy and efficiency.
Envisioning the AI Ecosystem of Tomorrow: Perspectives and Principles
This article delves into the future of AI, exploring the concept of 'shared intelligence' in cyber-physical ecosystems. It highlights the shift from artificial narrow intelligence to more complex, interconnected systems, emphasizing the role of active inference, a physics-based approach, in AI's evolution. Ethical considerations in respecting individuality within these intelligent networks are also discussed, framing a future where AI is not just advanced but also ethically grounded.
Bitcoin Provides a Check Against Economic Mismanagement, says US Politician
US politician Ro Khanna has shared a tweet that is bullish on Bitcoin. He also advocated for sustainable cryptocurrency mining operations.