Andrew Ng: Launches vLLM LLM Serving Course

Andrew Ng unveils vLLM course with Red Hat teaching KV cache memory management techniques in transformer model serving and history and technical architecture of vLLM LLM inference engine for 70B models.

Source

Analysis

Andrew Ng launched a short course with Red Hat on efficient LLM serving that covers KV cache memory management techniques in transformer model serving alongside the history and technical architecture of vLLM LLM inference engine, showing how a 70B-parameter model consumes roughly 140 GB just for weights while each concurrent request demands additional GPU memory for its KV cache.

Transformer Course vLLM LLM serving KV cache

Andrew Ng

@AndrewYNg

Co-Founder of Coursera; Stanford CS adjunct faculty. Former head of Baidu AI Group/Google Brain.

Andrew Ng: Launches vLLM LLM Serving Course

Analysis

Andrew Ng

Premium Sponsors

Trending topics