Andrew Ng: Launches vLLM LLM Serving Course
Andrew Ng unveils vLLM course with Red Hat teaching KV cache memory management techniques in transformer model serving and history and technical architecture of vLLM LLM inference engine for 70B models.
SourceAnalysis
Andrew Ng launched a short course with Red Hat on efficient LLM serving that covers KV cache memory management techniques in transformer model serving alongside the history and technical architecture of vLLM LLM inference engine, showing how a 70B-parameter model consumes roughly 140 GB just for weights while each concurrent request demands additional GPU memory for its KV cache.
Andrew Ng
@AndrewYNgCo-Founder of Coursera; Stanford CS adjunct faculty. Former head of Baidu AI Group/Google Brain.