Loading video...
Video Failed to Load
New course on serving LLMs efficiently -- how do you serve models to many concurrent users at low latency and reasonable cost? This short course is built with Red Hat and taught by Cedric Clyburn. Efficient LLM serving requires efficient memory management. A 70B-parameter model takes ~140 GB just... show more
112,086 views • 18 days ago •via X (Twitter)
0 Comments
No comments available
Comments from the original post will appear here
