Loading video...

Video Failed to Load

Go Home

LLMs require more GPU memory as they generate longer responses. Can we make GPU memory constant without significantly sacrificing accuracy? IceCache is a new method for managing KV caches that leverages Dynamic Continuous Indexing (DCI) to efficiently group and retrieve tokens by semantics. Joint work w/ Yuzhen Mao, Qitong...

21,163 views โ€ข 2 months ago โ€ขvia X (Twitter)

0 Comments

No comments available

Comments from the original post will appear here

Related Videos