Loading video...

Video Failed to Load

There was a problem loading this video. This could be due to a temporary network issue or the video might be unavailable.

LLMs require more GPU memory as they generate longer responses. Can we make GPU memory constant without significantly sacrificing accuracy? IceCache is a new method for managing KV caches that leverages Dynamic Continuous Indexing (DCI) to efficiently group and retrieve tokens by semantics. Joint work w/ Yuzhen Mao, Qitong... show more

Ke Li 🍁

6,481 subscribers

21,163 views • 2 months ago •via X (Twitter)

Anya Rossi• Live Now

Private livecam show

0 Comments

No comments available

Comments from the original post will appear here

Related Videos

Can you make a jigsaw puzzle with two different solutions? Or an image that changes appearance when flipped? We can do that, and a lot more, by using diffusion models to generate optical illusions! Continue reading for more illusions and method details 🧵

Can you make a jigsaw puzzle with two different solutions? Or an image that changes appearance when flipped? We can do that, and a lot more, by using diffusion models to generate optical illusions! Continue reading for more illusions and method details 🧵

Daniel Geng

125,796 views • 2 years ago

🎥 Video generation is hitting the memory wall. As videos get longer, the KV cache quietly explodes — and long-horizon consistency starts to break. We built Quant VideoGen: a training-free KV cache compression method for auto-regressive video diffusion. Instead of storing every KV in high precision, QVG exploits video’s spatiotemporal redundancy with semantic-aware smoothing + progressive residual quantization. 🚀 Up to 7× KV memory reduction ⚡ <4% overhead ✅ Strong long-video quality 🕹️ Deploy HYWorldPlay on your own RTX 5090 locally KV compression is becoming a core scaling primitive — not just for LLMs, but for video generation too. Paper: Code: (1/5)

🎥 Video generation is hitting the memory wall. As videos get longer, the KV cache quietly explodes — and long-horizon consistency starts to break. We built Quant VideoGen: a training-free KV cache compression method for auto-regressive video diffusion. Instead of storing every KV in high precision, QVG exploits video’s spatiotemporal redundancy with semantic-aware smoothing + progressive residual quantization. 🚀 Up to 7× KV memory reduction ⚡ <4% overhead ✅ Strong long-video quality 🕹️ Deploy HYWorldPlay on your own RTX 5090 locally KV compression is becoming a core scaling primitive — not just for LLMs, but for video generation too. Paper: Code: (1/5)

Haocheng Xi

64,520 views • 2 months ago

🟢 News: GPU compute in the browser is finally real. WebGPU now ships by default in Chrome, Firefox, Safari, and Edge—not a polyfill, not behind a flag. You can run LLMs client-side. Transformers.js and ONNX Runtime already ship WebGPU backends. Eight years of spec work. No more asterisks. 🔗 #WebGPU #GPU #Chrome #Safari #Firefox #Edge

🟢 News: GPU compute in the browser is finally real. WebGPU now ships by default in Chrome, Firefox, Safari, and Edge—not a polyfill, not behind a flag. You can run LLMs client-side. Transformers.js and ONNX Runtime already ship WebGPU backends. Eight years of spec work. No more asterisks. 🔗 #WebGPU #GPU #Chrome #Safari #Firefox #Edge

WebGL / WebGPU

12,762 views • 5 months ago

Can we make Transformers better and more efficient for robot learning? Excited to introduce Body Transformer (BoT), an architecture that leverages robot embodiment in the attention mechanism, by treating it as a graph of sensors and actuators.

Can we make Transformers better and more efficient for robot learning? Excited to introduce Body Transformer (BoT), an architecture that leverages robot embodiment in the attention mechanism, by treating it as a graph of sensors and actuators.

Carlo Sferrazza

60,679 views • 1 year ago

Who really benefits from GAIB 🟠 | RWAiFi ? 🤨 Let me share bro, ➤ Investors : Can get access to a stable asset AID backed by real GPU revenue. By staking it they earn yield from actual AI workload. ➤ GPU providers : Data centers or operators can tokenize their GPU and get financing through GAIB. ➤ AI Builders : Startups and companies get cheaper and more flexible access to GPU power. Hence, GAIB connects all three Investors, Providers and builders in a loop making AI compute more accessible and profitable for everyone. Let's discuss more about this.

Who really benefits from GAIB 🟠 | RWAiFi ? 🤨 Let me share bro, ➤ Investors : Can get access to a stable asset AID backed by real GPU revenue. By staking it they earn yield from actual AI workload. ➤ GPU providers : Data centers or operators can tokenize their GPU and get financing through GAIB. ➤ AI Builders : Startups and companies get cheaper and more flexible access to GPU power. Hence, GAIB connects all three Investors, Providers and builders in a loop making AI compute more accessible and profitable for everyone. Let's discuss more about this.

Elite94

13,289 views • 9 months ago

$🔥Nexera & Aethir: Unleashing AI’s Next Frontier Through Tokenized GPU Power 🤝 Nexera is proud to join forces with Aethir in a strategic partnership to make cutting-edge AI infrastructure globally accessible. By tokenizing fractional GPU ownership, we’re enabling developers, enterprises, and investors everywhere to harness the explosive growth of deep learning and generative AI without being limited by geography, scale, or cost. With transparent tokenization, innovators can access powerful GPUs for faster model training and more advanced applications. GPU providers gain streamlined funding for expansion and upgrades, and investors tap into a high-growth market with secure, compliant opportunities that can provide higher yields than other RWA products. It’s an entirely new ecosystem where everyone can thrive, fueling AI’s evolution at an unprecedented pace. By 2030, the global GPU market is projected to exceed hundreds of billions of dollars, driven by the explosive demand for AI-powered applications, deep learning, and increasingly sophisticated generative models, ensuring that tokenizing these invaluable resources is poised to tap into a massive, rapidly expanding opportunity. $NXRA$

🔥Nexera & Aethir: Unleashing AI’s Next Frontier Through Tokenized GPU Power 🤝 Nexera is proud to join forces with Aethir in a strategic partnership to make cutting-edge AI infrastructure globally accessible. By tokenizing fractional GPU ownership, we’re enabling developers, enterprises, and investors everywhere to harness the explosive growth of deep learning and generative AI without being limited by geography, scale, or cost. With transparent tokenization, innovators can access powerful GPUs for faster model training and more advanced applications. GPU providers gain streamlined funding for expansion and upgrades, and investors tap into a high-growth market with secure, compliant opportunities that can provide higher yields than other RWA products. It’s an entirely new ecosystem where everyone can thrive, fueling AI’s evolution at an unprecedented pace. By 2030, the global GPU market is projected to exceed hundreds of billions of dollars, driven by the explosive demand for AI-powered applications, deep learning, and increasingly sophisticated generative models, ensuring that tokenizing these invaluable resources is poised to tap into a massive, rapidly expanding opportunity. $NXRA

Nexera

27,774 views • 1 year ago

Introducing: DeCloud GPU Marketplace Dear Cloudians, We are thrilled to introduce another robust addition to the DeCloud Ecosystem, furthering our commitment to promoting decentralization at its pinnacle. The DeCloud GPU Marketplace is a cutting-edge platform designed to revolutionize the way you access and utilize GPU resources for your diverse computational needs. Users can harness the power of GPU resources for a multitude of purposes, including but not limited to: ^Mining various cryptocurrencies ^Utilizing AI/ML learning datasets ^Rendering high-resolution videos and 3D models Decentralization, Safety, and Privacy: At DeCloud, we prioritize decentralization, safety, and privacy. By providing access to different GPU providers and private connections, we ensure a secure and privacy-oriented system for our users. Your data and computational tasks are safeguarded within our decentralized DeCloud ecosystem.

Introducing: DeCloud GPU Marketplace Dear Cloudians, We are thrilled to introduce another robust addition to the DeCloud Ecosystem, furthering our commitment to promoting decentralization at its pinnacle. The DeCloud GPU Marketplace is a cutting-edge platform designed to revolutionize the way you access and utilize GPU resources for your diverse computational needs. Users can harness the power of GPU resources for a multitude of purposes, including but not limited to: ^Mining various cryptocurrencies ^Utilizing AI/ML learning datasets ^Rendering high-resolution videos and 3D models Decentralization, Safety, and Privacy: At DeCloud, we prioritize decentralization, safety, and privacy. By providing access to different GPU providers and private connections, we ensure a secure and privacy-oriented system for our users. Your data and computational tasks are safeguarded within our decentralized DeCloud ecosystem.

De Cloud

17,403 views • 2 years ago

Some things don't need to be complicated. GPU compute is one of them. Pay only for what you use, nothing more. Find a GPU that works for you today:

Some things don't need to be complicated. GPU compute is one of them. Pay only for what you use, nothing more. Find a GPU that works for you today:

Theta Network

16,826 views • 6 days ago

Memories are the previous learnings and context. Developers build up state and context as they do work, which is what makes a more tenured developer more effective than a brand new one with a similar skillset. It would be both inefficient and painful to relearn information from scratch around code structure, architecture, etc each time you needed to do a new task. Memories solve for this. As you do work with Cascade, it can automatically choose to “remember” pieces of information that it learns as Memories, and for any later work, it can choose to pull from this memory bank instead of trying to relearn that information from scratch. You also can manually prompt Cascade to remember parts of conversations as Memories and can manually go in and edit Memories post-fact. Here’s a developer asking Cascade to save some knowledge as a Memory:

Windsurf

12,356 views • 1 year ago

Finding the mouse position on the terrain using the GPU, without collision. This clever method greatly improves the previous method used in Terrain3D. It's considerably faster, works much closer and farther away, and is more accurate. See 🧵 #madewithgodot #GodotEngine

Finding the mouse position on the terrain using the GPU, without collision. This clever method greatly improves the previous method used in Terrain3D. It's considerably faster, works much closer and farther away, and is more accurate. See 🧵 #madewithgodot #GodotEngine

Cory Petkovsek 🎮

90,591 views • 2 years ago

👥Vox populi, vox Colab🗣️! We’re thrilled to announce the official addition of high-memory A100 runtimes for Colab subscribers! These new instances will offer ⚡️double the GPU and system RAM compared to standard A100 runtimes📈. Use this new power wisely!

👥Vox populi, vox Colab🗣️! We’re thrilled to announce the official addition of high-memory A100 runtimes for Colab subscribers! These new instances will offer ⚡️double the GPU and system RAM compared to standard A100 runtimes📈. Use this new power wisely!

Colaboratory

52,574 views • 9 months ago

Partnership Announcement 🤝 Nuklai is partnering with io.net (old account), a decentralized compute network that leverages #GPU clustering services to turn weaker hardware into advanced compute networks. will provide decentralized GPU power to Nuklai.

Partnership Announcement 🤝 Nuklai is partnering with io.net (old account), a decentralized compute network that leverages #GPU clustering services to turn weaker hardware into advanced compute networks. will provide decentralized GPU power to Nuklai.

Nuklai

85,191 views • 2 years ago

Axiom 4 for Houdini is out Check out the changes to the popular GPU-accelerated sparse volumetric fluid solver for VFX and games, including improved collisions between smoke and surrounding objects #Houdini #simulation #VFX #gamedev Theory Accelerated

Axiom 4 for Houdini is out Check out the changes to the popular GPU-accelerated sparse volumetric fluid solver for VFX and games, including improved collisions between smoke and surrounding objects #Houdini #simulation #VFX #gamedev Theory Accelerated

CG Channel

12,439 views • 9 months ago

My Dexcom continuous glucose monitor feels like magic for managing my diabetes. Thrilled to be going back to the #BigGame with Dexcom to share the new #DexcomG7 with the world. Check out for more. #Ad

My Dexcom continuous glucose monitor feels like magic for managing my diabetes. Thrilled to be going back to the #BigGame with Dexcom to share the new #DexcomG7 with the world. Check out for more. #Ad

Nick Jonas

257,659 views • 3 years ago

Fusing the future of AI Layers, Through the finest currency for Compute $GPU is LIVE on Ethereum Bridge and Trade from above link 🟠 Official $GPU Contract Address: 0x79D464248516Bc6977cA2069ba15d8D1044479D8 🟠 Ticker: $GPU 🟠 Pair: GPU/ETH

Fusing the future of AI Layers, Through the finest currency for Compute $GPU is LIVE on Ethereum Bridge and Trade from above link 🟠 Official $GPU Contract Address: 0x79D464248516Bc6977cA2069ba15d8D1044479D8 🟠 Ticker: $GPU 🟠 Pair: GPU/ETH

GPUNET

187,615 views • 1 year ago

⬛️ We are currently accelerating the incubation of GPU Nodes into the infraX Network, with 12 H100’s currently available for operation. Despite the incubation of such immense GPU power, the infraX Platform is optimally designed to run on the least amount of computational power possible, meaning a lot of our available GPU nodes are currently sitting idle. Currently, we're utilising a single gigantic NVIDIA H100 server with 80GB of VRAM and over 220GB of RAM to run our Platform. To put that in perspective, it rivals the computational power of an adult human brain. This setup enables us to handle immense computational load and deliver high-quality AI content to our users, however we have much more in store. Our remaining, immense network of GPU units is currently being prepared for rental operations as we look to transform the corporate GPU lending sphere through our corporate GPU lending protocol. We already have many high tier Web3 Players ready for technical integration, with more approaching us daily. Through our V3 DApp we look to make these integrations publicly viewable with real time usage graphs integrated directly into our Platform, allowing for exceedingly unique viewing opportunities. $INFRA

⬛️ We are currently accelerating the incubation of GPU Nodes into the infraX Network, with 12 H100’s currently available for operation. Despite the incubation of such immense GPU power, the infraX Platform is optimally designed to run on the least amount of computational power possible, meaning a lot of our available GPU nodes are currently sitting idle. Currently, we're utilising a single gigantic NVIDIA H100 server with 80GB of VRAM and over 220GB of RAM to run our Platform. To put that in perspective, it rivals the computational power of an adult human brain. This setup enables us to handle immense computational load and deliver high-quality AI content to our users, however we have much more in store. Our remaining, immense network of GPU units is currently being prepared for rental operations as we look to transform the corporate GPU lending sphere through our corporate GPU lending protocol. We already have many high tier Web3 Players ready for technical integration, with more approaching us daily. Through our V3 DApp we look to make these integrations publicly viewable with real time usage graphs integrated directly into our Platform, allowing for exceedingly unique viewing opportunities. $INFRA

infraX | $INFRA

42,843 views • 1 year ago

Learning by visuals and images is more effective for long term memory🧠🧬

Learning by visuals and images is more effective for long term memory🧠🧬

Naruhodo Kanji 🇯🇵 Japanese

17,724 views • 1 year ago

We're thrilled to unveil the new Provider Leaderboard on the Spheron GPU Marketplace! 🎮 Now, GPU providers can track their rewards and see how they stack up against others. 🏆 It’s time to up your game, add more compute power, and compete for the top spot! Get ready to level up your earnings and experience the thrill of healthy competition in the world of decentralized compute! 💪 Idle GPUs? Fill this out:-

We're thrilled to unveil the new Provider Leaderboard on the Spheron GPU Marketplace! 🎮 Now, GPU providers can track their rewards and see how they stack up against others. 🏆 It’s time to up your game, add more compute power, and compete for the top spot! Get ready to level up your earnings and experience the thrill of healthy competition in the world of decentralized compute! 💪 Idle GPUs? Fill this out:-

Spheron Network

105,978 views • 1 year ago

Yu et al., "MosaicMem: Hybrid Spatial Memory for Controllable Video World Models" A patch-based spatial memory that you raster into views + glues to make things work.

Yu et al., "MosaicMem: Hybrid Spatial Memory for Controllable Video World Models" A patch-based spatial memory that you raster into views + glues to make things work.

Kwang Moo Yi

11,170 views • 3 months ago