
Red Hat AI
@RedHat_AI • 11,002 subscribers
Accelerating AI innovation with open platforms and community. The future of AI is open.
Videos

What compression looks like on vLLM. Same Gemma 4 31B. Red Hat AI's quantized version runs at nearly 2x tokens/sec, half the memory, 99%+ accuracy retained. Open source. Quantized with LLM Compressor. Links in comments. 🙏 Sawyer Bowerman for the 2-minute demo.
Red Hat AI34,069 Aufrufe • vor 2 Monaten

Michael Goin (Michael Goin) walks through what's new in vLLM v0.17, v0.18, and v0.19 in ~8 minutes. Flash Attention 4, new performance modes, zero-bubble async scheduling, online MXFP4 quantization, Gemma 4, and a lot more. 1,592 commits. 682 contributors (163 new). 🎉 🚀
Red Hat AI22,983 Aufrufe • vor 2 Monaten

A full year of vLLM in 30 minutes by vLLM Lead from UC Berkeley, Simon Mo. Model and hardware usage trends, model architectures, API evolution, V1 engine rebuild, multimodal progress, expanding hardware support, and more. Plus how we are thinking about 2026. Enjoy!
Red Hat AI15,697 Aufrufe • vor 5 Monaten
Keine weiteren Inhalte verfügbar