Video yükleniyor...
Video Yüklenemedi
GOOGLE JUST MADE EVERY CHATBOT FEEL SLOW Diffusion Gemma 26b doesn’t predict word by word, it generates 256 tokens in parallel using bi-directional attention, like stable diffusion but for language it’s MoE so only 3.8B params activate during inference, fits on a single RTX 4090 with 18GB VRAM and... show more
27,080 görüntüleme • 19 gün önce •via X (Twitter)
0 Yorum
Yorum bulunmuyor
Orijinal gönderinin yorumları burada görünecek
