Loading video...
Video Failed to Load
GOOGLE JUST MADE EVERY CHATBOT FEEL SLOW Diffusion Gemma 26b doesn’t predict word by word, it generates 256 tokens in parallel using bi-directional attention, like stable diffusion but for language it’s MoE so only 3.8B params activate during inference, fits on a single RTX 4090 with 18GB VRAM and... show more
27,080 views • 19 days ago •via X (Twitter)
0 Comments
No comments available
Comments from the original post will appear here
