正在加载视频...
视频加载失败
GOOGLE JUST MADE EVERY CHATBOT FEEL SLOW Diffusion Gemma 26b doesn’t predict word by word, it generates 256 tokens in parallel using bi-directional attention, like stable diffusion but for language it’s MoE so only 3.8B params activate during inference, fits on a single RTX 4090 with 18GB VRAM and... show more
27,080 次观看 • 19 天前 •via X (Twitter)
0 条评论
暂无评论
原始帖子的评论将显示在这里
