正在加载视频...
视频加载失败
Added context to my tiny diffusion model to enable sequential generation of longer outputs! Currently the context is a quarter of the sequence length (seq_len=256, context_len=64). I have a theory that the less semantic-value-per-token, the worse the “curse of parallel decoding” is. With parallel decoding, we independently predict multiple... show more
89,040 次观看 • 7 个月前 •via X (Twitter)
0 条评论
暂无评论
原始帖子的评论将显示在这里

