
Nathan Barry
@nathanrs • 2,827 subscribers
Man in the Arena Allocator. Code autocomplete model post-training @zeddotdev. Prev @Apple, @zfellows
Shorts
Videos

Rewrote tiny-diffusion to be 3x smaller! Went from 951 lines to just 364, all contained in one file. As simple as possible, but not simpler. I also added a tiny GPT implementation as a comparison (312 lines, inspired by Andrej Karpathy). The two implementations are ~80% identical. The model architecture, training loop, tokenization, etc, only differ in 19 lines of code. The main differences are contained within two functions (generate and get_batch). The reason to include the GPT implementation was to show how similar autoregressive LMs are to diffusion LMs on an architectural level. Only *1* line of code in the architecture needs to be modified to support masked language diffusion instead of next-token prediction (by disabling causal masking). Link to the repo is in the comments
Nathan Barry161,243 次观看 • 5 个月前

tiny-diffusion, but Japanese! I wonder how logographic languages (Japanese, Chinese, etc) compare to phonetic/alphabetic languages in generation quality and speed with character-level tokenizers. The main difference is the semantic-value-per-token. Fewer tokens are needed to express an idea, which leads to fewer AR and diffusion steps. My main question is how would it affect the entropy in the output distributions. Lower entropy benefits parallel decoding. I could see arguments on both sides on how it affects it. One main benefit is that you have fewer opportunities to mangle words, leading to less obvious mistakes.
Nathan Barry54,128 次观看 • 5 个月前
没有更多内容可加载