
Igor Kotenkov
@stalkermustang • 2,940 subscribers
Shorts
Videos

While reading the DeepSeek v4 paper, I ended up writing down over 90 questions. A lot of the paper reviews out there skip over the details, which is usually where the actual learning happens. So, I decided to put together a proper guide: an Annotated Paper Walkthrough. The core idea is that you still read the original paper as your source material, but whenever things get dense or confusing, I hold your hand through it. You get detailed annotations with visualizations, code snippets, reference links, and—most importantly—the context you need so you don't feel lost. Today I'm releasing v1 with the first 50 notes. Some of the things I unpack: • Why swap Softmax and Sigmoid for Sqrt-Softplus in the MoE Router? • What on earth is a Birkhoff polytope? • Does attention process some tokens 3 times? • What are split-KV and split-K, and why did DeepSeek drop them? • Why use Reverse KL, and where does it even come from? ..and a lot more. Even the most demanding readers will find something new here. Open-source models are still heavily borrowing from DeepSeek v3, and there’s no doubt that v4 details will soon become standard topics in discussions and ML interviews. Hopefully, this guide helps you stay ahead of the curve. As a friend of mine joked, going through this will not only make you a better engineer, but a better man 😂 I can't prove that scientifically, but it's worth a shot. Check it out:
Igor Kotenkov38,684 görüntüleme • 26 gün önce
Daha fazla içerik yok.