Загрузка видео...

Не удалось загрузить видео

На главную

Issue 23 of Works in Progress should arrive with subscribers later this week (or early next). In it, we tackle the big questions: - How does a woman's fertility really decline (and why are we so wrong about it)? - How did Australia actually stop the boats? - How...

30,541 просмотров • 2 месяцев назад •via X (Twitter)

Комментарии: 0

Нет доступных комментариев

Здесь появятся комментарии из оригинального поста

Похожие видео

Chamath: Two terms you need to pay attention to in AI are Prefill and Decode “There's two terms that I think you're going to hear a ton about over these next few years.” “The first term is prefill, and the next is decode.” “What prefill and decode are, are two very distinct ways of how models think, and how a model goes through the process of answering a question that you ask it.” “And so when you send a prompt to AI, what happens is that the model processes it. This is called the reading phase or prefill.” “It reads your entire prompt all at once. And then it does a bunch of math, calculates all these relationships between all the words, and it stores them in temporary memory.” “The problem is that this is really compute bound. So it requires massive brute force. And Nvidia GPUs crush here.” “And their architecture is designed for massive parallel processing, which makes them really amazing at digesting these long prompts.” “So the problem just gets bigger and bigger, Nvidia just completely dominates.” “But the next phase though, this critical phase, the decode phase, is the writing phase, right?” “So the model starts to generate a response, you ask it a question and its response, one token at a time.” “And then to pick the next token to pick the next word, it has to look back at everything it has said already so that it doesn't hallucinate.” “The problem is that this is incredibly memory bandwidth constrained.” “And in our architecture, a long time ago, we made these design decisions from day one.” “And so what we did was we took a very different architectural approach, we took a very conservative process technology. We weren't pushing the boundaries of physics.” “And we used a lot of what's called SRAM. So memory on the chip so that we could do this decode thing as well or better than everybody else.” “And so now when you put these two things together, I just think it's going to create a huge acceleration in the ability for this entire infrastructure layer to get much cheaper and much more valuable, which I suspect then it'll have a lot more developer pull, you'll get a lot more applications being built, billions and billions of more people using it.”

The All-In Podcast

563,785 просмотров • 5 месяцев назад