Video wird geladen...

Video konnte nicht geladen werden

Zur Startseite

Update on our progress towards building the ultimate substrate for Generative AI

598,618 Aufrufe • vor 1 Jahr •via X (Twitter)

9 Kommentare

Profilbild von Extropic
Extropicvor 1 Jahr

Transcript: [00:00:07] Our last talk in this session will be from Guillaume Verdon. Thank you so much, Guillaume. [00:00:12] How's everyone doing? Good. Ready for some physics based AI. You heard about, a lot about quantum which used to be my world. But there's been a vibe shift towards a different kind of physics based computing. thermodynamic. So I'm gonna be talking about thermodynamic computing today and what we're building at Extropic. [00:00:34] Do you wanna, yeah, tweak it a bit? Oh yeah, cool. Yeah Trevor and I started you know, we brought AI to quantum computing, we basically, invented how to bring dif differentiable programming to quantum computers when we were kids. Heh. Hacking in Waterloo. Very good university. [00:00:56] Basically got absorbed by Alphabet and launched a product known today as TensorFlow [00:01:00] Quantum. And we figured out how to bring software to quantum computers. After shipping that at Google Quantum, I ended up starting a team working for Sergey Brin on working on quantum telecom quantum sensors and, quantum machine learning. [00:01:15] And Trevor ended up working at Google Quantum in hardware. Through our career in, quantum hardware, we realized noise was the bane of our existence. And so we decided to switch sides. You know, if you can't beat them, join them. So bring the noise to physics based computing. A lot of our team comes from big tech. [00:01:34] A lot of major players in quantum computing and, ML. Alright, so as we all know, generative AI is eating the world. Generative models are getting bigger and bigger. You don't see it super well here, but AI compute demand is scaling exponentially, much faster than Moore's law ever did even at, it's, peak velocity. [00:01:57] From a software and algorithms perspective, it seems [00:02:00] the more data you throw at a problem, the more parameters, the more flaps you throw at a model, the better compression code you learn and the better performance you get. So there's no real upper bound to how much appetite for compute there will be for AI. [00:02:13] It doesn't seem to be like it. And so what's the bottleneck to scaling where there's, a soft ceiling right now where energy efficiency is a big problem, right? Training these models is getting exponentially more costly. This is like CO2 equivalent. This is just GPT 3. It's kept going since then. [00:02:31] They're training GPT 5, probably with a nuclear reactor. People are, going to great lengths proposing building nuclear fusion to power supercomputers to build computers to train AI. So something has to change, right? That doesn't, scale. It's not practical. We can't just have nation states training AIs. [00:02:50] We want AIs to be everywhere. As Rodolfo mentioned earlier Moore's Law is coming to an end. Doesn't matter what metric you look at for digital [00:03:00] computers. But why, is that? This is a plot from, Michael Frank who, you're familiar with. Michael Frank predicts as well like mid 2030s, early 2030s, digital computing is coming to an end. [00:03:14] Basically transistors, become stochastic. When you go too low power you make them too small, right? So they fire sometimes. So your computer becomes probabilistic, which is a very big problem if you have a digital deterministic computer. So you need a fundamentally new architecture, a new way to embed algorithms into the physics of the hardware. [00:03:34] You don't have a choice but to go stochastic. And so we call it Moore's Wall. And we think we're going to start really feeling it by the end of the decade. And where does Moore's wall come from? Again, it's because transistors are made of matter and matter is jiggling about, right? And tr electrons have to hop through the transistor and it's kind of this, this shaky bridge, right? [00:03:54] And if the bridge is too small or too shaky, then your electrons don't make it through, right? [00:04:00] But turns out that you can use that sort of wibbly wobbly physics to your advantage. And that's what we're doing. I will mention here as we can see here, we're about, yeah a few hundred times above Landauer's limit, which is the bottom line there. [00:04:18] So we're, still very far from Landauer's limit in terms of energy efficiency. And we have a proof of existence of a dissipative computer. Our thesis is that the brain is more or less a thermodynamic computer. And the brain has a hundred trillion parameters running on 20 watts. So a hundred million times more energy efficient and denser than, GPU clouds. [00:04:39] So it's not totally unfounded for there to be a huge physics based overhang for, better AI compute, right? Like the fact that GPUs are good at AI was a stroke of luck to some extent. And so we want to take inspiration from nature. We want to take inspiration from biology. There's been a lot of efforts in neuromorphic computing.[00:05:00] [00:05:00] But neuromorphic computing is kind of like being obsessed with biomimicry for its own sake. It's like building an airplane that flaps its wings. It doesn't make any sense, right? What do you want to do is you want to understand what physical principle nature has figured out how to harness, right? So you want to understand the principle of lift, and then you make an artificial device that achieves flight like the Wright brothers. [00:05:20] And so that's what we're doing with a different kind of physics than what you've heard about today. We've heard a lot about quantum, which is the physics of the very small or the very cold over short timescales, so things are in superpositions. The, physics of the very big or, high power is deterministic, so you, don't have any sort of uncertainty about the state of the device. [00:05:42] But what we're building is algorithms and hardware operating in the thermodynamic limit. So So the, thermodynamic regime, the state of your computer is uncertain. You don't know what every degree of freedom state is. And everything's jiggling about. Everything's out of equilibrium. [00:05:59] So out of [00:06:00] equilibrium thermodynamics, more specifically, is what we're leveraging. So we want to build a physics based computer that operates in the same mesoscale regime as the brain and take inspiration from nature and use the physics that nature has figured out how to hack to do probabilistic generative AI directly in the physics of hardware. [00:06:21] And so, if, you look so, from the hardware perspective, it's clear that we have to go stochastic, we got to go probabilistic. From the algorithmic standpoint, algorithms, ultimately all gener generative AI is representing probability distributions. So you want to have more probabilistic algorithms. [00:06:40] Unfortunately, if you're going to represent a probability distribution on a deterministic computer, you can only store vectors and matrices. And so you know, if you're going to represent, uh, a probability distribution on a deterministic computer, you can only store vectors and matrices. So what do we do? [00:06:48] You represent things with Gaussians and, functional transformations that we call Deep Neural Networks. So you usually, most Deep Neural Networks, they if, they all convert some [00:07:00] form of Gaussian distribution into or from your, data, right? You're, morphing one blob into the shape of your data distribution. [00:07:10] And that's very limited. It's really hard to capture the tails of distributions. And in a sense your, representation has to grow bigger and bigger to capture more complexity of distribution. If you can. store a probability distribution in your computer and morph it into the shape of your data, you can store high complexity probability distributions on your computer. [00:07:32] This is what we learned with quantum computing, the fact that you maintain quantum coherence, you can have buildup of quantum complexity over time. That's what the quantum supremacy experiment when we were at Google demonstrated is that if you keep compounding complex operations, you end up with a distribution of extremely high complexity. [00:07:51] And so, algorithms want to capture the tail events, right? Like a hurricane, you don't want to just have like, median weather predictions. We have [00:08:00] some folks doing weather prediction in the audience. And so, if you want to go probabilistic, one option is variational approximations, so deep learn representations, those are very limited, like we said, and they're not as parameter efficient, cause like I said, as the complexity grows, you need more and more parameters. [00:08:16] If you want to do probabilistic algorithms on a digital computer, You're kinda screwed. I mean a lot of people on Wall Street do these algorithms, they're called Monte Carlo algorithms. They're very slow. Cause you gotta use a pseudo random number generator and accept or reject proposals and you're really trying to have a digital computer, LARP as a probabilistic system. [00:08:38] And it, it's extremely slow. Cause these operations, you know, you gotta accept or reject every clock cycle. So in a sense you're injecting entropy and then you're filtering, it out every time. So you're, kind of like heating up and cooling down your fridge every cycle. It's very energy efficient and it's very slow. [00:08:54] So there needs to be a new type of hardware, and that's what we're building in Extropic. [00:09:00] So we're building a full stack for what we call thermodynamic AI. And, uh, it starts with the hardware which we're gonna focus on in part today. So the hardware does probabilistic inference and learning by heat dissipation. [00:09:18] And we have algorithms and, middleware, uh, and software there. On how to map probabilistic ML algorithms onto this hardware, and ultimately we want to connect with modern deep learning. So we're gonna touch upon all of these, and I'm gonna be showing some numbers and a bunch of stuff for the first time here today. [00:09:38] So yeah, it's gonna be a treat. Woo! Okay, so, how, how does a thermodynamic computer work? Well, right now what I'm gonna talk about is a passive thermodynamic computer which is essentially a computer, well, let let's bridge the, space of algorithms real quick. There's a type of model, [00:10:00] there's a type of way to parametrize probability distributions where you parametrize an energy function, right? [00:10:05] You could think of like having a bowl, right? And then if I let a bunch of bouncy balls that I, threw in, into this bowl, there you, don't see the red dots very well, but there's samples there. They're to aggregate according to a certain distribution. At the bottom of this bowl. But they're, gonna have a certain spread. [00:10:24] Right, so if I have a very simple bowl, I get a very simple Gaussian. Gaussians are not that interesting, they're not super general, they're trivial, and classical computers can represent them very well. So what you want is actually a full functional approximator of energies. Right, like a neural network. [00:10:41] So you want a neural energy based model, right? You want a neurally parameterized energy function, and then when you have an equilibrium, Right, what we know from physics is that if you let a system equilibrate and it has a certain energy function, you get what is called a Boltzmann distribution, which is this exponential distribution there that is normalized, right?[00:11:00] [00:11:00] So you, what we're doing is programmable bowls that have parameters of how you shape the bowl, and we're letting electrons be our bouncy balls, hop around this high dimensional landscape, and then we get samples. from the computer. It's, pretty simple, but essentially you can have parameterized probability distributions that you let nature compute for you, very energy efficiently, right? [00:11:25] And so that's more or less how our hardware works, right? So here's an analogy, what would that look like for a mechanical system? You know, if you have a bunch of masses and springs, right? Springs and masses are parameters. You could tune those parameters, and you could put them in a, box of bouncy balls undergoing Brownian motion, and eventually, the system will equilibrate, and you could observe it, and at equilibrium, you're gonna get samples from this exponential distribution, right? [00:11:51] And you could break down a lot of probabilistic inference, or machine learning algorithms, and generative AI, as we'll see into these building blocks. [00:12:00] Actually, you could do everything with this. So what does it look like at the firmware layer? Well slightly above just the pure hardware, we want to train the system, right? [00:12:11] So we're and, everything I'm talking about has a crap ton of patents, so if anybody's tempted bring it on. Uh, but, uh, yeah so, basically you could tune the parameters according to a gradient based algorithm and you could train the system to learn any distribution you want. Here's a basic training of a distribution here. [00:12:30] That's basically how it works at the firmware layer. Essentially there's some people that claim to be thermodynamic computers, but all only offer passive thermodynamic computers for, uh, Gaussians. We think that, uh, that's not universal, right? You can't achieve universality, you can't represent arbitrary probability distributions, and it's not that interesting uh, from a a physics based computing standpoint. [00:12:54] it's much better to map much more of the algorithm. into the physics and so what we do is basically [00:13:00] full neural EBMs, uh, in hardware and we can learn general distributions. What can you do with EBMs? Well actually you're probably familiar with DALLE or Mid Journey or ChatGPT video or whatever. [00:13:13] They're all diffusion models. So diffusion models, what they do is they take data and they noise them progressively over maybe a thousand steps and then they use learned Gaussians, uh, to to go backwards to learn a variational denoising map, right? And you have thousands and thousands of steps. That's what takes so long for your diffusion model to run. [00:13:29] If you ever use mid journey, it takes minutes, right? So if you shortcut, if you smash down these Gaussians and these trajectories into more expressive models like EBMs, you can get away with a hundred times less steps. So there's a paper, uh, denoising recovery likelihood with EBMs. Uh, it's paper by Google, couple years ago, uh, but you could do basically diffusion models with a chain of denoising EBMs, right? [00:13:53] So, this machine can do basically everything that everything like diffusion models can [00:14:00] do but better. So that's pretty great, that's a lot of applications. The next slide is very fresh just specially cooked for today. It's very preliminary, so yeah you know, wait for the paper. [00:14:16] So yeah, disclaimer, wait for the paper. But, we thought, okay, can you, can you do a whole transformer as physics? Can you crack that? And we've invented that. We figured out how to do that, how to train and, and do inference on transformers. And turns out if you're gonna build it in a substrate that I'm gonna show, this is for superconductors. [00:14:35] Uh, you can make servers that are more or less parallel. You can achieve like ridiculous amount of tokens per watt second, right? Tokens per joule, right? So tokens per second divided by watts. It's about it's, 100 million times. Uh, more energy efficient, basically. Right, which again, ballpark of the brain. [00:14:58] Again, this is not a machine we've built, it's in [00:15:00] simulations so it's a projection. I'll show you in a second the machine we've actually built but this is extrapolated from the data we got there and are as accurate as we could simulations. In terms of speed, it's about a thousand fast a thousand to ten thousand times faster than GPU for inference of deterministic neural networks. [00:15:19] So for sampling of Monte Carlo, it's about a million to ten million times faster if the problem is native. Cause again, basically, you're replacing Monte Carlo steps that you would code at a, you know, a thousand hertz with electron fluctuations that happen at 80 gigahertz. So this is our superconducting lab. [00:15:40] This is as macroscopic as you can make a thermodynamic computer. Anybody? claiming they can do a breadboard in their garage that's a thermionic computer, injected synthetic noise, and it's a LARP. So we had to go to great lengths to make macros as macroscopic of a prototype as we could. This chip is about as big as your thumbnail, but the, features are visible.[00:16:00] [00:16:00] And we have a whole cryogenic lab to fabricate and test these devices. These use superconducting Josephson junctions to create, again general energy functions over the continuum. And this is, there's a whole movie online, you can check it out. We showed our fab and lab and how we've manufactured these devices. [00:16:23] But essentially were you to scale a massive superconducting supercomputer you can have the most energy efficient transformer you can build. And so that's probably for us this was a stepping stone. This was like our breadboard prototype, so this fab is in Sherbrooke, Canada. On the left is our chip. [00:16:44] So, you know, we have 25 patents and counting on this, on this paradigm, so come talk to us if you're interested in scaling this. So, that's our first thermodynamic computer. This is a programmable energy based model device. We're actually, I'm announcing today that we're going to be open sourcing [00:17:00] the blueprint for the software and hardware, so how we did How we can do all sorts of algorithms and how we can map algorithms onto superconducting by end of year 2024. [00:17:11] So check out the white paper when it comes out. Stay tuned on Twitter and socials and so on. So why are we burning our boats? Well the timelines are accelerating, and essentially, we cracked how to do, thermodynamic computing in silicon which is obviously important for large scale manufacturability. [00:17:32] In terms of energy efficiency and speed you can achieve if you have a hardware native energy based model, it's pretty significant so those are our orders of magnitude. Uh, you know our, goal is to pack more intelligence per watt and per nanosecond than any other substrate. And so we're partnering with customers where that matters a lot and they're ready to put some dollar signs behind that. [00:17:56] So those are simulations we're taping out [00:18:00] by end of year and testing begins early 2025. So we're looking for partners who want to try the very first chips and partner with us on the software and hardware. But yeah, it's us, we're Extropic, we're the real pioneers of ceramic computing, and I'll be taking questions during the break. [00:18:16] Thanks so much, everyone.

Profilbild von Sterling Crispin 🕊️
Sterling Crispin 🕊️vor 1 Jahr

lol insanely based good job @BasedBeffJezos 🫡 can’t think of another tech startup I want to succeed more than extropic This is the kind of stuff Kurzweil promised that got me into tech 15 years ago I’m so stoked to see the progress

Profilbild von RiverSonic Solutions
RiverSonic Solutionsvor 1 Jahr

This reminds me of my physics prof at Oxford, Neville Robinson, who would solve impossible differential equations by wiring transistors together in strange ways and using their non-linearity and noise to get the answer on an oscilloscope. No clock, no registers... Well done!

Profilbild von Y
Yvor 1 Jahr

You pivoted from thermodynamic to ceramic?

Profilbild von Peter Ciaccia — ggt/acc
Peter Ciaccia — ggt/accvor 1 Jahr

new branch of the tech tree cost next level science XP to unlock

Profilbild von ra 0of
ra 0ofvor 1 Jahr

why are comments like pumping a meme coin?

Profilbild von Biotech Austin
Biotech Austinvor 1 Jahr

You guys did a lot more bad ass stuff than I thought

Profilbild von dhanush
dhanushvor 1 Jahr

Very bullish on extropic :)

Profilbild von Gok
Gokvor 1 Jahr

seems pretty meaningless with no measure of width or output quality

Ähnliche Videos