正在加载视频...

视频加载失败

.Beff (e/acc) explains how thermodynamic computing fits into the current computing stack: "We don't do GPUs. They're called TSUs, thermodynamic sampling units. We think people are gonna put them next to GPUs or whatever their favorite accelerator is." "You don't need to run the whole workload on the TSU,...

42,570 次观看 • 23 天前 •via X (Twitter)

0 条评论

暂无评论

原始帖子的评论将显示在这里

相关视频

Culture is genetic because behavior is genetic. This beaver never saw a dam in its life. No beavers or anything else ever taught it to build a dam. It wants to build a dam because it is a beaver. Many beavers together build a big dam. That is beaver culture. Humans are not different. Nothing is different. This is what life is. This is how life works. Your body is your mind. A caterpillar wants to build a chrysalis. A bee wants to build a hive. A lion wants to build a pride. You are not special. You are not above your nature. you are INSIDE of it. The thoughts that we think are genetic thoughts. The crimes we commit are genetic crimes. The art we create is genetic art. Just like this beaver, you can give the animal different sticks and it will build a different dam, but it will always build a dam. And you can give humans different "education," but the human will always use it to do what its genes tell it to do. This is the first big answer that you need. This is the biggest piece of the puzzle. This is how to understand people 90% of the way. You just... notice what they do, and get out of the way, and watch them do it. And if they need sticks, you give them sticks. And if you don't like what they do, you have to get away from them. You cannot train dam-building into them or out of them any more than you can with a beaver. A beaver wants to build a dam because it is a beaver. Whatever you see people build, that's what they wanted to build from the sticks they got in the river they were in. Stop pretending you can change it.

hoe_math = PsychoMath

1,189,157 次观看 • 9 个月前

Thermodynamic computing is here There is a new computing paradigm emerging from the noise, and its arrival may be as significant as the dawn of deep learning or the advent of cloud virtualization. A new company, Extropic, has just launched its first thermodynamic computer, a device they call a TSU, or Thermal Sampling Unit. While the web is already filling with deep technical dives, what’s more important for most of us is building a clear intuition for what this technology is, how it’s fundamentally different from anything that’s come before, and why it’s generating so much excitement. This isn’t just another chip; it’s a new way to think about computation itself. Seeing is Believing: Solving Puzzles in One Shot To understand what a TSU does, let’s look at two classic, notoriously difficult computer science problems: Sudoku and the Eight Queens problem. When you or I solve a Sudoku, we use a process of sequential logic, guess-and-check, and backtracking. We make an assumption, follow its logical conclusion, and if we hit a dead end, we erase and try again. A classical computer does the same, just much faster. A TSU, however, approaches this in a completely different way. Using a TSU simulator, one can “program” the problem by first clamping the known values—the clues already on the board. Then, you program in the constraints: no duplicate numbers in any row, column, or 3x3 square. With the problem thus defined, the TSU doesn’t “search” for a solution; it anneals one. In a single computational step, the solution simply emerges, backfilling all the empty squares correctly. The same principle applies to the Eight Queens problem, a challenge to place eight queens on a chessboard so that none can attack any other. This is a complex combinatorial problem with 92 distinct solutions. A classical computer would have to iteratively search for these. A TSU, by contrast, can be programmed with the constraints (the “anti-affinity” between queens on the same row, column, or diagonal) and then set to sample the “solution space.” In this context, a valid solution is one with a “problem energy” of zero. The TSU’s physical nature allows it to naturally find these zero-energy states. A simulation of this process shows the TSU discovering all 92 unique solutions, demonstrating its ability to not just find an answer, but to explore the entire landscape of all correct answers. This is a fundamentally new approach, one that bypasses the brute-force, iterative methods we’ve relied on for decades. The Physics of Computation: Using Noise, Not Fighting It This new power comes from a radical design philosophy. For the last 70 years, computing has been about one thing: order. We build chips that are deterministic, logical, and precise. The great enemy has always been noise, heat, and randomness. We spend billions on cooling and error correction to eliminate these very things. Quantum computing, in many ways, is the ultimate expression of this, requiring temperatures near absolute zero to eliminate all thermal noise and achieve quantum coherence. Thermodynamic computing is the polar opposite. It doesn’t fight the noise; it uses it. The TSU is built on the understanding that the natural, stochastic noise from “leaky” transistors—the very randomness we’ve tried to engineer out of existence—is itself a powerful computational resource. Think of it this way: a GPU, which is central to today’s AI, has to simulate noise. When a generative AI model creates a new image or sentence, it’s using complex algorithms to fake randomness. The TSU doesn’t need to fake it; it harnesses the actual physical randomness of thermodynamics. It is a piece of hardware that directly computes with probability. This makes it a hybrid, sitting somewhere between a purely analog computer (which might use light or sound waves to compute) and a digital GPU. It’s a physical device that leverages the laws of physics itself to find solutions, rather than just using logic gates to simulate them. From a Lost Hiker to a Million Bouncy Balls Perhaps the best way to build intuition is with a metaphor. Imagine that solving a complex optimization problem is like trying to find the lowest point of altitude in a 100-square-mile mountainous landscape. Classical computing, using an algorithm like gradient descent, is like being a single hiker dropped into this landscape at night. You have no map or satellite view. All you have is an altimeter and the sensation of the slope under your feet. You can only take one step at a time, always walking downhill, hoping you don’t get stuck in a small local valley when the true, lowest canyon is miles away. Thermodynamic computing is a completely different approach. It’s like having a million bouncy balls and a helicopter. You drop all million balls simultaneously across the entire 100-square-mile landscape. Then, you “turn on an earthquake,” shaking the entire system. The balls bounce and jostle, but as the shaking (the “annealing”) subsides, where do they all end up? They naturally settle into the lowest points. The balls that collect in the deepest valley represent the optimal solution. The TSU is, in essence, a physical device for dropping those million balls at once and letting the laws of thermodynamics find the lowest “energy” state for you, all at the same time. Beyond Puzzles: The Real-World Impact This is far more than just a clever way to solve brain teasers. This ability to instantly find the lowest energy state for a complex, constrained system has staggering real-world applications. One of the most immediate is protein folding. Companies like Google’s DeepMind have made incredible progress with AI like AlphaFold, which predicts protein structures. But this is still a predictive model trained on existing data. A TSU could potentially solve the folding problem directly, treating the protein as a system of atomic affinities and repulsions and finding its most stable, lowest-energy configuration almost instantaneously. This could revolutionize drug discovery and materials science. An even more profound possibility lies in nuclear fusion. One of the greatest engineering challenges in history is controlling the superheated plasma within a tokamak reactor. This requires shaping unimaginably complex magnetic containment fields in real-time to prevent the plasma from touching the reactor walls. This is a real-time optimization problem so complex it’s currently beyond our capabilities. A TSU, however, could be fast enough. Its ability to compute with electricity itself, rather than abstracting the problem through layers of software, might allow it to update the magnetic fields fast enough to stabilize the fusion reaction. One could even imagine a future where thermodynamic computing elements are built directly into the tokamak’s walls, allowing the reactor to physically and intelligently react to the plasma’s state in real time. A ‘GPT-2 Moment’ for a New Era It’s easy to become numb to hype, but what we are witnessing with the TSU feels different. This is what you might call a “GPT-2 moment.” For those who were there, GPT-2 was the first generative AI model that wasn’t just a toy; it was the first time you could play with it at home and see the spark of true generative intelligence. It was the precursor that pointed directly to the GPT-3 and ChatGPT revolution that has since changed the world. This TSU has that same feel. It’s the “SDK” for a new computing paradigm. This technology is as different from classical computing as quantum computing is, but with a critical difference: a team of 15 built this in two years, and it runs at room temperature on your desk. Quantum computing has seen decades of work and billions in funding, and it still hasn’t produced a commercially viable, scalable machine. The TSU is here now. Based on a two-decade-long career at the cutting edge of technology—from seeing the obvious future of virtualization in 2007 to an early conviction in deep learning and GPT—this has all the same hallmarks of a fundamental, world-changing shift. We are not just building faster calculators; we are learning to compute with the universe itself. Pay close attention to this. This is the next big thing.

David Shapiro (L/0)

83,649 次观看 • 7 个月前

Cathie Wood just flagged the sleeper trade inside the AI boom that most people are completely missing. Everyone has been chasing GPUs. Nvidia, the data center buildout, the chip arms race. That trade has been obvious for two years. But OpenAI's CFO Sarah Fryer said something quite different: people are going to be really shocked by how agentic AI activates CPUs. Right now, for every CPU in an AI workload, there are 4 to 5 GPUs. That's the current ratio. Wood thinks that ratio is going to 1 to 1. Think about what that means. AI inference at scale, agents running autonomously, pipelines executing tasks across systems. The compute mix shifts dramatically away from pure GPU dominance. CPUs become a first-class citizen in the AI stack. Cathie called it going "back to the future." Intel has taken off. Flex (formerly Flextronics) is booming. Stocks that were giants in the dot-com bubble are resurging because the underlying demand for their products is real again. The GPU trade made sense at the training stage. You need massive parallel compute to train frontier models. But agentic AI runs differently. Agents are constantly orchestrating, reasoning, calling APIs, executing workflows. That workload looks a lot more like traditional computing. And traditional computing runs on CPUs. If Cathie Wood is right about the ratio collapsing to 1:1, the CPU demand signal embedded in the AI buildout is orders of magnitude larger than the market is currently pricing.

Milk Road AI

234,596 次观看 • 18 天前

David Sacks Explains How AI Will Go 1,000,000x in Four Years "I would say the rate of progress is exponential right now on at least three key dimensions." 1) The models "So number one is the algorithms themselves. The models are improving at a rate of, I don't know, 3-4x a year." "They're not just getting faster and better, but qualitatively they're different." "Remember, we started with pure LLM chatbots." "Then we went to reasoning models." "We didn't even get to the agents part of it yet, but that's the next big leap after reasoning models." "We're just starting to scratch the surface there." 2) The chips "Then you've got the chips." "Depending on how you measure it, each generation of chips is probably 3-4x better than the last." "It's not just the individual chips that are getting better, they're figuring out how to network them together." "Like with NVL72, it's like a rack system to create much better performance at the datacenter level." 3) The compute "And that would be the third area where you're seeing basically exponential progress." "Just look at the number of GPUs that are being deployed in datacenters." "So when Elon first started training Grok, I think they had maybe 100K GPUs. Now they're up to 300K. They're on their way to a million. Same thing with OpenAI's data center, Stargate." "And within a couple years they'll be at, I don't know, 5M GPUs, 10M GPUs? How Sacks gets to 1,000,000x: "The algorithms, the chips, and the datacenters are all improving or scaling at a rate of 3-4x a year." "That's 10x every two years." "Where people don't understand exponential progress is that if you're getting better at 10x every two years, that doesn't mean you'll be at 20x in four years." "It means you'll be at a 100x." "So you multiply those things together: the algorithms, the chips, and the raw compute that's available." 100x models 🧠 x 100x chips 💾 x 100x compute ⚡️ = 1,000,000x AI 🤖 "You're talking about 1,000,000x increase." "Some of which will be captured in price reductions, some of it will be in the performance ceiling, and then some of it will just be in the overall amount of AI compute that's available to the economy." "But the impact of this thing is gonna be absolutely massive." "And I think people still don't even appreciate that fact because they don't understand exponential progress."

The All-In Podcast

486,049 次观看 • 1 年前

How can you solve complex tasks using a Large Language Model? Here is a 2-minute introduction to everything you need to know to 10x the quality of your results. Let's talk about three techniques, in order of complexity, starting with the easiest one: • In-Context Learning • Indexing + In-Context Learning • Fine-tuning In-Context Learning The team that trained GPT-3 found something they couldn't explain: You can condition a model using examples of how you want it to behave. I included an example prompt in the attached video. You can "teach" the model how you want it to interpret questions, select the correct answers, and format the results by giving a few examples. You can also give specific knowledge to the model that will be helpful when formulating answers. We call this approach "grounding the model." There's another example in the video. Indexing + In-Context Learning Unfortunately, there is a limit to how much data you can include in a prompt. We call this the "context size." One version of GPT-4 supports a context of approximately 6,000 words, while the other supports 25,000 words. Although this sounds like a lot, many applications need more than that. Imagine you wrote a book and want to build an application to answer any questions about your story. What happens if your book is longer than the context? That's where Indexing comes in. Using a model, you can turn every book passage into an embedding. These are vectors, numbers that "encode" the passage's text. You can then store these embeddings in a particular database that supports fast retrieval of these vectors. You can then turn any question into an embedding and search the database for the list of passages that are similar to that query. Instead of using the entire book to ask the model, you can now use the relevant passages as in-context information, effectively working around the context size limitation. Fine-tuning Fine-tuning can give you an extra boost to get reliable outputs from your LLM. It is, however, the most complex approach on the list. There are different approaches to fine-tuning a model with your data. A popular technique is to process your data with your LLM and use the outputs to train a new classifier that solves your specific task. Notice that here you aren't modifying the LLM. Instead, you are chaining it with your trained classifier. Another approach is to modify the parameters of the LLM using your data. Think of this as "rewiring" the model in a way that solves your particular task. The results and costs will vary depending on how many layers you want to fine-tune from the original model. Many companies think that fine-tuning is the solution to their problems. In my experience, many will benefit from exploring the other two approaches. I love explaining Machine Learning and Artificial Intelligence ideas. If you enjoy in-depth content like this, follow me Santiago so you don't miss what comes next.

Santiago

384,482 次观看 • 3 年前