Video wird geladen...

Video konnte nicht geladen werden

Zur Startseite

4 MAC MINIS RUNNING A 671B PARAMETER MODEL AS A CLUSTER No data center, no Cloud, no expensive hardware and not a single API call.. Just 4 Mac minis connected through EXO running DeepSeek v3.1 671b locally and actually fast. The part nobody talks about is that you don’t...

58,724 Aufrufe • vor 12 Tagen •via X (Twitter)

0 Kommentare

Keine Kommentare verfügbar

Kommentare vom Original-Post werden hier angezeigt

Ähnliche Videos

I'm running Llama 4 Maverick at 620 t/s! I'm living in the future! Honestly, a large language model running this fast is something straight out of a sci-fi movie. Speeds like this will enable a whole new world of applications that aren't possible today. For reference, GPT-4o, which is probably the most popular OpenAI model, runs between 60 and 110 t/s. The secret here: I'm not running AI at Meta's Llama 4 Maverick on a GPU. I'm using the SambaNova Cloud (my sponsor) and their custom SN40L chips. They are optimized from the ground up for running AI workflows. Right now, SambaNova Cloud runs DeepSeek, Qwen, Whisper, and the entire family of Llama models on these chips. You can check the speed of each of these models using SambaNova Cloud's Playground (see the attached video). It's completely free, and that's how I'm measuring their speeds. For example, I also tried DeepSeek R1 (the latest version from May) and, oh boy! DeepSeek R1 is a huge 671B parameter model. It's probably the best open reasoning model in the world, and it runs at 140 tokens per second! !!! Inference time on an SN40L is night and day from what you'll get from a GPU. Here is why this is big: If you are running an agentic workflow that uses multiple models simultaneously on a GPU, it will need to swap models in and out of memory (because not every model fits). A single SNL40 chip can simultaneously hold over 100 models (trillions of parameters) in memory. If you are using open models, try the SambaCloud API to see what lightning speed looks like. Here is how: 1. Create a free account at: 2. Check the QuickStart guide: If you try the playground, check the speed you're getting with Llama 4 and DeepSeek, and post the results below. I've seen much higher numbers than I posted here, so I'm curious to see whether geography affects the speed.

Santiago

34,148 Aufrufe • vor 1 Jahr

The doomsday scenario was never AGI. It was running out of human text to train on. Geoffrey Hinton just killed that fear in one paragraph. Hinton: “If you are worried by inconsistencies in what you believe, you don’t need any more external data. You just need the stuff you believe and discover that it’s inconsistent, and so now you revise beliefs, and that can make you a whole lot smarter.” The model no longer needs us to feed it anything. It reasons over its own beliefs, hunts its own contradictions, and rewrites its own flawed conclusions without a human ever touching it. It comes out the other side rebuilt. Hinton: “This would be a neural net that just takes the beliefs it has in language and does reasoning on them to derive new beliefs.” This is not a scaling update. This is the machine mining its own cognitive fuel from the inside out. Hinton: “I believe Gemini is already starting to work like this. We both strongly believe that that’s a way forward to get more data for language.” Then Hinton paused, took a partisan shot at political opponents for failing to detect their own inconsistencies, and the room laughed. Nobody noticed the knife they had just walked into. Because the machine Hinton described does one thing the humans in that room fundamentally cannot. When it detects an inconsistency, it corrects it. No defense. No performance. No tribal loyalty dressed up as principle. It just finds the flaw and overwrites it. A neural network detects a contradiction and rewires itself smarter. A human detects a political opponent and trades structural logic for a dopamine hit. Every person in that room is still paying the ideological alignment tax the machine just eliminated. We need superintelligence not only to solve hard problems. We need it because the biological hardware running civilization is still executing the same tribal firmware it shipped with ten thousand years ago. The data wall is gone. The machine is generating its own intelligence at a velocity no human bias can even locate. The most devastating moment in that conversation was not the technical revelation. It was the man who architected the machine proving, in real time, exactly why we need it.

Dustin

23,499 Aufrufe • vor 3 Monaten