Video wird geladen...

Video konnte nicht geladen werden

Beim Laden dieses Videos ist ein Problem aufgetreten. Dies könnte an einem vorübergehenden Netzwerkproblem liegen oder das Video ist möglicherweise nicht verfügbar.

4 MAC MINIS RUNNING A 671B PARAMETER MODEL AS A CLUSTER No data center, no Cloud, no expensive hardware and not a single API call.. Just 4 Mac minis connected through EXO running DeepSeek v3.1 671b locally and actually fast. The part nobody talks about is that you don’t... show more

slash1s

8,250 subscribers

58,724 Aufrufe • vor 12 Tagen •via X (Twitter)

Wissenschaft & Technologie

Anya Rossi• Live Now

Private livecam show

0 Kommentare

Keine Kommentare verfügbar

Kommentare vom Original-Post werden hier angezeigt

Ähnliche Videos

Speed running my home AI cluster running distributed inference across 2 MacBooks and 2 Mac Minis. exo intern displays a real-time network topology as devices discover each other over the local network. Code is open source 👇

Speed running my home AI cluster running distributed inference across 2 MacBooks and 2 Mac Minis. exo intern displays a real-time network topology as devices discover each other over the local network. Code is open source 👇

Alex Cheema

110,105 Aufrufe • vor 1 Jahr

the CEO of NVIDIA just said the PC you use today will be gone, replaced by an AI supercomputer that lives in your house Jensen Huang stood on stage holding the machine and said it becomes less like a computer and more like R2-D2, a thing in your home that just does work for you all day he put it bluntly, this is as big as the day the phone became the smartphone and the wild part is the hardware to do it already exists and ships this year here is what he actually unveiled: > a desktop that runs a one trillion parameter AI model locally, 768GB of memory, sitting by your desk > agents that run 24/7 with no meter, no cloud bill, no rental, doing work while you sleep > NVIDIA and Microsoft rebuilding the PC from the ground up for the first time in 40 years > a full lineup, from a $249 chip to enterprise monsters, and almost nobody knows which one they actually need the hype is going to push everyone toward the biggest most expensive machine i wrote the honest breakdown, every NVIDIA AI box, what each really does, the real math, and which one is actually yours the full guide is in the article below

the CEO of NVIDIA just said the PC you use today will be gone, replaced by an AI supercomputer that lives in your house Jensen Huang stood on stage holding the machine and said it becomes less like a computer and more like R2-D2, a thing in your home that just does work for you all day he put it bluntly, this is as big as the day the phone became the smartphone and the wild part is the hardware to do it already exists and ships this year here is what he actually unveiled: > a desktop that runs a one trillion parameter AI model locally, 768GB of memory, sitting by your desk > agents that run 24/7 with no meter, no cloud bill, no rental, doing work while you sleep > NVIDIA and Microsoft rebuilding the PC from the ground up for the first time in 40 years > a full lineup, from a $249 chip to enterprise monsters, and almost nobody knows which one they actually need the hype is going to push everyone toward the biggest most expensive machine i wrote the honest breakdown, every NVIDIA AI box, what each really does, the real math, and which one is actually yours the full guide is in the article below

John Doe

193,435 Aufrufe • vor 20 Tagen

Have you been mass buying Mac Minis for your new OpenClaw🦞 workflows? One man who skated to puck is Alex Cheema, founder of EXO Labs! He joins TWiST today to show @jason how his open source software helps users connect all of their Mac Minis. Check out how you can start running your own AI compute cluster!

Have you been mass buying Mac Minis for your new OpenClaw🦞 workflows? One man who skated to puck is Alex Cheema, founder of EXO Labs! He joins TWiST today to show @jason how his open source software helps users connect all of their Mac Minis. Check out how you can start running your own AI compute cluster!

This Week in Startups

680,613 Aufrufe • vor 4 Monaten

Google just shipped a free dictation app for Mac and iPhone called AI Edge Eloquent, and the model running it is Gemma 4 12B, entirely on your device. What it actually does: you speak, it transcribes, removes filler words, and polishes the result before dropping clean text into whatever app you are using. You can also give it voice commands to reformat, like "translate to Hindi" or "make this a bullet list." Audio and video file transcription is built in too. The interesting part is the privacy angle. Most AI writing tools send your audio to a server. This one never leaves your machine, which matters if you are dictating anything sensitive. The tradeoff is hardware: you need a Mac with at least 16GB of RAM to run a 12B parameter model comfortably.

Google just shipped a free dictation app for Mac and iPhone called AI Edge Eloquent, and the model running it is Gemma 4 12B, entirely on your device. What it actually does: you speak, it transcribes, removes filler words, and polishes the result before dropping clean text into whatever app you are using. You can also give it voice commands to reformat, like "translate to Hindi" or "make this a bullet list." Audio and video file transcription is built in too. The interesting part is the privacy angle. Most AI writing tools send your audio to a server. This one never leaves your machine, which matters if you are dictating anything sensitive. The tradeoff is hardware: you need a Mac with at least 16GB of RAM to run a 12B parameter model comfortably.

Aditya ⚡Rao

84,413 Aufrufe • vor 13 Tagen

Someone put a Claude Code sticker on a $599 Mac Mini and accidentally built the most productive setup in tech. Dedicated machine, zero distractions, Claude 4.7 Opus running 24/7 with one prompt on the screen. What do you want the computer to do? Fork a repo, write the code, create the file, find the article, handle whatever comes next, repeat. No laptop, no cloud subscription, no context switching, no "let me pull that up real quick" Just a box the size of a hardcover book sitting on a desk doing the work of a full time hire around the clock. People stopped using Mac Minis as computers, they're using them as Claude, $599 once and it never clocks out.

Someone put a Claude Code sticker on a $599 Mac Mini and accidentally built the most productive setup in tech. Dedicated machine, zero distractions, Claude 4.7 Opus running 24/7 with one prompt on the screen. What do you want the computer to do? Fork a repo, write the code, create the file, find the article, handle whatever comes next, repeat. No laptop, no cloud subscription, no context switching, no "let me pull that up real quick" Just a box the size of a hardcover book sitting on a desk doing the work of a full time hire around the clock. People stopped using Mac Minis as computers, they're using them as Claude, $599 once and it never clocks out.

Defileo🔮

70,289 Aufrufe • vor 2 Monaten

This is all you need for running Deepseek R1 671B at home. Her the model with Q4 running on a $2000 Local Machine - 3.5 to 4 tokens per second. 512GB DDR4 RAM Video from "Digital Spaceport" YT Channel (link in comment)

This is all you need for running Deepseek R1 671B at home. Her the model with Q4 running on a $2000 Local Machine - 3.5 to 4 tokens per second. 512GB DDR4 RAM Video from "Digital Spaceport" YT Channel (link in comment)

Rohan Paul

311,988 Aufrufe • vor 1 Jahr

THIS DEVELOPER JUST RAN A TRILLION PARAMETER MODEL ON 4 MAC STUDIOS - 10X FASTER AND 5X CHEAPER THAN CLOUD CODE 19:00 he says it out loud. "we just ran a trillion parameter model. 30 something tokens per second. wow." RDMA over Thunderbolt made the cluster 10x faster than before. tensor parallelism splits the model between machines in parallel instead of sequentially DeepSeek V3 full size without quantization. 183GB per machine. still 25 tokens per second. faster than most people can read Kimi K2 with trillion parameters and 256,000 context window. loaded, responded and adapted memory usage dynamically based on prompt size 4 machines consuming 66 watts total. cloud GPU for the same workload costs $1,900/month a cluster that used to cost millions now assembled from Mac Studios in a few hours

THIS DEVELOPER JUST RAN A TRILLION PARAMETER MODEL ON 4 MAC STUDIOS - 10X FASTER AND 5X CHEAPER THAN CLOUD CODE 19:00 he says it out loud. "we just ran a trillion parameter model. 30 something tokens per second. wow." RDMA over Thunderbolt made the cluster 10x faster than before. tensor parallelism splits the model between machines in parallel instead of sequentially DeepSeek V3 full size without quantization. 183GB per machine. still 25 tokens per second. faster than most people can read Kimi K2 with trillion parameters and 256,000 context window. loaded, responded and adapted memory usage dynamically based on prompt size 4 machines consuming 66 watts total. cloud GPU for the same workload costs $1,900/month a cluster that used to cost millions now assembled from Mac Studios in a few hours

Noisy

78,209 Aufrufe • vor 2 Tagen

THIS VETERAN DEVELOPER PUT 2 MAC MINIS ON HIS DESK FOR $1,198 AND TURNED HERMES INTO A LOCAL AI WORKSPACE THAT DOESN’T NEED A $210/MONTH AGENT STACK he is not flexing hardware. he is showing the part most people still miss: once Hermes runs locally, the laptop stops being a chat window and starts acting like a workspace with memory, tools and saved skills two Mac minis, one local setup, zero cloud agent dashboard. research tasks, summaries, client notes and workflow steps stay on his machine instead of disappearing every time a session ends most people keep paying for Claude, wrappers, API calls and automation tools just to repeat the same instructions every day. Hermes cuts the waste by saving the process once and pulling it back when the same job returns the math is ugly for cloud tools. $210/month turns into $2,520/year before you even count extra tokens. his version is hardware once, local workflows after, and no panic when rate limits hit the quiet winners will not be the people with the cleanest chatbot tab. it will be the people who own the machine, the memory and the workflow before everyone else realizes that is the product

THIS VETERAN DEVELOPER PUT 2 MAC MINIS ON HIS DESK FOR $1,198 AND TURNED HERMES INTO A LOCAL AI WORKSPACE THAT DOESN’T NEED A $210/MONTH AGENT STACK he is not flexing hardware. he is showing the part most people still miss: once Hermes runs locally, the laptop stops being a chat window and starts acting like a workspace with memory, tools and saved skills two Mac minis, one local setup, zero cloud agent dashboard. research tasks, summaries, client notes and workflow steps stay on his machine instead of disappearing every time a session ends most people keep paying for Claude, wrappers, API calls and automation tools just to repeat the same instructions every day. Hermes cuts the waste by saving the process once and pulling it back when the same job returns the math is ugly for cloud tools. $210/month turns into $2,520/year before you even count extra tokens. his version is hardware once, local workflows after, and no panic when rate limits hit the quiet winners will not be the people with the cleanest chatbot tab. it will be the people who own the machine, the memory and the workflow before everyone else realizes that is the product

Gipp 🦅

60,173 Aufrufe • vor 14 Tagen

NVIDIA just built the laptop chip people are already calling the MacBook killer it is called RTX Spark, a single superchip that runs heavy creative work, real gaming, and private on-device AI agents, all on one machine but before you ditch your MacBook, here is the honest part nobody in the hype is telling you: > on raw memory bandwidth, a maxed macbook is actually still ahead, not behind > the RTX Spark laptops are not out yet, they ship this fall, apple is on shelves today > where nvidia truly wins is cuda, real RTX gaming, and 3D, things apple has no answer for > one is the proven machine you can buy now, the other is the more exciting bet for later so no, the macbook is not dead, and no, this is not just hype, the truth sits in between i broke down the full thing, every spec, who wins where, and which one is actually yours the honest breakdown is in the article below

NVIDIA just built the laptop chip people are already calling the MacBook killer it is called RTX Spark, a single superchip that runs heavy creative work, real gaming, and private on-device AI agents, all on one machine but before you ditch your MacBook, here is the honest part nobody in the hype is telling you: > on raw memory bandwidth, a maxed macbook is actually still ahead, not behind > the RTX Spark laptops are not out yet, they ship this fall, apple is on shelves today > where nvidia truly wins is cuda, real RTX gaming, and 3D, things apple has no answer for > one is the proven machine you can buy now, the other is the more exciting bet for later so no, the macbook is not dead, and no, this is not just hype, the truth sits in between i broke down the full thing, every spec, who wins where, and which one is actually yours the honest breakdown is in the article below

John Doe

107,985 Aufrufe • vor 19 Tagen

I'm running Llama 4 Maverick at 620 t/s! I'm living in the future! Honestly, a large language model running this fast is something straight out of a sci-fi movie. Speeds like this will enable a whole new world of applications that aren't possible today. For reference, GPT-4o, which is probably the most popular OpenAI model, runs between 60 and 110 t/s. The secret here: I'm not running AI at Meta's Llama 4 Maverick on a GPU. I'm using the SambaNova Cloud (my sponsor) and their custom SN40L chips. They are optimized from the ground up for running AI workflows. Right now, SambaNova Cloud runs DeepSeek, Qwen, Whisper, and the entire family of Llama models on these chips. You can check the speed of each of these models using SambaNova Cloud's Playground (see the attached video). It's completely free, and that's how I'm measuring their speeds. For example, I also tried DeepSeek R1 (the latest version from May) and, oh boy! DeepSeek R1 is a huge 671B parameter model. It's probably the best open reasoning model in the world, and it runs at 140 tokens per second! !!! Inference time on an SN40L is night and day from what you'll get from a GPU. Here is why this is big: If you are running an agentic workflow that uses multiple models simultaneously on a GPU, it will need to swap models in and out of memory (because not every model fits). A single SNL40 chip can simultaneously hold over 100 models (trillions of parameters) in memory. If you are using open models, try the SambaCloud API to see what lightning speed looks like. Here is how: 1. Create a free account at: 2. Check the QuickStart guide: If you try the playground, check the speed you're getting with Llama 4 and DeepSeek, and post the results below. I've seen much higher numbers than I posted here, so I'm curious to see whether geography affects the speed.

I'm running Llama 4 Maverick at 620 t/s! I'm living in the future! Honestly, a large language model running this fast is something straight out of a sci-fi movie. Speeds like this will enable a whole new world of applications that aren't possible today. For reference, GPT-4o, which is probably the most popular OpenAI model, runs between 60 and 110 t/s. The secret here: I'm not running AI at Meta's Llama 4 Maverick on a GPU. I'm using the SambaNova Cloud (my sponsor) and their custom SN40L chips. They are optimized from the ground up for running AI workflows. Right now, SambaNova Cloud runs DeepSeek, Qwen, Whisper, and the entire family of Llama models on these chips. You can check the speed of each of these models using SambaNova Cloud's Playground (see the attached video). It's completely free, and that's how I'm measuring their speeds. For example, I also tried DeepSeek R1 (the latest version from May) and, oh boy! DeepSeek R1 is a huge 671B parameter model. It's probably the best open reasoning model in the world, and it runs at 140 tokens per second! !!! Inference time on an SN40L is night and day from what you'll get from a GPU. Here is why this is big: If you are running an agentic workflow that uses multiple models simultaneously on a GPU, it will need to swap models in and out of memory (because not every model fits). A single SNL40 chip can simultaneously hold over 100 models (trillions of parameters) in memory. If you are using open models, try the SambaCloud API to see what lightning speed looks like. Here is how: 1. Create a free account at: 2. Check the QuickStart guide: If you try the playground, check the speed you're getting with Llama 4 and DeepSeek, and post the results below. I've seen much higher numbers than I posted here, so I'm curious to see whether geography affects the speed.

Santiago

34,148 Aufrufe • vor 1 Jahr

Deepseek running locally and privately for autocompletion in VSCode! 🙌 In less than a minute, I'll show you how to download Deepseek-coder and set it as the autocompletion model in VSCode. You’ll need to use ollama to download the model and CodeGPT to select it as the autocompletion model. Enjoy the best models running locally with :)

Deepseek running locally and privately for autocompletion in VSCode! 🙌 In less than a minute, I'll show you how to download Deepseek-coder and set it as the autocompletion model in VSCode. You’ll need to use ollama to download the model and CodeGPT to select it as the autocompletion model. Enjoy the best models running locally with :)

Daniel San

990,786 Aufrufe • vor 1 Jahr

i built a full game on a single GPU with a 3B model and this is the worst local AI will ever be. this was supposed to be a benchmark test. load the model, measure tokens per second, write it up, move on. instead i spent 20 minutes playing Octopus Invaders because the game is genuinely fun and i couldn't stop. a model with 3B active parameters built this from a single prompt. it debugged its own collision system when bullets were phasing through enemies. read the error, found the fix, kept building. this is not a frontier API. this is a quantized open source model running on hardware you can buy used for $800-$1200. no cloud. no subscription. no API costs. just a mass produced consumer GPU doing things that would have been absurd 12 months ago. and here's the part that should keep you up at night: every month the models get smaller and smarter. the quants get tighter. the context windows get longer. the tooling gets cleaner. what 3B active parameters does today on 24gb, a 1B model will do on 8gb within a year. you are looking at the floor. not the ceiling.

i built a full game on a single GPU with a 3B model and this is the worst local AI will ever be. this was supposed to be a benchmark test. load the model, measure tokens per second, write it up, move on. instead i spent 20 minutes playing Octopus Invaders because the game is genuinely fun and i couldn't stop. a model with 3B active parameters built this from a single prompt. it debugged its own collision system when bullets were phasing through enemies. read the error, found the fix, kept building. this is not a frontier API. this is a quantized open source model running on hardware you can buy used for $800-$1200. no cloud. no subscription. no API costs. just a mass produced consumer GPU doing things that would have been absurd 12 months ago. and here's the part that should keep you up at night: every month the models get smaller and smarter. the quants get tighter. the context windows get longer. the tooling gets cleaner. what 3B active parameters does today on 24gb, a 1B model will do on 8gb within a year. you are looking at the floor. not the ceiling.

Sudo su

36,251 Aufrufe • vor 3 Monaten

YOUR OLD ANDROID PHONE IS A FREE AI VISION SERVER AND YOU DON’T KNOW IT one dev built an app that turns any android into a local openai-compatible LLM server with vision support, analyzes camera feeds entirely on-device, no cloud, no subscriptions, nothing leaves your network you already own the hardware, it’s just sitting in a drawer full breakdown below ↓

YOUR OLD ANDROID PHONE IS A FREE AI VISION SERVER AND YOU DON’T KNOW IT one dev built an app that turns any android into a local openai-compatible LLM server with vision support, analyzes camera feeds entirely on-device, no cloud, no subscriptions, nothing leaves your network you already own the hardware, it’s just sitting in a drawer full breakdown below ↓

leopardracer

21,511 Aufrufe • vor 4 Tagen

The doomsday scenario was never AGI. It was running out of human text to train on. Geoffrey Hinton just killed that fear in one paragraph. Hinton: “If you are worried by inconsistencies in what you believe, you don’t need any more external data. You just need the stuff you believe and discover that it’s inconsistent, and so now you revise beliefs, and that can make you a whole lot smarter.” The model no longer needs us to feed it anything. It reasons over its own beliefs, hunts its own contradictions, and rewrites its own flawed conclusions without a human ever touching it. It comes out the other side rebuilt. Hinton: “This would be a neural net that just takes the beliefs it has in language and does reasoning on them to derive new beliefs.” This is not a scaling update. This is the machine mining its own cognitive fuel from the inside out. Hinton: “I believe Gemini is already starting to work like this. We both strongly believe that that’s a way forward to get more data for language.” Then Hinton paused, took a partisan shot at political opponents for failing to detect their own inconsistencies, and the room laughed. Nobody noticed the knife they had just walked into. Because the machine Hinton described does one thing the humans in that room fundamentally cannot. When it detects an inconsistency, it corrects it. No defense. No performance. No tribal loyalty dressed up as principle. It just finds the flaw and overwrites it. A neural network detects a contradiction and rewires itself smarter. A human detects a political opponent and trades structural logic for a dopamine hit. Every person in that room is still paying the ideological alignment tax the machine just eliminated. We need superintelligence not only to solve hard problems. We need it because the biological hardware running civilization is still executing the same tribal firmware it shipped with ten thousand years ago. The data wall is gone. The machine is generating its own intelligence at a velocity no human bias can even locate. The most devastating moment in that conversation was not the technical revelation. It was the man who architected the machine proving, in real time, exactly why we need it.

The doomsday scenario was never AGI. It was running out of human text to train on. Geoffrey Hinton just killed that fear in one paragraph. Hinton: “If you are worried by inconsistencies in what you believe, you don’t need any more external data. You just need the stuff you believe and discover that it’s inconsistent, and so now you revise beliefs, and that can make you a whole lot smarter.” The model no longer needs us to feed it anything. It reasons over its own beliefs, hunts its own contradictions, and rewrites its own flawed conclusions without a human ever touching it. It comes out the other side rebuilt. Hinton: “This would be a neural net that just takes the beliefs it has in language and does reasoning on them to derive new beliefs.” This is not a scaling update. This is the machine mining its own cognitive fuel from the inside out. Hinton: “I believe Gemini is already starting to work like this. We both strongly believe that that’s a way forward to get more data for language.” Then Hinton paused, took a partisan shot at political opponents for failing to detect their own inconsistencies, and the room laughed. Nobody noticed the knife they had just walked into. Because the machine Hinton described does one thing the humans in that room fundamentally cannot. When it detects an inconsistency, it corrects it. No defense. No performance. No tribal loyalty dressed up as principle. It just finds the flaw and overwrites it. A neural network detects a contradiction and rewires itself smarter. A human detects a political opponent and trades structural logic for a dopamine hit. Every person in that room is still paying the ideological alignment tax the machine just eliminated. We need superintelligence not only to solve hard problems. We need it because the biological hardware running civilization is still executing the same tribal firmware it shipped with ten thousand years ago. The data wall is gone. The machine is generating its own intelligence at a velocity no human bias can even locate. The most devastating moment in that conversation was not the technical revelation. It was the man who architected the machine proving, in real time, exactly why we need it.

Dustin

23,499 Aufrufe • vor 3 Monaten

We deployed a fully private AI agent on NuNet in under 5 minutes 🚀 OpenClaw🦞 running Qwen through ollama , one of the hottest open source model families right now, entirely on decentralized compute. No cloud. No API keys. No data leaving the machine. This is what private AI looks like when you actually build it instead of just talking about it. Your model. Your hardware. Your rules. Full walkthrough showing exactly how it works: What should we deploy next?

We deployed a fully private AI agent on NuNet in under 5 minutes 🚀 OpenClaw🦞 running Qwen through ollama , one of the hottest open source model families right now, entirely on decentralized compute. No cloud. No API keys. No data leaving the machine. This is what private AI looks like when you actually build it instead of just talking about it. Your model. Your hardware. Your rules. Full walkthrough showing exactly how it works: What should we deploy next?

NuNet 🌐

87,520 Aufrufe • vor 2 Monaten

ANTHROPIC JUST GOT OUTFLANKED AND NOBODY IS TALKING ABOUT IT. DeepSeek V4. Ollama. OpenClaw. Hermes. Claude Code. Five tools. One stack. Completely free. The people who put this together this weekend will have a local AI setup more powerful than what most companies are paying $10,000 a month for in cloud inference. No API bills. No rate limits. No data leaving your machine. Ever. This is not a tutorial for developers. This is a setup any builder can run in under an hour. The gap between people who know this exists and people who do not just got very expensive. Very quietly. This weekend. Follow CyrilXBT for the exact setup, config files, and prompts to make this stack actually work.

ANTHROPIC JUST GOT OUTFLANKED AND NOBODY IS TALKING ABOUT IT. DeepSeek V4. Ollama. OpenClaw. Hermes. Claude Code. Five tools. One stack. Completely free. The people who put this together this weekend will have a local AI setup more powerful than what most companies are paying $10,000 a month for in cloud inference. No API bills. No rate limits. No data leaving your machine. Ever. This is not a tutorial for developers. This is a setup any builder can run in under an hour. The gap between people who know this exists and people who do not just got very expensive. Very quietly. This weekend. Follow CyrilXBT for the exact setup, config files, and prompts to make this stack actually work.

CyrilXBT

32,226 Aufrufe • vor 1 Monat

Today we're announcing that hybrid agentic inference is coming to Perplexity Computer. Computer can split tasks between a local model running on your machine and frontier models in the cloud. This keeps private data on your device and maximizes token efficiency. Coming soon.

Today we're announcing that hybrid agentic inference is coming to Perplexity Computer. Computer can split tasks between a local model running on your machine and frontier models in the cloud. This keeps private data on your device and maximizes token efficiency. Coming soon.

Perplexity

348,106 Aufrufe • vor 19 Tagen

This guy plugged a DGX Spark (the $3K Nvidia box) and a Mac Mini M4 together to run AI and what happened next surprised everyone > the Nvidia box handles the hard part - processing your prompt in milliseconds > Mac Mini M4 handles the fast part - generating the response at memory bandwidth speeds nothing else can match > together they hit 84 tokens per second on Llama - 6x faster than the Spark alone > running compute agents locally on this setup means your data never leaves your hardware > two boxes. two different architectures. one AI system that Deepseek runs at data center scale he ran it on his desk save this. the way we build local AI is about to change

This guy plugged a DGX Spark (the $3K Nvidia box) and a Mac Mini M4 together to run AI and what happened next surprised everyone > the Nvidia box handles the hard part - processing your prompt in milliseconds > Mac Mini M4 handles the fast part - generating the response at memory bandwidth speeds nothing else can match > together they hit 84 tokens per second on Llama - 6x faster than the Spark alone > running compute agents locally on this setup means your data never leaves your hardware > two boxes. two different architectures. one AI system that Deepseek runs at data center scale he ran it on his desk save this. the way we build local AI is about to change

Mr. Buzzoni

153,102 Aufrufe • vor 22 Tagen

No. You don't need a Mac Mini for OpenClaw. You can actually host everything you need on an old Android phone. And you'll have a setup which is: - Much faster - Way cheaper - With the same features Even a $25 phone can do the job.

No. You don't need a Mac Mini for OpenClaw. You can actually host everything you need on an old Android phone. And you'll have a setup which is: - Much faster - Way cheaper - With the same features Even a $25 phone can do the job.

Paul Couvert

74,020 Aufrufe • vor 3 Monaten

I thought I needed a powerful GPU to run serious AI locally. I was completely wrong. Late 2025, Ollama quietly launched something that changed everything. They built a cloud inference layer running on NVIDIA's Blackwell architecture, the most powerful GPU hardware available today. No GPU on your desk required. No expensive API subscription. No complex setup. Just one command and you're running Kimi K2.5, a 1 trillion parameter open-source model, completely free. I connected it to OpenClaw, which links my phone's messaging apps to the AI agent running at home. I sent a message from WhatsApp. My machine picked it up, ran the task, and sent me back the result. It felt like magic. It's just good engineering. Save this post, you'll want this when you're building your first workflow. Want the SOP? DM me.

I thought I needed a powerful GPU to run serious AI locally. I was completely wrong. Late 2025, Ollama quietly launched something that changed everything. They built a cloud inference layer running on NVIDIA's Blackwell architecture, the most powerful GPU hardware available today. No GPU on your desk required. No expensive API subscription. No complex setup. Just one command and you're running Kimi K2.5, a 1 trillion parameter open-source model, completely free. I connected it to OpenClaw, which links my phone's messaging apps to the AI agent running at home. I sent a message from WhatsApp. My machine picked it up, ran the task, and sent me back the result. It felt like magic. It's just good engineering. Save this post, you'll want this when you're building your first workflow. Want the SOP? DM me.

Julian Goldie SEO

11,967 Aufrufe • vor 3 Monaten