Video wird geladen...

Video konnte nicht geladen werden

Beim Laden dieses Videos ist ein Problem aufgetreten. Dies könnte an einem vorübergehenden Netzwerkproblem liegen oder das Video ist möglicherweise nicht verfügbar.

watch a new 27 billion dense parameter model autonomously fail then fix itself on a single rtx 3090. 28 minutes of building, testing, debugging, and serving compressed to under three minutes. qwen 3.6 27b dense, q4_k_m quant, hermes agent as driving harness. one prompt in. mandelbrot fractal explorer with... zoom, pan, three palettes, smooth coloring, responsive layout out the other side. 10 verifiable tests, all 10 landed green on its own. what you watch in the video. the model writes mandelbrot.html. writes tests.js. starts the http server. opens the page in its own browser. runs the tests. finds the failures. traces the math. patches the code. reloads. re-runs. iterates again. and again. until every test passes. no human in the loop. local prototyping looks like this in 2026. one consumer card. one open source model. one harness.show more

Sudo su

30,600 subscribers

46,638 Aufrufe • vor 1 Monat •via X (Twitter)

Anya Rossi• Live Now

Private livecam show

0 Kommentare

Keine Kommentare verfügbar

Kommentare vom Original-Post werden hier angezeigt

Ähnliche Videos

first test results are in. qwen 3.6 27b dense just banged 10 out of 10 on a single rtx 3090 24gb tier at 40 tok/s. no quant tricks. no fused kernels. just q4_k_m straight cut on llama.cpp. i wrote a particle swarm benchmark this morning, fed it the prompt, and the model autonomously built a 500 particle boids flocking system. velocity driven hue, density based brightness, trail blend rendering, mouse attraction physics, click bursts, drag paint. then it used browser automation to test its own work, found the failing tests, iterated through the code, patched tests.js, and landed all 10 green on its own. i sat there hooked for 8 minutes playing. simple but mesmerizing. mouse trails build beautiful patterns, palette cycles with space, click sends particles flying, drag paints through the swarm. simplicity that hooks you. i'll open source this prompt and the build soon so anyone can reproduce it as their own benchmark. this is the first of 5 single file agent tests i wrote for this model. four more coming. octopus invaders flagship after as final. watch the full video below. see it autonomously build from one prompt. haven't slept well since this model dropped yesterday.

first test results are in. qwen 3.6 27b dense just banged 10 out of 10 on a single rtx 3090 24gb tier at 40 tok/s. no quant tricks. no fused kernels. just q4_k_m straight cut on llama.cpp. i wrote a particle swarm benchmark this morning, fed it the prompt, and the model autonomously built a 500 particle boids flocking system. velocity driven hue, density based brightness, trail blend rendering, mouse attraction physics, click bursts, drag paint. then it used browser automation to test its own work, found the failing tests, iterated through the code, patched tests.js, and landed all 10 green on its own. i sat there hooked for 8 minutes playing. simple but mesmerizing. mouse trails build beautiful patterns, palette cycles with space, click sends particles flying, drag paints through the swarm. simplicity that hooks you. i'll open source this prompt and the build soon so anyone can reproduce it as their own benchmark. this is the first of 5 single file agent tests i wrote for this model. four more coming. octopus invaders flagship after as final. watch the full video below. see it autonomously build from one prompt. haven't slept well since this model dropped yesterday.

Sudo su

132,426 Aufrufe • vor 2 Monaten

look anon, those of you who kept saying local AI is not there yet, who said open source can't compete, who said you need cloud APIs to get anything serious done, look at this gameplay for one minute. every pixel on this screen was written by one model, in one shot, on a single rtx 3090 with 24gb of vram. the model is qwen 3.6 27b dense q4. the harness is hermes agent. the hardware is a single consumer card you can buy used for 900 dollars. the prompt is open source on github. every claim verifiable, on your own desk. if your local AI take is from 2024, update it. the consumer tier is shipping work that was supposed to need 8 gpus and an api key. open source moved the floor while the rest of the field was busy explaining why it cannot. 24gb tier owners are eating ramen with half boiled egg and double chocolate.

look anon, those of you who kept saying local AI is not there yet, who said open source can't compete, who said you need cloud APIs to get anything serious done, look at this gameplay for one minute. every pixel on this screen was written by one model, in one shot, on a single rtx 3090 with 24gb of vram. the model is qwen 3.6 27b dense q4. the harness is hermes agent. the hardware is a single consumer card you can buy used for 900 dollars. the prompt is open source on github. every claim verifiable, on your own desk. if your local AI take is from 2024, update it. the consumer tier is shipping work that was supposed to need 8 gpus and an api key. open source moved the floor while the rest of the field was busy explaining why it cannot. 24gb tier owners are eating ramen with half boiled egg and double chocolate.

Sudo su

29,584 Aufrufe • vor 1 Monat

update: qwen 3.6 27b dense q4 just one shotted octopus invaders game on a single 3090. hermes agent drove the whole thing, ~41 tok/s gen 21gb vram at full 262k context, thinking mode on. one prompt in and the canonical multi-file space shooter benchmark out, the same exact prompt i ran on qwen 3.5 27b dense back in march on the same card. 3.5 needed one external scope bug fix before the game would even load on first play. 3.6 needed nothing. 11 of 11 files written, 2411 lines of code, zero steering interventions, zero external fixes, playable on first load. 16 minutes 41 seconds wall clock from prompt to playable. consumer tier king on a single 3090 is locked tonight, and the silicon underneath my desk did not change between march and now. the open source ecosystem just moved the floor. watch it ship itself, the full 16 minutes 41 seconds sped to 3 minutes 45, no human touched the keyboard between the first prompt and the final frame.

update: qwen 3.6 27b dense q4 just one shotted octopus invaders game on a single 3090. hermes agent drove the whole thing, ~41 tok/s gen 21gb vram at full 262k context, thinking mode on. one prompt in and the canonical multi-file space shooter benchmark out, the same exact prompt i ran on qwen 3.5 27b dense back in march on the same card. 3.5 needed one external scope bug fix before the game would even load on first play. 3.6 needed nothing. 11 of 11 files written, 2411 lines of code, zero steering interventions, zero external fixes, playable on first load. 16 minutes 41 seconds wall clock from prompt to playable. consumer tier king on a single 3090 is locked tonight, and the silicon underneath my desk did not change between march and now. the open source ecosystem just moved the floor. watch it ship itself, the full 16 minutes 41 seconds sped to 3 minutes 45, no human touched the keyboard between the first prompt and the final frame.

Sudo su

122,899 Aufrufe • vor 1 Monat

the tiebreaker is done. qwen 3.5 27B dense. single RTX 3090. one prompt. zero steering. zero human edits. 1,827 lines across 10 files. 13 minutes. full thinking mode. runs on first load. hermes 4.3 got the same prompt with 2x 3090s and 5x the context it needed. wrote 1,249 lines, left empty files, needed 3 interventions, game was broken on load. same architecture class. same quant. hermes got double the hardware. completely different result. dense wasn't the problem. hermes was. but here's what got me. this model thinks at 27 tok/s. every single token carries 27 billion parameters of reasoning. MoE hit 112 tok/s but only 3B active per token. the dense model is slower and it doesn't matter. watch 13 minutes of autonomous coding on a consumer GPU with zero intervention and tell me speed is what matters. a year ago this wasn't possible. now it runs on hardware you can buy used for $900. no API. no subscription. no cloud. just a 3090 doing what data centers did 18 months ago. full unedited session in the video. every token, every file, every thinking chain. 16 minutes. hit play.

the tiebreaker is done. qwen 3.5 27B dense. single RTX 3090. one prompt. zero steering. zero human edits. 1,827 lines across 10 files. 13 minutes. full thinking mode. runs on first load. hermes 4.3 got the same prompt with 2x 3090s and 5x the context it needed. wrote 1,249 lines, left empty files, needed 3 interventions, game was broken on load. same architecture class. same quant. hermes got double the hardware. completely different result. dense wasn't the problem. hermes was. but here's what got me. this model thinks at 27 tok/s. every single token carries 27 billion parameters of reasoning. MoE hit 112 tok/s but only 3B active per token. the dense model is slower and it doesn't matter. watch 13 minutes of autonomous coding on a consumer GPU with zero intervention and tell me speed is what matters. a year ago this wasn't possible. now it runs on hardware you can buy used for $900. no API. no subscription. no cloud. just a 3090 doing what data centers did 18 months ago. full unedited session in the video. every token, every file, every thinking chain. 16 minutes. hit play.

Sudo su

91,135 Aufrufe • vor 3 Monaten

small local model that falls apart in bloated agents like openclaw just runs like a wild horse in hermes agent. and that's not even my line, someone else called it that, i've just been quietly pointing people at this harness for months because it held up on everything i threw at it, 3b models all the way to one trillion params. watch this happen on my own machine. i pointed hermes agent at a local http endpoint, gemma 4 12b on my 3090 llama.cpp server, and it auto-detected the model and started working immediately. no config wrestling, no broken tool calls, no babysitting the output format, i typed in a url and it just went. the whole clip is exactly that, start to finish, no errors, no retries, butter smooth. and the tool calling, the one thing that quietly breaks most local setups, works here like it's nothing. it's not the model that's flaky, it's the harness around it. hermes agent is the first agent i've run that actually gets that right. one url, one local model on one card, and it runs like a wild horse.

small local model that falls apart in bloated agents like openclaw just runs like a wild horse in hermes agent. and that's not even my line, someone else called it that, i've just been quietly pointing people at this harness for months because it held up on everything i threw at it, 3b models all the way to one trillion params. watch this happen on my own machine. i pointed hermes agent at a local http endpoint, gemma 4 12b on my 3090 llama.cpp server, and it auto-detected the model and started working immediately. no config wrestling, no broken tool calls, no babysitting the output format, i typed in a url and it just went. the whole clip is exactly that, start to finish, no errors, no retries, butter smooth. and the tool calling, the one thing that quietly breaks most local setups, works here like it's nothing. it's not the model that's flaky, it's the harness around it. hermes agent is the first agent i've run that actually gets that right. one url, one local model on one card, and it runs like a wild horse.

Sudo su

27,339 Aufrufe • vor 18 Tagen

i watched gemma 4 12b build something genuinely impressive today, and then loop itself to death right in front of me. the full run is in the video, sped up but completely uncut, watch it to the end and you will catch the exact moment it stops building and starts looping right in the middle of the work. the task was clean, build a single file gravity simulator, n-body physics, orbits, collisions, running locally on one 3090 through an agent. and for ten minutes it was a joy to watch. it reached for a symplectic integrator on its own, the correct one, the kind that keeps orbits stable instead of spiralling out. real gravity with softening, proper orbital velocities, momentum conserved on collision. the physics was right. the thing actually worked. then on the very last step, writing a few tests to prove its own code, it fell into a loop. not a crash, a loop. it started repeating itself and would not stop. ten more minutes, thirty four thousand tokens into a single answer, the same fragments over and over, until i killed it myself. so it's not that gemma can't code. it did the hard part beautifully. it cannot finish. it cannot hold a long task together without unravelling, and finishing is the entire job in agentic work. here's the part that stings. i run this exact task, same harness, same card, on the chinese open models, qwen especially, and i never see this. they build it, they test it, they stop. every single time. google has the raw capability, you can see it sitting right there in the code, and then the model loops itself to death on a task a 27b from alibaba finishes clean. open weights, apache 2.0, so much to love on paper. i just need it to know when to stop talking.

i watched gemma 4 12b build something genuinely impressive today, and then loop itself to death right in front of me. the full run is in the video, sped up but completely uncut, watch it to the end and you will catch the exact moment it stops building and starts looping right in the middle of the work. the task was clean, build a single file gravity simulator, n-body physics, orbits, collisions, running locally on one 3090 through an agent. and for ten minutes it was a joy to watch. it reached for a symplectic integrator on its own, the correct one, the kind that keeps orbits stable instead of spiralling out. real gravity with softening, proper orbital velocities, momentum conserved on collision. the physics was right. the thing actually worked. then on the very last step, writing a few tests to prove its own code, it fell into a loop. not a crash, a loop. it started repeating itself and would not stop. ten more minutes, thirty four thousand tokens into a single answer, the same fragments over and over, until i killed it myself. so it's not that gemma can't code. it did the hard part beautifully. it cannot finish. it cannot hold a long task together without unravelling, and finishing is the entire job in agentic work. here's the part that stings. i run this exact task, same harness, same card, on the chinese open models, qwen especially, and i never see this. they build it, they test it, they stop. every single time. google has the raw capability, you can see it sitting right there in the code, and then the model loops itself to death on a task a 27b from alibaba finishes clean. open weights, apache 2.0, so much to love on paper. i just need it to know when to stop talking.

Sudo su

39,574 Aufrufe • vor 16 Tagen

claude code now writes its own test plan and runs it on a live iphone. i didn't write the tests. i didn't run the tests. i didn't even open the app. it caught the bug too.

claude code now writes its own test plan and runs it on a live iphone. i didn't write the tests. i didn't run the tests. i didn't even open the app. it caught the bug too.

Landseer Enga

30,714 Aufrufe • vor 1 Monat

hey here is the final result of octopus invaders on nvidia's flagship at full precision. nemotron super 120B on 2x H200 NVL. BF16 unquantized. 287GB of VRAM. hermes agent as the harness. 60 tok/s. first try it autonomously coded for 6 minutes straight. created 11 files. correct project structure. correct load order. started the server. i opened the browser and the result was a blank screen. i did not give up. second try i gave it a precise list of bugs and things to fix. it went back in for another 3 minutes. patched the code. served it again. still blank. so i did what any sane person would do. third try i just said the screen is blank, test it and fix it yourself. and this is where nemotron showed what it actually is. it became a debugger. you can see it in the video. realtime CSS test squares, red screen flashes, hermes agent browser tools, inspecting its own output. it built the parallax background with planets and comets. it rendered a rocket ship that tracks your mouse with fire and bullet physics. the aesthetic is real. but no enemies spawn. no collision. not playable. what surprised me is qwen 27B one shotted this exact game on a single RTX 3090 at Q4 quant. and here is nvidia's flagship at full precision on enterprise hardware needing 3 tries and still not getting there. that makes my hope high for the undisputed qwen 122B which is about to face the same test next. same hardware. same prompt and same harness. lets see if it one shots or not. full session in the video. no cuts. 5x speed.

hey here is the final result of octopus invaders on nvidia's flagship at full precision. nemotron super 120B on 2x H200 NVL. BF16 unquantized. 287GB of VRAM. hermes agent as the harness. 60 tok/s. first try it autonomously coded for 6 minutes straight. created 11 files. correct project structure. correct load order. started the server. i opened the browser and the result was a blank screen. i did not give up. second try i gave it a precise list of bugs and things to fix. it went back in for another 3 minutes. patched the code. served it again. still blank. so i did what any sane person would do. third try i just said the screen is blank, test it and fix it yourself. and this is where nemotron showed what it actually is. it became a debugger. you can see it in the video. realtime CSS test squares, red screen flashes, hermes agent browser tools, inspecting its own output. it built the parallax background with planets and comets. it rendered a rocket ship that tracks your mouse with fire and bullet physics. the aesthetic is real. but no enemies spawn. no collision. not playable. what surprised me is qwen 27B one shotted this exact game on a single RTX 3090 at Q4 quant. and here is nvidia's flagship at full precision on enterprise hardware needing 3 tries and still not getting there. that makes my hope high for the undisputed qwen 122B which is about to face the same test next. same hardware. same prompt and same harness. lets see if it one shots or not. full session in the video. no cuts. 5x speed.

Sudo su

10,958 Aufrufe • vor 2 Monaten

HERMES AGENT CROSSED 140,000 GITHUB STARS IN 3 MONTHS AND JUST BECAME THE MOST USED AGENT IN THE WORLD. Most AI agents forget everything between sessions. Hermes writes its own skills from experience. Next time it runs the skill, improves it, and gets faster. Independent benchmarks show agents with 20+ self-created skills complete similar tasks 40% faster than fresh instances. Qwen 3.6 where the 35B version outperforms last year's 120B models at one third the memory footprint. DGX Spark with 128GB unified memory running everything locally at $0 per month after hardware. The setup takes 30 minutes. LM Studio plus Qwen 3.6 27B for the model server. One install script for Hermes. One config connecting them. Set context window to 65,536 tokens or nothing works. After one month of daily use your skills directory has 20 to 50 learned workflows. Your Hermes is genuinely different from anyone else's.

HERMES AGENT CROSSED 140,000 GITHUB STARS IN 3 MONTHS AND JUST BECAME THE MOST USED AGENT IN THE WORLD. Most AI agents forget everything between sessions. Hermes writes its own skills from experience. Next time it runs the skill, improves it, and gets faster. Independent benchmarks show agents with 20+ self-created skills complete similar tasks 40% faster than fresh instances. Qwen 3.6 where the 35B version outperforms last year's 120B models at one third the memory footprint. DGX Spark with 128GB unified memory running everything locally at $0 per month after hardware. The setup takes 30 minutes. LM Studio plus Qwen 3.6 27B for the model server. One install script for Hermes. One config connecting them. Set context window to 65,536 tokens or nothing works. After one month of daily use your skills directory has 20 to 50 learned workflows. Your Hermes is genuinely different from anyone else's.

Lummox

69,423 Aufrufe • vor 19 Tagen

okay the fuss around hermes agent is not just air. this thing has substance. installed it on a single RTX 3090 running Qwen 3.5 27B base (Q4_K_M, 262K context, 29-35 tok/s). fully local. my machine my data. first thing i did was tell it to discover itself. find its own model weights, check its own GPU, read its own server flags, and write its own identity document. it did all of it autonomously. nvidia-smi, process grep, file writes. clean execution. the TUI is genuinely premium. dark theme, ASCII art, color coded tool calls with execution times, real time streaming. you actually enjoy watching it work. 29 tools. 80 skills (that's what it reports on boot). file ops, terminal, browser automation, code execution, cron scheduling, subagent delegation. and it has persistent memory across sessions. setup took 5 minutes. one curl install, setup wizard, point to localhost:8080/v1, done. dropping qwopus for this test btw. distilled models compress reasoning and lose precision on real coding tasks. base model only from here. more experiments coming. octopus invaders (the same game that broke qwopus) will be built using hermes agent next. comparing flow and results against claude code on the same model. if you want to run local AI agents on real hardware this one deserves a serious look.

okay the fuss around hermes agent is not just air. this thing has substance. installed it on a single RTX 3090 running Qwen 3.5 27B base (Q4_K_M, 262K context, 29-35 tok/s). fully local. my machine my data. first thing i did was tell it to discover itself. find its own model weights, check its own GPU, read its own server flags, and write its own identity document. it did all of it autonomously. nvidia-smi, process grep, file writes. clean execution. the TUI is genuinely premium. dark theme, ASCII art, color coded tool calls with execution times, real time streaming. you actually enjoy watching it work. 29 tools. 80 skills (that's what it reports on boot). file ops, terminal, browser automation, code execution, cron scheduling, subagent delegation. and it has persistent memory across sessions. setup took 5 minutes. one curl install, setup wizard, point to localhost:8080/v1, done. dropping qwopus for this test btw. distilled models compress reasoning and lose precision on real coding tasks. base model only from here. more experiments coming. octopus invaders (the same game that broke qwopus) will be built using hermes agent next. comparing flow and results against claude code on the same model. if you want to run local AI agents on real hardware this one deserves a serious look.

Sudo su

162,022 Aufrufe • vor 3 Monaten

no prompt engineering, no agentic harness, no tool calls. just me being lazy in llama.cpp's web ui and gemma 4 31b dense taking the task seriously. i typed "create gpu marketplace cards with hardware specs and prices per hour" and the model went and coded this ui, one shot, navy bg, glassmorphism cards, neon accent buttons, realistic pricing tiers per architecture. it even wrote a "why this looks premium" explanation under the code. for context this is a q4 quant of google's 31b dense thinking model, running on a rtx 5090 mobile 24gb in the rog scar 18 at around 15 tok/s sustained, same vram tier as a 3090 or 4090 desktop, so whatever you see here translates directly to your card at home. the whole interaction was me not trying and the model reasoning harder than the prompt deserved. that tells me more about where local ai is at in april 2026 than any leaderboard score. next test drops gemma 4 into hermes agent, autonomous tool calling, multi step reasoning, real agentic loop instead of a chat window. let's see what the same model does when it gets the right environment. more experiments coming anon. octopus invaders queued. same hardware, different tasks, all published here on x and all translatable to your 24gb card. for now the video below shows it coding live, gpu going brrr.

no prompt engineering, no agentic harness, no tool calls. just me being lazy in llama.cpp's web ui and gemma 4 31b dense taking the task seriously. i typed "create gpu marketplace cards with hardware specs and prices per hour" and the model went and coded this ui, one shot, navy bg, glassmorphism cards, neon accent buttons, realistic pricing tiers per architecture. it even wrote a "why this looks premium" explanation under the code. for context this is a q4 quant of google's 31b dense thinking model, running on a rtx 5090 mobile 24gb in the rog scar 18 at around 15 tok/s sustained, same vram tier as a 3090 or 4090 desktop, so whatever you see here translates directly to your card at home. the whole interaction was me not trying and the model reasoning harder than the prompt deserved. that tells me more about where local ai is at in april 2026 than any leaderboard score. next test drops gemma 4 into hermes agent, autonomous tool calling, multi step reasoning, real agentic loop instead of a chat window. let's see what the same model does when it gets the right environment. more experiments coming anon. octopus invaders queued. same hardware, different tasks, all published here on x and all translatable to your 24gb card. for now the video below shows it coding live, gpu going brrr.

Sudo su

28,939 Aufrufe • vor 2 Monaten

This is what it looks like to drive right next to a Model Y with literally no-one in the car. Pretty cool. This Model Y is driving itself from the end of the Giga Texas production line to its designated lane at the outbound lot. Note: This is not the quicksilver Model Y that drove itself completely autonomously to a customer in Austin today. This Ultra Red Model Y below will be shipped elsewhere.

This is what it looks like to drive right next to a Model Y with literally no-one in the car. Pretty cool. This Model Y is driving itself from the end of the Giga Texas production line to its designated lane at the outbound lot. Note: This is not the quicksilver Model Y that drove itself completely autonomously to a customer in Austin today. This Ultra Red Model Y below will be shipped elsewhere.

Sawyer Merritt

192,473 Aufrufe • vor 1 Jahr

Google CEO, Sundar Pichai: "If you don't learn how to orchestrate agents now, you'll spend 2027 catching up to people who started today" In 30 minutes, he explains why the best engineers are moving from writing code to running agents One agent researches One writes One tests One reviews One fixes The human becomes the operator, not the bottleneck Bookmark and watch the interview

Google CEO, Sundar Pichai: "If you don't learn how to orchestrate agents now, you'll spend 2027 catching up to people who started today" In 30 minutes, he explains why the best engineers are moving from writing code to running agents One agent researches One writes One tests One reviews One fixes The human becomes the operator, not the bottleneck Bookmark and watch the interview

rari

343,903 Aufrufe • vor 6 Tagen

Qwopus on a single RTX 3090. Claude Opus 4.6 reasoning distilled into Qwen 3.5 27B dense, running through Claude's own coding agent (claude code). 29-35 tok/s with thinking mode on. the jinja bug that kills thinking on base Qwen doesn't carry over. harness and model matched. the base model would pause mid task on Claude Code. just stop generating. that's why i ran it through OpenCode, which handles stalled states automatically. this distilled version doesn't stall. it waits for tool outputs, reads them, selfcorrects when something breaks, and keeps going. i gave it a benchmark analysis task. went 9 minutes autonomous. wrote a README nobody asked for. zero steering. video is 5x speed but fully uncut. if you have a 3090, you can run this right now. free. no API. no subscription. opus structured reasoning on localhost. octopus invaders is next. same prompt that base qwen passed in 13 minutes and hermes 4.3 failed on 2x the hardware. i want to see if the distillation changes the outcome or just the style. more data soon.

Qwopus on a single RTX 3090. Claude Opus 4.6 reasoning distilled into Qwen 3.5 27B dense, running through Claude's own coding agent (claude code). 29-35 tok/s with thinking mode on. the jinja bug that kills thinking on base Qwen doesn't carry over. harness and model matched. the base model would pause mid task on Claude Code. just stop generating. that's why i ran it through OpenCode, which handles stalled states automatically. this distilled version doesn't stall. it waits for tool outputs, reads them, selfcorrects when something breaks, and keeps going. i gave it a benchmark analysis task. went 9 minutes autonomous. wrote a README nobody asked for. zero steering. video is 5x speed but fully uncut. if you have a 3090, you can run this right now. free. no API. no subscription. opus structured reasoning on localhost. octopus invaders is next. same prompt that base qwen passed in 13 minutes and hermes 4.3 failed on 2x the hardware. i want to see if the distillation changes the outcome or just the style. more data soon.

Sudo su

295,349 Aufrufe • vor 3 Monaten

This Chinese developer linked two $2,999 NVIDIA DGX Sparks into one box and runs the full Qwen3-235B at home, after dropping his $1,999-a-month cloud bill to zero. He wired 2 small boxes into a single computer, split a giant 235-billion-parameter model in half between them, and serves it across his own network at about 10 tokens a second, with no internet, no cloud, right there on the desk. No data center, no thousand-dollar graphics cards, no monthly cloud bill. Just him, 2 gold boxes the size of a sandwich, one cable between them, and 1 power strip. And here is the whole payoff. He used to pay the cloud $1,999 a month for the same model, and the meter ticked on every request. Now he paid $5,998 once for 2 boxes, they covered their cost in 3 months, and after that he sends as many requests as he wants for free, only electricity. The two Sparks talk over one fast cable, each holds 128GB of memory, and together they carry the whole model, about 73GB loaded per box, with the chip inside pinned near the limit at 96%. Both boxes work as one and keep trading data over the cable, with no cloud in the loop and no single word leaking out. The ready model sits on one local address, and any app on his network calls it as easily as ChatGPT. And here is how he described, in plain words, what this pair of boxes does: "this is a pair of boxes that holds the huge Qwen3-235B model and serves it to one network. the model is split in half, and each box owns its half. parts: // Box 1 (holds the first half of the model and starts the answer fast, the first word appears in under a second) // Box 2 (holds the second half and writes out the rest, about 10 tokens a second) // Cable (connects the 2 boxes and moves data between them on every step, with no lag) // Address (one local address where any app sends its request, like to a cloud model) // Test (a script that runs big prompts through and measures speed and delays) // Monitor (checks temperature, power draw, and load on both boxes every 2 seconds). the model never goes to the cloud. he only steps in when a box runs hotter than 80 degrees or the cable between them starts dropping data." So the system knows exactly what it is, what it is for, and where its limits are. It knows it has to hold the whole huge model across 2 boxes on its own. It knows it has to answer every request locally, with no meter, no limits, and no internet. It knows the human is only needed when a box overheats or the link between them stalls. → The setup runs around the clock on 2 boxes, each pulling under 60 watts → However many requests he sends, the monthly bill is $0, only electricity → The first box starts the answer in under a second → The second writes text at about 10 tokens a second → One request at a time: 838 tokens in 85 seconds, first word in 0.8s → Two requests at once: 697 tokens in 108 seconds, first word in 0.7s → Both boxes sit at 96% load and warm up to 76-78 degrees And only when a chip in a box runs hotter than 80 degrees or the cable between the 2 Sparks drops data does the system call the owner. And when he himself is out on a run or in a coffee shop, he still reaches his own model at home from his phone: sends a big prompt to the local Qwen3-235B, gets the full answer back in under a minute and a half, with no token meter ticking and no limit to hit. Here is what the test shows on his screen during one of the night runs: "one request at a time: 838 tokens in 84.9 seconds, first word in 0.8s, then 0.1s per token." "two requests at once: 697 tokens in 107.6 seconds, first word in 0.7s, then 0.15s per token." "Box 1: chip at 96% load, 76 degrees, 56 watts, 73GB used in memory." "Box 2: chip at 96% load, 78 degrees, 56 watts, the Qwen3-235B model fully loaded." And while everyone around is paying for AI by the month and bumping into limits, his top-tier model just sits on the desk and works as much as he wants: his own little power plant instead of a forever meter. He has no server rack of his own and no cloud account behind it. Just 2 DGX Spark boxes on a desk, one model split in half between them, one local address, and a folder of prompts next to it. Out of everything I have seen this year, this is the cleanest way to stop paying for AI: $5,998 of hardware on the desk once, $0 a month to the cloud, unlimited forever, and between them 2 gold boxes, 1 cable, and the full Qwen3-235B answering at home with no internet.

This Chinese developer linked two $2,999 NVIDIA DGX Sparks into one box and runs the full Qwen3-235B at home, after dropping his $1,999-a-month cloud bill to zero. He wired 2 small boxes into a single computer, split a giant 235-billion-parameter model in half between them, and serves it across his own network at about 10 tokens a second, with no internet, no cloud, right there on the desk. No data center, no thousand-dollar graphics cards, no monthly cloud bill. Just him, 2 gold boxes the size of a sandwich, one cable between them, and 1 power strip. And here is the whole payoff. He used to pay the cloud $1,999 a month for the same model, and the meter ticked on every request. Now he paid $5,998 once for 2 boxes, they covered their cost in 3 months, and after that he sends as many requests as he wants for free, only electricity. The two Sparks talk over one fast cable, each holds 128GB of memory, and together they carry the whole model, about 73GB loaded per box, with the chip inside pinned near the limit at 96%. Both boxes work as one and keep trading data over the cable, with no cloud in the loop and no single word leaking out. The ready model sits on one local address, and any app on his network calls it as easily as ChatGPT. And here is how he described, in plain words, what this pair of boxes does: "this is a pair of boxes that holds the huge Qwen3-235B model and serves it to one network. the model is split in half, and each box owns its half. parts: // Box 1 (holds the first half of the model and starts the answer fast, the first word appears in under a second) // Box 2 (holds the second half and writes out the rest, about 10 tokens a second) // Cable (connects the 2 boxes and moves data between them on every step, with no lag) // Address (one local address where any app sends its request, like to a cloud model) // Test (a script that runs big prompts through and measures speed and delays) // Monitor (checks temperature, power draw, and load on both boxes every 2 seconds). the model never goes to the cloud. he only steps in when a box runs hotter than 80 degrees or the cable between them starts dropping data." So the system knows exactly what it is, what it is for, and where its limits are. It knows it has to hold the whole huge model across 2 boxes on its own. It knows it has to answer every request locally, with no meter, no limits, and no internet. It knows the human is only needed when a box overheats or the link between them stalls. → The setup runs around the clock on 2 boxes, each pulling under 60 watts → However many requests he sends, the monthly bill is $0, only electricity → The first box starts the answer in under a second → The second writes text at about 10 tokens a second → One request at a time: 838 tokens in 85 seconds, first word in 0.8s → Two requests at once: 697 tokens in 108 seconds, first word in 0.7s → Both boxes sit at 96% load and warm up to 76-78 degrees And only when a chip in a box runs hotter than 80 degrees or the cable between the 2 Sparks drops data does the system call the owner. And when he himself is out on a run or in a coffee shop, he still reaches his own model at home from his phone: sends a big prompt to the local Qwen3-235B, gets the full answer back in under a minute and a half, with no token meter ticking and no limit to hit. Here is what the test shows on his screen during one of the night runs: "one request at a time: 838 tokens in 84.9 seconds, first word in 0.8s, then 0.1s per token." "two requests at once: 697 tokens in 107.6 seconds, first word in 0.7s, then 0.15s per token." "Box 1: chip at 96% load, 76 degrees, 56 watts, 73GB used in memory." "Box 2: chip at 96% load, 78 degrees, 56 watts, the Qwen3-235B model fully loaded." And while everyone around is paying for AI by the month and bumping into limits, his top-tier model just sits on the desk and works as much as he wants: his own little power plant instead of a forever meter. He has no server rack of his own and no cloud account behind it. Just 2 DGX Spark boxes on a desk, one model split in half between them, one local address, and a folder of prompts next to it. Out of everything I have seen this year, this is the cleanest way to stop paying for AI: $5,998 of hardware on the desk once, $0 a month to the cloud, unlimited forever, and between them 2 gold boxes, 1 cable, and the full Qwen3-235B answering at home with no internet.

Blaze

93,219 Aufrufe • vor 24 Tagen

🚨 The Sims one shotted by GPT-5.6 Pro this is without codex or any coding harness , one shot entire game with logic in 48 minutes, all in one .html file. cc : Mirochill for the test roon when we can have big model taste and personality ? like after GPT 5 its so robotic

🚨 The Sims one shotted by GPT-5.6 Pro this is without codex or any coding harness , one shot entire game with logic in 48 minutes, all in one .html file. cc : Mirochill for the test roon when we can have big model taste and personality ? like after GPT 5 its so robotic

Chetaslua

143,906 Aufrufe • vor 4 Tagen

Google just made websites obsolete. Antigravity 2.0 builds full landing pages while you watch. Here’s how this actually works in real life. → You type one prompt → AI plans the entire app → It writes the code → Tests it in a browser → Fixes errors automatically → Deploys it live No devs. No templates. No monthly fees. Powered by Gemini 3 + parallel AI agents, so design, backend, and testing happen at the same time. This is how people are launching pages in under 5 minutes. Save this video, you’ll use it again. Want the SOP? DM me. 💬

Google just made websites obsolete. Antigravity 2.0 builds full landing pages while you watch. Here’s how this actually works in real life. → You type one prompt → AI plans the entire app → It writes the code → Tests it in a browser → Fixes errors automatically → Deploys it live No devs. No templates. No monthly fees. Powered by Gemini 3 + parallel AI agents, so design, backend, and testing happen at the same time. This is how people are launching pages in under 5 minutes. Save this video, you’ll use it again. Want the SOP? DM me. 💬

Julian Goldie SEO

12,967 Aufrufe • vor 5 Monaten

Your coding agent can run all night. It still can't tell if what it built actually works. Today we're open-sourcing the TestSprite CLl (Apache-2.0) A tool your agent calls on its own to test your app end-to-end like a real user, fix what broke, and re-check everything it ever got right. It's the same engine 100,000+ teams already use. We proved it in public, on a public leaderboard: Most correct app on the board:89% Built by the cheapest model in the field At half the cost of the priciest one You no longer need the biggest, most expensive model to ship software you can trust. Setup is 2 commands: npm install -g @testsprite/testsprite-cli testsprite init That's the last command you'll ever type - from there, your agent runs the tests itself.

Your coding agent can run all night. It still can't tell if what it built actually works. Today we're open-sourcing the TestSprite CLl (Apache-2.0) A tool your agent calls on its own to test your app end-to-end like a real user, fix what broke, and re-check everything it ever got right. It's the same engine 100,000+ teams already use. We proved it in public, on a public leaderboard: Most correct app on the board:89% Built by the cheapest model in the field At half the cost of the priciest one You no longer need the biggest, most expensive model to ship software you can trust. Setup is 2 commands: npm install -g @testsprite/testsprite-cli testsprite init That's the last command you'll ever type - from there, your agent runs the tests itself.

TestSprite

1,034,143 Aufrufe • vor 12 Tagen

Today, we’re excited to introduce Miso One, the most emotive voice model in the world. Miso One is an 8-billion-parameter text-to-speech model for highly expressive speech generation. It emotes like a human and responds faster than a human, with just 110 milliseconds of latency. We’ve open-sourced the model weights, with API access coming soon. Hear how Miso One sounds in the thread below.

Today, we’re excited to introduce Miso One, the most emotive voice model in the world. Miso One is an 8-billion-parameter text-to-speech model for highly expressive speech generation. It emotes like a human and responds faster than a human, with just 110 milliseconds of latency. We’ve open-sourced the model weights, with API access coming soon. Hear how Miso One sounds in the thread below.

Aoden Teo

5,085,945 Aufrufe • vor 20 Tagen

watch gemma 4 12b q8 dancing on a single rtx 3090 at 33 tokens a second average. google dropped this two days ago and it's the kind of thing that quietly moves the floor. a fully multimodal model, text image and audio in one net, 256k context, apache licensed, running entirely on one consumer gpu, no one metering your tokens. what you're watching is the whole loop live: the server streaming tokens top left, the gpu pegged bottom left, the answer landing on the right. all local, all mine. a year ago this needed someone else's datacenter. today it's a card you can buy. open source isn't catching up anymore, it's setting the pace. how fast does yours run?

watch gemma 4 12b q8 dancing on a single rtx 3090 at 33 tokens a second average. google dropped this two days ago and it's the kind of thing that quietly moves the floor. a fully multimodal model, text image and audio in one net, 256k context, apache licensed, running entirely on one consumer gpu, no one metering your tokens. what you're watching is the whole loop live: the server streaming tokens top left, the gpu pegged bottom left, the answer landing on the right. all local, all mine. a year ago this needed someone else's datacenter. today it's a card you can buy. open source isn't catching up anymore, it's setting the pace. how fast does yours run?

Sudo su

36,614 Aufrufe • vor 19 Tagen