atomic.chat's banner

atomic.chat

@atomic_chat_hq • 11,982 subscribers

Local AI chat and Inference Engine. Enhanced by TurboQuant. Team: @gladkos @skinbagwbones @AlexFromAtomic @danyurkin

Shorts

Fable 5 totally crushed our new contest, but it cost 6x more than Opus 4.8! We gave 4 models the same prompt: build three self-contained HTML5 canvas scenes with real physics demos Prompts: — A train derailing off a broken bridge into the water — Two cars jumping off ramps and colliding mid-air over a canyon — A monster truck crushing a row of parked cars Outputs: Fable 5: 62,158 tokens, $3.12 GPT 5.5: 37,753 tokens, $1.14 Opus 4.8: 22,280 tokens, $0.56 GLM 5.2: 36,246 tokens, $0.08 Fable 5 did all three scenes at A+. The crashes looked real, things fell and broke the right way, and nothing went through the ground or floated. GPT 5.5 was the closest to Fable. In the Bigfoot show, we think GPT was even a little better. GLM 5.2 did not win any scene, but it was the cheapest by far. Fable is the best pick for quality, but you pay more for it.

Fable 5 totally crushed our new contest, but it cost 6x more than Opus 4.8! We gave 4 models the same prompt: build three self-contained HTML5 canvas scenes with real physics demos Prompts: — A train derailing off a broken bridge into the water — Two cars jumping off ramps and colliding mid-air over a canyon — A monster truck crushing a row of parked cars Outputs: Fable 5: 62,158 tokens, $3.12 GPT 5.5: 37,753 tokens, $1.14 Opus 4.8: 22,280 tokens, $0.56 GLM 5.2: 36,246 tokens, $0.08 Fable 5 did all three scenes at A+. The crashes looked real, things fell and broke the right way, and nothing went through the ground or floated. GPT 5.5 was the closest to Fable. In the Bigfoot show, we think GPT was even a little better. GLM 5.2 did not win any scene, but it was the cheapest by far. Fable is the best pick for quality, but you pay more for it.

2,833,992 просмотров

New Claude Sonnet 5 performs at GPT 5.5 level 6x cheaper! We gave 4 models the same prompt: build three self-contained HTML5 canvas scenes with real physics crash demos Prompts: - A car crashes into a brick wall - A wrecking ball destroys a house - A catapult throws a rock at a castle wall Outputs: Sonnet 5: 15,047 tokens, $0.15 Opus 4.8: 23,063 tokens, $0.58 Sonnet 4.6: 25,824 tokens, $0.39 GPT 5.5: 31,152 tokens, $0.94 Sonnet 5 did as well as Opus 4.8 and GPT 5.5 on all three tests. In the wrecking ball test, it beat Opus 4.8. The cable moves smoothly and every hit connects. In the catapult test, it beat GPT 5.5. The rock always lands inside the wall. Sonnet 5 still needs better detail and graphics. But it used fewer tokens than every other model

New Claude Sonnet 5 performs at GPT 5.5 level 6x cheaper! We gave 4 models the same prompt: build three self-contained HTML5 canvas scenes with real physics crash demos Prompts: - A car crashes into a brick wall - A wrecking ball destroys a house - A catapult throws a rock at a castle wall Outputs: Sonnet 5: 15,047 tokens, $0.15 Opus 4.8: 23,063 tokens, $0.58 Sonnet 4.6: 25,824 tokens, $0.39 GPT 5.5: 31,152 tokens, $0.94 Sonnet 5 did as well as Opus 4.8 and GPT 5.5 on all three tests. In the wrecking ball test, it beat Opus 4.8. The cable moves smoothly and every hit connects. In the catapult test, it beat GPT 5.5. The rock always lands inside the wall. Sonnet 5 still needs better detail and graphics. But it used fewer tokens than every other model

725,764 просмотров

Sakana Fugu surprisingly performed near GLM 5.2 level but 17× more expensive! We gave the same prompt to 4 models: build a complete live Trader Desk with both frontend and backend components, real-time market data fetched from external APIs for 8 symbols, and a custom dark-theme UI. Outputs: Fugu Ultra — 22,225 t, $0.51 Opus 4.8 — 15,802 t, $0.31 GPT-5.5 — 11,474 t, $0.26 GLM 5.2 — 13,677 t, $0.03 Fugu created the most polished and feature-rich trading desk in the run. GLM 5.2 was very close behind, with a similarly complete multi-panel interface and live data, but at a much lower cost. Opus and GPT also performed well, delivering solid results with a better balance between quality and cost

Sakana Fugu surprisingly performed near GLM 5.2 level but 17× more expensive! We gave the same prompt to 4 models: build a complete live Trader Desk with both frontend and backend components, real-time market data fetched from external APIs for 8 symbols, and a custom dark-theme UI. Outputs: Fugu Ultra — 22,225 t, $0.51 Opus 4.8 — 15,802 t, $0.31 GPT-5.5 — 11,474 t, $0.26 GLM 5.2 — 13,677 t, $0.03 Fugu created the most polished and feature-rich trading desk in the run. GLM 5.2 was very close behind, with a similarly complete multi-panel interface and live data, but at a much lower cost. Opus and GPT also performed well, delivering solid results with a better balance between quality and cost

742,420 просмотров

Mistral OCR 4 turned a handwritten calculus exam into clean LaTeX! We gave it a photo of a hand-written exam page. The model read the handwriting and rebuilt every formula into structured digital text Output: Time: 5.1s · Cost: $0.09 Formulas came through exactly right - the hard part was nailed. The graph, unfortunately, it didn’t redraw. But that’s the telling part: most OCR tools just dump the text and quietly drop the figure. OCR 4 caught the plot, boxed it, and tagged it as a chart. It doesn’t get redrawn, but it gets read and accounted for

Mistral OCR 4 turned a handwritten calculus exam into clean LaTeX! We gave it a photo of a hand-written exam page. The model read the handwriting and rebuilt every formula into structured digital text Output: Time: 5.1s · Cost: $0.09 Formulas came through exactly right - the hard part was nailed. The graph, unfortunately, it didn’t redraw. But that’s the telling part: most OCR tools just dump the text and quietly drop the figure. OCR 4 caught the plot, boxed it, and tagged it as a chart. It doesn’t get redrawn, but it gets read and accounted for

416,647 просмотров

Nemotron 3 Ultra performed GPT 5.5 level 10× cheaper We gave three same prompts to build HTML5 canvas with real physics. At first scene we have water in a spinning drum. Galton board - balls through pegs into bins. And a block collision setup with extreme mass differences. Outputs: Nemotron 3 Ultra: 11.3k tokens, $0.051 GPT 5.5: 11.0k tokens, $0.57 Nemotron stays right on GPT 5.5's heels, but at 10× cheaper. The gap in quality is far smaller than the gap in price.

Nemotron 3 Ultra performed GPT 5.5 level 10× cheaper We gave three same prompts to build HTML5 canvas with real physics. At first scene we have water in a spinning drum. Galton board - balls through pegs into bins. And a block collision setup with extreme mass differences. Outputs: Nemotron 3 Ultra: 11.3k tokens, $0.051 GPT 5.5: 11.0k tokens, $0.57 Nemotron stays right on GPT 5.5's heels, but at 10× cheaper. The gap in quality is far smaller than the gap in price.

700,327 просмотров

Qwen 3.7-max beats Opus 4.7 and GPT-5.5 We tested three frontier models on a real agentic task: write a Tetris bot that plays the game and trains itself. Each model could read its own code, run benchmarks, and rewrite itself across 10 iterations. Then we compared the final bots head to head. Qwen 3.7-Max: training cost $1.32, bot improvement +56% Claude Opus 4.7: training cost $12.15, bot improvement +28% GPT-5.5: training cost $2.85, bot improvement +7% Qwen won on every dimension - biggest jump, 9× cheaper than Claude, 2× cheaper than GPT. Long agentic loops is where Qwen Max actually delivers.

Qwen 3.7-max beats Opus 4.7 and GPT-5.5 We tested three frontier models on a real agentic task: write a Tetris bot that plays the game and trains itself. Each model could read its own code, run benchmarks, and rewrite itself across 10 iterations. Then we compared the final bots head to head. Qwen 3.7-Max: training cost $1.32, bot improvement +56% Claude Opus 4.7: training cost $12.15, bot improvement +28% GPT-5.5: training cost $2.85, bot improvement +7% Qwen won on every dimension - biggest jump, 9× cheaper than Claude, 2× cheaper than GPT. Long agentic loops is where Qwen Max actually delivers.

865,484 просмотров

Bonsai 27B running locally on an iPhone in Atomic Chat! Bonsai is the first 27B-class model that fits on a phone. PrismML built it on Qwen3.6 27B with 1-bit weights. It takes 3.9GB instead of 54GB and keeps ~90% of the benchmark scores. Available now on iPhone and Android

Bonsai 27B running locally on an iPhone in Atomic Chat! Bonsai is the first 27B-class model that fits on a phone. PrismML built it on Qwen3.6 27B with 1-bit weights. It takes 3.9GB instead of 54GB and keeps ~90% of the benchmark scores. Available now on iPhone and Android

51,580 просмотров

Grok 4.5 performed GPT Sol level for free! We gave 4 models the same prompt: build three self-contained HTML5 canvas scenes with real physics demos Prompts: -robot deathmatch, Tombstone vs Minotaur -a hydraulic press flattening stuff on a conveyor -a semi truck jumping a canyon Outputs: GPT-5.6 Sol: 12.9K tokens, $0.51 (~7 min) Grok 4.5: 10.8K tokens, $0 (~5 min) Muse Spark 1.1: 26.8K tokens, $0.12 (~7.5 min) GLM 5.2: 10.9K tokens, $0.02 (~12 min) Grok 4.5 handled all three scenes genuinely well and got surprisingly close to GPT-5.6 this round. On top of that, it ran on the free tier. GPT-5.6 Sol, the frontier model, put out solid but not standout work. GLM 5.2 rendered all three scenes for pennies, but it came out the roughest of the four. Meta's new Muse Spark burned the most tokens yet still stayed cheap, delivering an average result.

Grok 4.5 performed GPT Sol level for free! We gave 4 models the same prompt: build three self-contained HTML5 canvas scenes with real physics demos Prompts: -robot deathmatch, Tombstone vs Minotaur -a hydraulic press flattening stuff on a conveyor -a semi truck jumping a canyon Outputs: GPT-5.6 Sol: 12.9K tokens, $0.51 (~7 min) Grok 4.5: 10.8K tokens, $0 (~5 min) Muse Spark 1.1: 26.8K tokens, $0.12 (~7.5 min) GLM 5.2: 10.9K tokens, $0.02 (~12 min) Grok 4.5 handled all three scenes genuinely well and got surprisingly close to GPT-5.6 this round. On top of that, it ran on the free tier. GPT-5.6 Sol, the frontier model, put out solid but not standout work. GLM 5.2 rendered all three scenes for pennies, but it came out the roughest of the four. Meta's new Muse Spark burned the most tokens yet still stayed cheap, delivering an average result.

70,064 просмотров

New Hunyuan Hy3 hits Gemini 3.5 quality on physics for 35x cheaper! We gave 4 models the same prompt: build three self-contained HTML5 canvas scenes with real physics demos Prompts: - A bowling ball knocking down the pins - An air hockey rally that ends in a goal - A pool break scattering the rack Outputs: Hunyuan Hy3: 29,757 tokens, $0.006 Gemini 3.5: 23,300 tokens, $0.21 GLM-5.2: 25,454 tokens, $0.07 DeepSeek-V4: 50,600 tokens, $0.009 Tencent's Hy3 matched Gemini across all three: clean collisions, the puck bounced true, the pins scattered like a real strike, the rack broke with real momentum, nothing clipped or floated. GLM is genuinely strong on pure coding tasks, but the moment the job steps outside clean code it gives way. DeepSeek was the letdown, it burned the most tokens of anyone (50k, almost 2x Hy3) and still turned in the weakest scenes

New Hunyuan Hy3 hits Gemini 3.5 quality on physics for 35x cheaper! We gave 4 models the same prompt: build three self-contained HTML5 canvas scenes with real physics demos Prompts: - A bowling ball knocking down the pins - An air hockey rally that ends in a goal - A pool break scattering the rack Outputs: Hunyuan Hy3: 29,757 tokens, $0.006 Gemini 3.5: 23,300 tokens, $0.21 GLM-5.2: 25,454 tokens, $0.07 DeepSeek-V4: 50,600 tokens, $0.009 Tencent's Hy3 matched Gemini across all three: clean collisions, the puck bounced true, the pins scattered like a real strike, the rack broke with real momentum, nothing clipped or floated. GLM is genuinely strong on pure coding tasks, but the moment the job steps outside clean code it gives way. DeepSeek was the letdown, it burned the most tokens of anyone (50k, almost 2x Hy3) and still turned in the weakest scenes

96,930 просмотров

LongCat performed Opus 4.8 and GPT 5.5 level on real physics tasks for $0! We gave 4 models the same prompt: build three self-contained HTML5 canvas scenes with real physics Prompts: - A cannon demolishing a brick wall - A bowling ball knocking down the pins - A tornado that sucks in random objects Outputs: LongCat: 18,015 tokens, $0.00 Opus 4.8: 18,872 tokens, $0.48 GPT 5.5: 32,588 tokens, $0.98 GLM 5.2: 31,062 tokens, $0.09 On the physics LongCat came out ahead of Opus 4.8 and GLM 5.2 - cleaner collisions, nothing clipping or falling through. On detail and rendering it matched GPT 5.5, the best looking of the four. Getting this quality for free is wild!

LongCat performed Opus 4.8 and GPT 5.5 level on real physics tasks for $0! We gave 4 models the same prompt: build three self-contained HTML5 canvas scenes with real physics Prompts: - A cannon demolishing a brick wall - A bowling ball knocking down the pins - A tornado that sucks in random objects Outputs: LongCat: 18,015 tokens, $0.00 Opus 4.8: 18,872 tokens, $0.48 GPT 5.5: 32,588 tokens, $0.98 GLM 5.2: 31,062 tokens, $0.09 On the physics LongCat came out ahead of Opus 4.8 and GLM 5.2 - cleaner collisions, nothing clipping or falling through. On detail and rendering it matched GPT 5.5, the best looking of the four. Getting this quality for free is wild!

104,522 просмотров

MTP speedup Qwen by 2.5x in Atomic Chat Dense vs MoE models on 2x RTX 5090 Qwen3.6 27B: 51 → 117 tps +137% Qwen3.6 35B-A3B: 218 → 267 tps +25% MTP drafts several tokens ahead and verifies them in one pass. The speedup depends on memory moved per pass. Dense 27B reads all 27B params per token, MoE 35B-A3B only reads 3B active. Dense had way more to save by batching. The baseline tps also differ (218 vs 51) for the same reason from the other side. Token generation is memory-bandwidth bound, and MoE moves ~8x less memory per token, so its baseline is already 4x ahead. ~80% draft acceptance. Zero accuracy loss. ~1 GB extra VRAM. Open-source code and local AI app – in the comments 👇

MTP speedup Qwen by 2.5x in Atomic Chat Dense vs MoE models on 2x RTX 5090 Qwen3.6 27B: 51 → 117 tps +137% Qwen3.6 35B-A3B: 218 → 267 tps +25% MTP drafts several tokens ahead and verifies them in one pass. The speedup depends on memory moved per pass. Dense 27B reads all 27B params per token, MoE 35B-A3B only reads 3B active. Dense had way more to save by batching. The baseline tps also differ (218 vs 51) for the same reason from the other side. Token generation is memory-bandwidth bound, and MoE moves ~8x less memory per token, so its baseline is already 4x ahead. ~80% draft acceptance. Zero accuracy loss. ~1 GB extra VRAM. Open-source code and local AI app – in the comments 👇

170,338 просмотров

Open-weight LongCat 2.0 matched GPT-5.5 level on agentic game dev for $0! We ran Meituan's LongCat 2.0 against cloud frontier GPT-5.5 in Kilo CLI with their agent. Same task for both - build a retro Duck Hunt game in one game.html, improved over 3 agent iterations with duck waves, ammo and physics Outputs: LongCat 2.0: 70.3K tokens, $0.00 GPT-5.5: 64.9K tokens, $0.65 LongCat kept up on graphics, physics and game logic. Ducks fly and fall when hit, the dog fetches them, ammo counts down, the waves keep coming. Both ran clean and nothing clipped. The only difference was the bill - GPT cost $0.65, LongCat ran local for $0

Open-weight LongCat 2.0 matched GPT-5.5 level on agentic game dev for $0! We ran Meituan's LongCat 2.0 against cloud frontier GPT-5.5 in Kilo CLI with their agent. Same task for both - build a retro Duck Hunt game in one game.html, improved over 3 agent iterations with duck waves, ammo and physics Outputs: LongCat 2.0: 70.3K tokens, $0.00 GPT-5.5: 64.9K tokens, $0.65 LongCat kept up on graphics, physics and game logic. Ducks fly and fall when hit, the dog fetches them, ammo counts down, the waves keep coming. Both ran clean and nothing clipped. The only difference was the bill - GPT cost $0.65, LongCat ran local for $0

39,771 просмотров

Liquid's LFM2.5-8B-A1B smashed OpenAI's gpt-oss-20b on tool calling We ran both locally on a MacBook Pro M5 Max, 64GB, and gave each the same trip-planning request that only completes if the model fires all 7 tool calls - weather for 3 cities, two currency conversions, an email and a reminder Outputs: LFM2.5-8B-A1B: 4.8 GB RAM usage, 7/7 tool-calls, 266 tok/s, 6.9s OpenAI gpt-oss-20b: 11 GB RAM usage, 3/7 tool-calls, 146 tok/s, 15.0s The 8B used less than half the RAM and still fired all 7 calls, while the 20B silently dropped more than half of its own. It also ran ~2x faster, wrapping the full agentic request in 6.9s against 15s. That's what 38T training tokens buy: a 1B-active MoE that nails the agentic tool calls a model 2.5x its active size keeps dropping

Liquid's LFM2.5-8B-A1B smashed OpenAI's gpt-oss-20b on tool calling We ran both locally on a MacBook Pro M5 Max, 64GB, and gave each the same trip-planning request that only completes if the model fires all 7 tool calls - weather for 3 cities, two currency conversions, an email and a reminder Outputs: LFM2.5-8B-A1B: 4.8 GB RAM usage, 7/7 tool-calls, 266 tok/s, 6.9s OpenAI gpt-oss-20b: 11 GB RAM usage, 3/7 tool-calls, 146 tok/s, 15.0s The 8B used less than half the RAM and still fired all 7 calls, while the 20B silently dropped more than half of its own. It also ran ~2x faster, wrapping the full agentic request in 6.9s against 15s. That's what 38T training tokens buy: a 1B-active MoE that nails the agentic tool calls a model 2.5x its active size keeps dropping

90,063 просмотров

Laguna XS 2.1 performed on Qwen 3.6 35B's level in Tetris building and ran 2x faster We tested two open models on a single RTX 3090 in the Poolside coding agent. The task was building a playable retro Tetris as one self-contained html file. Each model wrote and rewrote the game across 3 iterations Outputs: Laguna XS 2.1: 45K tokens, 158 tok/s Qwen 3.6 35B: 39K tokens, 81 tok/s The two Tetris builds are near identical. Poolside's Laguna has a couple of small visual bugs that Qwen 3.6 35B doesn't, but it built the same game twice as fast by its built-in DFlash speculative decoding

Laguna XS 2.1 performed on Qwen 3.6 35B's level in Tetris building and ran 2x faster We tested two open models on a single RTX 3090 in the Poolside coding agent. The task was building a playable retro Tetris as one self-contained html file. Each model wrote and rewrote the game across 3 iterations Outputs: Laguna XS 2.1: 45K tokens, 158 tok/s Qwen 3.6 35B: 39K tokens, 81 tok/s The two Tetris builds are near identical. Poolside's Laguna has a couple of small visual bugs that Qwen 3.6 35B doesn't, but it built the same game twice as fast by its built-in DFlash speculative decoding

23,256 просмотров

Compared Qwen3.6 35B and 27B in the same conditions with Google TurboQuant Device: MacBook Pro M5Max 64GB RAM Outputs characteristics: Qwen3.6 35B: 6672 tokens, 2m 10s, 65 tok/s Qwen3.6 27B: 7344 tokens, 5m 22s, 24 tok/s Conclusion: Both models were asked to draw waves using HTML, 35B responded quickly but the result feels weak and messy, while 27B took more time and delivered a much cleaner and more consistent result, because it is built for thinking and planning, so it works better on tasks that need structure, overall 27B is a better choice for tasks where planning matters, while 35B is more suitable for everyday use when you just need a fast response

Compared Qwen3.6 35B and 27B in the same conditions with Google TurboQuant Device: MacBook Pro M5Max 64GB RAM Outputs characteristics: Qwen3.6 35B: 6672 tokens, 2m 10s, 65 tok/s Qwen3.6 27B: 7344 tokens, 5m 22s, 24 tok/s Conclusion: Both models were asked to draw waves using HTML, 35B responded quickly but the result feels weak and messy, while 27B took more time and delivered a much cleaner and more consistent result, because it is built for thinking and planning, so it works better on tasks that need structure, overall 27B is a better choice for tasks where planning matters, while 35B is more suitable for everyday use when you just need a fast response

55,540 просмотров

Videos

Anya Rossi

sweetdream.ai

SweetDream.ai•Sponsored•Livecam

Watch Anya Live

Anya is streaming live right now! Join her private show and enjoy exclusive content.

Exclusive private shows

1.2k viewers online

Private Show

Join now for exclusive access

Free preview available • Premium content

GPT-5.6 Sol Ultra lost to GPT-5.5 on physics at 3× the cost! We gave 4 models the same prompt: build three self-contained HTML5 canvas scenes with real physics demos Prompts: - A monster truck backflipping onto a parked car - A stunt car jumping six buses into a brick wall - A train derailing off a broken bridge into the water Outputs: GPT-5.6 Sol Ultra: 32.9K tokens, $0.33 Opus 4.8: 9.2K tokens, $0.24 GPT-5.5: 12.4K tokens, $0.11 Grok 4.5: 7.0K tokens, $0.08 Sol Ultra draws just like GPT-5.5, only with more detail and it shows clearest on the bus jump where the two look almost identical. In the other tests Sol Ultra came out worse than its predecessor and the physics really got weaker. We think GPT-5.5 took the truck flip and the train outright. With Sol Ultra you basically get GPT-5.5 with weaker physics and a nicer picture for 3x the price. Newborn Grok 4.5 failed two of the three tests and only came good on the train

GPT-5.6 Sol Ultra lost to GPT-5.5 on physics at 3× the cost! We gave 4 models the same prompt: build three self-contained HTML5 canvas scenes with real physics demos Prompts: - A monster truck backflipping onto a parked car - A stunt car jumping six buses into a brick wall - A train derailing off a broken bridge into the water Outputs: GPT-5.6 Sol Ultra: 32.9K tokens, $0.33 Opus 4.8: 9.2K tokens, $0.24 GPT-5.5: 12.4K tokens, $0.11 Grok 4.5: 7.0K tokens, $0.08 Sol Ultra draws just like GPT-5.5, only with more detail and it shows clearest on the bus jump where the two look almost identical. In the other tests Sol Ultra came out worse than its predecessor and the physics really got weaker. We think GPT-5.5 took the truck flip and the train outright. With Sol Ultra you basically get GPT-5.5 with weaker physics and a nicer picture for 3x the price. Newborn Grok 4.5 failed two of the three tests and only came good on the train

895,609 просмотров • 11 дней назад

New Fable 5 beats Opus 4.8 on real world physics simulations We gave both models the same three prompts and asked them to build self contained HTML5 sims with real physics and no libraries: 1. Chaotic double pendulum 2. Galton board 3. Water in a spinning drum (WCSPH) Generation cost Fable 5: $3.35 on 68.7k tokens, time 14m 47s Opus 4.8: $0.93 on 38.9k tokens, time 8m 10s Fable clearly did better on the water simulation, producing a much more solid and continuous body of water. Opus left larger gaps near the walls, scattered particles around the scene, and struggled to keep the fluid stable.

New Fable 5 beats Opus 4.8 on real world physics simulations We gave both models the same three prompts and asked them to build self contained HTML5 sims with real physics and no libraries: 1. Chaotic double pendulum 2. Galton board 3. Water in a spinning drum (WCSPH) Generation cost Fable 5: $3.35 on 68.7k tokens, time 14m 47s Opus 4.8: $0.93 on 38.9k tokens, time 8m 10s Fable clearly did better on the water simulation, producing a much more solid and continuous body of water. Opus left larger gaps near the walls, scattered particles around the scene, and struggled to keep the fluid stable.

1,585,909 просмотров • 1 месяц назад

1-bit Hy3 running locally is 2.2x faster than its API at the same quality! We gave both models the same task and compared one-shot outputs. 1-bit Hy3 295B GGUF (92GB) ran locally on 4x RTX 5090 with 128GB VRAM against the same Hy3 over cloud API Tasks: - Flappy Bird - Arkanoid - Snake Outputs: Hy3 1-bit local: 76.9K tokens, 15.5 min Hy3 cloud API: 75.1K tokens, 34.3 min The 1-bit games look the same as the API ones. Birds fly through the pipes, bricks break, the snake eats and grows. Nothing froze or crashed. Both models even made the same slip: the snake can cross itself and the game does not end. Getting this quality from 1 bit running locally is wild! Run Hy3 GGUF yourself in Atomic Chat in 2 clicks

1-bit Hy3 running locally is 2.2x faster than its API at the same quality! We gave both models the same task and compared one-shot outputs. 1-bit Hy3 295B GGUF (92GB) ran locally on 4x RTX 5090 with 128GB VRAM against the same Hy3 over cloud API Tasks: - Flappy Bird - Arkanoid - Snake Outputs: Hy3 1-bit local: 76.9K tokens, 15.5 min Hy3 cloud API: 75.1K tokens, 34.3 min The 1-bit games look the same as the API ones. Birds fly through the pipes, bricks break, the snake eats and grows. Nothing froze or crashed. Both models even made the same slip: the snake can cross itself and the game does not end. Getting this quality from 1 bit running locally is wild! Run Hy3 GGUF yourself in Atomic Chat in 2 clicks

88,121 просмотров • 6 дней назад

New Opus 4.8 crashed Opus 4.7 at physics on canvas! We gave both models the same three prompts: simulate a real physics phenomenon on raw HTML5 canvas. Prompt 1: "A triple pendulum swings into chaos and paints glowing trails with its tip" Prompt 2: "A 1 kg block bounces between a wall and a 100.000 kg block. The collisions count out the digits of pi" Prompt 3: "Balls fall through a grid of pegs and pile into a bell curve"

New Opus 4.8 crashed Opus 4.7 at physics on canvas! We gave both models the same three prompts: simulate a real physics phenomenon on raw HTML5 canvas. Prompt 1: "A triple pendulum swings into chaos and paints glowing trails with its tip" Prompt 2: "A 1 kg block bounces between a wall and a 100.000 kg block. The collisions count out the digits of pi" Prompt 3: "Balls fall through a grid of pegs and pile into a bell curve"

637,426 просмотров • 1 месяц назад

DFlash makes Qwen 2.2x faster with no quality loss! We ran the same Qwen3.6-27B locally three ways on one RTX 6000: baseline, MTP, DFlash. The tasks only differ in one thing - how predictable the next word is: quicksort, describe a file in JSON, a logic puzzle, a sci-fi story. Outputs: Baseline: 44 tok/s · 1.00x MTP: 65 tok/s · 1.45x · 71% accepted DFlash: 98 tok/s · 2.20x · 30% accepted Baseline writes one token per step. MTP works inside the model itself and guesses 3 tokens ahead. DFlash is a separate small model that writes 15 tokens at once, and the big model only checks them. In JSON the same words repeat all the time, so most guesses were right: 152 tok/s, 3.4x speedup. In the story 9 guesses out of 10 were wrong. DFlash did all that extra work for nothing and became slower than baseline: 42 vs 44 tok/s. MTP guesses only 3 tokens, so a wrong guess costs very little: 46 tok/s and the win in that round. The output is identical in all three modes - DFlash is the pick for tasks with predictable output, like coding, and for chat and creative writing MTP works better. DFlash is now natively integrated into Atomic Chat on llama.cpp - speed up your Qwen models!

DFlash makes Qwen 2.2x faster with no quality loss! We ran the same Qwen3.6-27B locally three ways on one RTX 6000: baseline, MTP, DFlash. The tasks only differ in one thing - how predictable the next word is: quicksort, describe a file in JSON, a logic puzzle, a sci-fi story. Outputs: Baseline: 44 tok/s · 1.00x MTP: 65 tok/s · 1.45x · 71% accepted DFlash: 98 tok/s · 2.20x · 30% accepted Baseline writes one token per step. MTP works inside the model itself and guesses 3 tokens ahead. DFlash is a separate small model that writes 15 tokens at once, and the big model only checks them. In JSON the same words repeat all the time, so most guesses were right: 152 tok/s, 3.4x speedup. In the story 9 guesses out of 10 were wrong. DFlash did all that extra work for nothing and became slower than baseline: 42 vs 44 tok/s. MTP guesses only 3 tokens, so a wrong guess costs very little: 46 tok/s and the win in that round. The output is identical in all three modes - DFlash is the pick for tasks with predictable output, like coding, and for chat and creative writing MTP works better. DFlash is now natively integrated into Atomic Chat on llama.cpp - speed up your Qwen models!

75,006 просмотров • 7 дней назад

Google Turbo Quant running Locally in Atomic Chat MacBook Air M4 16 GB Model: QWEN3.5-9B Context window: 50000 Summarising 20000 words in just seconds.. You can do 3x larger context window, processing 3x faster than before!

Google Turbo Quant running Locally in Atomic Chat MacBook Air M4 16 GB Model: QWEN3.5-9B Context window: 50000 Summarising 20000 words in just seconds.. You can do 3x larger context window, processing 3x faster than before!

435,661 просмотров • 3 месяцев назад

New Kimi K2.7 Code performs at GPT-5.5 level 3x cheaper! We gave both models the same three prompts: build a self-contained HTML5 canvas sim with real physics, no libraries. A spring pendulum on a stretching coil, a 1 kg block trading collisions with a 100,000 kg block, and 22 balls churning in a spinning hexagon Outputs: Kimi K2.7 Code: $0.28 on 52.4k tokens GPT-5.5: $0.93 on 23.4k tokens Spring pendulums and blocks came out even. The balls Kimi did better: its pile spins with the drum when GPT's bounce around in pure chaos. On price to quality, K2.7 Code is the clear pick

New Kimi K2.7 Code performs at GPT-5.5 level 3x cheaper! We gave both models the same three prompts: build a self-contained HTML5 canvas sim with real physics, no libraries. A spring pendulum on a stretching coil, a 1 kg block trading collisions with a 100,000 kg block, and 22 balls churning in a spinning hexagon Outputs: Kimi K2.7 Code: $0.28 on 52.4k tokens GPT-5.5: $0.93 on 23.4k tokens Spring pendulums and blocks came out even. The balls Kimi did better: its pile spins with the drum when GPT's bounce around in pure chaos. On price to quality, K2.7 Code is the clear pick

139,760 просмотров • 1 месяц назад

New Google Gemma 4 12B claims near-26B performance - we tested both! We ran both models locally on one RTX 4090 and gave each the same task: write a self-contained HTML5 canvas animation with real physics in one file without libraries. Three scenes - a Galton board, two blocks colliding off a wall, and a chaotic triple pendulum Outputs: Gemma 4 26B-A4B: 15 GB VRAM usage, 6.9k tokens, 138 tok/s Gemma 4 12B: 9 GB VRAM usage, 8.9k tokens, 80 tok/s Same Gemma 4 family, but the 26B-A4B won every scene and ran ~1.7x faster - on just 4B active params. The 12B stayed very close though, on almost half the VRAM - which makes it the ideal model for a 16 GB laptop

New Google Gemma 4 12B claims near-26B performance - we tested both! We ran both models locally on one RTX 4090 and gave each the same task: write a self-contained HTML5 canvas animation with real physics in one file without libraries. Three scenes - a Galton board, two blocks colliding off a wall, and a chaotic triple pendulum Outputs: Gemma 4 26B-A4B: 15 GB VRAM usage, 6.9k tokens, 138 tok/s Gemma 4 12B: 9 GB VRAM usage, 8.9k tokens, 80 tok/s Same Gemma 4 family, but the 26B-A4B won every scene and ran ~1.7x faster - on just 4B active params. The 12B stayed very close though, on almost half the VRAM - which makes it the ideal model for a 16 GB laptop

151,786 просмотров • 1 месяц назад

Run OpenClaw with Gemma 4 and Atomic Chat MacBook Air M4 · 16 GB RAM · 25 tok/s No cloud! No subscription fees! Open-source local model. Runs on your regular device

Run OpenClaw with Gemma 4 and Atomic Chat MacBook Air M4 · 16 GB RAM · 25 tok/s No cloud! No subscription fees! Open-source local model. Runs on your regular device

313,459 просмотров • 3 месяцев назад

Open-weight MiniMax M3 filled out a US customs form from a driver's license photo For this test we deployed MiniMax M3 Q4 using MLX-VLM on a Mac Studio M3 Ultra 512GB RAM. The model was tasked with reading a scanned document and an ID card photo, then completing a declaration form Output: 736 tokens · Input: 1,847 tokens · Time: ~31s The model analyzed both inputs, streamed its reasoning, and then called three tools: write_field for text fields, mark for Yes/No checkboxes, and sign for the signature and date. It extracted the required information, mapped it to the correct fields and completed the form without any manual input

Open-weight MiniMax M3 filled out a US customs form from a driver's license photo For this test we deployed MiniMax M3 Q4 using MLX-VLM on a Mac Studio M3 Ultra 512GB RAM. The model was tasked with reading a scanned document and an ID card photo, then completing a declaration form Output: 736 tokens · Input: 1,847 tokens · Time: ~31s The model analyzed both inputs, streamed its reasoning, and then called three tools: write_field for text fields, mark for Yes/No checkboxes, and sign for the signature and date. It extracted the required information, mapped it to the correct fields and completed the form without any manual input

109,369 просмотров • 1 месяц назад

Running Hermes agent Locally with Gemma4 Device: Macbook Air CPU: M4 RAM: 16GB Open Source. Free. Private. With TurboQuant cache in @Atomic_Chat_HQ app

Running Hermes agent Locally with Gemma4 Device: Macbook Air CPU: M4 RAM: 16GB Open Source. Free. Private. With TurboQuant cache in @Atomic_Chat_HQ app

295,169 просмотров • 3 месяцев назад

Diffusion Gemma is 4x faster, but makes 6x more mistakes! We benchmarked the new diffusion LLM against its autoregressive twin on a single H100 (FP8). We gave each the same three tasks: write a Steve Jobs biography, the history of Tetris, and the story of BeOS - every next topic less popular than the previous one. Then we fact-checked every claim in every answer. Gemma4 got 45 facts right, 5 wrong. DiffusionGemma got 33 right, 28 wrong. The less popular the topic, the worse it got: 4 mistakes on Jobs, 12 on Tetris, 12 on BeOS. It named Clara Clley as Steve Jobs' mother, invented a colleague for Pajitnov named Geri Gulovik and priced the BeBox at $9,999. The real one cost $1,600. Outputs: Gemma4 26B A4B: 218 tok/s · 15.1s total · 45 facts · 5 mistakes DiffusionGemma 26B A4B: 763 tok/s · 3.7s total · 33 facts · 28 mistakes The reason is simple. DiffusionGemma throws 256 tokens on the screen at once and polishes them pass after pass until the text sounds smooth. Smooth is all it cares about: a fake name, date or number sounds just as smooth as a real one, so it stays. Regular Gemma4 meanwhile writes one word at a time and checks every new word against everything before it. Google says it themselves in the launch post: quality is lower, use regular Gemma 4 when facts matter.

Diffusion Gemma is 4x faster, but makes 6x more mistakes! We benchmarked the new diffusion LLM against its autoregressive twin on a single H100 (FP8). We gave each the same three tasks: write a Steve Jobs biography, the history of Tetris, and the story of BeOS - every next topic less popular than the previous one. Then we fact-checked every claim in every answer. Gemma4 got 45 facts right, 5 wrong. DiffusionGemma got 33 right, 28 wrong. The less popular the topic, the worse it got: 4 mistakes on Jobs, 12 on Tetris, 12 on BeOS. It named Clara Clley as Steve Jobs' mother, invented a colleague for Pajitnov named Geri Gulovik and priced the BeBox at $9,999. The real one cost $1,600. Outputs: Gemma4 26B A4B: 218 tok/s · 15.1s total · 45 facts · 5 mistakes DiffusionGemma 26B A4B: 763 tok/s · 3.7s total · 33 facts · 28 mistakes The reason is simple. DiffusionGemma throws 256 tokens on the screen at once and polishes them pass after pass until the text sounds smooth. Smooth is all it cares about: a fake name, date or number sounds just as smooth as a real one, so it stays. Regular Gemma4 meanwhile writes one word at a time and checks every new word against everything before it. Google says it themselves in the launch post: quality is lower, use regular Gemma 4 when facts matter.

75,760 просмотров • 1 месяц назад

New Z.ai GLM-5.2 beats Kimi K2.7 Code on physics contest! We gave both models the same three prompts and asked them to build self contained HTML5 sims with real physics and no libraries: 1. Pool break 2. Block on a bed of springs 3. Galton board Outputs: GLM-5.2: 12,640 tokens Kimi K2.7 Code: 7,420 tokens GLM 5.2 nailed all three, and it did it with way more detail and polish. The break conserved momentum, the block bounced off the springs and the Galton beads spread into a clean bell curve. Kimi struggled on every scene: its block fell straight through the springs, its break didn't look realistic with the balls colliding all wrong, and on the Galton board its balls overlapped and piled into each other instead of spreading out

New Z.ai GLM-5.2 beats Kimi K2.7 Code on physics contest! We gave both models the same three prompts and asked them to build self contained HTML5 sims with real physics and no libraries: 1. Pool break 2. Block on a bed of springs 3. Galton board Outputs: GLM-5.2: 12,640 tokens Kimi K2.7 Code: 7,420 tokens GLM 5.2 nailed all three, and it did it with way more detail and polish. The break conserved momentum, the block bounced off the springs and the Galton beads spread into a clean bell curve. Kimi struggled on every scene: its block fell straight through the springs, its break didn't look realistic with the balls colliding all wrong, and on the Galton board its balls overlapped and piled into each other instead of spreading out

60,899 просмотров • 1 месяц назад

Run Cline on Local AI models with Atomic Chat! Cline is a coding agent trusted by 8M+ developers. Write, refactor, ship code securely on your own hardware with local models powered by atomic.chat — no cloud, private, free and open-source

Run Cline on Local AI models with Atomic Chat! Cline is a coding agent trusted by 8M+ developers. Write, refactor, ship code securely on your own hardware with local models powered by atomic.chat — no cloud, private, free and open-source

43,940 просмотров • 27 дней назад

MiniMax M3 turned a napkin sketch into a playable game We handed MiniMax M3 a hand-drawn draft of a Doodle Jump style platformer. It read the elements off the draft, wrote the logic, drew the interface and shipped it as one self-contained HTML game Input: 6,920 tokens Output: 9,933 tokens Cost: $0.028 MiniMax (official) drops M3 on Hugging Face next week

MiniMax M3 turned a napkin sketch into a playable game We handed MiniMax M3 a hand-drawn draft of a Doodle Jump style platformer. It read the elements off the draft, wrote the logic, drew the interface and shipped it as one self-contained HTML game Input: 6,920 tokens Output: 9,933 tokens Cost: $0.028 MiniMax (official) drops M3 on Hugging Face next week

64,640 просмотров • 1 месяц назад

OpenClaude from GitLawb running through Atomic Chat's local model MacBook Air M4 16GB RAM - blazing fast No API key. No cloud. Fully open-source

OpenClaude from GitLawb running through Atomic Chat's local model MacBook Air M4 16GB RAM - blazing fast No API key. No cloud. Fully open-source

82,775 просмотров • 3 месяцев назад

Atomic Chat is now on Hugging Face 🤗 We're officially a Local App on the world's biggest AI hub. Run 200,000+ open-weight models from Hugging Face directly on your device - private, local, and open source!

Atomic Chat is now on Hugging Face 🤗 We're officially a Local App on the world's biggest AI hub. Run 200,000+ open-weight models from Hugging Face directly on your device - private, local, and open source!

28,630 просмотров • 1 месяц назад

Exa AI Search now in Atomic Chat! Your local models can finally search the live web and pull fresh info — powered by @exaailabs, trusted by 400,000+ developers, free and open-source

Exa AI Search now in Atomic Chat! Your local models can finally search the live web and pull fresh info — powered by @exaailabs, trusted by 400,000+ developers, free and open-source

22,480 просмотров • 1 месяц назад

Run KiloCode on open models powered by Atomic Chat Kilo is trusted by 3M+ developers. Code, debug, and build entirely on your device — no cloud, fully private, free and open-source with atomic.chat local AI provider

Run KiloCode on open models powered by Atomic Chat Kilo is trusted by 3M+ developers. Code, debug, and build entirely on your device — no cloud, fully private, free and open-source with atomic.chat local AI provider

11,990 просмотров • 1 месяц назад

Больше нет контента для загрузки