atomic.chat's banner
atomic.chat's profile picture

atomic.chat

@atomic_chat_hq7,532 subscribers

Local AI chat and Inference Engine. Enhanced by TurboQuant. Team: @gladkos @skinbagwbones @AlexFromAtomic @danyurkin

Shorts

Qwen 3.7-max beats Opus 4.7 and GPT-5.5 We tested three frontier models on a real agentic task: write a Tetris bot that plays the game and trains itself. Each model could read its own code, run benchmarks, and rewrite itself across 10 iterations. Then we compared the final bots head to head. Qwen 3.7-Max: training cost $1.32, bot improvement +56% Claude Opus 4.7: training cost $12.15, bot improvement +28% GPT-5.5: training cost $2.85, bot improvement +7% Qwen won on every dimension - biggest jump, 9× cheaper than Claude, 2× cheaper than GPT. Long agentic loops is where Qwen Max actually delivers.

Qwen 3.7-max beats Opus 4.7 and GPT-5.5 We tested three frontier models on a real agentic task: write a Tetris bot that plays the game and trains itself. Each model could read its own code, run benchmarks, and rewrite itself across 10 iterations. Then we compared the final bots head to head. Qwen 3.7-Max: training cost $1.32, bot improvement +56% Claude Opus 4.7: training cost $12.15, bot improvement +28% GPT-5.5: training cost $2.85, bot improvement +7% Qwen won on every dimension - biggest jump, 9× cheaper than Claude, 2× cheaper than GPT. Long agentic loops is where Qwen Max actually delivers.

853,975 просмотров

Liquid's LFM2.5-8B-A1B smashed OpenAI's gpt-oss-20b on tool calling We ran both locally on a MacBook Pro M5 Max, 64GB, and gave each the same trip-planning request that only completes if the model fires all 7 tool calls - weather for 3 cities, two currency conversions, an email and a reminder Outputs: LFM2.5-8B-A1B: 4.8 GB RAM usage, 7/7 tool-calls, 266 tok/s, 6.9s OpenAI gpt-oss-20b: 11 GB RAM usage, 3/7 tool-calls, 146 tok/s, 15.0s The 8B used less than half the RAM and still fired all 7 calls, while the 20B silently dropped more than half of its own. It also ran ~2x faster, wrapping the full agentic request in 6.9s against 15s. That's what 38T training tokens buy: a 1B-active MoE that nails the agentic tool calls a model 2.5x its active size keeps dropping

Liquid's LFM2.5-8B-A1B smashed OpenAI's gpt-oss-20b on tool calling We ran both locally on a MacBook Pro M5 Max, 64GB, and gave each the same trip-planning request that only completes if the model fires all 7 tool calls - weather for 3 cities, two currency conversions, an email and a reminder Outputs: LFM2.5-8B-A1B: 4.8 GB RAM usage, 7/7 tool-calls, 266 tok/s, 6.9s OpenAI gpt-oss-20b: 11 GB RAM usage, 3/7 tool-calls, 146 tok/s, 15.0s The 8B used less than half the RAM and still fired all 7 calls, while the 20B silently dropped more than half of its own. It also ran ~2x faster, wrapping the full agentic request in 6.9s against 15s. That's what 38T training tokens buy: a 1B-active MoE that nails the agentic tool calls a model 2.5x its active size keeps dropping

87,015 просмотров

MTP speedup Qwen by 2.5x in Atomic Chat Dense vs MoE models on 2x RTX 5090 Qwen3.6 27B: 51 → 117 tps +137% Qwen3.6 35B-A3B: 218 → 267 tps +25% MTP drafts several tokens ahead and verifies them in one pass. The speedup depends on memory moved per pass. Dense 27B reads all 27B params per token, MoE 35B-A3B only reads 3B active. Dense had way more to save by batching. The baseline tps also differ (218 vs 51) for the same reason from the other side. Token generation is memory-bandwidth bound, and MoE moves ~8x less memory per token, so its baseline is already 4x ahead. ~80% draft acceptance. Zero accuracy loss. ~1 GB extra VRAM. Open-source code and local AI app – in the comments 👇

MTP speedup Qwen by 2.5x in Atomic Chat Dense vs MoE models on 2x RTX 5090 Qwen3.6 27B: 51 → 117 tps +137% Qwen3.6 35B-A3B: 218 → 267 tps +25% MTP drafts several tokens ahead and verifies them in one pass. The speedup depends on memory moved per pass. Dense 27B reads all 27B params per token, MoE 35B-A3B only reads 3B active. Dense had way more to save by batching. The baseline tps also differ (218 vs 51) for the same reason from the other side. Token generation is memory-bandwidth bound, and MoE moves ~8x less memory per token, so its baseline is already 4x ahead. ~80% draft acceptance. Zero accuracy loss. ~1 GB extra VRAM. Open-source code and local AI app – in the comments 👇

168,714 просмотров

Compared Qwen3.6 35B and 27B in the same conditions with Google TurboQuant Device: MacBook Pro M5Max 64GB RAM Outputs characteristics: Qwen3.6 35B: 6672 tokens, 2m 10s, 65 tok/s Qwen3.6 27B: 7344 tokens, 5m 22s, 24 tok/s Conclusion: Both models were asked to draw waves using HTML, 35B responded quickly but the result feels weak and messy, while 27B took more time and delivered a much cleaner and more consistent result, because it is built for thinking and planning, so it works better on tasks that need structure, overall 27B is a better choice for tasks where planning matters, while 35B is more suitable for everyday use when you just need a fast response

Compared Qwen3.6 35B and 27B in the same conditions with Google TurboQuant Device: MacBook Pro M5Max 64GB RAM Outputs characteristics: Qwen3.6 35B: 6672 tokens, 2m 10s, 65 tok/s Qwen3.6 27B: 7344 tokens, 5m 22s, 24 tok/s Conclusion: Both models were asked to draw waves using HTML, 35B responded quickly but the result feels weak and messy, while 27B took more time and delivered a much cleaner and more consistent result, because it is built for thinking and planning, so it works better on tasks that need structure, overall 27B is a better choice for tasks where planning matters, while 35B is more suitable for everyday use when you just need a fast response

53,251 просмотров

Videos

Больше нет контента для загрузки