Video wird geladen...

Video konnte nicht geladen werden

Beim Laden dieses Videos ist ein Problem aufgetreten. Dies könnte an einem vorübergehenden Netzwerkproblem liegen oder das Video ist möglicherweise nicht verfügbar.

A 3B model just cleared a puzzle that a 1.6 TRILLION param model couldn't. You've seen this benchmark before: my sliding-puzzle test. Same Kimi & DeepSeek runs as last time. The only new thing: I dropped VibeThinker-3B in for a side-by-side. > VibeThinker → 3B > DeepSeek V4 Flash... → 284B > Kimi K2.6 → 1T > DeepSeek V4 Pro → 1.6T Shuffle depths 5, 10, 12, 15, 18, 22. One wrong move scrambles the whole board, so it's pure long-chain reasoning. ✅ VibeThinker-3B: solved all six. Never lost the thread. ⚠️ The giants started cracking at depth 15: Flash, Pro, and even Kimi each blew a run, scrambling the board past the move cap. As VibeThinker was not trained for tool calling, I had it emit X and ran the move. Bigger generalist ≠ smarter.show more

stevibe

26,999 subscribers

30,069 Aufrufe • vor 6 Tagen •via X (Twitter)

Nachrichten & Politik Wissenschaft & Technologie

Anya Rossi• Live Now

Private livecam show

0 Kommentare

Keine Kommentare verfügbar

Kommentare vom Original-Post werden hier angezeigt

Ähnliche Videos

Six open-source LLMs. One sliding puzzle. A brutal test of long-horizon reasoning and tool calling. Five of them broke. One didn't. I gave each model a move_tile tool and a scrambled 3×3 board, then asked it to solve the puzzle through pure turn-by-turn reasoning. The deeper the scramble, the more brutal the search. Five runs per depth, best run kept. A model fails the round if it exceeds 6x the optimal move count. > Depth 5: Everyone solves it. Yawn. > Depth 10: GLM 5.1 melts down. 43 moves. Cut. > Depth 12: Gemma4 26B loses the plot, shuffling tiles in circles. Gone. > Depth 15: The wall. DeepSeek V4 Flash, out. DeepSeek V4 Pro, out. Gemma4, out again. GLM 5.1, out. Two survivors: Qwen3.6 35B-A3B, and Kimi K2.6 with an 11-move solve that looked like cheating. > Depth 18: Same two. Everyone else hallucinating tiles that weren't there. > Depth 22: Final boss. Kimi, flawless for five rounds, finally cracks. 81 moves. Still scrambled. DeepSeek V4 Pro limps home at 90. Qwen3.6 35B-A3B solves it in 36. The smallest model in the room. ~3B active params. Fits on a single 3090. It beat everything. Kimi was elegant. Qwen3.6 was unstoppable.

Six open-source LLMs. One sliding puzzle. A brutal test of long-horizon reasoning and tool calling. Five of them broke. One didn't. I gave each model a move_tile tool and a scrambled 3×3 board, then asked it to solve the puzzle through pure turn-by-turn reasoning. The deeper the scramble, the more brutal the search. Five runs per depth, best run kept. A model fails the round if it exceeds 6x the optimal move count. > Depth 5: Everyone solves it. Yawn. > Depth 10: GLM 5.1 melts down. 43 moves. Cut. > Depth 12: Gemma4 26B loses the plot, shuffling tiles in circles. Gone. > Depth 15: The wall. DeepSeek V4 Flash, out. DeepSeek V4 Pro, out. Gemma4, out again. GLM 5.1, out. Two survivors: Qwen3.6 35B-A3B, and Kimi K2.6 with an 11-move solve that looked like cheating. > Depth 18: Same two. Everyone else hallucinating tiles that weren't there. > Depth 22: Final boss. Kimi, flawless for five rounds, finally cracks. 81 moves. Still scrambled. DeepSeek V4 Pro limps home at 90. Qwen3.6 35B-A3B solves it in 36. The smallest model in the room. ~3B active params. Fits on a single 3090. It beat everything. Kimi was elegant. Qwen3.6 was unstoppable.

stevibe

16,286 Aufrufe • vor 1 Monat

DeepSeek V4 Flash & V4 Pro Official API Speed Test V4 Flash: 80.63 tok/s V4 Pro: 36.72 tok/s

DeepSeek V4 Flash & V4 Pro Official API Speed Test V4 Flash: 80.63 tok/s V4 Pro: 36.72 tok/s

stevibe

17,555 Aufrufe • vor 2 Monaten

DeepSeek V4 Flash vs V4 Pro, 4 canvas tests run. I'm a dev, and honestly DeepSeek was never on my list for coding tasks. After seeing these results, I might need to reconsider. 🌳 Tree: Flash actually beat Pro: denser canopy, better leaf distribution 🌌 Night sky: Pro wins, shooting stars, constellations, Milky Way band all there 🐟 Fish boids: Pro is stunning: proper caustics, tight schooling, real underwater feel 🏠 House builder: first model we've tested that knows how to actually build a rooftop DeepSeek coding might be underrated.

DeepSeek V4 Flash vs V4 Pro, 4 canvas tests run. I'm a dev, and honestly DeepSeek was never on my list for coding tasks. After seeing these results, I might need to reconsider. 🌳 Tree: Flash actually beat Pro: denser canopy, better leaf distribution 🌌 Night sky: Pro wins, shooting stars, constellations, Milky Way band all there 🐟 Fish boids: Pro is stunning: proper caustics, tight schooling, real underwater feel 🏠 House builder: first model we've tested that knows how to actually build a rooftop DeepSeek coding might be underrated.

stevibe

56,222 Aufrufe • vor 2 Monaten

FREE DeepSeek V4 Pro and V4 Flash API. Runtime by Bad Theory Labs just dropped it. Same platform, new free model, same 2-minute setup. > Runtime already gave you free Claude Opus 4.8 and GPT 5.5 access. Now they added DeepSeek V4 Pro and V4 Flash to the free tier. Absolutely FREE, YES. Same exact flow as before: → Go to runtime(.)badtheorylabs(.)com → Sign up with Google → Fill in your details during onboarding → Free balance lands automatically(But you don't need it to use DeepSeek - it's completely free right now) → Grab your API key from the dashboard → Plug into OpenCode, OpenClaw, or Hermes - done DeepSeek V4 Pro is one of the strongest coding and reasoning models available right now. Free tier also includes btl-2 - auto-routes between GLM, OpenAI, Anthropic, and Gemma. 10 million tokens per month at zero cost. 344 models total. One OpenAI-compatible endpoint. No code rebuild, no card, no limits hunting. >> Just start using the API, and when choosing a provider, select DeepSeek I tested it myself - fully working. Bookmark this before the free tier fills up.

FREE DeepSeek V4 Pro and V4 Flash API. Runtime by Bad Theory Labs just dropped it. Same platform, new free model, same 2-minute setup. > Runtime already gave you free Claude Opus 4.8 and GPT 5.5 access. Now they added DeepSeek V4 Pro and V4 Flash to the free tier. Absolutely FREE, YES. Same exact flow as before: → Go to runtime(.)badtheorylabs(.)com → Sign up with Google → Fill in your details during onboarding → Free balance lands automatically(But you don't need it to use DeepSeek - it's completely free right now) → Grab your API key from the dashboard → Plug into OpenCode, OpenClaw, or Hermes - done DeepSeek V4 Pro is one of the strongest coding and reasoning models available right now. Free tier also includes btl-2 - auto-routes between GLM, OpenAI, Anthropic, and Gemma. 10 million tokens per month at zero cost. 344 models total. One OpenAI-compatible endpoint. No code rebuild, no card, no limits hunting. >> Just start using the API, and when choosing a provider, select DeepSeek I tested it myself - fully working. Bookmark this before the free tier fills up.

Atenov int.

26,895 Aufrufe • vor 8 Tagen

Deepseek-V4-Flash helping me setup Nvidia's Dynamo for disaggregated inference. I have really gotten this model to be a daily driver now. It's really strong at agentic workflows and a decent programmer. For all my side stuff, it's local deepseek now Claude sub cancelled wdyt

Deepseek-V4-Flash helping me setup Nvidia's Dynamo for disaggregated inference. I have really gotten this model to be a daily driver now. It's really strong at agentic workflows and a decent programmer. For all my side stuff, it's local deepseek now Claude sub cancelled wdyt

0xSero

24,469 Aufrufe • vor 1 Monat

I designed a new test specifically for multimodal models: fill out a paper form. And it's much harder than it sounds. This isn't typing into an electronic field that captures your text. The form is just an image. The model has to place each form element: text, checkmarks — at the correct pixel position on the canvas itself. Results: 🟢 Kimi K2.6 → done in 3:45, 16.7k output tokens 🟡 Step 3.7 Flash → half the fields, 57k output tokens 🔴 Gemini 3.5 Flash → 489k output tokens, never finished. I had to kill it. Gemini burned ~29x more output tokens than Kimi on the exact same task, and Kimi's was the only form that actually looked filled out. The test, a mocked application form, contains some challenging parts, such as one-character-per-box fields. I provided every model the same set of tools: > get canvas size > drop probe markers to find coordinates > add text > add checkmarks > move elements > take a screenshot anytime to check their own work > ... etc So it's vision + spatial reasoning + tool use + long context, all at once. Small models (Qwen, Gemma) can't really complete this test, so I skipped them. What happened: > Kimi nailed name, DOB, ID, gender, marital status, nationality, email, phone, address, postal code — placement slightly loose, but content correct. 15 turns. Clean. > Step got maybe half right — fields dropped, "United States" landed in the email line, data floating outside boxes. Burned 1.24M input tokens doing it (81 turns of re-reading the canvas). > Gemini almost got there visually... then spiraled. By turn 40 it was issuing a delete_elements call wiping element IDs 365–425, basically erasing its own work. 31 minutes, 489k output tokens, still streaming. Terminated. The takeaway isn't "Gemini bad." This test is indeed difficult. But token efficiency is capability now. A model that needs 30x the tokens and still can't converge is going to be 30x the cost in production. Kimi K2.6 just quietly did the thing.

I designed a new test specifically for multimodal models: fill out a paper form. And it's much harder than it sounds. This isn't typing into an electronic field that captures your text. The form is just an image. The model has to place each form element: text, checkmarks — at the correct pixel position on the canvas itself. Results: 🟢 Kimi K2.6 → done in 3:45, 16.7k output tokens 🟡 Step 3.7 Flash → half the fields, 57k output tokens 🔴 Gemini 3.5 Flash → 489k output tokens, never finished. I had to kill it. Gemini burned ~29x more output tokens than Kimi on the exact same task, and Kimi's was the only form that actually looked filled out. The test, a mocked application form, contains some challenging parts, such as one-character-per-box fields. I provided every model the same set of tools: > get canvas size > drop probe markers to find coordinates > add text > add checkmarks > move elements > take a screenshot anytime to check their own work > ... etc So it's vision + spatial reasoning + tool use + long context, all at once. Small models (Qwen, Gemma) can't really complete this test, so I skipped them. What happened: > Kimi nailed name, DOB, ID, gender, marital status, nationality, email, phone, address, postal code — placement slightly loose, but content correct. 15 turns. Clean. > Step got maybe half right — fields dropped, "United States" landed in the email line, data floating outside boxes. Burned 1.24M input tokens doing it (81 turns of re-reading the canvas). > Gemini almost got there visually... then spiraled. By turn 40 it was issuing a delete_elements call wiping element IDs 365–425, basically erasing its own work. 31 minutes, 489k output tokens, still streaming. Terminated. The takeaway isn't "Gemini bad." This test is indeed difficult. But token efficiency is capability now. A model that needs 30x the tokens and still can't converge is going to be 30x the cost in production. Kimi K2.6 just quietly did the thing.

stevibe

25,304 Aufrufe • vor 1 Monat

NVIDIA is giving away free access to 130+ AI models for a full year > most people building AI agents are paying $50-200/month for API access NVIDIA just made that argument irrelevant models you get: MiniMax M2.7, GLM 5.1, Kimi 2.5, DeepSeek-v4-flash, GPT-OSS-120B and 110+ more setup: > step 1 - get your free key > go to > register -> bind phone -> copy API key > step 2 - add to Hermes agent > open Settings -> Model Provider -> Custom base_url = " api_key = "nvapi-xxxxxxxxxxxxxxxxxxxx" > step 3 - pick a model model = "minimaxai/minimax-m2.7" model = "zhipuai/glm-5.1" model = "moonshot-ai/kimi-2.5" model = "deepseek-ai/deepseek-v4-flash" model = "nvidia/nemotron-3-ultra-550b-a55b" > Hermes already has NVIDIA set as default base_url > paste the key and you're running instantly > works the same in Cursor and OpenCode > cost: $0 > limit: 40 req/min > expires: 1 year while everyone is paying for API access, this is sitting there for free

NVIDIA is giving away free access to 130+ AI models for a full year > most people building AI agents are paying $50-200/month for API access NVIDIA just made that argument irrelevant models you get: MiniMax M2.7, GLM 5.1, Kimi 2.5, DeepSeek-v4-flash, GPT-OSS-120B and 110+ more setup: > step 1 - get your free key > go to > register -> bind phone -> copy API key > step 2 - add to Hermes agent > open Settings -> Model Provider -> Custom base_url = " api_key = "nvapi-xxxxxxxxxxxxxxxxxxxx" > step 3 - pick a model model = "minimaxai/minimax-m2.7" model = "zhipuai/glm-5.1" model = "moonshot-ai/kimi-2.5" model = "deepseek-ai/deepseek-v4-flash" model = "nvidia/nemotron-3-ultra-550b-a55b" > Hermes already has NVIDIA set as default base_url > paste the key and you're running instantly > works the same in Cursor and OpenCode > cost: $0 > limit: 40 req/min > expires: 1 year while everyone is paying for API access, this is sitting there for free

Mr. Buzzoni

192,952 Aufrufe • vor 20 Tagen

Anthropic's in trouble, again. The entire Claude experience is now available at 1/6th the price. Kimi now does everything Claude does, powered by K2.6, a 1-trillion-parameter MoE model that activates only 32B parameters per token. It covers all three features Claude has (Chat, Code, and Cowork): 1) Kimi Chat runs in four modes - Instant for fast responses - Thinking for deep reasoning - Agent for multi-step execution - and Agent Swarm for parallel workloads. There's a 262K context window across all of them. 2) Kimi Code is the open-source CLI coding agent with K2.6 as the default backend. K2.6 ranked #1 on OpenRouter's programming leaderboard by weekly usage. 3) Kimi Agent is the Cowork equivalent. It generates: - full websites with database and auth - presentation decks (editable PPTX output) - spreadsheets with formulas and charts - word docs and structured research reports. On top of this, Kimi K2.6 is also trained to decompose tasks into up to 300 parallel sub-agents. This helps it retain coherence even across 4,000+ tool calls in a single run, with sessions sustaining up to 13 hours. On SWE-Bench Pro: - Kimi K2.6 → 58.6 - GPT-5.4 xhigh → 57.7 - Gemini 3.1 Pro → 54.2 - Claude Opus 4.6 → 53.4 Kimi K2.6 model is open weights and self-hostable on 4x H100s in INT4. Find the link to the HuggingFace model page in the replies!

Anthropic's in trouble, again. The entire Claude experience is now available at 1/6th the price. Kimi now does everything Claude does, powered by K2.6, a 1-trillion-parameter MoE model that activates only 32B parameters per token. It covers all three features Claude has (Chat, Code, and Cowork): 1) Kimi Chat runs in four modes - Instant for fast responses - Thinking for deep reasoning - Agent for multi-step execution - and Agent Swarm for parallel workloads. There's a 262K context window across all of them. 2) Kimi Code is the open-source CLI coding agent with K2.6 as the default backend. K2.6 ranked #1 on OpenRouter's programming leaderboard by weekly usage. 3) Kimi Agent is the Cowork equivalent. It generates: - full websites with database and auth - presentation decks (editable PPTX output) - spreadsheets with formulas and charts - word docs and structured research reports. On top of this, Kimi K2.6 is also trained to decompose tasks into up to 300 parallel sub-agents. This helps it retain coherence even across 4,000+ tool calls in a single run, with sessions sustaining up to 13 hours. On SWE-Bench Pro: - Kimi K2.6 → 58.6 - GPT-5.4 xhigh → 57.7 - Gemini 3.1 Pro → 54.2 - Claude Opus 4.6 → 53.4 Kimi K2.6 model is open weights and self-hostable on 4x H100s in INT4. Find the link to the HuggingFace model page in the replies!

Avi Chawla

109,027 Aufrufe • vor 1 Monat

I have been testing DeepSeek-V4-Pro with the Pi coding agent. I am mindblown by how well it works out of the box. A few notes: I spent a few hours building an LLM wiki with an agent powered entirely by DeepSeek-V4-Pro on Fireworks AI inference. This is the first time I feel like there is an open-weight model that can reason at the level of Claude and Codex. And it does this in a cost-effective way with support for 1M context length. To be clear, I am using DeepSeek-V4-Pro inside of Pi without any special configuration. It works out of the box. It's exciting that there is a model that can just be plugged into a basic harness like Pi, and it just works. I've never seen that before. Most models require lots of configuration and setup. DeepSeek's DeepSeek-V4-Pro is clearly good at agentic coding (probably the best from the open-weight models), but the model is also great on knowledge-intensive tasks where reasoning matters. The agent pulled agentic engineering best practices from different company docs (Anthropic, OpenAI, Google, Stripe, Meta, Modal, DeepSeek, Mistral, Cohere), searched and digested Reddit and HN threads, summarized arxiv papers, and surfaced trending GitHub repos. Then it distilled everything into actionable tips across categories. I love the Wiki it built. The quality is really good. Here is a snapshot of what the wiki looks like: DeepSeek-V4-Pro handled the task without breaking stride. Multi-step research queries, code generation for scaffolding, context-heavy reasoning across disparate sources. For coding specifically, this is the first open-weight model that genuinely feels like a Codex or Claude Code experience. It compares in capability and actual multi-turn agentic work. What made the loop feel so responsive was Fireworks' inference speed (the fastest in the market) and the fact that they actually validate models at the systems level before shipping. No corrupted reasoning traces. Just fast, reliable iteration. The hybrid CSA and HCA attention design cuts KV cache to just 10% and inference FLOPs by nearly 4x at 1M-token context. This is what makes the agent loop actually fast and cheap enough to run in practice. For devs who've been watching open-weight models close the gap but haven't found one that actually delivers in practice, this is the closest I've seen. Try it here:

I have been testing DeepSeek-V4-Pro with the Pi coding agent. I am mindblown by how well it works out of the box. A few notes: I spent a few hours building an LLM wiki with an agent powered entirely by DeepSeek-V4-Pro on Fireworks AI inference. This is the first time I feel like there is an open-weight model that can reason at the level of Claude and Codex. And it does this in a cost-effective way with support for 1M context length. To be clear, I am using DeepSeek-V4-Pro inside of Pi without any special configuration. It works out of the box. It's exciting that there is a model that can just be plugged into a basic harness like Pi, and it just works. I've never seen that before. Most models require lots of configuration and setup. DeepSeek's DeepSeek-V4-Pro is clearly good at agentic coding (probably the best from the open-weight models), but the model is also great on knowledge-intensive tasks where reasoning matters. The agent pulled agentic engineering best practices from different company docs (Anthropic, OpenAI, Google, Stripe, Meta, Modal, DeepSeek, Mistral, Cohere), searched and digested Reddit and HN threads, summarized arxiv papers, and surfaced trending GitHub repos. Then it distilled everything into actionable tips across categories. I love the Wiki it built. The quality is really good. Here is a snapshot of what the wiki looks like: DeepSeek-V4-Pro handled the task without breaking stride. Multi-step research queries, code generation for scaffolding, context-heavy reasoning across disparate sources. For coding specifically, this is the first open-weight model that genuinely feels like a Codex or Claude Code experience. It compares in capability and actual multi-turn agentic work. What made the loop feel so responsive was Fireworks' inference speed (the fastest in the market) and the fact that they actually validate models at the systems level before shipping. No corrupted reasoning traces. Just fast, reliable iteration. The hybrid CSA and HCA attention design cuts KV cache to just 10% and inference FLOPs by nearly 4x at 1M-token context. This is what makes the agent loop actually fast and cheap enough to run in practice. For devs who've been watching open-weight models close the gap but haven't found one that actually delivers in practice, this is the closest I've seen. Try it here:

elvis

59,057 Aufrufe • vor 2 Monaten

Cerebras inference is very fast. So fast that it changes how we think about configuring our LLMs for voice agent use cases. Kimi K2.6 is a 1T parameter reasoning model that Cerebras serves at 650 - 1,000 tokens per second (end-to-end throughput), with time to first token metrics as low as 150ms (latency). These numbers are two to three times faster than other similarly capable models. The biggest lever we get from this kind of speed is that we can use the model in reasoning mode, and still have excellent "time to first non-thinking token." This solves a big pain point we have in 2026 for voice agent use cases. Almost all recent innovation in post-training has focused on making models good at reasoning ("test time compute"). This is great, but it makes the user-facing model latency much, much slower. Which is a problem for conversational voice agents. We can run Kimi K2.6 with reasoning turned on, and get responses faster than other models produce with reasoning disabled. On my 30-turn voice agent benchmark, Kimi K2.6 with reasoning enabled ties GPT 5.1 and Haiku 4.5 with reasoning disabled, and is still about 200ms seconds faster! On my primary task agent benchmark, Kimi K2.6 is now the #2 model. It ranks just behind Gemini 3.5 Flash in "high" reasoning mode, and tied with GLM 5, Sonnet 4.6, and GPT 5.4 with reasoning set to "low." But Kimi K2.6 completes each turn in the agent loop in under 500ms. The other four models are all at least 3x slower. (Models only qualify for this benchmark if they can complete task turns at a P50 <4s.) A couple of other things that this speed buys us, for production voice agents: - Tool calls happen fast enough that we don't have to work around tool call latency in our pipeline design. - We can prompt the model to output structured data at the beginning of a response, followed by plain text for voice generation. This opens up possibilities like asking the model to do complex classification/generation tasks that influence the rest of the pipeline. For example, the model could create a detailed style prompt for a steerable TTS model, for each individual conversation turn. And, of course, you can use Kimi K2.6 with reasoning turned off. Cerebras calls this "instant" mode. Here's a video of a Cerebras Kimi K2.6 voice agent with voice-to-voice response time, measured at the client, under 500ms. This is the true response latency as perceived by the user, including all network and audio codec overhead, transcription and turn detection, Kimi K2.6 token generation, and voice generation. 500ms is, effectively, instant. So the Cerebras naming for this mode is a propos. :-)

Cerebras inference is very fast. So fast that it changes how we think about configuring our LLMs for voice agent use cases. Kimi K2.6 is a 1T parameter reasoning model that Cerebras serves at 650 - 1,000 tokens per second (end-to-end throughput), with time to first token metrics as low as 150ms (latency). These numbers are two to three times faster than other similarly capable models. The biggest lever we get from this kind of speed is that we can use the model in reasoning mode, and still have excellent "time to first non-thinking token." This solves a big pain point we have in 2026 for voice agent use cases. Almost all recent innovation in post-training has focused on making models good at reasoning ("test time compute"). This is great, but it makes the user-facing model latency much, much slower. Which is a problem for conversational voice agents. We can run Kimi K2.6 with reasoning turned on, and get responses faster than other models produce with reasoning disabled. On my 30-turn voice agent benchmark, Kimi K2.6 with reasoning enabled ties GPT 5.1 and Haiku 4.5 with reasoning disabled, and is still about 200ms seconds faster! On my primary task agent benchmark, Kimi K2.6 is now the #2 model. It ranks just behind Gemini 3.5 Flash in "high" reasoning mode, and tied with GLM 5, Sonnet 4.6, and GPT 5.4 with reasoning set to "low." But Kimi K2.6 completes each turn in the agent loop in under 500ms. The other four models are all at least 3x slower. (Models only qualify for this benchmark if they can complete task turns at a P50 <4s.) A couple of other things that this speed buys us, for production voice agents: - Tool calls happen fast enough that we don't have to work around tool call latency in our pipeline design. - We can prompt the model to output structured data at the beginning of a response, followed by plain text for voice generation. This opens up possibilities like asking the model to do complex classification/generation tasks that influence the rest of the pipeline. For example, the model could create a detailed style prompt for a steerable TTS model, for each individual conversation turn. And, of course, you can use Kimi K2.6 with reasoning turned off. Cerebras calls this "instant" mode. Here's a video of a Cerebras Kimi K2.6 voice agent with voice-to-voice response time, measured at the client, under 500ms. This is the true response latency as perceived by the user, including all network and audio codec overhead, transcription and turn detection, Kimi K2.6 token generation, and voice generation. 500ms is, effectively, instant. So the Cerebras naming for this mode is a propos. :-)

kwindla

40,319 Aufrufe • vor 1 Monat

Just tested the Kimi-VL 3B model on hugging face and it's surprisingly powerful for its size - Outperforms larger models like GPT-4o on key benchmarks - Open source - Strong reasoning capabilities too .

Just tested the Kimi-VL 3B model on hugging face and it's surprisingly powerful for its size - Outperforms larger models like GPT-4o on key benchmarks - Open source - Strong reasoning capabilities too .

AshutoshShrivastava

12,389 Aufrufe • vor 1 Jahr

i built a full game on a single GPU with a 3B model and this is the worst local AI will ever be. this was supposed to be a benchmark test. load the model, measure tokens per second, write it up, move on. instead i spent 20 minutes playing Octopus Invaders because the game is genuinely fun and i couldn't stop. a model with 3B active parameters built this from a single prompt. it debugged its own collision system when bullets were phasing through enemies. read the error, found the fix, kept building. this is not a frontier API. this is a quantized open source model running on hardware you can buy used for $800-$1200. no cloud. no subscription. no API costs. just a mass produced consumer GPU doing things that would have been absurd 12 months ago. and here's the part that should keep you up at night: every month the models get smaller and smarter. the quants get tighter. the context windows get longer. the tooling gets cleaner. what 3B active parameters does today on 24gb, a 1B model will do on 8gb within a year. you are looking at the floor. not the ceiling.

i built a full game on a single GPU with a 3B model and this is the worst local AI will ever be. this was supposed to be a benchmark test. load the model, measure tokens per second, write it up, move on. instead i spent 20 minutes playing Octopus Invaders because the game is genuinely fun and i couldn't stop. a model with 3B active parameters built this from a single prompt. it debugged its own collision system when bullets were phasing through enemies. read the error, found the fix, kept building. this is not a frontier API. this is a quantized open source model running on hardware you can buy used for $800-$1200. no cloud. no subscription. no API costs. just a mass produced consumer GPU doing things that would have been absurd 12 months ago. and here's the part that should keep you up at night: every month the models get smaller and smarter. the quants get tighter. the context windows get longer. the tooling gets cleaner. what 3B active parameters does today on 24gb, a 1B model will do on 8gb within a year. you are looking at the floor. not the ceiling.

Sudo su

36,251 Aufrufe • vor 4 Monaten

We added 4 major open weight models in April. These powerful models are available for encrypted inference in the Maple apps, web, and API: - DeepSeek DeepSeek V4 - Kimi.ai Kimi K2.6 - Z.ai GLM 5.1 - Google Gemma Gemma 4 Power and privacy in one spot.

We added 4 major open weight models in April. These powerful models are available for encrypted inference in the Maple apps, web, and API: - DeepSeek DeepSeek V4 - Kimi.ai Kimi K2.6 - Z.ai GLM 5.1 - Google Gemma Gemma 4 Power and privacy in one spot.

Maple

11,630 Aufrufe • vor 2 Monaten

DeepSeek V4 Flash running on the DGX Station.

DeepSeek V4 Flash running on the DGX Station.

Matthew Berman

70,555 Aufrufe • vor 1 Monat

DeepSeek V4 Pro vs Xiaomi MIMO V2.5 Pro MIMO V2.5-Pro wins by: 1. ~1 min faster than DeepSeek-V4-Pro 2. cleaner aesthetic Our reasoning: MIMO V2.5 Pro is natively multimodal, so it has a stronger sense of what good visuals look like. Curious to see how DeepSeek V4 Vision will change that

DeepSeek V4 Pro vs Xiaomi MIMO V2.5 Pro MIMO V2.5-Pro wins by: 1. ~1 min faster than DeepSeek-V4-Pro 2. cleaner aesthetic Our reasoning: MIMO V2.5 Pro is natively multimodal, so it has a stronger sense of what good visuals look like. Curious to see how DeepSeek V4 Vision will change that

GMI Cloud

40,379 Aufrufe • vor 2 Monaten

Here's DeepSeek v4 Pro. Added to the playable gallery as well.

Here's DeepSeek v4 Pro. Added to the playable gallery as well.

Ethan Mollick

38,946 Aufrufe • vor 2 Monaten

Stop paying for 5 different AI subscriptions. ChatLLM by Abacus AI puts GPT-5.5, Claude 4.7, Gemini 3.1, DeepSeek-V4, Kimi-2.6 in one place. One prompt → routed to the best model → output. Pay once. Use all of them.

Stop paying for 5 different AI subscriptions. ChatLLM by Abacus AI puts GPT-5.5, Claude 4.7, Gemini 3.1, DeepSeek-V4, Kimi-2.6 in one place. One prompt → routed to the best model → output. Pay once. Use all of them.

Abacus.AI

26,747,516 Aufrufe • vor 1 Monat

Pi Agent vs OpenCode token usage A lot of people recommended Pi Agent so I decided to check Pi Agent took 1.1k tokens in first turn OpenCode took 11.5k Setup: 1) Trimmed OpenCode (from usual 30k first turn to 11.5k) - 0 MCPs - 2 lightweight plugins (opencode-env-protect and openslimedit) - 8k char AGENTS.md - 11585 deepseek-v4-flash input tokens - for $0.0016 2) Vanilla Pi - 0 MCPs - 0 system prompt - 8k char AGENTS.md - 1114 kimi-k2.6 input tokens - for $0.0008 I think using better models with capped tokens per turn can keep usage nearly the same as uncontrolled DeepSeek V4 Flash? The challenge now is finding the sweet spot. Can't cap tokens if quality drops. We'll see. Video below - OpenCode vs Pi side by side "say hi back" first turns test, with OpenCode Go usage for each

Pi Agent vs OpenCode token usage A lot of people recommended Pi Agent so I decided to check Pi Agent took 1.1k tokens in first turn OpenCode took 11.5k Setup: 1) Trimmed OpenCode (from usual 30k first turn to 11.5k) - 0 MCPs - 2 lightweight plugins (opencode-env-protect and openslimedit) - 8k char AGENTS.md - 11585 deepseek-v4-flash input tokens - for $0.0016 2) Vanilla Pi - 0 MCPs - 0 system prompt - 8k char AGENTS.md - 1114 kimi-k2.6 input tokens - for $0.0008 I think using better models with capped tokens per turn can keep usage nearly the same as uncontrolled DeepSeek V4 Flash? The challenge now is finding the sweet spot. Can't cap tokens if quality drops. We'll see. Video below - OpenCode vs Pi side by side "say hi back" first turns test, with OpenCode Go usage for each

raymel 👋

200,631 Aufrufe • vor 1 Monat

This is from Apple's State of the Union The local model is a 3B parameter SLM that uses adapters trained for each specific feature. Diffusion model does the same thing, adapter for each style. Anything running locally or Apple's Secure Cloud is an Apple model, not OpenAI.

This is from Apple's State of the Union The local model is a 3B parameter SLM that uses adapters trained for each specific feature. Diffusion model does the same thing, adapter for each style. Anything running locally or Apple's Secure Cloud is an Apple model, not OpenAI.

Max Weinbach

2,648,083 Aufrufe • vor 2 Jahren

Testing DeepSeek R1 in VSCode with CodeGPT Step-by-step guide to connect: ✨ Select "LLMs Cloud model" ✨ Choose DeepSeek as the provider ✨ Pick the "deepseek-reasoner" model ✨ Select code and/or files from your project and send them to the model That’s it! You’re all set to use this amazing DeepSeek model... 🚀 P.S.: Update to the latest version of CodeGPT to start using it!

Testing DeepSeek R1 in VSCode with CodeGPT Step-by-step guide to connect: ✨ Select "LLMs Cloud model" ✨ Choose DeepSeek as the provider ✨ Pick the "deepseek-reasoner" model ✨ Select code and/or files from your project and send them to the model That’s it! You’re all set to use this amazing DeepSeek model... 🚀 P.S.: Update to the latest version of CodeGPT to start using it!

Daniel San

34,346 Aufrufe • vor 1 Jahr