Video wird geladen...

Video konnte nicht geladen werden

Beim Laden dieses Videos ist ein Problem aufgetreten. Dies könnte an einem vorübergehenden Netzwerkproblem liegen oder das Video ist möglicherweise nicht verfügbar.

Hello, Kimi K2.6.

stevibe

20,711 subscribers

52,761 Aufrufe • vor 1 Monat •via X (Twitter)

Wissenschaft & Technologie

Anya Rossi• Live Now

Private livecam show

0 Kommentare

Keine Kommentare verfügbar

Kommentare vom Original-Post werden hier angezeigt

Ähnliche Videos

Kimi-K2.6 running locally 110+ tok/s

Kimi-K2.6 running locally 110+ tok/s

0xSero

24,607 Aufrufe • vor 16 Tagen

One-shotted WebGL website by Kimi K2.6 😭

One-shotted WebGL website by Kimi K2.6 😭

Crystal

43,839 Aufrufe • vor 1 Monat

Kimi K2.6 is a beast for design. ❤️‍🔥 Just recorded a tutorial on how to use Kimi K2.6 to build award-winning $10k websites (step-by-step).

Kimi K2.6 is a beast for design. ❤️‍🔥 Just recorded a tutorial on how to use Kimi K2.6 to build award-winning $10k websites (step-by-step).

Viktor Oddy

105,155 Aufrufe • vor 1 Monat

Kimi K2.6 is actually a beast for design. ❤️‍🔥 Just recorded a tutorial on how to use Kimi K2.6 + to build award-winning $10k websites (step-by-step).

Kimi K2.6 is actually a beast for design. ❤️‍🔥 Just recorded a tutorial on how to use Kimi K2.6 + to build award-winning $10k websites (step-by-step).

Viktor Oddy

414,768 Aufrufe • vor 1 Monat

Which LLMs actually love to think? Tested 7 models on 5 math problems, measured reasoning length. The think winners: both Qwen3.5 models (27B and 35B A3B) — massive overthinkers, up to 10k+ tokens on a single question. Plot twists: > Kimi K2.6 feels verbose, actually one of the leanest > Gemma4 26B A4B solved 2 with ZERO thinking

Which LLMs actually love to think? Tested 7 models on 5 math problems, measured reasoning length. The think winners: both Qwen3.5 models (27B and 35B A3B) — massive overthinkers, up to 10k+ tokens on a single question. Plot twists: > Kimi K2.6 feels verbose, actually one of the leanest > Gemma4 26B A4B solved 2 with ZERO thinking

stevibe

97,403 Aufrufe • vor 1 Monat

Curious about ollama kimi-k2.6:cloud speed? 3 test runs: > 77.9 tok/s, TTFT 979ms > 114.3 tok/s, TTFT 788ms > 86.3 tok/s, TTFT 1117ms For comparison, OpenRouter stats: > Parasail: 14 tok/s > Moonshot AI: 27 tok/s > NovitaAI: 27 tok/s > Cloudflare: 71 tok/s Obvious caveat: cloud speeds fluctuate with load. Just sharing numbers for the curious.

Curious about ollama kimi-k2.6:cloud speed? 3 test runs: > 77.9 tok/s, TTFT 979ms > 114.3 tok/s, TTFT 788ms > 86.3 tok/s, TTFT 1117ms For comparison, OpenRouter stats: > Parasail: 14 tok/s > Moonshot AI: 27 tok/s > NovitaAI: 27 tok/s > Cloudflare: 71 tok/s Obvious caveat: cloud speeds fluctuate with load. Just sharing numbers for the curious.

stevibe

35,129 Aufrufe • vor 1 Monat

Kimi Code is good at video reasoning with Kimi K2.6.🎬 Drag in reference videos, ask about colors, shots, or visual style, and Kimi Code can generate a ready-to-use .cube LUT file.🎞️

Kimi Code is good at video reasoning with Kimi K2.6.🎬 Drag in reference videos, ask about colors, shots, or visual style, and Kimi Code can generate a ready-to-use .cube LUT file.🎞️

Kimi Developers

13,574 Aufrufe • vor 10 Tagen

lol...Kimi really cooked Claude Design tested the same prompt on both Kimi K2.6 and Claude Design, and these are the outputs not to mention Kimi is 7x cheaper and 100% open source... see prompt in the comments👇

lol...Kimi really cooked Claude Design tested the same prompt on both Kimi K2.6 and Claude Design, and these are the outputs not to mention Kimi is 7x cheaper and 100% open source... see prompt in the comments👇

Farhan

323,403 Aufrufe • vor 1 Monat

Six open-source LLMs. One sliding puzzle. A brutal test of long-horizon reasoning and tool calling. Five of them broke. One didn't. I gave each model a move_tile tool and a scrambled 3×3 board, then asked it to solve the puzzle through pure turn-by-turn reasoning. The deeper the scramble, the more brutal the search. Five runs per depth, best run kept. A model fails the round if it exceeds 6x the optimal move count. > Depth 5: Everyone solves it. Yawn. > Depth 10: GLM 5.1 melts down. 43 moves. Cut. > Depth 12: Gemma4 26B loses the plot, shuffling tiles in circles. Gone. > Depth 15: The wall. DeepSeek V4 Flash, out. DeepSeek V4 Pro, out. Gemma4, out again. GLM 5.1, out. Two survivors: Qwen3.6 35B-A3B, and Kimi K2.6 with an 11-move solve that looked like cheating. > Depth 18: Same two. Everyone else hallucinating tiles that weren't there. > Depth 22: Final boss. Kimi, flawless for five rounds, finally cracks. 81 moves. Still scrambled. DeepSeek V4 Pro limps home at 90. Qwen3.6 35B-A3B solves it in 36. The smallest model in the room. ~3B active params. Fits on a single 3090. It beat everything. Kimi was elegant. Qwen3.6 was unstoppable.

Six open-source LLMs. One sliding puzzle. A brutal test of long-horizon reasoning and tool calling. Five of them broke. One didn't. I gave each model a move_tile tool and a scrambled 3×3 board, then asked it to solve the puzzle through pure turn-by-turn reasoning. The deeper the scramble, the more brutal the search. Five runs per depth, best run kept. A model fails the round if it exceeds 6x the optimal move count. > Depth 5: Everyone solves it. Yawn. > Depth 10: GLM 5.1 melts down. 43 moves. Cut. > Depth 12: Gemma4 26B loses the plot, shuffling tiles in circles. Gone. > Depth 15: The wall. DeepSeek V4 Flash, out. DeepSeek V4 Pro, out. Gemma4, out again. GLM 5.1, out. Two survivors: Qwen3.6 35B-A3B, and Kimi K2.6 with an 11-move solve that looked like cheating. > Depth 18: Same two. Everyone else hallucinating tiles that weren't there. > Depth 22: Final boss. Kimi, flawless for five rounds, finally cracks. 81 moves. Still scrambled. DeepSeek V4 Pro limps home at 90. Qwen3.6 35B-A3B solves it in 36. The smallest model in the room. ~3B active params. Fits on a single 3090. It beat everything. Kimi was elegant. Qwen3.6 was unstoppable.

stevibe

16,286 Aufrufe • vor 1 Monat

I designed a new test specifically for multimodal models: fill out a paper form. And it's much harder than it sounds. This isn't typing into an electronic field that captures your text. The form is just an image. The model has to place each form element: text, checkmarks — at the correct pixel position on the canvas itself. Results: 🟢 Kimi K2.6 → done in 3:45, 16.7k output tokens 🟡 Step 3.7 Flash → half the fields, 57k output tokens 🔴 Gemini 3.5 Flash → 489k output tokens, never finished. I had to kill it. Gemini burned ~29x more output tokens than Kimi on the exact same task, and Kimi's was the only form that actually looked filled out. The test, a mocked application form, contains some challenging parts, such as one-character-per-box fields. I provided every model the same set of tools: > get canvas size > drop probe markers to find coordinates > add text > add checkmarks > move elements > take a screenshot anytime to check their own work > ... etc So it's vision + spatial reasoning + tool use + long context, all at once. Small models (Qwen, Gemma) can't really complete this test, so I skipped them. What happened: > Kimi nailed name, DOB, ID, gender, marital status, nationality, email, phone, address, postal code — placement slightly loose, but content correct. 15 turns. Clean. > Step got maybe half right — fields dropped, "United States" landed in the email line, data floating outside boxes. Burned 1.24M input tokens doing it (81 turns of re-reading the canvas). > Gemini almost got there visually... then spiraled. By turn 40 it was issuing a delete_elements call wiping element IDs 365–425, basically erasing its own work. 31 minutes, 489k output tokens, still streaming. Terminated. The takeaway isn't "Gemini bad." This test is indeed difficult. But token efficiency is capability now. A model that needs 30x the tokens and still can't converge is going to be 30x the cost in production. Kimi K2.6 just quietly did the thing.

I designed a new test specifically for multimodal models: fill out a paper form. And it's much harder than it sounds. This isn't typing into an electronic field that captures your text. The form is just an image. The model has to place each form element: text, checkmarks — at the correct pixel position on the canvas itself. Results: 🟢 Kimi K2.6 → done in 3:45, 16.7k output tokens 🟡 Step 3.7 Flash → half the fields, 57k output tokens 🔴 Gemini 3.5 Flash → 489k output tokens, never finished. I had to kill it. Gemini burned ~29x more output tokens than Kimi on the exact same task, and Kimi's was the only form that actually looked filled out. The test, a mocked application form, contains some challenging parts, such as one-character-per-box fields. I provided every model the same set of tools: > get canvas size > drop probe markers to find coordinates > add text > add checkmarks > move elements > take a screenshot anytime to check their own work > ... etc So it's vision + spatial reasoning + tool use + long context, all at once. Small models (Qwen, Gemma) can't really complete this test, so I skipped them. What happened: > Kimi nailed name, DOB, ID, gender, marital status, nationality, email, phone, address, postal code — placement slightly loose, but content correct. 15 turns. Clean. > Step got maybe half right — fields dropped, "United States" landed in the email line, data floating outside boxes. Burned 1.24M input tokens doing it (81 turns of re-reading the canvas). > Gemini almost got there visually... then spiraled. By turn 40 it was issuing a delete_elements call wiping element IDs 365–425, basically erasing its own work. 31 minutes, 489k output tokens, still streaming. Terminated. The takeaway isn't "Gemini bad." This test is indeed difficult. But token efficiency is capability now. A model that needs 30x the tokens and still can't converge is going to be 30x the cost in production. Kimi K2.6 just quietly did the thing.

stevibe

25,304 Aufrufe • vor 19 Tagen

Anthropic's in trouble, again. The entire Claude experience is now available at 1/6th the price. Kimi now does everything Claude does, powered by K2.6, a 1-trillion-parameter MoE model that activates only 32B parameters per token. It covers all three features Claude has (Chat, Code, and Cowork): 1) Kimi Chat runs in four modes - Instant for fast responses - Thinking for deep reasoning - Agent for multi-step execution - and Agent Swarm for parallel workloads. There's a 262K context window across all of them. 2) Kimi Code is the open-source CLI coding agent with K2.6 as the default backend. K2.6 ranked #1 on OpenRouter's programming leaderboard by weekly usage. 3) Kimi Agent is the Cowork equivalent. It generates: - full websites with database and auth - presentation decks (editable PPTX output) - spreadsheets with formulas and charts - word docs and structured research reports. On top of this, Kimi K2.6 is also trained to decompose tasks into up to 300 parallel sub-agents. This helps it retain coherence even across 4,000+ tool calls in a single run, with sessions sustaining up to 13 hours. On SWE-Bench Pro: - Kimi K2.6 → 58.6 - GPT-5.4 xhigh → 57.7 - Gemini 3.1 Pro → 54.2 - Claude Opus 4.6 → 53.4 Kimi K2.6 model is open weights and self-hostable on 4x H100s in INT4. Find the link to the HuggingFace model page in the replies!

Anthropic's in trouble, again. The entire Claude experience is now available at 1/6th the price. Kimi now does everything Claude does, powered by K2.6, a 1-trillion-parameter MoE model that activates only 32B parameters per token. It covers all three features Claude has (Chat, Code, and Cowork): 1) Kimi Chat runs in four modes - Instant for fast responses - Thinking for deep reasoning - Agent for multi-step execution - and Agent Swarm for parallel workloads. There's a 262K context window across all of them. 2) Kimi Code is the open-source CLI coding agent with K2.6 as the default backend. K2.6 ranked #1 on OpenRouter's programming leaderboard by weekly usage. 3) Kimi Agent is the Cowork equivalent. It generates: - full websites with database and auth - presentation decks (editable PPTX output) - spreadsheets with formulas and charts - word docs and structured research reports. On top of this, Kimi K2.6 is also trained to decompose tasks into up to 300 parallel sub-agents. This helps it retain coherence even across 4,000+ tool calls in a single run, with sessions sustaining up to 13 hours. On SWE-Bench Pro: - Kimi K2.6 → 58.6 - GPT-5.4 xhigh → 57.7 - Gemini 3.1 Pro → 54.2 - Claude Opus 4.6 → 53.4 Kimi K2.6 model is open weights and self-hostable on 4x H100s in INT4. Find the link to the HuggingFace model page in the replies!

Avi Chawla

108,747 Aufrufe • vor 1 Monat

Kimi K2.6 was released 1h ago, and it looks amazing! Here it's running with MLX (mlx-vlm) on two M3 Ultras (full 1T param VLM) 🔥

Kimi K2.6 was released 1h ago, and it looks amazing! Here it's running with MLX (mlx-vlm) on two M3 Ultras (full 1T param VLM) 🔥

Pedro Cuenca

65,682 Aufrufe • vor 1 Monat

i made a pendulum simulator 16 pendulums oscillate through wave patterns making music along the way built with kimi k2.6, threejs, tonejs you are getting very sleepy ..

i made a pendulum simulator 16 pendulums oscillate through wave patterns making music along the way built with kimi k2.6, threejs, tonejs you are getting very sleepy ..

AA

65,813 Aufrufe • vor 1 Monat

Introducing Kimi 2.6 Code. A Claude Code-like terminal experience built specifically for Kimi K2.6, effectively making it one of the most powerful open-source coding agents on the planet. Simply bring your API key and use /login. Repo here 👇

Introducing Kimi 2.6 Code. A Claude Code-like terminal experience built specifically for Kimi K2.6, effectively making it one of the most powerful open-source coding agents on the planet. Simply bring your API key and use /login. Repo here 👇

Pietro Schirano

135,373 Aufrufe • vor 1 Monat

Everyone says k2.6 is unusable. I had it build a pokemon style battle app. 1 prompt. Kimi cost 8 to 10 times less on tokens than opus 4.7. Which is better?

Everyone says k2.6 is unusable. I had it build a pokemon style battle app. 1 prompt. Kimi cost 8 to 10 times less on tokens than opus 4.7. Which is better?

Max Blade

127,292 Aufrufe • vor 1 Monat

Kimi K2.6 is godly in terms of webdev and still SOTA chinese model 😼 > best in deep SWE bench even after launches of multiple models from different chinese labs Look at the watery feel one shotted it < using custom harness - 4 kimi in parallel just like grok >

Kimi K2.6 is godly in terms of webdev and still SOTA chinese model 😼 > best in deep SWE bench even after launches of multiple models from different chinese labs Look at the watery feel one shotted it < using custom harness - 4 kimi in parallel just like grok >

Chetaslua

82,515 Aufrufe • vor 11 Tagen

Kimi geldi can kurtardı Kimi edebiyatını yaptı Kimi taş üstüne taş koydu Kimi taşları mezar yaptı Kimi baktı geçti Kimi kaldı dert etti Velhasıl bir insanlık sınavıydı #6subat

Kimi geldi can kurtardı Kimi edebiyatını yaptı Kimi taş üstüne taş koydu Kimi taşları mezar yaptı Kimi baktı geçti Kimi kaldı dert etti Velhasıl bir insanlık sınavıydı #6subat

Taha Hüseyin Karagöz

17,325 Aufrufe • vor 4 Monaten

Pi agent is the Arch Linux of coding agents Qwen 3.6 Plus not caching? > No problem. Just ask Pi to patch itself from a community fork Kimi K2.6 imploding mid reasoning? > No problem. Just pull a snippet from a good Samaritan and let Pi test it on itself Now, I can use the reliable Qwen 3.6 Plus without busting my OpenCode Go sub, and the capable Kimi K2.6 without 404s The beauty of Pi is it can customize itself

Pi agent is the Arch Linux of coding agents Qwen 3.6 Plus not caching? > No problem. Just ask Pi to patch itself from a community fork Kimi K2.6 imploding mid reasoning? > No problem. Just pull a snippet from a good Samaritan and let Pi test it on itself Now, I can use the reliable Qwen 3.6 Plus without busting my OpenCode Go sub, and the capable Kimi K2.6 without 404s The beauty of Pi is it can customize itself

raymel 👋

46,916 Aufrufe • vor 1 Monat

Kimi insan yalan söyler, kimi insan yalancıdır, kimi insan yalan makinası, kimi insan yalan fabrikası, Kimi insan ise 👉🏿 YÜRÜYEN YALANCI..!

Kimi insan yalan söyler, kimi insan yalancıdır, kimi insan yalan makinası, kimi insan yalan fabrikası, Kimi insan ise 👉🏿 YÜRÜYEN YALANCI..!

Cengiz ALÇAYIR

22,007 Aufrufe • vor 2 Jahren

I tested a $4K weekend idea with Kimi K2.6. Found 30 local med spas with weak websites. Generated 30 custom landing pages in ~20 minutes. Cold emailed all 30 at $500. Here’s exactly what happened:

I tested a $4K weekend idea with Kimi K2.6. Found 30 local med spas with weak websites. Generated 30 custom landing pages in ~20 minutes. Cold emailed all 30 at $500. Here’s exactly what happened:

Mujeeb Ahmed

51,286 Aufrufe • vor 1 Monat