Video yükleniyor...

Video Yüklenemedi

Ana Sayfaya Dön

A GERMAN DEVELOPER REPLACED HIS ENTIRE DEV TEAM WITH KIMI K2.6, VISUALIZED EVERYTHING IN OBSIDIAN AND NOW MAKES $80,000/MONTH SOLO 1 trillion parameters, 32 billion activated per token and a SWE-Bench score of 65.8 - Kimi K2.6 reads the entire client codebase, understands the architecture, writes production code and...

845,855 görüntüleme • 21 gün önce •via X (Twitter)

0 Yorum

Yorum bulunmuyor

Orijinal gönderinin yorumları burada görünecek

Benzer Videolar

KIMI K2.6 JUST CRUSHED GPT-5 AND A SINGLE PERSON CAN NOW POTENTIALLY BUILD AN $80K/MONTH BUSINESS WITH 300 AI AGENTS AND JUST $500 IN OVERHEAD The video attached is proof that almost everyone missed Kimi K2 Thinking didn’t just score 44.9% on Humanity’s Last Exam, it outperformed GPT-5 (41.7%), Claude, and every other major model across multiple benchmarks It’s open source Over a trillion parameters, trained for just $4.6M Runs locally on a Mac Studio and in the demo, it turns a 100-page PDF into a fully designed PowerPoint presentation in under two minutes while other models are still thinking In the article below, the author lays out a clear blueprint for turning this into a real business: > 300 parallel sub-agents running up to 4000 steps per execution - research, coding, analysis and visual creation all happen simultaneously > 65.8% on SWE-Bench solving real GitHub engineering tasks end-to-end with little to no human intervention > Skill injection through simple .md files - instant vertical specialization (HIPAA compliance, financial regulations, Shopify workflows and more) > Automated client acquisition: monitor job listings for “Data Analyst” or “Automation Engineer” roles and pitch an AI solution before companies even start hiring The math is simple: A $10k project Traditional agency → salaries, office costs, QA, project management and overhead eat most of the profit AI agency powered by Kimi → roughly $500 in operating costs plus one operator managing client relationships = the potential for 72k$+ monthly profit at scale Read the article Save this post Start building AI-native agencies while everyone else is still doing things the old way

Bonsai 🌳

21,487 görüntüleme • 12 gün önce

This Chinese developer runs 9 agents on Claude Code under a GPT-5.5 orchestrator and they close 500 client tasks a month without a single assistant. His client work is closed without him, on a single laptop and only three subscriptions. The entire system lives on one MacBook Pro M4 with 128 GB of memory and subscriptions to Claude Code and GPT-5.5 cost him approximately $300 a month. There is no CRM, no team, no office only a terminal window with 9 parallel streams. The orchestrator works with a simple system prompt: «You are the orchestrator of a client inbox. Classify every incoming email into 4 categories: code, content, analysis, communication. Delegate to the corresponding worker agent. When the result is ready, check it for completeness, send it to the client on my behalf, and mark the task as closed. Do not ask clarifying questions.» And the orchestrator checks the inbox every 30 seconds, classifies fresh emails, and distributes them to 9 worker agents on Claude Code, each of whom is responsible for their own class of tasks. Here is an example of how one of them closes a request to refactor a client's auth module: Task: refactor user-auth module Broke the monolith into 3 files by responsibilities Added unit tests, coverage increased to 87% Renamed 4 functions to camelCase according to the style guide PR is ready for review, link below» And so about 50 cycles a day. By noon 25 tasks are closed, by dinner 50, and by the end of the month 500. On average, it takes about 7 minutes from the appearance of an email in the inbox to sending the result to the client. This is more than what a live team of 6 developers, copywriters and analysts working 8 hours a day closes. This is no longer an agency. This is a workstation where an orchestrator replaces a manager, and 9 worker agents replace the staff. The pipeline goes from inbox to closing 500 times a month without human participation at any step.

Blaze

29,917 görüntüleme • 1 ay önce

Cerebras inference is very fast. So fast that it changes how we think about configuring our LLMs for voice agent use cases. Kimi K2.6 is a 1T parameter reasoning model that Cerebras serves at 650 - 1,000 tokens per second (end-to-end throughput), with time to first token metrics as low as 150ms (latency). These numbers are two to three times faster than other similarly capable models. The biggest lever we get from this kind of speed is that we can use the model in reasoning mode, and still have excellent "time to first non-thinking token." This solves a big pain point we have in 2026 for voice agent use cases. Almost all recent innovation in post-training has focused on making models good at reasoning ("test time compute"). This is great, but it makes the user-facing model latency much, much slower. Which is a problem for conversational voice agents. We can run Kimi K2.6 with reasoning turned on, and get responses faster than other models produce with reasoning disabled. On my 30-turn voice agent benchmark, Kimi K2.6 with reasoning enabled ties GPT 5.1 and Haiku 4.5 with reasoning disabled, and is still about 200ms seconds faster! On my primary task agent benchmark, Kimi K2.6 is now the #2 model. It ranks just behind Gemini 3.5 Flash in "high" reasoning mode, and tied with GLM 5, Sonnet 4.6, and GPT 5.4 with reasoning set to "low." But Kimi K2.6 completes each turn in the agent loop in under 500ms. The other four models are all at least 3x slower. (Models only qualify for this benchmark if they can complete task turns at a P50 <4s.) A couple of other things that this speed buys us, for production voice agents: - Tool calls happen fast enough that we don't have to work around tool call latency in our pipeline design. - We can prompt the model to output structured data at the beginning of a response, followed by plain text for voice generation. This opens up possibilities like asking the model to do complex classification/generation tasks that influence the rest of the pipeline. For example, the model could create a detailed style prompt for a steerable TTS model, for each individual conversation turn. And, of course, you can use Kimi K2.6 with reasoning turned off. Cerebras calls this "instant" mode. Here's a video of a Cerebras Kimi K2.6 voice agent with voice-to-voice response time, measured at the client, under 500ms. This is the true response latency as perceived by the user, including all network and audio codec overhead, transcription and turn detection, Kimi K2.6 token generation, and voice generation. 500ms is, effectively, instant. So the Cerebras naming for this mode is a propos. :-)

kwindla

40,319 görüntüleme • 16 gün önce