Загрузка видео...

Не удалось загрузить видео

На главную

THIS CHINESE DEVELOPER VISUALIZED WHAT 300 KIMI K2.6 AGENTS LOOK LIKE IN ACTION - AND IT LOOKS EXACTLY LIKE A BRAIN WORKING FOR YOU every line on screen is a connection firing in real time - hundreds of neurons across multiple layers, activations lighting up, signals passing through the...

137,889 просмотров • 18 дней назад •via X (Twitter)

Комментарии: 0

Нет доступных комментариев

Здесь появятся комментарии из оригинального поста

Похожие видео

I pay Claude $20 a month. Most $TAO holders do too. There is a stack you can build in 15 minutes that fixes that completely. It runs on Bittensor. It costs $10. You do not write a single line of code. Here is how every AI chat product actually works under the hood. Three layers. Always three. The model. The brain. GPT, Claude, DeepSeek, Kimi, GLM. The inference layer. The GPU that runs the model when you hit send. The interface. The chat box you actually look at. ChatGPT and Claude bundle all three and hand you the result. You cannot change the model. You cannot change the inference. The interface is non-negotiable. Every prompt you type goes to a server run by a private company whose terms of service can quietly change next month. The anti-ChatGPT move is to pick each layer yourself. This is where $TAO comes in. Chutes is Subnet 64 on Bittensor. It is the inference layer. Open source models like DeepSeek, Kimi, GLM, and Llama get served by a global network of miner-operated GPUs. Validators score the output quality. The best inference wins the emissions. You hit send. A miner somewhere runs your prompt. You get the answer back. The TAO you hold is in part paying for the GPU you just used. The basic stack is one URL. chutes. ai/chat No account. No API key. No setup. Switch models mid-conversation. Web search built in. Image generation. File uploads. Free. The advanced stack is Chutes plus TypingMind. One-time license. No recurring fee. Plugins, agents, custom personas, a prompt library you build over months. Full model switching between Chutes, OpenAI, and Anthropic from the same window. Total cost: $10 a month to Chutes for inference. That $10 buys you $50 in actual usage. But here is the signal most people missed inside this story. Chutes ran a free tier until February. Then they killed it. Then they raised the minimum to $10 in May. Most people saw that as bad news. It is the opposite. Free things on the internet do not last. Real products do. Chutes is becoming a real product. A subnet that generates actual revenue from actual users paying actual money for actual AI inference. That is what $43 million in Q1 network revenue looks like at the individual subnet level. And there is one more thing ChatGPT and Claude cannot offer that Chutes already has. Trusted Execution Environments. Your prompt gets encrypted on your device, shipped to a confidential compute GPU, and the lock only breaks inside the chip. The miner running the model physically cannot read your prompt. ChatGPT cannot promise that. Claude cannot promise that. Bittensor already built it. You are holding a network where the subnets are generating real revenue, shipping real privacy infrastructure, and replacing $20 a month centralised subscriptions with $10 a month decentralised inference. The people who use the product always understand the investment better than the people who only watch the price.

2xnmore

26,871 просмотров • 29 дней назад

this video is the CLEAREST explanation of how claude skills + AI agents work and how to use them most people set up an AI agent and wonder why it keeps disappointing them. the context window is everything context is what the model assembles before it takes any action. think of it like everything the agent needs to read before it does anything. the quality of what goes in determines the quality of what comes out. the models are genuinely really good right now. claude and gpt are exceptional. the variable is almost always the context you give them. 1. agent.md files are mostly unnecessary every single line you put in an agent.md file gets added to every single conversation you have with your agent. a 1000 line file is around 7000 tokens burning on every run. the model already knows to use react. it can read your codebase. save the agent.md for proprietary information specific to your company that the model genuinely cannot know on its own. 2. skills are the actual unlock a skill.md file works differently. what loads into context is only the name and description, around 50 tokens. the full instructions only appear when the agent recognizes it needs that skill. so instead of 7000 tokens on every run you have 50. and the agent stays sharp because the context window stays lean. the closer you get to filling the context window the worse the agent performs, same way you perform worse when someone dumps 10 things on you at once. 3. here is how to actually build a skill the right way most people identify a workflow and immediately try to write the skill. what you want to do instead is run the workflow by hand with the agent first. walk it through every single step. tell it what to check, what good looks like, what bad looks like. correct it in real time. once you have had a full successful run from start to finish, tell the agent to review everything it just did and write the skill itself. it writes a better skill than you will because it has the full context of what actually worked in practice not in theory. 4. recursively building skills is how you go from frustrated to reliable when the skill breaks, and it will break, ask the agent exactly why it failed. it will tell you specifically what went wrong. fix it together in that same conversation. then tell it to update the skill file so that failure mode never happens again. ross mike did this five times with his youtube report generator. it now pulls from eight different data sources and runs flawlessly every single time without him touching it. 5. sub agents are something you earn not something you set up on day one start with one agent. build one workflow. turn it into one skill. once that works add another. ross mike has five sub agents now covering marketing, business, personal and more. it took months to get there and every single one exists because a workflow proved it deserved to exist. the people who set up 15 sub agents on day one and wonder why nothing works skipped all the steps that make the thing actually run. 6. your workflow is the thing the model cannot get anywhere else the model has been trained on everything. it knows more than you about most things. what it does not have is your specific process, your taste, your way of doing things. that is what skills capture. that is what makes your agent actually useful versus a generic one. downloading someone else's skill means downloading their context onto your setup and it will not work the way you want it to because it was never built around how you work. this is the clearest explanation of how agents actually work i have heard. Micky runs this stuff every single day and the results show it. full episode is now live on The Startup Ideas Podcast (SIP) 🧃 where you get your pods people charge for this sorta stuff i give away the sauce for free i just want you to win watch

GREG ISENBERG

192,024 просмотров • 2 месяцев назад

An entire empire was overthrown over a two percent tax on a breakfast beverage. Look at what you tolerate now. You are taxed when you earn it. Taxed when you spend it. Taxed when you save it. Taxed when you invest it. And when you die, they tax whatever is left. That is not a system. That is a harvest. You commute in a car you paid sales tax to buy. You drive it on roads you were already taxed to build. You fill it with gas taxed by the gallon. When you sell that car, the next buyer pays sales tax on it again. The same car. Taxed every time it changes hands. You arrive at a job where your salary is cut before it ever touches your hands. If you work for yourself, you pay both sides. Two people on paper. Neither one keeps what they earned. Then you go home. Every bill you open has a government standing behind it with its hand out. You buy a house with money they already took their share of. Then they charge you property tax on it every year for the rest of your life. You want to renovate your own kitchen. You need a permit. You want to build a deck on your own land. You need a permit. You pay for the property. Then you pay for permission to use it. Stop paying property tax and they seize your home. Not because you missed a mortgage payment. Because you missed a payment to the government for the privilege of keeping what is already yours. You do not own your home. You rent it from the state. If you leave something behind for your children, they are taxed on what you were already taxed to earn. The same wealth. Taxed at every stage of your life. Then taxed one final time because you had the audacity to die. They found a way to monetize your absence. We are told this is the price of civilization. It is not. It is architecture. The most effective prison ever built is the one where the inmates believe they are free. They did not take your freedom. They priced you out of it. If you kept the full value of your labor, you would be free within years. Not decades. Years. The system cannot allow that. A machine built on consumption needs a consumer that never stops. You did not sign a social contract. You were assigned one. Now pay attention. They spent decades perfecting the extraction of your productivity. Now they are building the technology to replace you. AI is not coming for your job because corporations are greedy. It is coming because a system that already takes half your output just realized it can take all of it. Without needing you in the equation. You were never the point of this arrangement. You were the input. And the moment they engineer a cheaper one, you become a rounding error on a quarterly earnings call. They did not build AI to free you. They built it to finish what the tax code started. It was never about the tea. It was about the precedent. Today we hand over half our waking lives and thank them for the potholes. You do not live in a free economy. You live in a subscription you never signed up for. And the penalty for canceling is everything you have.

Dustin

27,628 просмотров • 2 месяцев назад

Cerebras inference is very fast. So fast that it changes how we think about configuring our LLMs for voice agent use cases. Kimi K2.6 is a 1T parameter reasoning model that Cerebras serves at 650 - 1,000 tokens per second (end-to-end throughput), with time to first token metrics as low as 150ms (latency). These numbers are two to three times faster than other similarly capable models. The biggest lever we get from this kind of speed is that we can use the model in reasoning mode, and still have excellent "time to first non-thinking token." This solves a big pain point we have in 2026 for voice agent use cases. Almost all recent innovation in post-training has focused on making models good at reasoning ("test time compute"). This is great, but it makes the user-facing model latency much, much slower. Which is a problem for conversational voice agents. We can run Kimi K2.6 with reasoning turned on, and get responses faster than other models produce with reasoning disabled. On my 30-turn voice agent benchmark, Kimi K2.6 with reasoning enabled ties GPT 5.1 and Haiku 4.5 with reasoning disabled, and is still about 200ms seconds faster! On my primary task agent benchmark, Kimi K2.6 is now the #2 model. It ranks just behind Gemini 3.5 Flash in "high" reasoning mode, and tied with GLM 5, Sonnet 4.6, and GPT 5.4 with reasoning set to "low." But Kimi K2.6 completes each turn in the agent loop in under 500ms. The other four models are all at least 3x slower. (Models only qualify for this benchmark if they can complete task turns at a P50 <4s.) A couple of other things that this speed buys us, for production voice agents: - Tool calls happen fast enough that we don't have to work around tool call latency in our pipeline design. - We can prompt the model to output structured data at the beginning of a response, followed by plain text for voice generation. This opens up possibilities like asking the model to do complex classification/generation tasks that influence the rest of the pipeline. For example, the model could create a detailed style prompt for a steerable TTS model, for each individual conversation turn. And, of course, you can use Kimi K2.6 with reasoning turned off. Cerebras calls this "instant" mode. Here's a video of a Cerebras Kimi K2.6 voice agent with voice-to-voice response time, measured at the client, under 500ms. This is the true response latency as perceived by the user, including all network and audio codec overhead, transcription and turn detection, Kimi K2.6 token generation, and voice generation. 500ms is, effectively, instant. So the Cerebras naming for this mode is a propos. :-)

kwindla

40,319 просмотров • 27 дней назад