Loading video...

Video Failed to Load

There was a problem loading this video. This could be due to a temporary network issue or the video might be unavailable.

I have been testing DeepSeek-V4-Pro with the Pi coding agent. I am mindblown by how well it works out of the box. A few notes: I spent a few hours building an LLM wiki with an agent powered entirely by DeepSeek-V4-Pro on Fireworks AI inference. This is the first... time I feel like there is an open-weight model that can reason at the level of Claude and Codex. And it does this in a cost-effective way with support for 1M context length. To be clear, I am using DeepSeek-V4-Pro inside of Pi without any special configuration. It works out of the box. It's exciting that there is a model that can just be plugged into a basic harness like Pi, and it just works. I've never seen that before. Most models require lots of configuration and setup. DeepSeek's DeepSeek-V4-Pro is clearly good at agentic coding (probably the best from the open-weight models), but the model is also great on knowledge-intensive tasks where reasoning matters. The agent pulled agentic engineering best practices from different company docs (Anthropic, OpenAI, Google, Stripe, Meta, Modal, DeepSeek, Mistral, Cohere), searched and digested Reddit and HN threads, summarized arxiv papers, and surfaced trending GitHub repos. Then it distilled everything into actionable tips across categories. I love the Wiki it built. The quality is really good. Here is a snapshot of what the wiki looks like: DeepSeek-V4-Pro handled the task without breaking stride. Multi-step research queries, code generation for scaffolding, context-heavy reasoning across disparate sources. For coding specifically, this is the first open-weight model that genuinely feels like a Codex or Claude Code experience. It compares in capability and actual multi-turn agentic work. What made the loop feel so responsive was Fireworks' inference speed (the fastest in the market) and the fact that they actually validate models at the systems level before shipping. No corrupted reasoning traces. Just fast, reliable iteration. The hybrid CSA and HCA attention design cuts KV cache to just 10% and inference FLOPs by nearly 4x at 1M-token context. This is what makes the agent loop actually fast and cheap enough to run in practice. For devs who've been watching open-weight models close the gap but haven't found one that actually delivers in practice, this is the closest I've seen. Try it here:show more

elvis

308,474 subscribers

58,555 views • 1 month ago •via X (Twitter)

Gaming Education Science & Technology

Anya Rossi• Live Now

Private livecam show

0 Comments

No comments available

Comments from the original post will appear here

Related Videos

DeepSeek-V4 dropped. 1M context. 10x smaller KV cache. First open model where the context window and the agentic post-training meet.

DeepSeek-V4 dropped. 1M context. 10x smaller KV cache. First open model where the context window and the agentic post-training meet.

Ben Burtenshaw

49,900 views • 2 months ago

The DeepSeek-R1 paper is a gem! Highly encourage everyone to read it. It's clear that LLM reasoning capabilities can be learned in different ways. RL, if applied correctly and at scale, can lead to some really powerful and interesting scaling and emergent properties. There is more to RL than meets the eye! Here is my breakdown of the paper along with a few tests: The multi-state training might not make sense initially but they provide clues on optimizations that we can continue to tap into. Data quality is still very important for enhancing the usability of the LLM. Unlike other reasoning LLMs, DeepSeek-R1's training recipe and weights are open so we can build on top of it. This opens up exciting research opportunities. About the attached clip: the previous preview model wasn't able to solve this task. DeepSeek-R1 can solve this and many other tasks that o1 can solve. It's a very good model for coding and math.

The DeepSeek-R1 paper is a gem! Highly encourage everyone to read it. It's clear that LLM reasoning capabilities can be learned in different ways. RL, if applied correctly and at scale, can lead to some really powerful and interesting scaling and emergent properties. There is more to RL than meets the eye! Here is my breakdown of the paper along with a few tests: The multi-state training might not make sense initially but they provide clues on optimizations that we can continue to tap into. Data quality is still very important for enhancing the usability of the LLM. Unlike other reasoning LLMs, DeepSeek-R1's training recipe and weights are open so we can build on top of it. This opens up exciting research opportunities. About the attached clip: the previous preview model wasn't able to solve this task. DeepSeek-R1 can solve this and many other tasks that o1 can solve. It's a very good model for coding and math.

elvis

140,600 views • 1 year ago

CHINA JUST DROPPED AN AI CODING MODEL WITH A 1M CONTEXT WINDOW. And I connected it to Claude Code to see what it could actually do. Meet GLM-X Preview On paper, a few things immediately stood out: → 1M context window → Agentic coding capabilities → Works inside Claude Code → Designed for large-scale coding and reasoning workflows But specs don't matter much if the model can't deliver in practice. So I gave it a real-world task. THE TEST One prompt: > Build a modern AI lead generation dashboard using React and Tailwind CSS. Requirements: → Dark mode → Analytics dashboard → Lead table → Email outreach section → Responsive design → Production-ready component structure Instead of generating a few snippets, it planned the architecture, generated the dashboard components, created the Tailwind configuration, and walked through the implementation requirements. What impressed me most wasn't the code itself. It was how well it maintained context throughout the workflow. That's where a 1M context window starts becoming useful. Less time re-explaining requirements. Less context loss. More room for complex projects. The AI coding race is getting very interesting. And it's no longer just GPT, Claude, and Gemini competing for attention. Results from my test below 👇

CHINA JUST DROPPED AN AI CODING MODEL WITH A 1M CONTEXT WINDOW. And I connected it to Claude Code to see what it could actually do. Meet GLM-X Preview On paper, a few things immediately stood out: → 1M context window → Agentic coding capabilities → Works inside Claude Code → Designed for large-scale coding and reasoning workflows But specs don't matter much if the model can't deliver in practice. So I gave it a real-world task. THE TEST One prompt: > Build a modern AI lead generation dashboard using React and Tailwind CSS. Requirements: → Dark mode → Analytics dashboard → Lead table → Email outreach section → Responsive design → Production-ready component structure Instead of generating a few snippets, it planned the architecture, generated the dashboard components, created the Tailwind configuration, and walked through the implementation requirements. What impressed me most wasn't the code itself. It was how well it maintained context throughout the workflow. That's where a 1M context window starts becoming useful. Less time re-explaining requirements. Less context loss. More room for complex projects. The AI coding race is getting very interesting. And it's no longer just GPT, Claude, and Gemini competing for attention. Results from my test below 👇

Md Riyazuddin

31,199 views • 6 days ago

$Everyone's sleeping on MiniMax. Again. They just shipped M3. The first open-weights model to combine frontier coding, 1M context, and native multimodality in one drop. I plugged it into Claude Code this morning. Pasted a design from Dribbble. Watched M3 write production-ready React code in one session. At the agency, I just replaced Opus 4.8 with M3 for 80% of our coding tasks. The output is the same and we are running everything at a fraction of the cost. Open infrastructure is the future.$

Everyone's sleeping on MiniMax. Again. They just shipped M3. The first open-weights model to combine frontier coding, 1M context, and native multimodality in one drop. I plugged it into Claude Code this morning. Pasted a design from Dribbble. Watched M3 write production-ready React code in one session. At the agency, I just replaced Opus 4.8 with M3 for 80% of our coding tasks. The output is the same and we are running everything at a fraction of the cost. Open infrastructure is the future.

Prajwal Tomar

12,834 views • 20 days ago

The most amazing aspect of DeepSeek Open Source is that just the Reasoning Engine can be isolated and used with any other LLM. In fact you can have a mixture of Reasoning Engines cascade around a problem and then use an Operator type agent AI to function on the the results.

The most amazing aspect of DeepSeek Open Source is that just the Reasoning Engine can be isolated and used with any other LLM. In fact you can have a mixture of Reasoning Engines cascade around a problem and then use an Operator type agent AI to function on the the results.

Brian Roemmele

398,746 views • 1 year ago

I'm running Llama 4 Maverick at 620 t/s! I'm living in the future! Honestly, a large language model running this fast is something straight out of a sci-fi movie. Speeds like this will enable a whole new world of applications that aren't possible today. For reference, GPT-4o, which is probably the most popular OpenAI model, runs between 60 and 110 t/s. The secret here: I'm not running AI at Meta's Llama 4 Maverick on a GPU. I'm using the SambaNova Cloud (my sponsor) and their custom SN40L chips. They are optimized from the ground up for running AI workflows. Right now, SambaNova Cloud runs DeepSeek, Qwen, Whisper, and the entire family of Llama models on these chips. You can check the speed of each of these models using SambaNova Cloud's Playground (see the attached video). It's completely free, and that's how I'm measuring their speeds. For example, I also tried DeepSeek R1 (the latest version from May) and, oh boy! DeepSeek R1 is a huge 671B parameter model. It's probably the best open reasoning model in the world, and it runs at 140 tokens per second! !!! Inference time on an SN40L is night and day from what you'll get from a GPU. Here is why this is big: If you are running an agentic workflow that uses multiple models simultaneously on a GPU, it will need to swap models in and out of memory (because not every model fits). A single SNL40 chip can simultaneously hold over 100 models (trillions of parameters) in memory. If you are using open models, try the SambaCloud API to see what lightning speed looks like. Here is how: 1. Create a free account at: 2. Check the QuickStart guide: If you try the playground, check the speed you're getting with Llama 4 and DeepSeek, and post the results below. I've seen much higher numbers than I posted here, so I'm curious to see whether geography affects the speed.

I'm running Llama 4 Maverick at 620 t/s! I'm living in the future! Honestly, a large language model running this fast is something straight out of a sci-fi movie. Speeds like this will enable a whole new world of applications that aren't possible today. For reference, GPT-4o, which is probably the most popular OpenAI model, runs between 60 and 110 t/s. The secret here: I'm not running AI at Meta's Llama 4 Maverick on a GPU. I'm using the SambaNova Cloud (my sponsor) and their custom SN40L chips. They are optimized from the ground up for running AI workflows. Right now, SambaNova Cloud runs DeepSeek, Qwen, Whisper, and the entire family of Llama models on these chips. You can check the speed of each of these models using SambaNova Cloud's Playground (see the attached video). It's completely free, and that's how I'm measuring their speeds. For example, I also tried DeepSeek R1 (the latest version from May) and, oh boy! DeepSeek R1 is a huge 671B parameter model. It's probably the best open reasoning model in the world, and it runs at 140 tokens per second! !!! Inference time on an SN40L is night and day from what you'll get from a GPU. Here is why this is big: If you are running an agentic workflow that uses multiple models simultaneously on a GPU, it will need to swap models in and out of memory (because not every model fits). A single SNL40 chip can simultaneously hold over 100 models (trillions of parameters) in memory. If you are using open models, try the SambaCloud API to see what lightning speed looks like. Here is how: 1. Create a free account at: 2. Check the QuickStart guide: If you try the playground, check the speed you're getting with Llama 4 and DeepSeek, and post the results below. I've seen much higher numbers than I posted here, so I'm curious to see whether geography affects the speed.

Santiago

34,148 views • 1 year ago

Anthropic CEO Dario Amodei on Open-Source AI Models. "I don't think open source works the same way in AI that it has worked in other areas. Primarily because with open source you can see the source code of the model. Here we can't see inside the model, it's often called open weights instead of open source to kind of distinguish that. But a lot of the benefits, which is that many people can work on it and that it's kind of additive, don't quite work in the same way. So I've actually always seen it as a red herring. When I see a new model come out I don't care whether it's open source or not. If we talk about Deep Seek I don't think it mattered that Deep Seek is open source. I think I ask, is it a good model? Is it better than us at the things that matter? That's the only thing that I care about. It actually doesn't matter either way. Because ultimately you have to host it on the cloud. The people who host it on the cloud do inference. These are big models, they're hard to do inference on. When I think about competition I think about which models are good at the tasks that we do. I think open source is actually a red herring. It's not free. You have to run it on inference and someone has to make it fast on inference." --- From 'Alex Kantrowitz' YT channel

Anthropic CEO Dario Amodei on Open-Source AI Models. "I don't think open source works the same way in AI that it has worked in other areas. Primarily because with open source you can see the source code of the model. Here we can't see inside the model, it's often called open weights instead of open source to kind of distinguish that. But a lot of the benefits, which is that many people can work on it and that it's kind of additive, don't quite work in the same way. So I've actually always seen it as a red herring. When I see a new model come out I don't care whether it's open source or not. If we talk about Deep Seek I don't think it mattered that Deep Seek is open source. I think I ask, is it a good model? Is it better than us at the things that matter? That's the only thing that I care about. It actually doesn't matter either way. Because ultimately you have to host it on the cloud. The people who host it on the cloud do inference. These are big models, they're hard to do inference on. When I think about competition I think about which models are good at the tasks that we do. I think open source is actually a red herring. It's not free. You have to run it on inference and someone has to make it fast on inference." --- From 'Alex Kantrowitz' YT channel

Rohan Paul

944,205 views • 7 months ago

Finally got a chance to play around with Andrej Karpathy's LLM Council. I built it as a plugin inside of Claude Code. Hooked it up with OpenRouter for models. The AskUserQuestion tool came in handy to select the council and chairman. This is my first test, but I agree with Karpathy that the concept of LLM ensembles can be used beyond models that offer perspectives on interesting questions. I feel like this could have really cool applications in agentic coding. More on that soon. I built this as a plugin, so next I will be exploring other user cases around agentic coding, like evaluation, tool building, designing, and research. If there is enough interest, I will clean it up and push it out as an open plugin.

Finally got a chance to play around with Andrej Karpathy's LLM Council. I built it as a plugin inside of Claude Code. Hooked it up with OpenRouter for models. The AskUserQuestion tool came in handy to select the council and chairman. This is my first test, but I agree with Karpathy that the concept of LLM ensembles can be used beyond models that offer perspectives on interesting questions. I feel like this could have really cool applications in agentic coding. More on that soon. I built this as a plugin, so next I will be exploring other user cases around agentic coding, like evaluation, tool building, designing, and research. If there is enough interest, I will clean it up and push it out as an open plugin.

elvis

79,648 views • 5 months ago

The MiniMax M2 model is mind-blowing! It's open-source. It outperforms Gemini 2.5, Claude 4.1, and Qwen3 across coding and tool-use benchmarks. Right now, it's one of the world's top 5 models in intelligence! And here is the best part: Claude is one of the best models you can use today, and MiniMax M2 costs only 8% of that! It's smaller, faster, and cheaper. Extremely efficient at using tokens. Minimax M2's biggest strength: High agentic capabilities. The model can plan and execute complex multi-tool workflows. It's reliable and very robust at executing long-horizon tool chains. In summary: • Low latency • Very cheap • Excels at agentic tasks • Open-source The model currently powers the MiniMax Agent and is available for a free global trial. You can access MiniMax M2's API here: To access the agent: And here is the MiniMax website: Thanks to the MiniMax team for showing me the ropes and partnering with me on this post.

The MiniMax M2 model is mind-blowing! It's open-source. It outperforms Gemini 2.5, Claude 4.1, and Qwen3 across coding and tool-use benchmarks. Right now, it's one of the world's top 5 models in intelligence! And here is the best part: Claude is one of the best models you can use today, and MiniMax M2 costs only 8% of that! It's smaller, faster, and cheaper. Extremely efficient at using tokens. Minimax M2's biggest strength: High agentic capabilities. The model can plan and execute complex multi-tool workflows. It's reliable and very robust at executing long-horizon tool chains. In summary: • Low latency • Very cheap • Excels at agentic tasks • Open-source The model currently powers the MiniMax Agent and is available for a free global trial. You can access MiniMax M2's API here: To access the agent: And here is the MiniMax website: Thanks to the MiniMax team for showing me the ropes and partnering with me on this post.

Santiago

91,142 views • 7 months ago

🔥 Battle for the top reasoning LLM intensifies! The QwQ-32B-Preview is a very good reasoning LLM. Full video of my tests here: Summary of my findings and thoughts: It was able to solve a couple of hard math problems so it looks very promising for maths. It didn’t do so well on my coding task (generating bash script). By the results reported on the LiveCodeBench it has room for improvement. One thing that’s become very clear to me is that the reasoning capabilities of these LLMs are significantly closing the gap between the open and closed-sourced models. The competition is now going to be on a different level and it's going to be focused on which model produces the most efficient, optimized, accurate, and fastest reasoning steps beyond just accurate responses. That's what developers will care about. Traditional benchmarks are not going to be good enough for this. On that note, it's getting harder to assess these models, especially the consistency, efficiency, and quality of reasoning steps. After experimenting with this model, I realized that the reasoning paths are not fully optimized and there is a lot more optimization that needs to happen before these models are used in production settings. There might be a need to build some type of native and efficient self-assessment or self-reflection capability that prevents these reasoning LLMs to go in loops or produce unnecessary lengthy sequences. I also noticed that this model, at least from the HF demo, doesn’t separate the reasoning from the response. I think that actually hurts the performance of the model. On the other hand, o1 and R1 do that really well. In addition to that, I believe the training on reasoning is hurting the performance of the LLM in other areas such as helpfulness (check the code example in the video). Something that’s necessary at the moment is validating or evaluating the quality of the reasoning chains and figuring out a better strategy to optimize them. Current methods are probably not sufficient to solve this problem but that's where innovation will comes next. I recognize that this is a first effort so kudos to the Qwen team on this release. These issues highlight the importance of transparency with reasoning LLMs. We need to know how it was trained and with exact data or optimization strategy. Understanding that will enable researchers and developers to build better intuition and improve the reasoning capabilities and components at a faster rate. There is an opportunity for someone or a company to build a truly open-reasoning LLM. The race is on! I will continue to track the state-of-the-art in reasoning LLMs and report my takes and observations here. Stay tuned for more.

🔥 Battle for the top reasoning LLM intensifies! The QwQ-32B-Preview is a very good reasoning LLM. Full video of my tests here: Summary of my findings and thoughts: It was able to solve a couple of hard math problems so it looks very promising for maths. It didn’t do so well on my coding task (generating bash script). By the results reported on the LiveCodeBench it has room for improvement. One thing that’s become very clear to me is that the reasoning capabilities of these LLMs are significantly closing the gap between the open and closed-sourced models. The competition is now going to be on a different level and it's going to be focused on which model produces the most efficient, optimized, accurate, and fastest reasoning steps beyond just accurate responses. That's what developers will care about. Traditional benchmarks are not going to be good enough for this. On that note, it's getting harder to assess these models, especially the consistency, efficiency, and quality of reasoning steps. After experimenting with this model, I realized that the reasoning paths are not fully optimized and there is a lot more optimization that needs to happen before these models are used in production settings. There might be a need to build some type of native and efficient self-assessment or self-reflection capability that prevents these reasoning LLMs to go in loops or produce unnecessary lengthy sequences. I also noticed that this model, at least from the HF demo, doesn’t separate the reasoning from the response. I think that actually hurts the performance of the model. On the other hand, o1 and R1 do that really well. In addition to that, I believe the training on reasoning is hurting the performance of the LLM in other areas such as helpfulness (check the code example in the video). Something that’s necessary at the moment is validating or evaluating the quality of the reasoning chains and figuring out a better strategy to optimize them. Current methods are probably not sufficient to solve this problem but that's where innovation will comes next. I recognize that this is a first effort so kudos to the Qwen team on this release. These issues highlight the importance of transparency with reasoning LLMs. We need to know how it was trained and with exact data or optimization strategy. Understanding that will enable researchers and developers to build better intuition and improve the reasoning capabilities and components at a faster rate. There is an opportunity for someone or a company to build a truly open-reasoning LLM. The race is on! I will continue to track the state-of-the-art in reasoning LLMs and report my takes and observations here. Stay tuned for more.

elvis

14,740 views • 1 year ago

Clawdbot creator Peter Steinberger 🦞 says Claude Opus is his favorite model, but OpenAI Codex is the best for coding: "OpenAI is very reliable. For coding, I prefer Codex because it can navigate large codebases. You can prompt and have 95% certainty that it actually works. With Claude Code you need more tricks to get the same." "But character wise, [Opus] behaves so good in a Discord it kind of feels like a human. I've only really experienced that with Opus."

Clawdbot creator Peter Steinberger 🦞 says Claude Opus is his favorite model, but OpenAI Codex is the best for coding: "OpenAI is very reliable. For coding, I prefer Codex because it can navigate large codebases. You can prompt and have 95% certainty that it actually works. With Claude Code you need more tricks to get the same." "But character wise, [Opus] behaves so good in a Discord it kind of feels like a human. I've only really experienced that with Opus."

TBPN

442,563 views • 4 months ago

I have been testing “OpenAI” o3 mini-high. The exposure of the “reasoning” on this model in my tests seem to not be the full reasoning, but a shifted and edited and tunicated version of what we would expect from a full exposure of the reasoning engine. See the video on the left. One can compare this to my FREE Open Source local newly quantized and distilled DeepSeek R1 8b (llama). See the video on the right. Clearly we see the full reading with DeepSeek R1 and something well tunicates with o3 mini-high (getting mad typing that name folks). Compare the quality of the outputs and the reasoning insights. Frankly there is no comparison. It feels like “OpenAI” begrudgingly released the reasoning output because they had to, but gave us something of a simulated reasoning and not the actual one. Of course my local lap top running an 8b model is slower, but all of the outputs and the actual creative writing is superior on the FREE model. The reasoning output, when legitimate can be many times more valuable than the answer. In the past week I have found thousands of ways to use in in many projects. Thus far I would have been far more impressed with o3 mini if I have never seen DeepSeek. However today it is too little—too late. The prompt: “Write a very creative and funny story on a pig and butterfly as friends living on Mars.”

I have been testing “OpenAI” o3 mini-high. The exposure of the “reasoning” on this model in my tests seem to not be the full reasoning, but a shifted and edited and tunicated version of what we would expect from a full exposure of the reasoning engine. See the video on the left. One can compare this to my FREE Open Source local newly quantized and distilled DeepSeek R1 8b (llama). See the video on the right. Clearly we see the full reading with DeepSeek R1 and something well tunicates with o3 mini-high (getting mad typing that name folks). Compare the quality of the outputs and the reasoning insights. Frankly there is no comparison. It feels like “OpenAI” begrudgingly released the reasoning output because they had to, but gave us something of a simulated reasoning and not the actual one. Of course my local lap top running an 8b model is slower, but all of the outputs and the actual creative writing is superior on the FREE model. The reasoning output, when legitimate can be many times more valuable than the answer. In the past week I have found thousands of ways to use in in many projects. Thus far I would have been far more impressed with o3 mini if I have never seen DeepSeek. However today it is too little—too late. The prompt: “Write a very creative and funny story on a pig and butterfly as friends living on Mars.”

Brian Roemmele

74,506 views • 1 year ago

I cant believe this guy just made a permanent solution to context bloat and open sourced it all! when we tested this tool (Context+) for solving an issue on the OpenCode repository, the agent using this tool used ~6.5k fewer tokens, found the code and fixed it in half the time! the results were surprising: 6 to 10k tokens saved per prompt, completed task in ~2 minutes while the agent running without the tool took ~4 mins for the same and got stuck in loops bro built an entire beast by using all the modern tools that we could think of: undo trees, semantic search by meaning (by haskellforall), advanced refactoring, blast radius, advanced file context trees, restore points... i can keep going on semantic code search and context trees are the future of agentic coding and this tool proves it the feature i loved the most is semantic search and how it gets things done 2x faster with least possible tokens it makes an agent that actually knows what it’s doing and not just guessing, it makes meaning from your code similar to RAG. if you aren't optimizing your context, you are just burning money the developer says this tool is still under development, it can have unexpected behavior and the docs need updates but the video shows the reality of how fast it can be github: get here:

I cant believe this guy just made a permanent solution to context bloat and open sourced it all! when we tested this tool (Context+) for solving an issue on the OpenCode repository, the agent using this tool used ~6.5k fewer tokens, found the code and fixed it in half the time! the results were surprising: 6 to 10k tokens saved per prompt, completed task in ~2 minutes while the agent running without the tool took ~4 mins for the same and got stuck in loops bro built an entire beast by using all the modern tools that we could think of: undo trees, semantic search by meaning (by haskellforall), advanced refactoring, blast radius, advanced file context trees, restore points... i can keep going on semantic code search and context trees are the future of agentic coding and this tool proves it the feature i loved the most is semantic search and how it gets things done 2x faster with least possible tokens it makes an agent that actually knows what it’s doing and not just guessing, it makes meaning from your code similar to RAG. if you aren't optimizing your context, you are just burning money the developer says this tool is still under development, it can have unexpected behavior and the docs need updates but the video shows the reality of how fast it can be github: get here:

forloop

225,774 views • 3 months ago

$China just made Silicon Valley's entire AI industry look like a scam. The US government spent 3 years trying to stop China from building competitive AI. But this backfired HORRIBLY. Here's what happened: Yesterday, a Chinese startup called DeepSeek released a new AI model called V4. It matches the performance of OpenAI and Anthropic's best models. At 1/7th the price. And for the first time ever, it was built on Chinese chips. NOT American ones. That last part is the one that terrifies the west. For context: Since 2022, the US has banned the export of advanced AI chips to China. The entire strategy was built on the assumption that if China can't access Nvidia's best hardware, they can't build frontier AI. But DeepSeek just proved that assumption wrong. Their V4 model was trained and runs on Huawei's Ascend chips. Huawei spent months working directly with DeepSeek to make sure V4 runs across their entire line of AI processors. Jensen Huang even predicted this on a recent podcast: "The day that DeepSeek comes out on Huawei first, that is a horrible outcome for our nation." That day was yesterday. And the numbers are crazy: DeepSeek V4 costs $3.48 per million output tokens. OpenAI's latest model GPT-5.5 costs $30. Anthropic's Claude charges $25. Same ballpark performance. 7x cheaper. Uber's CTO just admitted they burned through their ENTIRE 2026 AI budget in 4 months using Anthropic's tools. If Uber had used DeepSeek instead, that same budget would have lasted 7 YEARS. 4 months vs 7 years. Same work getting done. But the pricing isn't even the big thing here. The real story is what DeepSeek did with their technical report: They published the benchmarks where they LOSE. Every AI company cherry-picks the tests where their model wins. DeepSeek ran the full comparison against GPT-5.4 and Google's Gemini, found they trail frontier models by 3 to 6 months, and printed it anyway. They literally don't care because the price gap makes the performance gap irrelevant for 90% of use cases. So the US export controls didn't slow China down. They ACCELERATED China's independence. Because Chinese developers were FORCED to train models with limited resources, they had to figure out how to make AI radically more efficient. That constraint became their competitive advantage. Every generation of DeepSeek has gotten dramatically cheaper to train. V4 continues the trend. Meanwhile US companies are going the OPPOSITE direction: OpenAI's GPT-5.5 Pro costs $180 per million output tokens. That's 51x more expensive than DeepSeek V4 for comparable work. The Commerce Secretary confirmed this week that ZERO Nvidia advanced chip shipments have actually gone through to China despite being approved in January. So China built frontier AI anyway. Without American chips. At a fraction of the cost. And the market response tells you everything: Chinese chipmaker SMIC surged 10%. Huahong Semiconductor jumped 15%. DeepSeek's Chinese AI competitors Zhipu AI and MiniMax dropped 9% because V4 is destroying them too. DeepSeek is making Silicon Valley's pricing model look like a scam. US tech companies spent $650 billion on AI infrastructure this year. DeepSeek just showed the world you can match their output for pennies. The export controls were supposed to be America's ace card. Instead they taught China how to win without American chips, at American prices nobody can compete with. Jensen Huang was right. This is a horrible outcome. But it's the outcome America built for itself.$

China just made Silicon Valley's entire AI industry look like a scam. The US government spent 3 years trying to stop China from building competitive AI. But this backfired HORRIBLY. Here's what happened: Yesterday, a Chinese startup called DeepSeek released a new AI model called V4. It matches the performance of OpenAI and Anthropic's best models. At 1/7th the price. And for the first time ever, it was built on Chinese chips. NOT American ones. That last part is the one that terrifies the west. For context: Since 2022, the US has banned the export of advanced AI chips to China. The entire strategy was built on the assumption that if China can't access Nvidia's best hardware, they can't build frontier AI. But DeepSeek just proved that assumption wrong. Their V4 model was trained and runs on Huawei's Ascend chips. Huawei spent months working directly with DeepSeek to make sure V4 runs across their entire line of AI processors. Jensen Huang even predicted this on a recent podcast: "The day that DeepSeek comes out on Huawei first, that is a horrible outcome for our nation." That day was yesterday. And the numbers are crazy: DeepSeek V4 costs $3.48 per million output tokens. OpenAI's latest model GPT-5.5 costs $30. Anthropic's Claude charges $25. Same ballpark performance. 7x cheaper. Uber's CTO just admitted they burned through their ENTIRE 2026 AI budget in 4 months using Anthropic's tools. If Uber had used DeepSeek instead, that same budget would have lasted 7 YEARS. 4 months vs 7 years. Same work getting done. But the pricing isn't even the big thing here. The real story is what DeepSeek did with their technical report: They published the benchmarks where they LOSE. Every AI company cherry-picks the tests where their model wins. DeepSeek ran the full comparison against GPT-5.4 and Google's Gemini, found they trail frontier models by 3 to 6 months, and printed it anyway. They literally don't care because the price gap makes the performance gap irrelevant for 90% of use cases. So the US export controls didn't slow China down. They ACCELERATED China's independence. Because Chinese developers were FORCED to train models with limited resources, they had to figure out how to make AI radically more efficient. That constraint became their competitive advantage. Every generation of DeepSeek has gotten dramatically cheaper to train. V4 continues the trend. Meanwhile US companies are going the OPPOSITE direction: OpenAI's GPT-5.5 Pro costs $180 per million output tokens. That's 51x more expensive than DeepSeek V4 for comparable work. The Commerce Secretary confirmed this week that ZERO Nvidia advanced chip shipments have actually gone through to China despite being approved in January. So China built frontier AI anyway. Without American chips. At a fraction of the cost. And the market response tells you everything: Chinese chipmaker SMIC surged 10%. Huahong Semiconductor jumped 15%. DeepSeek's Chinese AI competitors Zhipu AI and MiniMax dropped 9% because V4 is destroying them too. DeepSeek is making Silicon Valley's pricing model look like a scam. US tech companies spent $650 billion on AI infrastructure this year. DeepSeek just showed the world you can match their output for pennies. The export controls were supposed to be America's ace card. Instead they taught China how to win without American chips, at American prices nobody can compete with. Jensen Huang was right. This is a horrible outcome. But it's the outcome America built for itself.

Ricardo

279,586 views • 2 months ago

Jason Calacanis: Anthropic, OpenAI and others are trying to kill OpenClaw Why? Because an open source agent platform is an existential threat to frontier model companies. @jason: “ People are going to say I'm a conspiracy theorist, but the number one goal, I believe, in the large language model, frontier model space, is to kill (OpenClaw). This is a giant movement to stop it, because this is the equivalent of having an open source Android-like player in the market, and that could be incredibly disruptive. Because, I believe, open source is going to win the day on the large language models and take 90% of the token usage, and I think the entire frontier model space could be undercut by open source. And I think they realize that SLMs, the smaller language models that are verticalized now, that will run on desktops and laptops and are even starting to run on the top ones, that is their biggest competitive threat, and I hope it happens.”

Jason Calacanis: Anthropic, OpenAI and others are trying to kill OpenClaw Why? Because an open source agent platform is an existential threat to frontier model companies. @jason: “ People are going to say I'm a conspiracy theorist, but the number one goal, I believe, in the large language model, frontier model space, is to kill (OpenClaw). This is a giant movement to stop it, because this is the equivalent of having an open source Android-like player in the market, and that could be incredibly disruptive. Because, I believe, open source is going to win the day on the large language models and take 90% of the token usage, and I think the entire frontier model space could be undercut by open source. And I think they realize that SLMs, the smaller language models that are verticalized now, that will run on desktops and laptops and are even starting to run on the top ones, that is their biggest competitive threat, and I hope it happens.”

The All-In Podcast

64,267 views • 2 months ago

🧠 Chat with Reasoning A few days ago the DeepSeek team released a LLM model with reasoning in various sizes. This we show is an example of 1bl that can run on machines with low GPU power like a mobile, but have enough power to answer complex questions. With these advanced models it is possible to link it with #IoT equipment to control information and use it in advanced control environments. All this under Open Source models and decentralized networks such as #Neurai #XNA $XNA #DeepSeek #Reasoning #AIchat

🧠 Chat with Reasoning A few days ago the DeepSeek team released a LLM model with reasoning in various sizes. This we show is an example of 1bl that can run on machines with low GPU power like a mobile, but have enough power to answer complex questions. With these advanced models it is possible to link it with #IoT equipment to control information and use it in advanced control environments. All this under Open Source models and decentralized networks such as #Neurai #XNA $XNA #DeepSeek #Reasoning #AIchat

NeurAI Project / XNA

17,691 views • 1 year ago

o1-pro is probably the best model i've used for coding, hands down i gave it a pretty complicated codebase and asked it to refactor while referencing docs the difference between claude/gemini/o1 and o1 pro is night and day. first time in a while i've been this impressed. full comparison in the video + code.

o1-pro is probably the best model i've used for coding, hands down i gave it a pretty complicated codebase and asked it to refactor while referencing docs the difference between claude/gemini/o1 and o1 pro is night and day. first time in a while i've been this impressed. full comparison in the video + code.

Sully

255,372 views • 1 year ago

Bash is all you need! Which is why I'm introducing my holiday project: just-bash just-bash is a pretty complete implementation of bash in TypeScript designed to be used as a bash tool by AI agents. Because it turns out agents love exploring data via shell scripts, even beyond coding. It comes with grep, sed, awk and the 99th percentile features that an agent like Claude Code or Cursor would use. In fact, Claude Code can use it for secure bash execution. In the package - A bash-tool for AI SDK - A binary for use by yourself or your coding agents - An overlay filesystem to feed files to your agent securely - A Vercel Sandbox compatible API, so you can quickly upgrade to a real VM if you need to run binaries - An example AI agent that explores the just-bash code base using just-bash - I imported the Oils shell bash compatibility suite and just-bash passes a very good chunk What is interesting about this codebase: It was essentially entirely written by Opus 4.5. Coding agents love bash and they are good at reproducing it. They are also great at text-book recursive descent parsers and AST tweet-walk interpreters. That said, it is, like, a lot of code and I didn't read it all 😅. This is very much a hack, but it also seems to be _really_ useful. I haven't really found anything agents want to use that it doesn't support and it's fast and secure (caveats apply). It doesn't have write access to your computer and the filesystem is given a root that the agent cannot escape from. Find it at Related: Our recent blog post how we migrated our data analysis agent to bash tools and achieved incredible quality improvements The video shows the example agent investigating the just-bash code base

Bash is all you need! Which is why I'm introducing my holiday project: just-bash just-bash is a pretty complete implementation of bash in TypeScript designed to be used as a bash tool by AI agents. Because it turns out agents love exploring data via shell scripts, even beyond coding. It comes with grep, sed, awk and the 99th percentile features that an agent like Claude Code or Cursor would use. In fact, Claude Code can use it for secure bash execution. In the package - A bash-tool for AI SDK - A binary for use by yourself or your coding agents - An overlay filesystem to feed files to your agent securely - A Vercel Sandbox compatible API, so you can quickly upgrade to a real VM if you need to run binaries - An example AI agent that explores the just-bash code base using just-bash - I imported the Oils shell bash compatibility suite and just-bash passes a very good chunk What is interesting about this codebase: It was essentially entirely written by Opus 4.5. Coding agents love bash and they are good at reproducing it. They are also great at text-book recursive descent parsers and AST tweet-walk interpreters. That said, it is, like, a lot of code and I didn't read it all 😅. This is very much a hack, but it also seems to be _really_ useful. I haven't really found anything agents want to use that it doesn't support and it's fast and secure (caveats apply). It doesn't have write access to your computer and the filesystem is given a root that the agent cannot escape from. Find it at Related: Our recent blog post how we migrated our data analysis agent to bash tools and achieved incredible quality improvements The video shows the example agent investigating the just-bash code base

Malte Ubl

124,713 views • 5 months ago

The same kinds of productivity gains we've seen in coding with AI agents are heading to the rest of knowledge work. This is the jump when you go from having a chatbot to being able to actually have an agent go off and do work for minutes or even hours and come back with a complete work output that you then review. Here's an example of the new Box Agent filling out an RFP response from an existing knowledge base. This process would normally take hours to fill out, and requires the full attention of the user doing the work. Now, you provide the Box Agent with the RFP questions, and it will go off, make a plan, extract all the relevant questions, read through existing source material to come up with an answer, and then generate a new word document as the final output. All while you're doing something else. The key to this architecture is that the agent is able to use all of the same tools in the background that a user uses to get work done. The agent can search for documents, read entire files, run scripts and tools in the background, and even be able to write code on the fly to automate tasks it hasn't seen before. And best of all, the Box Agent will (soon) work from the Box MCP and CLI so you can invoke it in any agentic system as a step in a process. This kind of agent complexity would have been impossible even 6 months ago. Models consistently failed at tracking long running tasks or using the right tools at the right moment for the task. But this is all now possible because of models like GPT-5.4, Opus 4.6, and Gemini 3, and is only getting better by the month. Just as we moved from engineers writing code and using AI as an assistant to answer questions, in many areas of knowledge work -like legal, finance, consulting, sales, marketing, and more- when we have a problem we'll just kick off the AI agent to just go work on it for us in the background.

The same kinds of productivity gains we've seen in coding with AI agents are heading to the rest of knowledge work. This is the jump when you go from having a chatbot to being able to actually have an agent go off and do work for minutes or even hours and come back with a complete work output that you then review. Here's an example of the new Box Agent filling out an RFP response from an existing knowledge base. This process would normally take hours to fill out, and requires the full attention of the user doing the work. Now, you provide the Box Agent with the RFP questions, and it will go off, make a plan, extract all the relevant questions, read through existing source material to come up with an answer, and then generate a new word document as the final output. All while you're doing something else. The key to this architecture is that the agent is able to use all of the same tools in the background that a user uses to get work done. The agent can search for documents, read entire files, run scripts and tools in the background, and even be able to write code on the fly to automate tasks it hasn't seen before. And best of all, the Box Agent will (soon) work from the Box MCP and CLI so you can invoke it in any agentic system as a step in a process. This kind of agent complexity would have been impossible even 6 months ago. Models consistently failed at tracking long running tasks or using the right tools at the right moment for the task. But this is all now possible because of models like GPT-5.4, Opus 4.6, and Gemini 3, and is only getting better by the month. Just as we moved from engineers writing code and using AI as an assistant to answer questions, in many areas of knowledge work -like legal, finance, consulting, sales, marketing, and more- when we have a problem we'll just kick off the AI agent to just go work on it for us in the background.

Aaron Levie

24,617 views • 2 months ago

This is one of the coolest open-source AI agent projects I've seen in a while: 'Understand Anything' It's a plugin for Claude Code, Codex, OpenCode etc. that analyzes your codebase and turns it into a knowledge base that you can interact with. It explains the codebase to you, rather than showing you the structure. It seems like it's designed for code but I opened my Obsidian vault of podcast highlights in Claude Code, then ran /understand. The result is a knowledge graph that I can search of highlights from 888 podcast episodes and 144K lines of markdown text.

This is one of the coolest open-source AI agent projects I've seen in a while: 'Understand Anything' It's a plugin for Claude Code, Codex, OpenCode etc. that analyzes your codebase and turns it into a knowledge base that you can interact with. It explains the codebase to you, rather than showing you the structure. It seems like it's designed for code but I opened my Obsidian vault of podcast highlights in Claude Code, then ran /understand. The result is a knowledge graph that I can search of highlights from 888 podcast episodes and 144K lines of markdown text.

Dan McAteer

155,689 views • 3 days ago