Loading video...

Video Failed to Load

Go Home

I have been testing DeepSeek-V4-Pro with the Pi coding agent. I am mindblown by how well it works out of the box. A few notes: I spent a few hours building an LLM wiki with an agent powered entirely by DeepSeek-V4-Pro on Fireworks AI inference. This is the first...

58,555 views • 1 month ago •via X (Twitter)

0 Comments

No comments available

Comments from the original post will appear here

Related Videos

I'm running Llama 4 Maverick at 620 t/s! I'm living in the future! Honestly, a large language model running this fast is something straight out of a sci-fi movie. Speeds like this will enable a whole new world of applications that aren't possible today. For reference, GPT-4o, which is probably the most popular OpenAI model, runs between 60 and 110 t/s. The secret here: I'm not running AI at Meta's Llama 4 Maverick on a GPU. I'm using the SambaNova Cloud (my sponsor) and their custom SN40L chips. They are optimized from the ground up for running AI workflows. Right now, SambaNova Cloud runs DeepSeek, Qwen, Whisper, and the entire family of Llama models on these chips. You can check the speed of each of these models using SambaNova Cloud's Playground (see the attached video). It's completely free, and that's how I'm measuring their speeds. For example, I also tried DeepSeek R1 (the latest version from May) and, oh boy! DeepSeek R1 is a huge 671B parameter model. It's probably the best open reasoning model in the world, and it runs at 140 tokens per second! !!! Inference time on an SN40L is night and day from what you'll get from a GPU. Here is why this is big: If you are running an agentic workflow that uses multiple models simultaneously on a GPU, it will need to swap models in and out of memory (because not every model fits). A single SNL40 chip can simultaneously hold over 100 models (trillions of parameters) in memory. If you are using open models, try the SambaCloud API to see what lightning speed looks like. Here is how: 1. Create a free account at: 2. Check the QuickStart guide: If you try the playground, check the speed you're getting with Llama 4 and DeepSeek, and post the results below. I've seen much higher numbers than I posted here, so I'm curious to see whether geography affects the speed.

Santiago

34,148 views • 1 year ago

🔥 Battle for the top reasoning LLM intensifies! The QwQ-32B-Preview is a very good reasoning LLM. Full video of my tests here: Summary of my findings and thoughts: It was able to solve a couple of hard math problems so it looks very promising for maths. It didn’t do so well on my coding task (generating bash script). By the results reported on the LiveCodeBench it has room for improvement. One thing that’s become very clear to me is that the reasoning capabilities of these LLMs are significantly closing the gap between the open and closed-sourced models. The competition is now going to be on a different level and it's going to be focused on which model produces the most efficient, optimized, accurate, and fastest reasoning steps beyond just accurate responses. That's what developers will care about. Traditional benchmarks are not going to be good enough for this. On that note, it's getting harder to assess these models, especially the consistency, efficiency, and quality of reasoning steps. After experimenting with this model, I realized that the reasoning paths are not fully optimized and there is a lot more optimization that needs to happen before these models are used in production settings. There might be a need to build some type of native and efficient self-assessment or self-reflection capability that prevents these reasoning LLMs to go in loops or produce unnecessary lengthy sequences. I also noticed that this model, at least from the HF demo, doesn’t separate the reasoning from the response. I think that actually hurts the performance of the model. On the other hand, o1 and R1 do that really well. In addition to that, I believe the training on reasoning is hurting the performance of the LLM in other areas such as helpfulness (check the code example in the video). Something that’s necessary at the moment is validating or evaluating the quality of the reasoning chains and figuring out a better strategy to optimize them. Current methods are probably not sufficient to solve this problem but that's where innovation will comes next. I recognize that this is a first effort so kudos to the Qwen team on this release. These issues highlight the importance of transparency with reasoning LLMs. We need to know how it was trained and with exact data or optimization strategy. Understanding that will enable researchers and developers to build better intuition and improve the reasoning capabilities and components at a faster rate. There is an opportunity for someone or a company to build a truly open-reasoning LLM. The race is on! I will continue to track the state-of-the-art in reasoning LLMs and report my takes and observations here. Stay tuned for more.

elvis

14,740 views • 1 year ago

China just made Silicon Valley's entire AI industry look like a scam. The US government spent 3 years trying to stop China from building competitive AI. But this backfired HORRIBLY. Here's what happened: Yesterday, a Chinese startup called DeepSeek released a new AI model called V4. It matches the performance of OpenAI and Anthropic's best models. At 1/7th the price. And for the first time ever, it was built on Chinese chips. NOT American ones. That last part is the one that terrifies the west. For context: Since 2022, the US has banned the export of advanced AI chips to China. The entire strategy was built on the assumption that if China can't access Nvidia's best hardware, they can't build frontier AI. But DeepSeek just proved that assumption wrong. Their V4 model was trained and runs on Huawei's Ascend chips. Huawei spent months working directly with DeepSeek to make sure V4 runs across their entire line of AI processors. Jensen Huang even predicted this on a recent podcast: "The day that DeepSeek comes out on Huawei first, that is a horrible outcome for our nation." That day was yesterday. And the numbers are crazy: DeepSeek V4 costs $3.48 per million output tokens. OpenAI's latest model GPT-5.5 costs $30. Anthropic's Claude charges $25. Same ballpark performance. 7x cheaper. Uber's CTO just admitted they burned through their ENTIRE 2026 AI budget in 4 months using Anthropic's tools. If Uber had used DeepSeek instead, that same budget would have lasted 7 YEARS. 4 months vs 7 years. Same work getting done. But the pricing isn't even the big thing here. The real story is what DeepSeek did with their technical report: They published the benchmarks where they LOSE. Every AI company cherry-picks the tests where their model wins. DeepSeek ran the full comparison against GPT-5.4 and Google's Gemini, found they trail frontier models by 3 to 6 months, and printed it anyway. They literally don't care because the price gap makes the performance gap irrelevant for 90% of use cases. So the US export controls didn't slow China down. They ACCELERATED China's independence. Because Chinese developers were FORCED to train models with limited resources, they had to figure out how to make AI radically more efficient. That constraint became their competitive advantage. Every generation of DeepSeek has gotten dramatically cheaper to train. V4 continues the trend. Meanwhile US companies are going the OPPOSITE direction: OpenAI's GPT-5.5 Pro costs $180 per million output tokens. That's 51x more expensive than DeepSeek V4 for comparable work. The Commerce Secretary confirmed this week that ZERO Nvidia advanced chip shipments have actually gone through to China despite being approved in January. So China built frontier AI anyway. Without American chips. At a fraction of the cost. And the market response tells you everything: Chinese chipmaker SMIC surged 10%. Huahong Semiconductor jumped 15%. DeepSeek's Chinese AI competitors Zhipu AI and MiniMax dropped 9% because V4 is destroying them too. DeepSeek is making Silicon Valley's pricing model look like a scam. US tech companies spent $650 billion on AI infrastructure this year. DeepSeek just showed the world you can match their output for pennies. The export controls were supposed to be America's ace card. Instead they taught China how to win without American chips, at American prices nobody can compete with. Jensen Huang was right. This is a horrible outcome. But it's the outcome America built for itself.

Ricardo

279,586 views • 2 months ago

Bash is all you need! Which is why I'm introducing my holiday project: just-bash just-bash is a pretty complete implementation of bash in TypeScript designed to be used as a bash tool by AI agents. Because it turns out agents love exploring data via shell scripts, even beyond coding. It comes with grep, sed, awk and the 99th percentile features that an agent like Claude Code or Cursor would use. In fact, Claude Code can use it for secure bash execution. In the package - A bash-tool for AI SDK - A binary for use by yourself or your coding agents - An overlay filesystem to feed files to your agent securely - A Vercel Sandbox compatible API, so you can quickly upgrade to a real VM if you need to run binaries - An example AI agent that explores the just-bash code base using just-bash - I imported the Oils shell bash compatibility suite and just-bash passes a very good chunk What is interesting about this codebase: It was essentially entirely written by Opus 4.5. Coding agents love bash and they are good at reproducing it. They are also great at text-book recursive descent parsers and AST tweet-walk interpreters. That said, it is, like, a lot of code and I didn't read it all 😅. This is very much a hack, but it also seems to be _really_ useful. I haven't really found anything agents want to use that it doesn't support and it's fast and secure (caveats apply). It doesn't have write access to your computer and the filesystem is given a root that the agent cannot escape from. Find it at Related: Our recent blog post how we migrated our data analysis agent to bash tools and achieved incredible quality improvements The video shows the example agent investigating the just-bash code base

Malte Ubl

124,713 views • 5 months ago

The same kinds of productivity gains we've seen in coding with AI agents are heading to the rest of knowledge work. This is the jump when you go from having a chatbot to being able to actually have an agent go off and do work for minutes or even hours and come back with a complete work output that you then review. Here's an example of the new Box Agent filling out an RFP response from an existing knowledge base. This process would normally take hours to fill out, and requires the full attention of the user doing the work. Now, you provide the Box Agent with the RFP questions, and it will go off, make a plan, extract all the relevant questions, read through existing source material to come up with an answer, and then generate a new word document as the final output. All while you're doing something else. The key to this architecture is that the agent is able to use all of the same tools in the background that a user uses to get work done. The agent can search for documents, read entire files, run scripts and tools in the background, and even be able to write code on the fly to automate tasks it hasn't seen before. And best of all, the Box Agent will (soon) work from the Box MCP and CLI so you can invoke it in any agentic system as a step in a process. This kind of agent complexity would have been impossible even 6 months ago. Models consistently failed at tracking long running tasks or using the right tools at the right moment for the task. But this is all now possible because of models like GPT-5.4, Opus 4.6, and Gemini 3, and is only getting better by the month. Just as we moved from engineers writing code and using AI as an assistant to answer questions, in many areas of knowledge work -like legal, finance, consulting, sales, marketing, and more- when we have a problem we'll just kick off the AI agent to just go work on it for us in the background.

Aaron Levie

24,617 views • 2 months ago