elvis's banner
elvis's profile picture

elvis

@omarsar0305,984 subscribers

Building self-improving AI @dair_ai • Prev: Meta AI | PhD • Learn about AI Agents for FREE here: https://t.co/P5SA9u54xO

Shorts

Increasingly, HTML Artifacts are becoming a core part of how I work with AI agents. Long-horizon agent sessions need a better way to surface insights about what work it has done. This may not be obvious right now, but as you start to let your agent work on dynamic workflows, large codebases, long-running loops (e.g., using /goal), and deep research tasks, you need a good way to present results. Chat window is not it. You also don't want to just trust everything the agents do. Artifacts help provide an important verification layer, which in turn enables important decision-making. I like HTML artifacts because I can just ask the agent to produce as many of them (and in whatever form) as I need to verify the work and make sense out of everything. I even built a nice tab system for my artifacts. They are great for continual learning and research. I use HTML artifacts for logging, tracking experiments, brainstorming, managing my inbox, code reviews, agent session management, deep research, writing, reading, and so much more. I believe Andrej Karpathy wrote about this somewhere: As we move on to more advanced applications of AI agents and outputs get more complex, we will start to find the need for even more advanced forms of interactions with AI, including interactive neural videos/simulations.

Increasingly, HTML Artifacts are becoming a core part of how I work with AI agents. Long-horizon agent sessions need a better way to surface insights about what work it has done. This may not be obvious right now, but as you start to let your agent work on dynamic workflows, large codebases, long-running loops (e.g., using /goal), and deep research tasks, you need a good way to present results. Chat window is not it. You also don't want to just trust everything the agents do. Artifacts help provide an important verification layer, which in turn enables important decision-making. I like HTML artifacts because I can just ask the agent to produce as many of them (and in whatever form) as I need to verify the work and make sense out of everything. I even built a nice tab system for my artifacts. They are great for continual learning and research. I use HTML artifacts for logging, tracking experiments, brainstorming, managing my inbox, code reviews, agent session management, deep research, writing, reading, and so much more. I believe Andrej Karpathy wrote about this somewhere: As we move on to more advanced applications of AI agents and outputs get more complex, we will start to find the need for even more advanced forms of interactions with AI, including interactive neural videos/simulations.

36,049 просмотров

Building a personal knowledge base for my agents is increasingly where I spend my time these days. Like Andrej Karpathy, I also use Obsidian for my MD vaults. What's different in my approach is that I curate research papers on a daily basis and have actually tuned a Skill for months to find high-signal, relevant papers. I was reviewing and curating papers manually for some time, but now it's all automated as it has gotten so good at capturing what I consider the best of the best. There are so many papers these days, so this is a big deal. You all get to benefit from that with the papers I feature in my timeline and on DAIR.AI. The papers are indexed using tobi lutke qmd cli tool (all of it in markdown files along with useful metadata). So good for semantic search and surfacing insights, unlike anything out there. I am a visual person, so I then started to experiment with how to leverage this personal knowledge base of research papers inside my new interactive artifact generator (mcp tools inside my agent orchestrator system). The result is what you see in the clip. 100s of papers with all sorts of insights visualized. I keep track of research papers daily, so believe me when I tell you that this system is absolutely insane at surfacing insights. This is the result of months of tinkering on how to index research and leverage agent automations for wikification and robust documentation. But this is just the beginning. The visual artifact (which is interactive too) can be changed dynamically as I please. I can prompt my agent to throw any data at it. I can add different views to the data. Different interactions. I feel like this is the most personalized research system I have ever built and used, and it's not even close. The knowledge that the agents are able to surface from this basic setup is already extremely useful as I experiment with new agentic engineering concepts. I feel like this knowledge layer and the higher-level ones I am working on will allow me to maximize other automation tools like autoresearch. The research is only as good as the research questions. And the research questions are only as good as the insights the agents have access to. Where I am spending time now is on how to make this more actionable. I am obsessed about the search problem here. The automations, autoresearch, ralph research loop (I built one months ago) are easier to build but are only as good as what you feed them. Work in progress. More updates soon. Back to building.

Building a personal knowledge base for my agents is increasingly where I spend my time these days. Like Andrej Karpathy, I also use Obsidian for my MD vaults. What's different in my approach is that I curate research papers on a daily basis and have actually tuned a Skill for months to find high-signal, relevant papers. I was reviewing and curating papers manually for some time, but now it's all automated as it has gotten so good at capturing what I consider the best of the best. There are so many papers these days, so this is a big deal. You all get to benefit from that with the papers I feature in my timeline and on DAIR.AI. The papers are indexed using tobi lutke qmd cli tool (all of it in markdown files along with useful metadata). So good for semantic search and surfacing insights, unlike anything out there. I am a visual person, so I then started to experiment with how to leverage this personal knowledge base of research papers inside my new interactive artifact generator (mcp tools inside my agent orchestrator system). The result is what you see in the clip. 100s of papers with all sorts of insights visualized. I keep track of research papers daily, so believe me when I tell you that this system is absolutely insane at surfacing insights. This is the result of months of tinkering on how to index research and leverage agent automations for wikification and robust documentation. But this is just the beginning. The visual artifact (which is interactive too) can be changed dynamically as I please. I can prompt my agent to throw any data at it. I can add different views to the data. Different interactions. I feel like this is the most personalized research system I have ever built and used, and it's not even close. The knowledge that the agents are able to surface from this basic setup is already extremely useful as I experiment with new agentic engineering concepts. I feel like this knowledge layer and the higher-level ones I am working on will allow me to maximize other automation tools like autoresearch. The research is only as good as the research questions. And the research questions are only as good as the insights the agents have access to. Where I am spending time now is on how to make this more actionable. I am obsessed about the search problem here. The automations, autoresearch, ralph research loop (I built one months ago) are easier to build but are only as good as what you feed them. Work in progress. More updates soon. Back to building.

461,558 просмотров

Just built an insane new agent skill. It can perfectly extract slides from YT videos, then write notes, images, transcripts, and slides into Obsidian vaults. An HTML artifact allows me to navigate and add more notes as I listen. Should I release the skill?

Just built an insane new agent skill. It can perfectly extract slides from YT videos, then write notes, images, transcripts, and slides into Obsidian vaults. An HTML artifact allows me to navigate and add more notes as I listen. Should I release the skill?

43,865 просмотров

Been exploring a new way to explore AI research papers to discover deeper insights. Agents are at the center of it. So far, I've built this little interactive artifact generator in my orchestrator to visualize things. This allows me to change views and insights (on-demand) from 100s of papers. Just scratching the surface here. More to share soon.

Been exploring a new way to explore AI research papers to discover deeper insights. Agents are at the center of it. So far, I've built this little interactive artifact generator in my orchestrator to visualize things. This allows me to change views and insights (on-demand) from 100s of papers. Just scratching the surface here. More to share soon.

134,590 просмотров

I just built my own wiki generator plugin for my agents. My agents can now generate wikis for anything I ask. One of my favorite wikis is called PaperWiki. This is a great example of what Andrej Karpathy describes. It uses obsidian vaults to organize papers, retrieve LLM-generated summaries, diagrams, and other advanced views for paper exploration. When Obsidian UI is not enough, I use my own artifact generator inside my agent orchestrator (see clip for example). This allows my agents to build any kind of view or exploration feature that I need. The papers are all curated with automations and several rules/patterns I have manually built over the years. On the surface, this looks basic. But behind the scenes, there are advanced search capabilities, connections, metadata, derived data, and other interesting bits of information that are extremely useful for my research agents. This is mostly built for agents. The artifact preview is just a high-level way to validate and quickly assess the quality of the wiki, suggest improvements, and it's also great for research. I use tobi lutke's qmd for all search capabilities. Everything is markdown. The summaries and even the diagrams. The wiki updates on its own based on several automations I have optimized over the past couple of weeks. The wiki grows and self-improves based on several requirements important for my research use cases. This is as personalized as it gets. There is nothing like it out there. And I use my research expertise to continue improving it over time. This is a vanilla wiki. There are so many things I want to build on top of this. Different aggregations, views, artifacts, etc. All to help automate more of my research work and accelerate productivity. I think the biggest leverage here is how powerful this could be for discovery and experimentation. One of my goals is to use it to find deeper connections and insights that would otherwise elude the top human researchers and use those to generate interesting new hypotheses and research experiments. That way, my agents can use autoresearch to explore research ideas at the frontier. Stay tuned for more.

I just built my own wiki generator plugin for my agents. My agents can now generate wikis for anything I ask. One of my favorite wikis is called PaperWiki. This is a great example of what Andrej Karpathy describes. It uses obsidian vaults to organize papers, retrieve LLM-generated summaries, diagrams, and other advanced views for paper exploration. When Obsidian UI is not enough, I use my own artifact generator inside my agent orchestrator (see clip for example). This allows my agents to build any kind of view or exploration feature that I need. The papers are all curated with automations and several rules/patterns I have manually built over the years. On the surface, this looks basic. But behind the scenes, there are advanced search capabilities, connections, metadata, derived data, and other interesting bits of information that are extremely useful for my research agents. This is mostly built for agents. The artifact preview is just a high-level way to validate and quickly assess the quality of the wiki, suggest improvements, and it's also great for research. I use tobi lutke's qmd for all search capabilities. Everything is markdown. The summaries and even the diagrams. The wiki updates on its own based on several automations I have optimized over the past couple of weeks. The wiki grows and self-improves based on several requirements important for my research use cases. This is as personalized as it gets. There is nothing like it out there. And I use my research expertise to continue improving it over time. This is a vanilla wiki. There are so many things I want to build on top of this. Different aggregations, views, artifacts, etc. All to help automate more of my research work and accelerate productivity. I think the biggest leverage here is how powerful this could be for discovery and experimentation. One of my goals is to use it to find deeper connections and insights that would otherwise elude the top human researchers and use those to generate interesting new hypotheses and research experiments. That way, my agents can use autoresearch to explore research ideas at the frontier. Stay tuned for more.

64,932 просмотров

We are entering an extremely exciting era for open-weight models. Kimi K2.6 now feels like a top agentic model. I took it for a spin via Fireworks AI fast inference APIs. Kimi K2.6 has impressive agentic capabilities, design skills, and the ability to synthesize large amounts of information. I built a little Skill that produces survey papers on any AI research topic you want. (see example in the clip) You can use the skill to tell your agent to generate a survey on whatever topic and watch it go to work. The artifact was fully generated by Kimi.ai's Kimi K2.6. It's cheap and fast. Next step for me is to explore ways to continue integrating the capabilities of these models on use cases like automating my LLM knowledge bases and augmenting my agent memory capabilities. Stay tuned for more.

We are entering an extremely exciting era for open-weight models. Kimi K2.6 now feels like a top agentic model. I took it for a spin via Fireworks AI fast inference APIs. Kimi K2.6 has impressive agentic capabilities, design skills, and the ability to synthesize large amounts of information. I built a little Skill that produces survey papers on any AI research topic you want. (see example in the clip) You can use the skill to tell your agent to generate a survey on whatever topic and watch it go to work. The artifact was fully generated by Kimi.ai's Kimi K2.6. It's cheap and fast. Next step for me is to explore ways to continue integrating the capabilities of these models on use cases like automating my LLM knowledge bases and augmenting my agent memory capabilities. Stay tuned for more.

47,678 просмотров

This is insane! 🤯 Just built a new skill in Claude Code using Opus 4.5. The skill uses Gemini 3 Pro (via API) for designing web pages. Look at what it generated from one simple prompt.

This is insane! 🤯 Just built a new skill in Claude Code using Opus 4.5. The skill uses Gemini 3 Pro (via API) for designing web pages. Look at what it generated from one simple prompt.

152,823 просмотров

LLM Knowledge Base → Slides When Andrej Karpathy shared his LLM Knowledge Base setup, many were wondering how to generate more visual forms of the wiki. There are many options, but I think Gamma is one of the best at producing high-quality, rich presentations. To showcase this, I just built a pipeline that turns my AI papers wiki (1K+ papers across 20 AI agent topics) into polished slide presentations using Gamma. The flow: Obsidian vault → Gamma MCP → embedded preview in my dashboard. I give one command to my agent, which pulls the top papers from each topic (via the wiki), feeds them to Gamma, and renders the presentation inline. The Gamma connector for Claude is a great choice for generating beautiful and professional slides. Easy to use. Go to your Claude instance and add the official Gamma connector. That's it! Claude Code will now have access to all the necessary MCP tools for generating slides. I use the Claude Agent SDK for my agent orchestrator, so I use the official Gamma MCP tools and embed the generated slides in an iframe via my artifact preview. See the clip below for an example.

LLM Knowledge Base → Slides When Andrej Karpathy shared his LLM Knowledge Base setup, many were wondering how to generate more visual forms of the wiki. There are many options, but I think Gamma is one of the best at producing high-quality, rich presentations. To showcase this, I just built a pipeline that turns my AI papers wiki (1K+ papers across 20 AI agent topics) into polished slide presentations using Gamma. The flow: Obsidian vault → Gamma MCP → embedded preview in my dashboard. I give one command to my agent, which pulls the top papers from each topic (via the wiki), feeds them to Gamma, and renders the presentation inline. The Gamma connector for Claude is a great choice for generating beautiful and professional slides. Easy to use. Go to your Claude instance and add the official Gamma connector. That's it! Claude Code will now have access to all the necessary MCP tools for generating slides. I use the Claude Agent SDK for my agent orchestrator, so I use the official Gamma MCP tools and embed the generated slides in an iframe via my artifact preview. See the clip below for an example.

46,687 просмотров

As an ML Engineer, this is one of the most useful applications of GPT-4 I've seen. Chat Explore is a powerful AI-powered data exploration tool. Here’s why I am so impressed:

As an ML Engineer, this is one of the most useful applications of GPT-4 I've seen. Chat Explore is a powerful AI-powered data exploration tool. Here’s why I am so impressed:

716,614 просмотров

GPT-5.5 in Codex is a delight to work with: - Super sharp with responses - It understands intent better than any model - Great "personality" - Gets lots of stuff done without pausing unnecessarily It generated this beautiful artifact design. Huge win for OpenAI.

GPT-5.5 in Codex is a delight to work with: - Super sharp with responses - It understands intent better than any model - Great "personality" - Gets lots of stuff done without pausing unnecessarily It generated this beautiful artifact design. Huge win for OpenAI.

36,795 просмотров

LLM Artifacts Connected to Andrej Karpathy's LLM Knowledge base idea, I've been building out a fun way to generate dynamic artifacts from these knowledge bases with the goal of discovering and revealing meaningful and deeper insights. LLM KBs are hard to consume for humans, as I think they are more built for agents. So the question is, what form would be useful for humans to take actions and make important decisions? That's what I am trying to figure out with these artifacts. The artifact example shows a pulse on HN discussions around AI-related stories. The insights can go deeper, of course, but this is already super fun and thought-provoking, like some of my favorite podcasts. The format and depth matter a lot. The aggregation skills of agents are outstanding if you tune the prompts and skill carefully. I built this artifact generator in a few minutes through an agent skill, but I feel like there are so many ways that LLM-generated information can be used and consumed. Like generating deeper insights and analysis, and things that are just not feasible for humans today. The generated artifact (including its data and design) serves as reusable templates or can be updated in real-time via auomations, which is something I am also working on. It is truly an insane way to monitor and track information. Better than a newsletter. Better than newspapers. There is something about this that gets me really excited about the future of AI agents for knowledge generation and discovery. Lots of hidden gems everywhere just waiting to be discovered and acted on if the information is presented correctly. This is not perfect. The format, style/prose can be improved, but this is easy to customize via skill. You can personalize it to your liking. I feel like these dynamic artifacts are going to emerge as a strong new medium to stay on the cutting edge of things, both for agents and humans. My target is research, of course. This was just a basic example. Besides animation, I am also targeting other components like voice, videos, images, slides, etc. This space is full of opportunities to explore. Skill for this coming soon.

LLM Artifacts Connected to Andrej Karpathy's LLM Knowledge base idea, I've been building out a fun way to generate dynamic artifacts from these knowledge bases with the goal of discovering and revealing meaningful and deeper insights. LLM KBs are hard to consume for humans, as I think they are more built for agents. So the question is, what form would be useful for humans to take actions and make important decisions? That's what I am trying to figure out with these artifacts. The artifact example shows a pulse on HN discussions around AI-related stories. The insights can go deeper, of course, but this is already super fun and thought-provoking, like some of my favorite podcasts. The format and depth matter a lot. The aggregation skills of agents are outstanding if you tune the prompts and skill carefully. I built this artifact generator in a few minutes through an agent skill, but I feel like there are so many ways that LLM-generated information can be used and consumed. Like generating deeper insights and analysis, and things that are just not feasible for humans today. The generated artifact (including its data and design) serves as reusable templates or can be updated in real-time via auomations, which is something I am also working on. It is truly an insane way to monitor and track information. Better than a newsletter. Better than newspapers. There is something about this that gets me really excited about the future of AI agents for knowledge generation and discovery. Lots of hidden gems everywhere just waiting to be discovered and acted on if the information is presented correctly. This is not perfect. The format, style/prose can be improved, but this is easy to customize via skill. You can personalize it to your liking. I feel like these dynamic artifacts are going to emerge as a strong new medium to stay on the cutting edge of things, both for agents and humans. My target is research, of course. This was just a basic example. Besides animation, I am also targeting other components like voice, videos, images, slides, etc. This space is full of opportunities to explore. Skill for this coming soon.

30,714 просмотров

Hacker News → LLM Artifact I built the most personalized HN feed. It only tracks topics I do research around based on memory and LLM wiki. No point in storing bookmarks. With a few automations, rules, skills, and proactive agents, you can make the feed whatever you want.

Hacker News → LLM Artifact I built the most personalized HN feed. It only tracks topics I do research around based on memory and LLM wiki. No point in storing bookmarks. With a few automations, rules, skills, and proactive agents, you can make the feed whatever you want.

17,613 просмотров

Simplicity is at the heart of great software. This is one of the reasons why Claude Code has been sticky for me. As a builder, I love planning and brainstorming, and this is now a key focus of Claude Code. I use Shift + Tab a lot to cycle between brainstorming, planning, and execution. This functionality provides the appropriate interface for me to either be very involved or less involved as I please. This works particularly well when building out new and complex features or entire new projects. This saves a huge amount of time. It allows me to tune Claude Code to execute and build more effectively. It also builds a loop of trust, and I often (surprisingly) find Claude Code asking for clarifications when it's confused. Coding agents don't normally do that. I have shared before on the power of brainstorming with AI for longer times. Try it and you will not be disappointed. Vibe coding is fun, but pair it with intentional development cycles, and you watch how far you can take a project with coding agents today.

Simplicity is at the heart of great software. This is one of the reasons why Claude Code has been sticky for me. As a builder, I love planning and brainstorming, and this is now a key focus of Claude Code. I use Shift + Tab a lot to cycle between brainstorming, planning, and execution. This functionality provides the appropriate interface for me to either be very involved or less involved as I please. This works particularly well when building out new and complex features or entire new projects. This saves a huge amount of time. It allows me to tune Claude Code to execute and build more effectively. It also builds a loop of trust, and I often (surprisingly) find Claude Code asking for clarifications when it's confused. Coding agents don't normally do that. I have shared before on the power of brainstorming with AI for longer times. Try it and you will not be disappointed. Vibe coding is fun, but pair it with intentional development cycles, and you watch how far you can take a project with coding agents today.

81,765 просмотров

one of my favorite ways to use claude code skills right now - combining remotion with claude-in-chrome for motion video creation. the workflow is addictive. the clip you see here was produced with minimal prompting effort let me know if you would like me to write a full break down of this process. you describe to claude code what you want, claude code writes the remotion components, opens the remotion studio (via browser) with claude-in-chrome, sees the actual rendered output, and iterates on it in real time. need the arrows pointing to the center of the bubbles instead of the edge? just say it. need the layout shifted to the center? say it. claude sees the preview, adjusts the code, re-renders. then when you're happy you tell claude code to render the final video. but skills are what make this possible. remotion knowledge + browser automation + the taste to iterate visually. no copy pasting screenshots back and forth. no "can you try moving it 10px to the right" over chat. it just looks and fixes. this is the kind of workflow that makes you realize how much further claude code can go beyond just writing code in a terminal. i didn't touch any code while working on the clip you see if you haven't tried combining skills together like this - start experimenting. the skills combos is where the magic is at.

one of my favorite ways to use claude code skills right now - combining remotion with claude-in-chrome for motion video creation. the workflow is addictive. the clip you see here was produced with minimal prompting effort let me know if you would like me to write a full break down of this process. you describe to claude code what you want, claude code writes the remotion components, opens the remotion studio (via browser) with claude-in-chrome, sees the actual rendered output, and iterates on it in real time. need the arrows pointing to the center of the bubbles instead of the edge? just say it. need the layout shifted to the center? say it. claude sees the preview, adjusts the code, re-renders. then when you're happy you tell claude code to render the final video. but skills are what make this possible. remotion knowledge + browser automation + the taste to iterate visually. no copy pasting screenshots back and forth. no "can you try moving it 10px to the right" over chat. it just looks and fixes. this is the kind of workflow that makes you realize how much further claude code can go beyond just writing code in a terminal. i didn't touch any code while working on the clip you see if you haven't tried combining skills together like this - start experimenting. the skills combos is where the magic is at.

46,382 просмотров

Just incredible that this is possible today. One of my favorite MCP tools as of late. Just prompt to generate beautiful excalidraw diagrams.

Just incredible that this is possible today. One of my favorite MCP tools as of late. Just prompt to generate beautiful excalidraw diagrams.

37,264 просмотров

This is one of the fastest ways to build a custom ChatGPT-like system on top of your data. It's called ChatLLM (by Abacus.AI). Here is a demo of how to build a simple custom chat LLM:

This is one of the fastest ways to build a custom ChatGPT-like system on top of your data. It's called ChatLLM (by Abacus.AI). Here is a demo of how to build a simple custom chat LLM:

227,166 просмотров

MiniMax M2.1 just dropped. Been using it for the last couple of hours. It's crazy good! I built a deep research agent and used M2.1 for orchestration. Agentic capabilities feel unmatched. Reports generated are next level.

MiniMax M2.1 just dropped. Been using it for the last couple of hours. It's crazy good! I built a deep research agent and used M2.1 for orchestration. Agentic capabilities feel unmatched. Reports generated are next level.

48,395 просмотров

Agentic web scrapers are here! Firecrawl just launched FIRE-1, their new agent-powered web-scraper. This is really dope! It navigates complex websites, interacts with dynamic content, and fills forms to scrape the data you need.

Agentic web scrapers are here! Firecrawl just launched FIRE-1, their new agent-powered web-scraper. This is really dope! It navigates complex websites, interacts with dynamic content, and fills forms to scrape the data you need.

80,087 просмотров

MiniMax-M2 is a bigger deal than I thought! Just built a deep research agent with M2 - the interleaved thinking hits different! It preserves content blocks (thinking + text + tool_use) to reason between tool calls. Huge for self-improving agents. Details + repo below ↓

MiniMax-M2 is a bigger deal than I thought! Just built a deep research agent with M2 - the interleaved thinking hits different! It preserves content blocks (thinking + text + tool_use) to reason between tool calls. Huge for self-improving agents. Details + repo below ↓

32,089 просмотров

JUST IN: Meta AI introduces Voicebox, an all-in-one generative speech model. Voicebox is an impressive breakthrough! It could do for speech what other models like GPT-3 and Stable Diffusion have done for text and images. Some key details: - Voicebox can synthesize speech across 6 languages - It's a general-purpose model that can perform tasks it wasn't trained on. It can perform noise removal, content editing, style conversion, and more - Supports in-context text-to-speech synthesis and cross-lingual style transfer - It's 20x faster than current models and outperforms single-purpose models through in-context learning paper: blog:

JUST IN: Meta AI introduces Voicebox, an all-in-one generative speech model. Voicebox is an impressive breakthrough! It could do for speech what other models like GPT-3 and Stable Diffusion have done for text and images. Some key details: - Voicebox can synthesize speech across 6 languages - It's a general-purpose model that can perform tasks it wasn't trained on. It can perform noise removal, content editing, style conversion, and more - Supports in-context text-to-speech synthesis and cross-lingual style transfer - It's 20x faster than current models and outperforms single-purpose models through in-context learning paper: blog:

88,495 просмотров

Videos

omarsar0's profile picture

I have been testing DeepSeek-V4-Pro with the Pi coding agent. I am mindblown by how well it works out of the box. A few notes: I spent a few hours building an LLM wiki with an agent powered entirely by DeepSeek-V4-Pro on Fireworks AI inference. This is the first time I feel like there is an open-weight model that can reason at the level of Claude and Codex. And it does this in a cost-effective way with support for 1M context length. To be clear, I am using DeepSeek-V4-Pro inside of Pi without any special configuration. It works out of the box. It's exciting that there is a model that can just be plugged into a basic harness like Pi, and it just works. I've never seen that before. Most models require lots of configuration and setup. DeepSeek's DeepSeek-V4-Pro is clearly good at agentic coding (probably the best from the open-weight models), but the model is also great on knowledge-intensive tasks where reasoning matters. The agent pulled agentic engineering best practices from different company docs (Anthropic, OpenAI, Google, Stripe, Meta, Modal, DeepSeek, Mistral, Cohere), searched and digested Reddit and HN threads, summarized arxiv papers, and surfaced trending GitHub repos. Then it distilled everything into actionable tips across categories. I love the Wiki it built. The quality is really good. Here is a snapshot of what the wiki looks like: DeepSeek-V4-Pro handled the task without breaking stride. Multi-step research queries, code generation for scaffolding, context-heavy reasoning across disparate sources. For coding specifically, this is the first open-weight model that genuinely feels like a Codex or Claude Code experience. It compares in capability and actual multi-turn agentic work. What made the loop feel so responsive was Fireworks' inference speed (the fastest in the market) and the fact that they actually validate models at the systems level before shipping. No corrupted reasoning traces. Just fast, reliable iteration. The hybrid CSA and HCA attention design cuts KV cache to just 10% and inference FLOPs by nearly 4x at 1M-token context. This is what makes the agent loop actually fast and cheap enough to run in practice. For devs who've been watching open-weight models close the gap but haven't found one that actually delivers in practice, this is the closest I've seen. Try it here:

elvis

55,948 просмотров • 1 месяц назад