elvis

@omarsar0 • 311,096 subscribers

Building self-improving AI @dair_ai • Prev: Meta AI | PhD • Learn about AI Agents for FREE here: https://t.co/P5SA9u54xO

Shorts

This blew up more than I expected. To minimize the negative impact of these changes, my orchestrator can now smoothly switch/handoff between any provider/model (e.g., fable 5 -> gpt-5-sol) Just one of the benefits of owning the harness & orchestrator.

This blew up more than I expected. To minimize the negative impact of these changes, my orchestrator can now smoothly switch/handoff between any provider/model (e.g., fable 5 -> gpt-5-sol) Just one of the benefits of owning the harness & orchestrator.

62,278 次观看

Building a personal knowledge base for my agents is increasingly where I spend my time these days. Like Andrej Karpathy, I also use Obsidian for my MD vaults. What's different in my approach is that I curate research papers on a daily basis and have actually tuned a Skill for months to find high-signal, relevant papers. I was reviewing and curating papers manually for some time, but now it's all automated as it has gotten so good at capturing what I consider the best of the best. There are so many papers these days, so this is a big deal. You all get to benefit from that with the papers I feature in my timeline and on DAIR.AI. The papers are indexed using tobi lutke qmd cli tool (all of it in markdown files along with useful metadata). So good for semantic search and surfacing insights, unlike anything out there. I am a visual person, so I then started to experiment with how to leverage this personal knowledge base of research papers inside my new interactive artifact generator (mcp tools inside my agent orchestrator system). The result is what you see in the clip. 100s of papers with all sorts of insights visualized. I keep track of research papers daily, so believe me when I tell you that this system is absolutely insane at surfacing insights. This is the result of months of tinkering on how to index research and leverage agent automations for wikification and robust documentation. But this is just the beginning. The visual artifact (which is interactive too) can be changed dynamically as I please. I can prompt my agent to throw any data at it. I can add different views to the data. Different interactions. I feel like this is the most personalized research system I have ever built and used, and it's not even close. The knowledge that the agents are able to surface from this basic setup is already extremely useful as I experiment with new agentic engineering concepts. I feel like this knowledge layer and the higher-level ones I am working on will allow me to maximize other automation tools like autoresearch. The research is only as good as the research questions. And the research questions are only as good as the insights the agents have access to. Where I am spending time now is on how to make this more actionable. I am obsessed about the search problem here. The automations, autoresearch, ralph research loop (I built one months ago) are easier to build but are only as good as what you feed them. Work in progress. More updates soon. Back to building.

Building a personal knowledge base for my agents is increasingly where I spend my time these days. Like Andrej Karpathy, I also use Obsidian for my MD vaults. What's different in my approach is that I curate research papers on a daily basis and have actually tuned a Skill for months to find high-signal, relevant papers. I was reviewing and curating papers manually for some time, but now it's all automated as it has gotten so good at capturing what I consider the best of the best. There are so many papers these days, so this is a big deal. You all get to benefit from that with the papers I feature in my timeline and on DAIR.AI. The papers are indexed using tobi lutke qmd cli tool (all of it in markdown files along with useful metadata). So good for semantic search and surfacing insights, unlike anything out there. I am a visual person, so I then started to experiment with how to leverage this personal knowledge base of research papers inside my new interactive artifact generator (mcp tools inside my agent orchestrator system). The result is what you see in the clip. 100s of papers with all sorts of insights visualized. I keep track of research papers daily, so believe me when I tell you that this system is absolutely insane at surfacing insights. This is the result of months of tinkering on how to index research and leverage agent automations for wikification and robust documentation. But this is just the beginning. The visual artifact (which is interactive too) can be changed dynamically as I please. I can prompt my agent to throw any data at it. I can add different views to the data. Different interactions. I feel like this is the most personalized research system I have ever built and used, and it's not even close. The knowledge that the agents are able to surface from this basic setup is already extremely useful as I experiment with new agentic engineering concepts. I feel like this knowledge layer and the higher-level ones I am working on will allow me to maximize other automation tools like autoresearch. The research is only as good as the research questions. And the research questions are only as good as the insights the agents have access to. Where I am spending time now is on how to make this more actionable. I am obsessed about the search problem here. The automations, autoresearch, ralph research loop (I built one months ago) are easier to build but are only as good as what you feed them. Work in progress. More updates soon. Back to building.

464,399 次观看

You can now use Claude inside After Effects. Higgsfield's new MCP connector lets Claude work inside your actual AE project. It can build compositions, set keyframes, write expressions, and run the repetitive ExtendScript work for you, and everything it makes becomes an editable AE scene. More creative tools should work this way: an assistant that does the routine work while you keep full manual control. Try it here:

You can now use Claude inside After Effects. Higgsfield's new MCP connector lets Claude work inside your actual AE project. It can build compositions, set keyframes, write expressions, and run the repetitive ExtendScript work for you, and everything it makes becomes an editable AE scene. More creative tools should work this way: an assistant that does the routine work while you keep full manual control. Try it here:

14,649 次观看

I have found it really useful to convert notes/essays/paper into Artifacts. And it seems Fable 5 is really good at it. Artifacts are great for building deeper intuition on any topic. Just shared this one I did on Satya's latest essay on the Reverse Information Paradox.

I have found it really useful to convert notes/essays/paper into Artifacts. And it seems Fable 5 is really good at it. Artifacts are great for building deeper intuition on any topic. Just shared this one I did on Satya's latest essay on the Reverse Information Paradox.

11,900 次观看

Been exploring a new way to explore AI research papers to discover deeper insights. Agents are at the center of it. So far, I've built this little interactive artifact generator in my orchestrator to visualize things. This allows me to change views and insights (on-demand) from 100s of papers. Just scratching the surface here. More to share soon.

Been exploring a new way to explore AI research papers to discover deeper insights. Agents are at the center of it. So far, I've built this little interactive artifact generator in my orchestrator to visualize things. This allows me to change views and insights (on-demand) from 100s of papers. Just scratching the surface here. More to share soon.

138,438 次观看

I just open-sourced my /learn skill. Learn anything with agents and HTML artifacts. I have been learning about all kinds of topics with it. Install the skill and interact with any agent to help you through any topic. Ask it to generate visual and interactive artifacts and help you go deeper or generate knowledge checks (e.g., quizzes). Upskilling myself on any topic is one of the most impactful ways I have been able to use AI agents. If you are a DAIR Academy pro member, you can use it with our AI Builder. Skill: Try now:

I just open-sourced my /learn skill. Learn anything with agents and HTML artifacts. I have been learning about all kinds of topics with it. Install the skill and interact with any agent to help you through any topic. Ask it to generate visual and interactive artifacts and help you go deeper or generate knowledge checks (e.g., quizzes). Upskilling myself on any topic is one of the most impactful ways I have been able to use AI agents. If you are a DAIR Academy pro member, you can use it with our AI Builder. Skill: Try now:

34,407 次观看

As an ML Engineer, this is one of the most useful applications of GPT-4 I've seen. Chat Explore is a powerful AI-powered data exploration tool. Here’s why I am so impressed:

As an ML Engineer, this is one of the most useful applications of GPT-4 I've seen. Chat Explore is a powerful AI-powered data exploration tool. Here’s why I am so impressed:

716,652 次观看

Just built an insane new agent skill. It can perfectly extract slides from YT videos, then write notes, images, transcripts, and slides into Obsidian vaults. An HTML artifact allows me to navigate and add more notes as I listen. Should I release the skill?

Just built an insane new agent skill. It can perfectly extract slides from YT videos, then write notes, images, transcripts, and slides into Obsidian vaults. An HTML artifact allows me to navigate and add more notes as I listen. Should I release the skill?

45,837 次观看

This is insane! 🤯 Just built a new skill in Claude Code using Opus 4.5. The skill uses Gemini 3 Pro (via API) for designing web pages. Look at what it generated from one simple prompt.

This is insane! 🤯 Just built a new skill in Claude Code using Opus 4.5. The skill uses Gemini 3 Pro (via API) for designing web pages. Look at what it generated from one simple prompt.

152,910 次观看

Increasingly, HTML Artifacts are becoming a core part of how I work with AI agents. Long-horizon agent sessions need a better way to surface insights about what work it has done. This may not be obvious right now, but as you start to let your agent work on dynamic workflows, large codebases, long-running loops (e.g., using /goal), and deep research tasks, you need a good way to present results. Chat window is not it. You also don't want to just trust everything the agents do. Artifacts help provide an important verification layer, which in turn enables important decision-making. I like HTML artifacts because I can just ask the agent to produce as many of them (and in whatever form) as I need to verify the work and make sense out of everything. I even built a nice tab system for my artifacts. They are great for continual learning and research. I use HTML artifacts for logging, tracking experiments, brainstorming, managing my inbox, code reviews, agent session management, deep research, writing, reading, and so much more. I believe Andrej Karpathy wrote about this somewhere: As we move on to more advanced applications of AI agents and outputs get more complex, we will start to find the need for even more advanced forms of interactions with AI, including interactive neural videos/simulations.

Increasingly, HTML Artifacts are becoming a core part of how I work with AI agents. Long-horizon agent sessions need a better way to surface insights about what work it has done. This may not be obvious right now, but as you start to let your agent work on dynamic workflows, large codebases, long-running loops (e.g., using /goal), and deep research tasks, you need a good way to present results. Chat window is not it. You also don't want to just trust everything the agents do. Artifacts help provide an important verification layer, which in turn enables important decision-making. I like HTML artifacts because I can just ask the agent to produce as many of them (and in whatever form) as I need to verify the work and make sense out of everything. I even built a nice tab system for my artifacts. They are great for continual learning and research. I use HTML artifacts for logging, tracking experiments, brainstorming, managing my inbox, code reviews, agent session management, deep research, writing, reading, and so much more. I believe Andrej Karpathy wrote about this somewhere: As we move on to more advanced applications of AI agents and outputs get more complex, we will start to find the need for even more advanced forms of interactions with AI, including interactive neural videos/simulations.

36,789 次观看

I just built my own wiki generator plugin for my agents. My agents can now generate wikis for anything I ask. One of my favorite wikis is called PaperWiki. This is a great example of what Andrej Karpathy describes. It uses obsidian vaults to organize papers, retrieve LLM-generated summaries, diagrams, and other advanced views for paper exploration. When Obsidian UI is not enough, I use my own artifact generator inside my agent orchestrator (see clip for example). This allows my agents to build any kind of view or exploration feature that I need. The papers are all curated with automations and several rules/patterns I have manually built over the years. On the surface, this looks basic. But behind the scenes, there are advanced search capabilities, connections, metadata, derived data, and other interesting bits of information that are extremely useful for my research agents. This is mostly built for agents. The artifact preview is just a high-level way to validate and quickly assess the quality of the wiki, suggest improvements, and it's also great for research. I use tobi lutke's qmd for all search capabilities. Everything is markdown. The summaries and even the diagrams. The wiki updates on its own based on several automations I have optimized over the past couple of weeks. The wiki grows and self-improves based on several requirements important for my research use cases. This is as personalized as it gets. There is nothing like it out there. And I use my research expertise to continue improving it over time. This is a vanilla wiki. There are so many things I want to build on top of this. Different aggregations, views, artifacts, etc. All to help automate more of my research work and accelerate productivity. I think the biggest leverage here is how powerful this could be for discovery and experimentation. One of my goals is to use it to find deeper connections and insights that would otherwise elude the top human researchers and use those to generate interesting new hypotheses and research experiments. That way, my agents can use autoresearch to explore research ideas at the frontier. Stay tuned for more.

I just built my own wiki generator plugin for my agents. My agents can now generate wikis for anything I ask. One of my favorite wikis is called PaperWiki. This is a great example of what Andrej Karpathy describes. It uses obsidian vaults to organize papers, retrieve LLM-generated summaries, diagrams, and other advanced views for paper exploration. When Obsidian UI is not enough, I use my own artifact generator inside my agent orchestrator (see clip for example). This allows my agents to build any kind of view or exploration feature that I need. The papers are all curated with automations and several rules/patterns I have manually built over the years. On the surface, this looks basic. But behind the scenes, there are advanced search capabilities, connections, metadata, derived data, and other interesting bits of information that are extremely useful for my research agents. This is mostly built for agents. The artifact preview is just a high-level way to validate and quickly assess the quality of the wiki, suggest improvements, and it's also great for research. I use tobi lutke's qmd for all search capabilities. Everything is markdown. The summaries and even the diagrams. The wiki updates on its own based on several automations I have optimized over the past couple of weeks. The wiki grows and self-improves based on several requirements important for my research use cases. This is as personalized as it gets. There is nothing like it out there. And I use my research expertise to continue improving it over time. This is a vanilla wiki. There are so many things I want to build on top of this. Different aggregations, views, artifacts, etc. All to help automate more of my research work and accelerate productivity. I think the biggest leverage here is how powerful this could be for discovery and experimentation. One of my goals is to use it to find deeper connections and insights that would otherwise elude the top human researchers and use those to generate interesting new hypotheses and research experiments. That way, my agents can use autoresearch to explore research ideas at the frontier. Stay tuned for more.

66,903 次观看

My new favorite skill is /learn. I built it to learn any topic at whatever level you like. It combines two of my passions: artifacts and learning. Coming soon to the DAIR.AI academy.

My new favorite skill is /learn. I built it to learn any topic at whatever level you like. It combines two of my passions: artifacts and learning. Coming soon to the DAIR.AI academy.

23,141 次观看

We are entering an extremely exciting era for open-weight models. Kimi K2.6 now feels like a top agentic model. I took it for a spin via Fireworks AI fast inference APIs. Kimi K2.6 has impressive agentic capabilities, design skills, and the ability to synthesize large amounts of information. I built a little Skill that produces survey papers on any AI research topic you want. (see example in the clip) You can use the skill to tell your agent to generate a survey on whatever topic and watch it go to work. The artifact was fully generated by Kimi.ai's Kimi K2.6. It's cheap and fast. Next step for me is to explore ways to continue integrating the capabilities of these models on use cases like automating my LLM knowledge bases and augmenting my agent memory capabilities. Stay tuned for more.

We are entering an extremely exciting era for open-weight models. Kimi K2.6 now feels like a top agentic model. I took it for a spin via Fireworks AI fast inference APIs. Kimi K2.6 has impressive agentic capabilities, design skills, and the ability to synthesize large amounts of information. I built a little Skill that produces survey papers on any AI research topic you want. (see example in the clip) You can use the skill to tell your agent to generate a survey on whatever topic and watch it go to work. The artifact was fully generated by Kimi.ai's Kimi K2.6. It's cheap and fast. Next step for me is to explore ways to continue integrating the capabilities of these models on use cases like automating my LLM knowledge bases and augmenting my agent memory capabilities. Stay tuned for more.

47,678 次观看

LLM Knowledge Base → Slides When Andrej Karpathy shared his LLM Knowledge Base setup, many were wondering how to generate more visual forms of the wiki. There are many options, but I think Gamma is one of the best at producing high-quality, rich presentations. To showcase this, I just built a pipeline that turns my AI papers wiki (1K+ papers across 20 AI agent topics) into polished slide presentations using Gamma. The flow: Obsidian vault → Gamma MCP → embedded preview in my dashboard. I give one command to my agent, which pulls the top papers from each topic (via the wiki), feeds them to Gamma, and renders the presentation inline. The Gamma connector for Claude is a great choice for generating beautiful and professional slides. Easy to use. Go to your Claude instance and add the official Gamma connector. That's it! Claude Code will now have access to all the necessary MCP tools for generating slides. I use the Claude Agent SDK for my agent orchestrator, so I use the official Gamma MCP tools and embed the generated slides in an iframe via my artifact preview. See the clip below for an example.

LLM Knowledge Base → Slides When Andrej Karpathy shared his LLM Knowledge Base setup, many were wondering how to generate more visual forms of the wiki. There are many options, but I think Gamma is one of the best at producing high-quality, rich presentations. To showcase this, I just built a pipeline that turns my AI papers wiki (1K+ papers across 20 AI agent topics) into polished slide presentations using Gamma. The flow: Obsidian vault → Gamma MCP → embedded preview in my dashboard. I give one command to my agent, which pulls the top papers from each topic (via the wiki), feeds them to Gamma, and renders the presentation inline. The Gamma connector for Claude is a great choice for generating beautiful and professional slides. Easy to use. Go to your Claude instance and add the official Gamma connector. That's it! Claude Code will now have access to all the necessary MCP tools for generating slides. I use the Claude Agent SDK for my agent orchestrator, so I use the official Gamma MCP tools and embed the generated slides in an iframe via my artifact preview. See the clip below for an example.

47,204 次观看

GPT-5.5 in Codex is a delight to work with: - Super sharp with responses - It understands intent better than any model - Great "personality" - Gets lots of stuff done without pausing unnecessarily It generated this beautiful artifact design. Huge win for OpenAI.

GPT-5.5 in Codex is a delight to work with: - Super sharp with responses - It understands intent better than any model - Great "personality" - Gets lots of stuff done without pausing unnecessarily It generated this beautiful artifact design. Huge win for OpenAI.

36,795 次观看

Simplicity is at the heart of great software. This is one of the reasons why Claude Code has been sticky for me. As a builder, I love planning and brainstorming, and this is now a key focus of Claude Code. I use Shift + Tab a lot to cycle between brainstorming, planning, and execution. This functionality provides the appropriate interface for me to either be very involved or less involved as I please. This works particularly well when building out new and complex features or entire new projects. This saves a huge amount of time. It allows me to tune Claude Code to execute and build more effectively. It also builds a loop of trust, and I often (surprisingly) find Claude Code asking for clarifications when it's confused. Coding agents don't normally do that. I have shared before on the power of brainstorming with AI for longer times. Try it and you will not be disappointed. Vibe coding is fun, but pair it with intentional development cycles, and you watch how far you can take a project with coding agents today.

Simplicity is at the heart of great software. This is one of the reasons why Claude Code has been sticky for me. As a builder, I love planning and brainstorming, and this is now a key focus of Claude Code. I use Shift + Tab a lot to cycle between brainstorming, planning, and execution. This functionality provides the appropriate interface for me to either be very involved or less involved as I please. This works particularly well when building out new and complex features or entire new projects. This saves a huge amount of time. It allows me to tune Claude Code to execute and build more effectively. It also builds a loop of trust, and I often (surprisingly) find Claude Code asking for clarifications when it's confused. Coding agents don't normally do that. I have shared before on the power of brainstorming with AI for longer times. Try it and you will not be disappointed. Vibe coding is fun, but pair it with intentional development cycles, and you watch how far you can take a project with coding agents today.

81,765 次观看

Excited to launch a new way to upskill with AI agents. This is how we are making it possible for anyone to learn to build with coding agents. To start, we are launching 4 new hands-on labs on the following topics: - Agent Skills - Agentic Image Generation - 30 Days of Hermes Agents - Prompt Engineering with Agents I am confident that with our new DAIR.AI platform, anyone can learn to become a top AI builder by building and acquiring highly-demanded AI skills. And there is a lot more landing in the coming weeks.

Excited to launch a new way to upskill with AI agents. This is how we are making it possible for anyone to learn to build with coding agents. To start, we are launching 4 new hands-on labs on the following topics: - Agent Skills - Agentic Image Generation - 30 Days of Hermes Agents - Prompt Engineering with Agents I am confident that with our new DAIR.AI platform, anyone can learn to become a top AI builder by building and acquiring highly-demanded AI skills. And there is a lot more landing in the coming weeks.

17,141 次观看

LLM Artifacts Connected to Andrej Karpathy's LLM Knowledge base idea, I've been building out a fun way to generate dynamic artifacts from these knowledge bases with the goal of discovering and revealing meaningful and deeper insights. LLM KBs are hard to consume for humans, as I think they are more built for agents. So the question is, what form would be useful for humans to take actions and make important decisions? That's what I am trying to figure out with these artifacts. The artifact example shows a pulse on HN discussions around AI-related stories. The insights can go deeper, of course, but this is already super fun and thought-provoking, like some of my favorite podcasts. The format and depth matter a lot. The aggregation skills of agents are outstanding if you tune the prompts and skill carefully. I built this artifact generator in a few minutes through an agent skill, but I feel like there are so many ways that LLM-generated information can be used and consumed. Like generating deeper insights and analysis, and things that are just not feasible for humans today. The generated artifact (including its data and design) serves as reusable templates or can be updated in real-time via auomations, which is something I am also working on. It is truly an insane way to monitor and track information. Better than a newsletter. Better than newspapers. There is something about this that gets me really excited about the future of AI agents for knowledge generation and discovery. Lots of hidden gems everywhere just waiting to be discovered and acted on if the information is presented correctly. This is not perfect. The format, style/prose can be improved, but this is easy to customize via skill. You can personalize it to your liking. I feel like these dynamic artifacts are going to emerge as a strong new medium to stay on the cutting edge of things, both for agents and humans. My target is research, of course. This was just a basic example. Besides animation, I am also targeting other components like voice, videos, images, slides, etc. This space is full of opportunities to explore. Skill for this coming soon.

LLM Artifacts Connected to Andrej Karpathy's LLM Knowledge base idea, I've been building out a fun way to generate dynamic artifacts from these knowledge bases with the goal of discovering and revealing meaningful and deeper insights. LLM KBs are hard to consume for humans, as I think they are more built for agents. So the question is, what form would be useful for humans to take actions and make important decisions? That's what I am trying to figure out with these artifacts. The artifact example shows a pulse on HN discussions around AI-related stories. The insights can go deeper, of course, but this is already super fun and thought-provoking, like some of my favorite podcasts. The format and depth matter a lot. The aggregation skills of agents are outstanding if you tune the prompts and skill carefully. I built this artifact generator in a few minutes through an agent skill, but I feel like there are so many ways that LLM-generated information can be used and consumed. Like generating deeper insights and analysis, and things that are just not feasible for humans today. The generated artifact (including its data and design) serves as reusable templates or can be updated in real-time via auomations, which is something I am also working on. It is truly an insane way to monitor and track information. Better than a newsletter. Better than newspapers. There is something about this that gets me really excited about the future of AI agents for knowledge generation and discovery. Lots of hidden gems everywhere just waiting to be discovered and acted on if the information is presented correctly. This is not perfect. The format, style/prose can be improved, but this is easy to customize via skill. You can personalize it to your liking. I feel like these dynamic artifacts are going to emerge as a strong new medium to stay on the cutting edge of things, both for agents and humans. My target is research, of course. This was just a basic example. Besides animation, I am also targeting other components like voice, videos, images, slides, etc. This space is full of opportunities to explore. Skill for this coming soon.

31,190 次观看

Got a chance to try out Matt Pocock /teach skill. It's similar to my /learn skill. You can try the skill with Hermes Agent right now in our academy. I will keep the lab FREE for now. It's amazing to learn with AI agents. Go try it!

Got a chance to try out Matt Pocock /teach skill. It's similar to my /learn skill. You can try the skill with Hermes Agent right now in our academy. I will keep the lab FREE for now. It's amazing to learn with AI agents. Go try it!

13,717 次观看

This is one of the fastest ways to build a custom ChatGPT-like system on top of your data. It's called ChatLLM (by Abacus.AI). Here is a demo of how to build a simple custom chat LLM:

This is one of the fastest ways to build a custom ChatGPT-like system on top of your data. It's called ChatLLM (by Abacus.AI). Here is a demo of how to build a simple custom chat LLM:

227,166 次观看

Videos

Anya Rossi

sweetdream.ai

SweetDream.ai•Sponsored•Livecam

Watch Anya Live

Anya is streaming live right now! Join her private show and enjoy exclusive content.

Exclusive private shows

1.2k viewers online

Private Show

Join now for exclusive access

Free preview available • Premium content

Everyone keeps asking me how to build a second brain or an LLM wiki. Here is the easiest setup I have found. I took my Wiki Builder skill, installed it into HyperAgent (Hyperagent) as a reusable skill, and asked it to build a research wiki on LLM verification from the latest 2026 papers. Recorded a quick demo. Built with Hyperagent. It planned first, asked a few sharp questions, then did the research: 29 papers curated into 21 files, a research map, a glossary, and clean subfields. Now it is a knowledge base that my other research agents build on. HyperAgent has all the capabilities to allow your agents and skills to compound. That’s a powerful use of AI agents.

Everyone keeps asking me how to build a second brain or an LLM wiki. Here is the easiest setup I have found. I took my Wiki Builder skill, installed it into HyperAgent (Hyperagent) as a reusable skill, and asked it to build a research wiki on LLM verification from the latest 2026 papers. Recorded a quick demo. Built with Hyperagent. It planned first, asked a few sharp questions, then did the research: 29 papers curated into 21 files, a research map, a glossary, and clean subfields. Now it is a knowledge base that my other research agents build on. HyperAgent has all the capabilities to allow your agents and skills to compound. That’s a powerful use of AI agents.

223,746 次观看 • 5 天前

This Marble Skill Taxonomy is so good! It is an open taxonomy of what children learn across the primary/elementary years. Couldn't resist asking Fable 5 to generate an artifact to visualize it.

This Marble Skill Taxonomy is so good! It is an open taxonomy of what children learn across the primary/elementary years. Couldn't resist asking Fable 5 to generate an artifact to visualize it.

35,830 次观看 • 7 天前

LLM Wikis + HTML Artifacts are insanely powerful. You should seriously consider this in your workflows. LLM Wikis captures all the important information that lets you and your agents do meaningful work. HTML artifacts present that information in interesting ways that allow you to take important actions along with your agents. My HTML artifacts sit on top of my LLM wikis. They are dynamic and are easily extended as needs arise. I have hooked my Artifacts to talk to my agents, and similarly, the agents can talk to artifacts. This has allowed me to build powerful artifacts that reduce my inbox to zero, keep me updated on any topic of interest, fast prototyping, do deep research, design/trigger new experiments, generate figures to improve understanding, schedule research, search relevant information, discover topics, and so much more. What you see in the clip is not a website. It's a simple interactive HTML artifact. HTML artifacts are useful for designers, engineers, researchers, students, and anyone working with agents. Lastly, HTML doesn't replace Markdown. They are a much better combination working together.

LLM Wikis + HTML Artifacts are insanely powerful. You should seriously consider this in your workflows. LLM Wikis captures all the important information that lets you and your agents do meaningful work. HTML artifacts present that information in interesting ways that allow you to take important actions along with your agents. My HTML artifacts sit on top of my LLM wikis. They are dynamic and are easily extended as needs arise. I have hooked my Artifacts to talk to my agents, and similarly, the agents can talk to artifacts. This has allowed me to build powerful artifacts that reduce my inbox to zero, keep me updated on any topic of interest, fast prototyping, do deep research, design/trigger new experiments, generate figures to improve understanding, schedule research, search relevant information, discover topics, and so much more. What you see in the clip is not a website. It's a simple interactive HTML artifact. HTML artifacts are useful for designers, engineers, researchers, students, and anyone working with agents. Lastly, HTML doesn't replace Markdown. They are a much better combination working together.

246,912 次观看 • 2 个月前

LLM Wikis are being slept on. I argue that creating knowledge bases with LLMs or coding agents is one of the most valuable applications of AI today. It's about being intentional in building and scaling your intelligence stack. To showcase this, I wanted to share an LLM Wiki I have built over the last couple of months. It's called PaperWiki, and I use it across all my research workflows, along with my research agents. In fact, I also use it to curate papers I share with my communities, newsletter, and on X. The PaperWiki is updated regularly with automations, so I basically have agents on a loop maintaining it. All the entries are ingested from different sources and stored in a vault (Obsidian) and further indexed using qmd. And then further presented via an HTML artifact. So all of it is easily accessible to all my agents and easily searchable through full-text search and rich semantic search. The structure of the wiki has proven significantly useful to start interesting and exciting cutting-edge research projects with my research agents (from building tiny and more efficient gpt/difussion llms to building out SoTA harnesses and memory systems). It turns out that agents love markdown files and can more easily navigate the papers given the rich metadata structure of the wiki. I am just getting started on this, but it's clear to me that we should all be experimenting with LLM Wikis. Here's why: Building LLM knowledge bases gets you into the habit of leveraging AI outputs in all kinds of creative ways. It's the good kind of tokenmaxxing we should all be pushing for. LLM Wikis can be maintained automatically in a loop. I use an automation that updates the wiki every day based on papers I curate. The curation is another automation I run in a loop (with a bit of human in the loop), so I get to build on all my previous knowledge and expertise, and all of it compounds the deeper the integration/layers. One interesting result of this process is that I feel like I can better spot high-quality papers and remove noise more easily. Social media could never solve that. And most paper aggregators use metrics I simply don't trust. I like that agents can help with the noise vs. signal problem. This is important for research. Lots of people consider agents to produce mostly slop. But it doesn't have to be that way. Careful curations, prompts, automations, verifiers, and human-in-the-loop can produce some astonishing results. And you really don't need frontier models for this. I use a combination of frontier models (opus-4.8) and open-weight models (deepseek-v4-flash) to maintain this. An exciting future work (we are working on this DAIR.AI) is to tune specialized models on top of this to allow LLMs to quickly understand cutting-edge research ideas and can better conceptualize research strategies that further accelerate scientific research agents. I plan to open-source a bunch of this work, including the artifact, but this is currently work in progress, and I was excited to share some thoughts as I continue working on it. Sharing more as I go. Stay tuned!

LLM Wikis are being slept on. I argue that creating knowledge bases with LLMs or coding agents is one of the most valuable applications of AI today. It's about being intentional in building and scaling your intelligence stack. To showcase this, I wanted to share an LLM Wiki I have built over the last couple of months. It's called PaperWiki, and I use it across all my research workflows, along with my research agents. In fact, I also use it to curate papers I share with my communities, newsletter, and on X. The PaperWiki is updated regularly with automations, so I basically have agents on a loop maintaining it. All the entries are ingested from different sources and stored in a vault (Obsidian) and further indexed using qmd. And then further presented via an HTML artifact. So all of it is easily accessible to all my agents and easily searchable through full-text search and rich semantic search. The structure of the wiki has proven significantly useful to start interesting and exciting cutting-edge research projects with my research agents (from building tiny and more efficient gpt/difussion llms to building out SoTA harnesses and memory systems). It turns out that agents love markdown files and can more easily navigate the papers given the rich metadata structure of the wiki. I am just getting started on this, but it's clear to me that we should all be experimenting with LLM Wikis. Here's why: Building LLM knowledge bases gets you into the habit of leveraging AI outputs in all kinds of creative ways. It's the good kind of tokenmaxxing we should all be pushing for. LLM Wikis can be maintained automatically in a loop. I use an automation that updates the wiki every day based on papers I curate. The curation is another automation I run in a loop (with a bit of human in the loop), so I get to build on all my previous knowledge and expertise, and all of it compounds the deeper the integration/layers. One interesting result of this process is that I feel like I can better spot high-quality papers and remove noise more easily. Social media could never solve that. And most paper aggregators use metrics I simply don't trust. I like that agents can help with the noise vs. signal problem. This is important for research. Lots of people consider agents to produce mostly slop. But it doesn't have to be that way. Careful curations, prompts, automations, verifiers, and human-in-the-loop can produce some astonishing results. And you really don't need frontier models for this. I use a combination of frontier models (opus-4.8) and open-weight models (deepseek-v4-flash) to maintain this. An exciting future work (we are working on this DAIR.AI) is to tune specialized models on top of this to allow LLMs to quickly understand cutting-edge research ideas and can better conceptualize research strategies that further accelerate scientific research agents. I plan to open-source a bunch of this work, including the artifact, but this is currently work in progress, and I was excited to share some thoughts as I continue working on it. Sharing more as I go. Stay tuned!

54,713 次观看 • 16 天前

Guess which is Fugu Ultra? This is how recent models compare when generating endless procedural terrain (using Three.js). All of these are one-shotted! Just wild! Trying a few more examples. Will share soon!

Guess which is Fugu Ultra? This is how recent models compare when generating endless procedural terrain (using Three.js). All of these are one-shotted! Just wild! Trying a few more examples. Will share soon!

79,273 次观看 • 26 天前

Fable 5 is an absolute beast at threejs. It's the best LLM at generating 3D simulations/worlds. Watch how I combine it with gpt-realtime-2 to generate interactive educational 3D worlds DAIR.AI. Comment below if you want to see a breakdown.

Fable 5 is an absolute beast at threejs. It's the best LLM at generating 3D simulations/worlds. Watch how I combine it with gpt-realtime-2 to generate interactive educational 3D worlds DAIR.AI. Comment below if you want to see a breakdown.

37,736 次观看 • 13 天前

Who did it best? GLM-5.2 (left) | Fugu Ultra (middle) | Fable 5 (right) Same one-shot prompt. The last one is my favorite!

Who did it best? GLM-5.2 (left) | Fugu Ultra (middle) | Fable 5 (right) Same one-shot prompt. The last one is my favorite!

47,223 次观看 • 17 天前

I am hooked on Dynamic Workflows! The idea of generating harnesses on the fly is so compelling that I reverse-engineered it for my agent orchestrator. And then I built a monitoring dashboard (as an HTML artifact) to track tasks, metrics, and reports. I can now use and monitor dynamic workflows in my agent orchestrator with coding agents like Claude Code, Codex, Pi, and even my own custom-built DAIR.AI agent. This is clearly the future of working with agents to accomplish complex, long-running tasks. Some use cases I'm having success with: - Branching deep research tasks (with verification) - Parallel deep research tasks - Session mining of all my agent sessions - Bug hunting - Triaging - Fact-checking - LLM councils - AI simulations - Data synthesis - Evals generation ... and many others Dynamic workflows, like agent skills, feel like an important primitive to not only get the most out of agents but also incorporate dynamic behaviors and important components like cooperation and verification. There is so much exploration ground here. The exciting part is that this is not limited to coding tasks; it extends to business use cases and many other technical domains like science and research.

I am hooked on Dynamic Workflows! The idea of generating harnesses on the fly is so compelling that I reverse-engineered it for my agent orchestrator. And then I built a monitoring dashboard (as an HTML artifact) to track tasks, metrics, and reports. I can now use and monitor dynamic workflows in my agent orchestrator with coding agents like Claude Code, Codex, Pi, and even my own custom-built DAIR.AI agent. This is clearly the future of working with agents to accomplish complex, long-running tasks. Some use cases I'm having success with: - Branching deep research tasks (with verification) - Parallel deep research tasks - Session mining of all my agent sessions - Bug hunting - Triaging - Fact-checking - LLM councils - AI simulations - Data synthesis - Evals generation ... and many others Dynamic workflows, like agent skills, feel like an important primitive to not only get the most out of agents but also incorporate dynamic behaviors and important components like cooperation and verification. There is so much exploration ground here. The exciting part is that this is not limited to coding tasks; it extends to business use cases and many other technical domains like science and research.

102,878 次观看 • 1 个月前

LingBot-VLA 2.0 is an impressive new embodied model. Open source and is trained across diverse robot configurations, from single-arm robots to humanoid platforms. It packs 60K hours of curated robot and human data into one generalist policy. It improves robots on difficult long-horizon tasks. Great release by Robbyant.

LingBot-VLA 2.0 is an impressive new embodied model. Open source and is trained across diverse robot configurations, from single-arm robots to humanoid platforms. It packs 60K hours of curated robot and human data into one generalist policy. It improves robots on difficult long-horizon tasks. Great release by Robbyant.

10,464 次观看 • 3 天前

Most world models fall apart after a few seconds. Common failure modes include texture smearing, warped geometry, and scenes that no longer look real. LingBot-World 2.0 from Robbyant seems to hold 720p at 60 fps for a full hour of interaction. That’s impressive. Here is what makes that possible.

Most world models fall apart after a few seconds. Common failure modes include texture smearing, warped geometry, and scenes that no longer look real. LingBot-World 2.0 from Robbyant seems to hold 720p at 60 fps for a full hour of interaction. That’s impressive. Here is what makes that possible.

11,920 次观看 • 4 天前

This is insane! I just used the new Claude Code Playground plugin to level up my Nano Banana Image generator skill. My skill has a self-improving loop, but with the playground skill, I can also pass precise annotations to nano banana as it improves the images. It's so good!

This is insane! I just used the new Claude Code Playground plugin to level up my Nano Banana Image generator skill. My skill has a self-improving loop, but with the playground skill, I can also pass precise annotations to nano banana as it improves the images. It's so good!

286,603 次观看 • 5 个月前

LLM-as-a-Judge explained in ~10 mins. Knowing how to build AI verifiers and judges is one of the most important emerging AI skills today. Here is a quick intro on the topic and where to learn how to apply LLM-as-a-Judge.

LLM-as-a-Judge explained in ~10 mins. Knowing how to build AI verifiers and judges is one of the most important emerging AI skills today. Here is a quick intro on the topic and where to learn how to apply LLM-as-a-Judge.

34,576 次观看 • 19 天前

o3-mini-high (left) vs. deepseek-r1 (right) results from the first try deepseek-r1 is cracked... wtf!

o3-mini-high (left) vs. deepseek-r1 (right) results from the first try deepseek-r1 is cracked... wtf!

719,808 次观看 • 1 年前

This is just mindblowing stuff! I couldn't resist replicating this workflow to generate 3D biological structures. In a few minutes, I designed an artifact specifically built to generate these for any topic. Stack: - HTML Artifact to view diagrams - Gemini Nano Pro for concept generation - Tripo for generative 3D - Codex for assembling everything AI will exponentially accelerate learning and democratize high-quality education. Stay tuned! We have a few releases on this front.

This is just mindblowing stuff! I couldn't resist replicating this workflow to generate 3D biological structures. In a few minutes, I designed an artifact specifically built to generate these for any topic. Stack: - HTML Artifact to view diagrams - Gemini Nano Pro for concept generation - Tripo for generative 3D - Codex for assembling everything AI will exponentially accelerate learning and democratize high-quality education. Stay tuned! We have a few releases on this front.

108,123 次观看 • 2 个月前

OMG! Fugu Ultra is ridiculously good at these 3D renders.

OMG! Fugu Ultra is ridiculously good at these 3D renders.

43,780 次观看 • 26 天前

Introducing ralph-research plugin. I just adopted the ralph-loop for implementing papers. Mindblown how good this works already. The entire plugin was one-shotted by Claude Code, but it can already code AI paper concepts and run experiments in a self-improving loop. Wild!

Introducing ralph-research plugin. I just adopted the ralph-loop for implementing papers. Mindblown how good this works already. The entire plugin was one-shotted by Claude Code, but it can already code AI paper concepts and run experiments in a self-improving loop. Wild!

221,360 次观看 • 6 个月前

Great essay by Tobi. Building an AI-native company? Go read it now. I couldn't resist visualizing it with my artifact generator. Biggest takeaway for me: "The risk isn't that AI does the work. It's that nobody learns from it."

Great essay by Tobi. Building an AI-native company? Go read it now. I couldn't resist visualizing it with my artifact generator. Biggest takeaway for me: "The risk isn't that AI does the work. It's that nobody learns from it."

77,563 次观看 • 2 个月前

Obsessed with our new /learn skill. It's my favorite way of learning and researching topics. The agent creates a learning plan and a learning hub (artifact) that adjusts per learner needs and progress.

Obsessed with our new /learn skill. It's my favorite way of learning and researching topics. The agent creates a learning plan and a learning hub (artifact) that adjusts per learner needs and progress.

29,227 次观看 • 24 天前

Loop engineering is great until something breaks. Here is how I improve the reliability of my agentic loops. I use human-in-the-loop (HITL). It's easy and extremely effective. Anyone can build this. My setup: I recorded a quick demo of how it all works. I shared recently that I now use more voice agents to build and communicate with agents. I also use them to verify. I hate the idea of being tied down to my computer or in a Slack channel to communicate with my agents. Here is what I have done to streamline communication with my agents. All my Claude and Codex agent sessions now use the Dial MCP server. It has a bunch of tools and provisions my agents with their own number that can place calls as native tools, with voice, SMS, and iMessage behind one interface. As my loops/automations work on PRs and new features, my agents escalate decisions to me via a short phone call. This is extremely useful when I am on the road or away from my desk. If you want to try this with Claude Code or Codex, paste this into your agent and get started right away: "Get yourself a Dial phone number and call me. Say hello and that setup is working, then hang up. Follow Natan Voitenkov and team are building something special here. Go check them out. Give your agent a phone number now: ($5 free credit)

Loop engineering is great until something breaks. Here is how I improve the reliability of my agentic loops. I use human-in-the-loop (HITL). It's easy and extremely effective. Anyone can build this. My setup: I recorded a quick demo of how it all works. I shared recently that I now use more voice agents to build and communicate with agents. I also use them to verify. I hate the idea of being tied down to my computer or in a Slack channel to communicate with my agents. Here is what I have done to streamline communication with my agents. All my Claude and Codex agent sessions now use the Dial MCP server. It has a bunch of tools and provisions my agents with their own number that can place calls as native tools, with voice, SMS, and iMessage behind one interface. As my loops/automations work on PRs and new features, my agents escalate decisions to me via a short phone call. This is extremely useful when I am on the road or away from my desk. If you want to try this with Claude Code or Codex, paste this into your agent and get started right away: "Get yourself a Dial phone number and call me. Say hello and that setup is working, then hang up. Follow Natan Voitenkov and team are building something special here. Go check them out. Give your agent a phone number now: ($5 free credit)

15,295 次观看 • 11 天前

Kimi K2 Thinking is a bigger deal than I thought! I just ran a quick eval on a deep agent I built for customer support. It's on par with GPT-5; no other LLM has reached this level of agentic, orchestration, and reasoning capabilities. Huge for agentic and reasoning tasks.

Kimi K2 Thinking is a bigger deal than I thought! I just ran a quick eval on a deep agent I built for customer support. It's on par with GPT-5; no other LLM has reached this level of agentic, orchestration, and reasoning capabilities. Huge for agentic and reasoning tasks.

228,508 次观看 • 8 个月前