
elvis
@omarsar0 • 305,984 subscribers
Building self-improving AI @dair_ai • Prev: Meta AI | PhD • Learn about AI Agents for FREE here: https://t.co/P5SA9u54xO
Shorts
Videos

LLM Wikis + HTML Artifacts are insanely powerful. You should seriously consider this in your workflows. LLM Wikis captures all the important information that lets you and your agents do meaningful work. HTML artifacts present that information in interesting ways that allow you to take important actions along with your agents. My HTML artifacts sit on top of my LLM wikis. They are dynamic and are easily extended as needs arise. I have hooked my Artifacts to talk to my agents, and similarly, the agents can talk to artifacts. This has allowed me to build powerful artifacts that reduce my inbox to zero, keep me updated on any topic of interest, fast prototyping, do deep research, design/trigger new experiments, generate figures to improve understanding, schedule research, search relevant information, discover topics, and so much more. What you see in the clip is not a website. It's a simple interactive HTML artifact. HTML artifacts are useful for designers, engineers, researchers, students, and anyone working with agents. Lastly, HTML doesn't replace Markdown. They are a much better combination working together.
elvis244,834 Aufrufe • vor 26 Tagen

Just released my new /lesson-generator skill. Use it with your agent to learn anything: - generate lessons/courses on any topic - include nano-banana images with my /image-generator skill - present the course as an HTML artifact And it's also available to use in our academy.
elvis53,393 Aufrufe • vor 11 Tagen

This is just mindblowing stuff! I couldn't resist replicating this workflow to generate 3D biological structures. In a few minutes, I designed an artifact specifically built to generate these for any topic. Stack: - HTML Artifact to view diagrams - Gemini Nano Pro for concept generation - Tripo for generative 3D - Codex for assembling everything AI will exponentially accelerate learning and democratize high-quality education. Stay tuned! We have a few releases on this front.
elvis106,045 Aufrufe • vor 24 Tagen

This is insane! I just used the new Claude Code Playground plugin to level up my Nano Banana Image generator skill. My skill has a self-improving loop, but with the playground skill, I can also pass precise annotations to nano banana as it improves the images. It's so good!
elvis286,068 Aufrufe • vor 4 Monaten

The best way to learn AI is to build with agents. To help with that, we've launched hands-on labs and a new series on Agentic Engineering. First topic: Agent Skills. Next in the pipeline: planning, context engineering, multi-agent systems, long-running agents,.. Go build!
elvis29,596 Aufrufe • vor 12 Tagen

arXiv Papers → LLM Artifacts This is how I keep up with AI research now. It's like having access to the most personalized arXiv feed. Automations run everyday to curate papers based a set of rules and insights. Curated papers are indexed and power the artifacts. Agent convert papers to LLM wikis (based on Andrej Karpathy idea), which means insights are indexed and easily searchable and reusable. I feel like LLM Artifacts is the natural evolution to LLM Wikis. It's about making that knowledge actionable. Artifacts are customizable via agents. Artifacts can interact with agents and are dynamic in nature. Anything can be injected into the artifact as needed (insights, components, suggested experiments, action items, etc). I can take action on Artifact items with my agent orchestrator (Electron app). So I can ask questions about any paper and automate experiments in the background right from within the artifact. This is more than a visual. It's not a single prompt. It's several proactive agents coordinating to surface interesting facts, knowledge, and insights that I can act on a researcher. Agents are not just for generating useful artifacts, they are useful to keep learning and staying on the cutting edge of knowledge. Stay tuned for more.
elvis58,154 Aufrufe • vor 28 Tagen

o3-mini-high (left) vs. deepseek-r1 (right) results from the first try deepseek-r1 is cracked... wtf!
elvis719,763 Aufrufe • vor 1 Jahr

Introducing ralph-research plugin. I just adopted the ralph-loop for implementing papers. Mindblown how good this works already. The entire plugin was one-shotted by Claude Code, but it can already code AI paper concepts and run experiments in a self-improving loop. Wild!
elvis221,031 Aufrufe • vor 4 Monaten

I have been testing DeepSeek-V4-Pro with the Pi coding agent. I am mindblown by how well it works out of the box. A few notes: I spent a few hours building an LLM wiki with an agent powered entirely by DeepSeek-V4-Pro on Fireworks AI inference. This is the first time I feel like there is an open-weight model that can reason at the level of Claude and Codex. And it does this in a cost-effective way with support for 1M context length. To be clear, I am using DeepSeek-V4-Pro inside of Pi without any special configuration. It works out of the box. It's exciting that there is a model that can just be plugged into a basic harness like Pi, and it just works. I've never seen that before. Most models require lots of configuration and setup. DeepSeek's DeepSeek-V4-Pro is clearly good at agentic coding (probably the best from the open-weight models), but the model is also great on knowledge-intensive tasks where reasoning matters. The agent pulled agentic engineering best practices from different company docs (Anthropic, OpenAI, Google, Stripe, Meta, Modal, DeepSeek, Mistral, Cohere), searched and digested Reddit and HN threads, summarized arxiv papers, and surfaced trending GitHub repos. Then it distilled everything into actionable tips across categories. I love the Wiki it built. The quality is really good. Here is a snapshot of what the wiki looks like: DeepSeek-V4-Pro handled the task without breaking stride. Multi-step research queries, code generation for scaffolding, context-heavy reasoning across disparate sources. For coding specifically, this is the first open-weight model that genuinely feels like a Codex or Claude Code experience. It compares in capability and actual multi-turn agentic work. What made the loop feel so responsive was Fireworks' inference speed (the fastest in the market) and the fact that they actually validate models at the systems level before shipping. No corrupted reasoning traces. Just fast, reliable iteration. The hybrid CSA and HCA attention design cuts KV cache to just 10% and inference FLOPs by nearly 4x at 1M-token context. This is what makes the agent loop actually fast and cheap enough to run in practice. For devs who've been watching open-weight models close the gap but haven't found one that actually delivers in practice, this is the closest I've seen. Try it here:
elvis55,948 Aufrufe • vor 1 Monat

Kimi K2 Thinking is a bigger deal than I thought! I just ran a quick eval on a deep agent I built for customer support. It's on par with GPT-5; no other LLM has reached this level of agentic, orchestration, and reasoning capabilities. Huge for agentic and reasoning tasks.
elvis228,508 Aufrufe • vor 6 Monaten

Claude Code sub-agents is probably the easiest way to build complex, custom agentic systems. Not just for code, but for anything you can imagine. Watch me build a multi-agent deep research system with Claude Code subagents. Chain subagents with /commands for reliability.
elvis189,943 Aufrufe • vor 10 Monaten

YT Podcast → LLM Artifact This is now my favorite way to consume podcasts. Knowledge artifacts generated by agents. The agent (Opus 4.7) spots important insights, does deep analysis, and generates thought-provoking observations that really get me curious to research further. All the research goes into a self-improving wiki for later use by any of my agents. I am using Elevenlabs Scribe for diarization. Skills and scripts to guide the artifact generation. Artifacts are just plain HTML + JS (e.g., chart.js). You can go further on anything by selecting text and components (charts) and doing deeper research, as I show in the clip. I expect more people to use agents in this way.
elvis30,921 Aufrufe • vor 1 Monat

Claude 3.7 Sonnet has serious competition! Gemini 2.5 Pro is a legit good model for code. - code quality is really good - 1M token context - native multimodality - long code generation - understand large codebases I used it with Windsurf to generate an AI search agent app:
elvis211,476 Aufrufe • vor 1 Jahr

Finally got a chance to play around with Andrej Karpathy's LLM Council. I built it as a plugin inside of Claude Code. Hooked it up with OpenRouter for models. The AskUserQuestion tool came in handy to select the council and chairman. This is my first test, but I agree with Karpathy that the concept of LLM ensembles can be used beyond models that offer perspectives on interesting questions. I feel like this could have really cool applications in agentic coding. More on that soon. I built this as a plugin, so next I will be exploring other user cases around agentic coding, like evaluation, tool building, designing, and research. If there is enough interest, I will clean it up and push it out as an open plugin.
elvis79,529 Aufrufe • vor 4 Monaten