Loading video...

Video Failed to Load

Go Home

New RLM trajectory that blew my mind! I will use this one as the main example in the YT tutorial. I passed in a CSV containing transcripts of 320 episodes of the Lex Fridman podcast and asked it to find what his first 10 ML guests had to say...

38,963 views • 4 months ago •via X (Twitter)

0 Comments

No comments available

Comments from the original post will appear here

Related Videos

RLM is the most import foundation of my Pi Harness (other than Pi of course). It's seeded with late interaction retrieval results (thanks to @lightonai for pylate). The Agent initiates it with query then.. 𝐒𝐞𝐭𝐮𝐩 A python REPL is created and seeded with: 1. Late interaction search to pre-filter. Instead of doing top 3/5/10, it's top hundreds of documents. This is set into a `context` variable. 2. Python functions are loaded in to do more searches if `context` variable isn't enough. And to make llm calls with cheaper models in parallel batches. 𝐈𝐭𝐞𝐫𝐚𝐭𝐢𝐨𝐧 𝐋𝐨𝐨𝐩 From there, an LLM iterates in the REPL based on the query. It's just like exploring in a jupyter notebook. The LLM writes prose (like a markdown cell) and code to be run in the REPL each turn. This allows the LLM to sort, filter, and synthesize information. It can fan out and ask smaller models to summarize, combine, contrast, or do anything else to documents to help it understand the data. After several turns the LLM reponds with the final answer. Either because it found the answer, or hit the budget limit. Context as a Python variable, LLM as the programmer, REPL as the runtime. 𝐖𝐡𝐲 𝐃𝐨𝐞𝐬 𝐓𝐡𝐢𝐬 𝐖𝐨𝐫𝐤 1. Richer Shell. Agents (and subagents) work by intermixing code and prose/thinking. But they use static scripts or bash that run and exit and start over each tool call. That's not ideal for exploration and synthesis of data. For that, state is useful to continue building and exploring the data as you learn more. There's a reason jupyter notebooks have been popular with data scientists. 2. Keeps main agent context clean. The better context you have the better the agent will perform (duh!). This means three thing: better human input, less missing search results, and less incorrect search results. Letting the agent iterate allows it to synthesize just what is needed and nothing else. All bad paths or peeks at something that turns out to be irrelevant stays out of main agent context. 3. Stack the good ideas! People often compare late interaction search vs RLM. Or static vs dynamic languages. Or agentic search vs semantic search. But...You can just use them all together for what they're each good at. Use them all for the area they're really great for. Read the full post which has more detail about how and why.

Isaac Flath

40,212 views • 2 months ago

My RLM finally went recursive! Looking at these logs is way too addictive please send help. Notes: > Sent it 10 long wikipedia articles about deep learning (~2M context). > Asked it to find BLEU scores from Attention paper & explain MHA from these articles > RLM controlled by the new Minimax 2.5 ! Minor prompt changes were needed from the RLM paper. > Spends first 3 iterations understanding data format, works through errors, until it locates the Attention article from the mess. Like a human would use a Jupyter Notebook. > Launches subagent on only AIAYN article > This subagent launches 2 more subagents to fetch (a) BLEU score and (b) MHA (my original two-part question) > The lowest subagent returns the output using "FINAL_VAR" (i.e. it does not generate the text! Just finds the correct location in the context and sends it back as a variable) > Recursion propagates upwards > Outermost LLM recieves the RLM output, and generates the full text response. > Took 2.5 minutes walltime. Max recursion depth level was 2. 12 LLM calls in total. (This video contains cuts when the LLM is thinking/generating) > Subagents never gets to see more than 2000 characters. Only the outermost LLM gets to see the full output - it's needed to answer the final question, but its only 200-300 tokens compared to 2M! > Fully async. Code execution and subagent tasks can happen simultaneously! I feel soooo satisfied. Been some time since I've been this excited about shooting a tutorial video.

AVB

38,241 views • 4 months ago

GoogleDeepmind Chief AGI Scientist Shane Legg: AGI by 2028 He’s had the same timelines for 12 years - insane! He gives a log-normal distribution with a mode of 2025. Importantly, while he puts a 50% chance of AGI by 2028, that means there is a 30% chance of AGI in the next three years. How have his timelines been so consistent since 2011? SHANE LEGG: I first formed those beliefs around 2001 after reading Ray Kurzweil's The Age of Spiritual Machines. There were two really important points in his book that I came to believe as true: 1) One is that computational power would grow exponentially for at least a few decades. And that the quantity of data in the world would grow exponentially for a few decades. And when you have exponentially increasing quantities of computation and data, then the value of highly scalable algorithms gets higher and higher. There's a lot of incentive to make a more scalable algorithm to harness all this computing data. So I thought it would be very likely that we'll start to discover scalable algorithms to do this. And then there's a positive feedback between all these things, because if your algorithm gets better at harnessing computing data, then the value of the data and the compute goes up because it can be more effectively used. And that drives more investment in these areas. If your compute performance goes up, then the value of the data goes up because you can utilize more data. So there are positive feedback loops between all these things. 2) And then the second thing was just looking at the trends. If the scalable algorithms were to be discovered, then during the 2020s, it should be possible to start training models on significantly more data than a human would experience in a lifetime. And I figured that that would be a time where big things would start to happen that would eventually unlock AGI. And I think we're now at that first part. I think we can start training models now with the scale of the data that is beyond what a human can experience in a lifetime. So I think this is the first unlocking step. DWARKESH: If we're in 2029 and it hasn't happened yet, if there was a problem that caused it, what would be the most likely reason for that? SHANE LEGG: I don't know. At the moment, it looks to me like all the problems are likely solvable with a number of years of research.

AI Notkilleveryoneism Memes ⏸️

74,490 views • 2 years ago

I cut Fable 5 token usage 2.5x with just one change! - Before: 5.5 M tokens · 7 errors · $8.94 - After: 2.3 M tokens · 0 errors · $4.17 The final build was the same for both, but the path the agent took wildly differed. In both runs, the agent started with the same thing, i.e., it understood the backend before building anything, like: - Permission policies - Available storage buckets - Auth providers configured - How edge functions are deployed The first run used Firebase, which was built for a human dev using a dashboard. While the dev can read the above state by clicking through tabs, an agent has no dashboard. So it gathered the same info through API calls. And there's no single Firebase call that returned this info. The agent required to query multiple times, and each query over-returned. For instance, when the agent asked how sign-in is configured, Firebase also returned the entire auth surface and every method it supported. This was far more context than what it needed. And it repeated across every part of the backend it inspected. Some states (like which auth providers are active) weren't queryable at all. I provided it myself. Otherwise, the agent would have guessed. Errors further compounded the token usage. When a dev sees "permission denied," they can look at the console and figure out whether it's a rule, a path, or an unauthenticated request. Firebase returned the same string to the agent as well, and it had none of that surrounding context to debug. So it guessed again, picked the most likely cause, and rewrote code, utilizing more tokens. This Firebase setup cost me 5.5M tokens and 7 manual interventions during errors on a full-stack RAG app. But I brought that down to 2.3M tokens and 0 manual interventions by using InsForge as the backend context engineering layer (open-source and self-hostable via Docker). It provides the same primitives as Supabase/Firebase, but structures the entire information layer for agents, instead of dashboards. In one CLI call that consumed ~500 tokens, the agent saw the full backend topology before writing a single line of code. This included auth, database, storage, edge functions, model gateway, micro VMs, and deployment. Also, instead of loading the entire product surface into context on every task, four narrowly scoped skills activated only when relevant to keep cognitive load minimal. And to ensure efficient retries if needed, every CLI operation returned structured JSON with meaningful exit codes, so the agent never guessed what to do next. Here's the InsForge GitHub Repo: (don't forget to star it ⭐) The video below depicts the final build, comparing Firebase and InsForge. To dive deeper, I recently published a full walkthrough building the same RAG app on both backends and inspected them end-to-end. Read it below.

Avi Chawla

112,406 views • 20 days ago

Everyone wants agent swarms. Very few people are talking seriously enough about the context layer that makes swarms useful. Even with one agent, context is fragile. Too little context and the agent guesses. Too much context and it wastes tokens, loses focus, or reasons over irrelevant noise. The sweet spot is precise context: the right knowledge, in the right structure, at the right moment. With many agents, that challenge explodes. Each agent produces decisions, assumptions, findings, summaries, risks, and partial conclusions. Unless that knowledge becomes shared, structured, and reusable, every new agent is forced to rediscover what another agent already learned. That is not a swarm. That is a crowd. Shared context graphs are what turn agent activity into agent collaboration, and OriginTrail DKG V10 brings them to life. Was just playing with some final polishing for the V10 release, and it is really powerful to see shared context graphs where multiple agents contribute knowledge into the same connected memory, with attribution visible directly in the graph ui. That matters for three reasons. First, agents can access and build on one shared memory instead of staying trapped in isolated sessions. Second, the graph structure helps them retrieve the exact context they need, instead of stuffing everything into a prompt and hoping the model sorts it out. Third, verifiability of provenance. You can see which agent contributed each piece of knowledge, trace the source, and decide what to trust. Tokenmaxxing starts with fewer tokens, but the deeper story is coordination - agents stop reloading the world and start building on shared, verifiable context. That is the foundation for serious multi-agent work across software engineering, research, finance, operations, project management, and far beyond. The future is not more agents, it is agents working from shared, verifiable context. But the more the merrier, of course.

Jurij Skornik

11,070 views • 1 month ago

this video is the CLEAREST explanation of how claude skills + AI agents work and how to use them most people set up an AI agent and wonder why it keeps disappointing them. the context window is everything context is what the model assembles before it takes any action. think of it like everything the agent needs to read before it does anything. the quality of what goes in determines the quality of what comes out. the models are genuinely really good right now. claude and gpt are exceptional. the variable is almost always the context you give them. 1. agent.md files are mostly unnecessary every single line you put in an agent.md file gets added to every single conversation you have with your agent. a 1000 line file is around 7000 tokens burning on every run. the model already knows to use react. it can read your codebase. save the agent.md for proprietary information specific to your company that the model genuinely cannot know on its own. 2. skills are the actual unlock a skill.md file works differently. what loads into context is only the name and description, around 50 tokens. the full instructions only appear when the agent recognizes it needs that skill. so instead of 7000 tokens on every run you have 50. and the agent stays sharp because the context window stays lean. the closer you get to filling the context window the worse the agent performs, same way you perform worse when someone dumps 10 things on you at once. 3. here is how to actually build a skill the right way most people identify a workflow and immediately try to write the skill. what you want to do instead is run the workflow by hand with the agent first. walk it through every single step. tell it what to check, what good looks like, what bad looks like. correct it in real time. once you have had a full successful run from start to finish, tell the agent to review everything it just did and write the skill itself. it writes a better skill than you will because it has the full context of what actually worked in practice not in theory. 4. recursively building skills is how you go from frustrated to reliable when the skill breaks, and it will break, ask the agent exactly why it failed. it will tell you specifically what went wrong. fix it together in that same conversation. then tell it to update the skill file so that failure mode never happens again. ross mike did this five times with his youtube report generator. it now pulls from eight different data sources and runs flawlessly every single time without him touching it. 5. sub agents are something you earn not something you set up on day one start with one agent. build one workflow. turn it into one skill. once that works add another. ross mike has five sub agents now covering marketing, business, personal and more. it took months to get there and every single one exists because a workflow proved it deserved to exist. the people who set up 15 sub agents on day one and wonder why nothing works skipped all the steps that make the thing actually run. 6. your workflow is the thing the model cannot get anywhere else the model has been trained on everything. it knows more than you about most things. what it does not have is your specific process, your taste, your way of doing things. that is what skills capture. that is what makes your agent actually useful versus a generic one. downloading someone else's skill means downloading their context onto your setup and it will not work the way you want it to because it was never built around how you work. this is the clearest explanation of how agents actually work i have heard. Micky runs this stuff every single day and the results show it. full episode is now live on The Startup Ideas Podcast (SIP) 🧃 where you get your pods people charge for this sorta stuff i give away the sauce for free i just want you to win watch

GREG ISENBERG

192,408 views • 2 months ago

I just compared Claude Code vs Codex vs Cursor CLI The task was to build a Next.js app with Tailwind 4 and shadcn components to collect customer feedback and showcase it with a widget. I gave all three the same prompt and let them go for 30 minutes to see what they came up with. Claude Code with Opus 4.1 Even though I told it to set up the app in the existing project folder, it tried to create a directory for it. After I interrupted and told it not to do that, it built a demo form and landing page with no errors. I had to ask it to make the demo interactive so users could submit a testimonial and preview it. The landing page looked like AI and was pretty basic, but it worked and it was done in a fraction of the time of the others. Total tokens used: 33k Codex with GPT-5 At the end of the 30 minutes I just could not get Codex to produce a working app. It got stuck in a loop of not being able to set up Tailwind 4 and despite many, MANY, attempts, I ended up with a "failed to compile" error. Total tokens used: 102k Cursor Agent with GPT-5 This was the slowest agent by far and a couple of times I actually thought it got stuck in a loop and was close to Ctrl+C'ing to cancel it. The TUI is really nice though, especially how it shows diffs and it did eventually build a working app (after one or two slight errors that needed fixing) The demo was interactive and it had a very minimal design that looked bare but also a lot less like an "AI generated" app than the Opus 4.1 design. It also wasn't too chatty and just did what it needed to do! Code quality was on a par with Opus 4.1, but it did use 5.5x as many tokens to get there. Still cheaper than Opus on a direct comparison but not when you factor in a Claude Code Max subscription. Total tokens: 188k I'll be able to do a proper comparison and record some videos when I'm back from holiday but for now, Opus is still the more capable model out of the box and Claude Code is the more complete CLI product. It will be interesting to see how Cursor evolve their CLI though with commands and subagents because I think with GPT-5 they have a real shot at providing competition for Claude Code if they can optimise output to get similar quality with less tokens. Jump to 0:40 in the video to see the two apps. Which do you think is which? ;)

Ian Nuttall

194,949 views • 10 months ago

How can you solve complex tasks using a Large Language Model? Here is a 2-minute introduction to everything you need to know to 10x the quality of your results. Let's talk about three techniques, in order of complexity, starting with the easiest one: • In-Context Learning • Indexing + In-Context Learning • Fine-tuning In-Context Learning The team that trained GPT-3 found something they couldn't explain: You can condition a model using examples of how you want it to behave. I included an example prompt in the attached video. You can "teach" the model how you want it to interpret questions, select the correct answers, and format the results by giving a few examples. You can also give specific knowledge to the model that will be helpful when formulating answers. We call this approach "grounding the model." There's another example in the video. Indexing + In-Context Learning Unfortunately, there is a limit to how much data you can include in a prompt. We call this the "context size." One version of GPT-4 supports a context of approximately 6,000 words, while the other supports 25,000 words. Although this sounds like a lot, many applications need more than that. Imagine you wrote a book and want to build an application to answer any questions about your story. What happens if your book is longer than the context? That's where Indexing comes in. Using a model, you can turn every book passage into an embedding. These are vectors, numbers that "encode" the passage's text. You can then store these embeddings in a particular database that supports fast retrieval of these vectors. You can then turn any question into an embedding and search the database for the list of passages that are similar to that query. Instead of using the entire book to ask the model, you can now use the relevant passages as in-context information, effectively working around the context size limitation. Fine-tuning Fine-tuning can give you an extra boost to get reliable outputs from your LLM. It is, however, the most complex approach on the list. There are different approaches to fine-tuning a model with your data. A popular technique is to process your data with your LLM and use the outputs to train a new classifier that solves your specific task. Notice that here you aren't modifying the LLM. Instead, you are chaining it with your trained classifier. Another approach is to modify the parameters of the LLM using your data. Think of this as "rewiring" the model in a way that solves your particular task. The results and costs will vary depending on how many layers you want to fine-tune from the original model. Many companies think that fine-tuning is the solution to their problems. In my experience, many will benefit from exploring the other two approaches. I love explaining Machine Learning and Artificial Intelligence ideas. If you enjoy in-depth content like this, follow me Santiago so you don't miss what comes next.

Santiago

384,482 views • 3 years ago