Loading video...

Video Failed to Load

There was a problem loading this video. This could be due to a temporary network issue or the video might be unavailable.

New RLM trajectory that blew my mind! I will use this one as the main example in the YT tutorial. I passed in a CSV containing transcripts of 320 episodes of the Lex Fridman podcast and asked it to find what his first 10 ML guests had to say... about AGI. The context had 60,855,062 characters. > Main agent explored data format, understood its CSV > extracted all 320 guests, identified the first 10 ML guys (Benegio, Brockman, Goodfellow etc) > Launched parallel subagents passing just their corresponding transcripts (about 35K chars each) > Subagents performed find operations to search for AGI, read the context and returned outputs > Main agent gathered all the data, generated a summary of all AGI conversations It took 4 minutes to crunch, and the fun part is it cost me 0.2$ with Minimax-M2.5. It read 1M tokens (825K was cache hits so it was quite cheap), produced just 69K tokens (19K were reasoning). ---- My notes: - This would be basically impossible to do at this quality with a base LM. (context rot, since 99% of the data is useless) - It will cost 20x more with ReAct model (too many tasks) - It will cost 10x more with a React + Subagent model (read/write contexts instead of using symbolic variables) - I'm a happy panda. (thanks for reading)show more

AVB

11,044 subscribers

38,963 views • 4 months ago •via X (Twitter)

Anya Rossi• Live Now

Private livecam show

0 Comments

No comments available

Comments from the original post will appear here

Related Videos

RLM is the most import foundation of my Pi Harness (other than Pi of course). It's seeded with late interaction retrieval results (thanks to @lightonai for pylate). The Agent initiates it with query then.. 𝐒𝐞𝐭𝐮𝐩 A python REPL is created and seeded with: 1. Late interaction search to pre-filter. Instead of doing top 3/5/10, it's top hundreds of documents. This is set into a `context` variable. 2. Python functions are loaded in to do more searches if `context` variable isn't enough. And to make llm calls with cheaper models in parallel batches. 𝐈𝐭𝐞𝐫𝐚𝐭𝐢𝐨𝐧 𝐋𝐨𝐨𝐩 From there, an LLM iterates in the REPL based on the query. It's just like exploring in a jupyter notebook. The LLM writes prose (like a markdown cell) and code to be run in the REPL each turn. This allows the LLM to sort, filter, and synthesize information. It can fan out and ask smaller models to summarize, combine, contrast, or do anything else to documents to help it understand the data. After several turns the LLM reponds with the final answer. Either because it found the answer, or hit the budget limit. Context as a Python variable, LLM as the programmer, REPL as the runtime. 𝐖𝐡𝐲 𝐃𝐨𝐞𝐬 𝐓𝐡𝐢𝐬 𝐖𝐨𝐫𝐤 1. Richer Shell. Agents (and subagents) work by intermixing code and prose/thinking. But they use static scripts or bash that run and exit and start over each tool call. That's not ideal for exploration and synthesis of data. For that, state is useful to continue building and exploring the data as you learn more. There's a reason jupyter notebooks have been popular with data scientists. 2. Keeps main agent context clean. The better context you have the better the agent will perform (duh!). This means three thing: better human input, less missing search results, and less incorrect search results. Letting the agent iterate allows it to synthesize just what is needed and nothing else. All bad paths or peeks at something that turns out to be irrelevant stays out of main agent context. 3. Stack the good ideas! People often compare late interaction search vs RLM. Or static vs dynamic languages. Or agentic search vs semantic search. But...You can just use them all together for what they're each good at. Use them all for the area they're really great for. Read the full post which has more detail about how and why.

RLM is the most import foundation of my Pi Harness (other than Pi of course). It's seeded with late interaction retrieval results (thanks to @lightonai for pylate). The Agent initiates it with query then.. 𝐒𝐞𝐭𝐮𝐩 A python REPL is created and seeded with: 1. Late interaction search to pre-filter. Instead of doing top 3/5/10, it's top hundreds of documents. This is set into a `context` variable. 2. Python functions are loaded in to do more searches if `context` variable isn't enough. And to make llm calls with cheaper models in parallel batches. 𝐈𝐭𝐞𝐫𝐚𝐭𝐢𝐨𝐧 𝐋𝐨𝐨𝐩 From there, an LLM iterates in the REPL based on the query. It's just like exploring in a jupyter notebook. The LLM writes prose (like a markdown cell) and code to be run in the REPL each turn. This allows the LLM to sort, filter, and synthesize information. It can fan out and ask smaller models to summarize, combine, contrast, or do anything else to documents to help it understand the data. After several turns the LLM reponds with the final answer. Either because it found the answer, or hit the budget limit. Context as a Python variable, LLM as the programmer, REPL as the runtime. 𝐖𝐡𝐲 𝐃𝐨𝐞𝐬 𝐓𝐡𝐢𝐬 𝐖𝐨𝐫𝐤 1. Richer Shell. Agents (and subagents) work by intermixing code and prose/thinking. But they use static scripts or bash that run and exit and start over each tool call. That's not ideal for exploration and synthesis of data. For that, state is useful to continue building and exploring the data as you learn more. There's a reason jupyter notebooks have been popular with data scientists. 2. Keeps main agent context clean. The better context you have the better the agent will perform (duh!). This means three thing: better human input, less missing search results, and less incorrect search results. Letting the agent iterate allows it to synthesize just what is needed and nothing else. All bad paths or peeks at something that turns out to be irrelevant stays out of main agent context. 3. Stack the good ideas! People often compare late interaction search vs RLM. Or static vs dynamic languages. Or agentic search vs semantic search. But...You can just use them all together for what they're each good at. Use them all for the area they're really great for. Read the full post which has more detail about how and why.

Isaac Flath

40,212 views • 2 months ago

My RLM finally went recursive! Looking at these logs is way too addictive please send help. Notes: > Sent it 10 long wikipedia articles about deep learning (~2M context). > Asked it to find BLEU scores from Attention paper & explain MHA from these articles > RLM controlled by the new Minimax 2.5 ! Minor prompt changes were needed from the RLM paper. > Spends first 3 iterations understanding data format, works through errors, until it locates the Attention article from the mess. Like a human would use a Jupyter Notebook. > Launches subagent on only AIAYN article > This subagent launches 2 more subagents to fetch (a) BLEU score and (b) MHA (my original two-part question) > The lowest subagent returns the output using "FINAL_VAR" (i.e. it does not generate the text! Just finds the correct location in the context and sends it back as a variable) > Recursion propagates upwards > Outermost LLM recieves the RLM output, and generates the full text response. > Took 2.5 minutes walltime. Max recursion depth level was 2. 12 LLM calls in total. (This video contains cuts when the LLM is thinking/generating) > Subagents never gets to see more than 2000 characters. Only the outermost LLM gets to see the full output - it's needed to answer the final question, but its only 200-300 tokens compared to 2M! > Fully async. Code execution and subagent tasks can happen simultaneously! I feel soooo satisfied. Been some time since I've been this excited about shooting a tutorial video.

My RLM finally went recursive! Looking at these logs is way too addictive please send help. Notes: > Sent it 10 long wikipedia articles about deep learning (~2M context). > Asked it to find BLEU scores from Attention paper & explain MHA from these articles > RLM controlled by the new Minimax 2.5 ! Minor prompt changes were needed from the RLM paper. > Spends first 3 iterations understanding data format, works through errors, until it locates the Attention article from the mess. Like a human would use a Jupyter Notebook. > Launches subagent on only AIAYN article > This subagent launches 2 more subagents to fetch (a) BLEU score and (b) MHA (my original two-part question) > The lowest subagent returns the output using "FINAL_VAR" (i.e. it does not generate the text! Just finds the correct location in the context and sends it back as a variable) > Recursion propagates upwards > Outermost LLM recieves the RLM output, and generates the full text response. > Took 2.5 minutes walltime. Max recursion depth level was 2. 12 LLM calls in total. (This video contains cuts when the LLM is thinking/generating) > Subagents never gets to see more than 2000 characters. Only the outermost LLM gets to see the full output - it's needed to answer the final question, but its only 200-300 tokens compared to 2M! > Fully async. Code execution and subagent tasks can happen simultaneously! I feel soooo satisfied. Been some time since I've been this excited about shooting a tutorial video.

AVB

38,241 views • 4 months ago

Subagents have arrived in Gemini CLI! 🤖🚀 Create your own custom subagents in Gemini CLI! Subagents are specialized, expert agents that the main agent can delegate work to. 📦- Subagents have their own set of tools, MCP servers, system instructions, and context window. 🏷️- Use Agent to explicitly delegate to a subagent 🧹- Keeps the main context window clean ⚡️- Speed up work by running agents in parallel Read more in the launch blog below 👇

Subagents have arrived in Gemini CLI! 🤖🚀 Create your own custom subagents in Gemini CLI! Subagents are specialized, expert agents that the main agent can delegate work to. 📦- Subagents have their own set of tools, MCP servers, system instructions, and context window. 🏷️- Use Agent to explicitly delegate to a subagent 🧹- Keeps the main context window clean ⚡️- Speed up work by running agents in parallel Read more in the launch blog below 👇

Jack Wotherspoon

221,592 views • 2 months ago

GoogleDeepmind Chief AGI Scientist Shane Legg: AGI by 2028 He’s had the same timelines for 12 years - insane! He gives a log-normal distribution with a mode of 2025. Importantly, while he puts a 50% chance of AGI by 2028, that means there is a 30% chance of AGI in the next three years. How have his timelines been so consistent since 2011? SHANE LEGG: I first formed those beliefs around 2001 after reading Ray Kurzweil's The Age of Spiritual Machines. There were two really important points in his book that I came to believe as true: 1) One is that computational power would grow exponentially for at least a few decades. And that the quantity of data in the world would grow exponentially for a few decades. And when you have exponentially increasing quantities of computation and data, then the value of highly scalable algorithms gets higher and higher. There's a lot of incentive to make a more scalable algorithm to harness all this computing data. So I thought it would be very likely that we'll start to discover scalable algorithms to do this. And then there's a positive feedback between all these things, because if your algorithm gets better at harnessing computing data, then the value of the data and the compute goes up because it can be more effectively used. And that drives more investment in these areas. If your compute performance goes up, then the value of the data goes up because you can utilize more data. So there are positive feedback loops between all these things. 2) And then the second thing was just looking at the trends. If the scalable algorithms were to be discovered, then during the 2020s, it should be possible to start training models on significantly more data than a human would experience in a lifetime. And I figured that that would be a time where big things would start to happen that would eventually unlock AGI. And I think we're now at that first part. I think we can start training models now with the scale of the data that is beyond what a human can experience in a lifetime. So I think this is the first unlocking step. DWARKESH: If we're in 2029 and it hasn't happened yet, if there was a problem that caused it, what would be the most likely reason for that? SHANE LEGG: I don't know. At the moment, it looks to me like all the problems are likely solvable with a number of years of research.

GoogleDeepmind Chief AGI Scientist Shane Legg: AGI by 2028 He’s had the same timelines for 12 years - insane! He gives a log-normal distribution with a mode of 2025. Importantly, while he puts a 50% chance of AGI by 2028, that means there is a 30% chance of AGI in the next three years. How have his timelines been so consistent since 2011? SHANE LEGG: I first formed those beliefs around 2001 after reading Ray Kurzweil's The Age of Spiritual Machines. There were two really important points in his book that I came to believe as true: 1) One is that computational power would grow exponentially for at least a few decades. And that the quantity of data in the world would grow exponentially for a few decades. And when you have exponentially increasing quantities of computation and data, then the value of highly scalable algorithms gets higher and higher. There's a lot of incentive to make a more scalable algorithm to harness all this computing data. So I thought it would be very likely that we'll start to discover scalable algorithms to do this. And then there's a positive feedback between all these things, because if your algorithm gets better at harnessing computing data, then the value of the data and the compute goes up because it can be more effectively used. And that drives more investment in these areas. If your compute performance goes up, then the value of the data goes up because you can utilize more data. So there are positive feedback loops between all these things. 2) And then the second thing was just looking at the trends. If the scalable algorithms were to be discovered, then during the 2020s, it should be possible to start training models on significantly more data than a human would experience in a lifetime. And I figured that that would be a time where big things would start to happen that would eventually unlock AGI. And I think we're now at that first part. I think we can start training models now with the scale of the data that is beyond what a human can experience in a lifetime. So I think this is the first unlocking step. DWARKESH: If we're in 2029 and it hasn't happened yet, if there was a problem that caused it, what would be the most likely reason for that? SHANE LEGG: I don't know. At the moment, it looks to me like all the problems are likely solvable with a number of years of research.

AI Notkilleveryoneism Memes ⏸️

74,490 views • 2 years ago

🚨David Friedberg: AI is starting to identify and solve problems on its own “I'll give you a science corner example: there's this Evo 2 model that they publish at the Arc Institute, which Patrick Collison, you know, is the main funder and chairman.” “So that Evo 2 model, they just ingested all the DNA data they could find in the world.” “Trillions and trillions of base paired data that they ingested and then they looked at patterns in DNA. And that's it.” “They had no context for what the DNA represented, they had no context for the concept of genes, none of the structured understanding of what that DNA does, what it is, and you know what it did?” “They fed in the BRCA gene variant and the thing output a warning saying, ‘I think that this is a pathogenic variant to DNA,’ without having any context.” “This is the breast cancer allele.” “And it didn't have any knowledge and it wasn't trained on that at all.” “It had no knowledge that there are pathogenic variants for cancer, and it identified that this was a genetic variant that can cause some sort of pathogenic outcome in the organism.” “That's a great example where there's a lack of understanding at the human level on what really drives some of the patterns in nature, the patterns in society, the patterns in behavior that are kind of emergent phenomena perhaps, that these AI models are starting to identify.”

🚨David Friedberg: AI is starting to identify and solve problems on its own “I'll give you a science corner example: there's this Evo 2 model that they publish at the Arc Institute, which Patrick Collison, you know, is the main funder and chairman.” “So that Evo 2 model, they just ingested all the DNA data they could find in the world.” “Trillions and trillions of base paired data that they ingested and then they looked at patterns in DNA. And that's it.” “They had no context for what the DNA represented, they had no context for the concept of genes, none of the structured understanding of what that DNA does, what it is, and you know what it did?” “They fed in the BRCA gene variant and the thing output a warning saying, ‘I think that this is a pathogenic variant to DNA,’ without having any context.” “This is the breast cancer allele.” “And it didn't have any knowledge and it wasn't trained on that at all.” “It had no knowledge that there are pathogenic variants for cancer, and it identified that this was a genetic variant that can cause some sort of pathogenic outcome in the organism.” “That's a great example where there's a lack of understanding at the human level on what really drives some of the patterns in nature, the patterns in society, the patterns in behavior that are kind of emergent phenomena perhaps, that these AI models are starting to identify.”

The All-In Podcast

79,717 views • 11 months ago

Introducing Claude Code Hook - Context Timeline (Saving this to try later) Install with: npx claude-code-templates@latest --hook monitoring/context-timeline Managing the context window and the subagents running in Claude Code is hard to keep track of That's why I built this hook... It starts the moment you open a session and shows a timeline with the main agent's context window and how subagents start working in their own separate context Every subagent you have running will show up in real time This way you can manage the context and the subagents you run, and see everything in a much simpler way than in the console

Introducing Claude Code Hook - Context Timeline (Saving this to try later) Install with: npx claude-code-templates@latest --hook monitoring/context-timeline Managing the context window and the subagents running in Claude Code is hard to keep track of That's why I built this hook... It starts the moment you open a session and shows a timeline with the main agent's context window and how subagents start working in their own separate context Every subagent you have running will show up in real time This way you can manage the context and the subagents you run, and see everything in a much simpler way than in the console

Daniel San

51,228 views • 2 months ago

$Everyone's sleeping on MiniMax. Again. They just shipped M3. The first open-weights model to combine frontier coding, 1M context, and native multimodality in one drop. I plugged it into Claude Code this morning. Pasted a design from Dribbble. Watched M3 write production-ready React code in one session. At the agency, I just replaced Opus 4.8 with M3 for 80% of our coding tasks. The output is the same and we are running everything at a fraction of the cost. Open infrastructure is the future.$

Everyone's sleeping on MiniMax. Again. They just shipped M3. The first open-weights model to combine frontier coding, 1M context, and native multimodality in one drop. I plugged it into Claude Code this morning. Pasted a design from Dribbble. Watched M3 write production-ready React code in one session. At the agency, I just replaced Opus 4.8 with M3 for 80% of our coding tasks. The output is the same and we are running everything at a fraction of the cost. Open infrastructure is the future.

Prajwal Tomar

12,904 views • 26 days ago

This is CRAZY. You have to watch the video below. I uploaded all of my weekly newsletters to Google's notebook LM. I clicked a button and it generated a 22 minute audio podcast about all my newsletters. This was completely AI generated. (AI = Artificial Intelligence) It took a couple minutes to read 60+ pages of my newsletters and generate this podcast-style conversation. Absolutely mind blowing.

This is CRAZY. You have to watch the video below. I uploaded all of my weekly newsletters to Google's notebook LM. I clicked a button and it generated a 22 minute audio podcast about all my newsletters. This was completely AI generated. (AI = Artificial Intelligence) It took a couple minutes to read 60+ pages of my newsletters and generate this podcast-style conversation. Absolutely mind blowing.

Lou Stagner (Golf Stat Pro)

80,650 views • 1 year ago

I cut Fable 5 token usage 2.5x with just one change! - Before: 5.5 M tokens · 7 errors · $8.94 - After: 2.3 M tokens · 0 errors · $4.17 The final build was the same for both, but the path the agent took wildly differed. In both runs, the agent started with the same thing, i.e., it understood the backend before building anything, like: - Permission policies - Available storage buckets - Auth providers configured - How edge functions are deployed The first run used Firebase, which was built for a human dev using a dashboard. While the dev can read the above state by clicking through tabs, an agent has no dashboard. So it gathered the same info through API calls. And there's no single Firebase call that returned this info. The agent required to query multiple times, and each query over-returned. For instance, when the agent asked how sign-in is configured, Firebase also returned the entire auth surface and every method it supported. This was far more context than what it needed. And it repeated across every part of the backend it inspected. Some states (like which auth providers are active) weren't queryable at all. I provided it myself. Otherwise, the agent would have guessed. Errors further compounded the token usage. When a dev sees "permission denied," they can look at the console and figure out whether it's a rule, a path, or an unauthenticated request. Firebase returned the same string to the agent as well, and it had none of that surrounding context to debug. So it guessed again, picked the most likely cause, and rewrote code, utilizing more tokens. This Firebase setup cost me 5.5M tokens and 7 manual interventions during errors on a full-stack RAG app. But I brought that down to 2.3M tokens and 0 manual interventions by using InsForge as the backend context engineering layer (open-source and self-hostable via Docker). It provides the same primitives as Supabase/Firebase, but structures the entire information layer for agents, instead of dashboards. In one CLI call that consumed ~500 tokens, the agent saw the full backend topology before writing a single line of code. This included auth, database, storage, edge functions, model gateway, micro VMs, and deployment. Also, instead of loading the entire product surface into context on every task, four narrowly scoped skills activated only when relevant to keep cognitive load minimal. And to ensure efficient retries if needed, every CLI operation returned structured JSON with meaningful exit codes, so the agent never guessed what to do next. Here's the InsForge GitHub Repo: (don't forget to star it ⭐) The video below depicts the final build, comparing Firebase and InsForge. To dive deeper, I recently published a full walkthrough building the same RAG app on both backends and inspected them end-to-end. Read it below.

I cut Fable 5 token usage 2.5x with just one change! - Before: 5.5 M tokens · 7 errors · $8.94 - After: 2.3 M tokens · 0 errors · $4.17 The final build was the same for both, but the path the agent took wildly differed. In both runs, the agent started with the same thing, i.e., it understood the backend before building anything, like: - Permission policies - Available storage buckets - Auth providers configured - How edge functions are deployed The first run used Firebase, which was built for a human dev using a dashboard. While the dev can read the above state by clicking through tabs, an agent has no dashboard. So it gathered the same info through API calls. And there's no single Firebase call that returned this info. The agent required to query multiple times, and each query over-returned. For instance, when the agent asked how sign-in is configured, Firebase also returned the entire auth surface and every method it supported. This was far more context than what it needed. And it repeated across every part of the backend it inspected. Some states (like which auth providers are active) weren't queryable at all. I provided it myself. Otherwise, the agent would have guessed. Errors further compounded the token usage. When a dev sees "permission denied," they can look at the console and figure out whether it's a rule, a path, or an unauthenticated request. Firebase returned the same string to the agent as well, and it had none of that surrounding context to debug. So it guessed again, picked the most likely cause, and rewrote code, utilizing more tokens. This Firebase setup cost me 5.5M tokens and 7 manual interventions during errors on a full-stack RAG app. But I brought that down to 2.3M tokens and 0 manual interventions by using InsForge as the backend context engineering layer (open-source and self-hostable via Docker). It provides the same primitives as Supabase/Firebase, but structures the entire information layer for agents, instead of dashboards. In one CLI call that consumed ~500 tokens, the agent saw the full backend topology before writing a single line of code. This included auth, database, storage, edge functions, model gateway, micro VMs, and deployment. Also, instead of loading the entire product surface into context on every task, four narrowly scoped skills activated only when relevant to keep cognitive load minimal. And to ensure efficient retries if needed, every CLI operation returned structured JSON with meaningful exit codes, so the agent never guessed what to do next. Here's the InsForge GitHub Repo: (don't forget to star it ⭐) The video below depicts the final build, comparing Firebase and InsForge. To dive deeper, I recently published a full walkthrough building the same RAG app on both backends and inspected them end-to-end. Read it below.

Avi Chawla

112,406 views • 20 days ago

I love the game of Boggle. This demo showcases our Gemini 2.0 Pro model’s coding abilities in AI Studio. It is mind boggling to think that it can write the full piece of code, including all the right data structures and search algorithms to find all valid words on a Boggle board from a relatively simple prompt asking it to do so. As a computer scientist, I'm also happy that it got its data structures right on the first trie. Discombobulating!

I love the game of Boggle. This demo showcases our Gemini 2.0 Pro model’s coding abilities in AI Studio. It is mind boggling to think that it can write the full piece of code, including all the right data structures and search algorithms to find all valid words on a Boggle board from a relatively simple prompt asking it to do so. As a computer scientist, I'm also happy that it got its data structures right on the first trie. Discombobulating!

Jeff Dean

122,224 views • 1 year ago

Big moment for Postgres! AI agents broke the idea of what a database is supposed to do. Traditional databases were built for humans, and Agents broke that model. - They branch endlessly. - They run ten experiments at once. - They need isolation, context, memory, structured reasoning, and safe sandboxes. Letting agents touch production systems is terrifying because the old model of Postgres was never built for this kind of behavior. Agentic Postgres is an agent-ready version of Postgres by TimescaleDB (by Tiger Data) that solves this. I think it is one of the biggest upgrades to the Agent stack this year and Tiger Data is working with me on this post to share what they did. Some key features: > It instantly creates branches of an entire database, which is perfect for parallel agent evals, safe experiments, migrations, or isolated testing. Forks take seconds and cost almost nothing. > It comes with a built-in MCP server, which agents can use to get schema guidance, best practices, and safe, structured access to Postgres. This is also helpful to run migrations with a real understanding. > It comes with actual hybrid search (vector search and BM25), so Agents can retrieve data directly inside the database. > The database is Memory native. This gives a persistent context for Agents to evolve. This is one of the first times I have seen Postgres feel ready for the AI native era.

Big moment for Postgres! AI agents broke the idea of what a database is supposed to do. Traditional databases were built for humans, and Agents broke that model. - They branch endlessly. - They run ten experiments at once. - They need isolation, context, memory, structured reasoning, and safe sandboxes. Letting agents touch production systems is terrifying because the old model of Postgres was never built for this kind of behavior. Agentic Postgres is an agent-ready version of Postgres by TimescaleDB (by Tiger Data) that solves this. I think it is one of the biggest upgrades to the Agent stack this year and Tiger Data is working with me on this post to share what they did. Some key features: > It instantly creates branches of an entire database, which is perfect for parallel agent evals, safe experiments, migrations, or isolated testing. Forks take seconds and cost almost nothing. > It comes with a built-in MCP server, which agents can use to get schema guidance, best practices, and safe, structured access to Postgres. This is also helpful to run migrations with a real understanding. > It comes with actual hybrid search (vector search and BM25), so Agents can retrieve data directly inside the database. > The database is Memory native. This gives a persistent context for Agents to evolve. This is one of the first times I have seen Postgres feel ready for the AI native era.

Avi Chawla

94,290 views • 7 months ago

I just inscribed the CRAWL module to Termina! You can now crawl the OCI to find Bitmap districts with reinscriptions or children! GO TO: TRY: > crawl bitmap This will take a long time, but don't worry! I already did it for you. Bring your own data: Let's work together to keep this dataset updated. You can crawl ranges (for example, of your own bitmaps) and submit a pull request. It took me about 7 hours to crawl all of bitmap with about 15 windows open each running on a range. It would've taken about 10 days with just one window! Definitely some improvements that can be made in performance for the future, but so now we have at least made it possible! And it is mind-blowing what we will be able to do with this data. Stay tuned later this week to find out how we can piggyback off this data to build powerful tools on-chain!

I just inscribed the CRAWL module to Termina! You can now crawl the OCI to find Bitmap districts with reinscriptions or children! GO TO: TRY: > crawl bitmap This will take a long time, but don't worry! I already did it for you. Bring your own data: Let's work together to keep this dataset updated. You can crawl ranges (for example, of your own bitmaps) and submit a pull request. It took me about 7 hours to crawl all of bitmap with about 15 windows open each running on a range. It would've taken about 10 days with just one window! Definitely some improvements that can be made in performance for the future, but so now we have at least made it possible! And it is mind-blowing what we will be able to do with this data. Stay tuned later this week to find out how we can piggyback off this data to build powerful tools on-chain!

bitoshi blockamoto 🧱 BITMAP 🟧

23,686 views • 8 months ago

Everyone wants agent swarms. Very few people are talking seriously enough about the context layer that makes swarms useful. Even with one agent, context is fragile. Too little context and the agent guesses. Too much context and it wastes tokens, loses focus, or reasons over irrelevant noise. The sweet spot is precise context: the right knowledge, in the right structure, at the right moment. With many agents, that challenge explodes. Each agent produces decisions, assumptions, findings, summaries, risks, and partial conclusions. Unless that knowledge becomes shared, structured, and reusable, every new agent is forced to rediscover what another agent already learned. That is not a swarm. That is a crowd. Shared context graphs are what turn agent activity into agent collaboration, and OriginTrail DKG V10 brings them to life. Was just playing with some final polishing for the V10 release, and it is really powerful to see shared context graphs where multiple agents contribute knowledge into the same connected memory, with attribution visible directly in the graph ui. That matters for three reasons. First, agents can access and build on one shared memory instead of staying trapped in isolated sessions. Second, the graph structure helps them retrieve the exact context they need, instead of stuffing everything into a prompt and hoping the model sorts it out. Third, verifiability of provenance. You can see which agent contributed each piece of knowledge, trace the source, and decide what to trust. Tokenmaxxing starts with fewer tokens, but the deeper story is coordination - agents stop reloading the world and start building on shared, verifiable context. That is the foundation for serious multi-agent work across software engineering, research, finance, operations, project management, and far beyond. The future is not more agents, it is agents working from shared, verifiable context. But the more the merrier, of course.

Everyone wants agent swarms. Very few people are talking seriously enough about the context layer that makes swarms useful. Even with one agent, context is fragile. Too little context and the agent guesses. Too much context and it wastes tokens, loses focus, or reasons over irrelevant noise. The sweet spot is precise context: the right knowledge, in the right structure, at the right moment. With many agents, that challenge explodes. Each agent produces decisions, assumptions, findings, summaries, risks, and partial conclusions. Unless that knowledge becomes shared, structured, and reusable, every new agent is forced to rediscover what another agent already learned. That is not a swarm. That is a crowd. Shared context graphs are what turn agent activity into agent collaboration, and OriginTrail DKG V10 brings them to life. Was just playing with some final polishing for the V10 release, and it is really powerful to see shared context graphs where multiple agents contribute knowledge into the same connected memory, with attribution visible directly in the graph ui. That matters for three reasons. First, agents can access and build on one shared memory instead of staying trapped in isolated sessions. Second, the graph structure helps them retrieve the exact context they need, instead of stuffing everything into a prompt and hoping the model sorts it out. Third, verifiability of provenance. You can see which agent contributed each piece of knowledge, trace the source, and decide what to trust. Tokenmaxxing starts with fewer tokens, but the deeper story is coordination - agents stop reloading the world and start building on shared, verifiable context. That is the foundation for serious multi-agent work across software engineering, research, finance, operations, project management, and far beyond. The future is not more agents, it is agents working from shared, verifiable context. But the more the merrier, of course.

Jurij Skornik

11,070 views • 1 month ago

I got a smart meter recently and saw you can download a HDF file with the data so I had the idea of writing a script that could parse that and show the data in a useful manner. However I discovered that someone has already done this and done a really good job on it. The video explains how it works. In simple terms it uses the ESB smart meter data and shows a breakdown of how much data you are using and when and also recommends plans and estimates what each one would cost based on the data. The tool is available at easier to read on Laptop or Tablet or then your phone to Landscape.

I got a smart meter recently and saw you can download a HDF file with the data so I had the idea of writing a script that could parse that and show the data in a useful manner. However I discovered that someone has already done this and done a really good job on it. The video explains how it works. In simple terms it uses the ESB smart meter data and shows a breakdown of how much data you are using and when and also recommends plans and estimates what each one would cost based on the data. The tool is available at easier to read on Laptop or Tablet or then your phone to Landscape.

Carlow Weather

298,521 views • 2 years ago

this video is the CLEAREST explanation of how claude skills + AI agents work and how to use them most people set up an AI agent and wonder why it keeps disappointing them. the context window is everything context is what the model assembles before it takes any action. think of it like everything the agent needs to read before it does anything. the quality of what goes in determines the quality of what comes out. the models are genuinely really good right now. claude and gpt are exceptional. the variable is almost always the context you give them. 1. agent.md files are mostly unnecessary every single line you put in an agent.md file gets added to every single conversation you have with your agent. a 1000 line file is around 7000 tokens burning on every run. the model already knows to use react. it can read your codebase. save the agent.md for proprietary information specific to your company that the model genuinely cannot know on its own. 2. skills are the actual unlock a skill.md file works differently. what loads into context is only the name and description, around 50 tokens. the full instructions only appear when the agent recognizes it needs that skill. so instead of 7000 tokens on every run you have 50. and the agent stays sharp because the context window stays lean. the closer you get to filling the context window the worse the agent performs, same way you perform worse when someone dumps 10 things on you at once. 3. here is how to actually build a skill the right way most people identify a workflow and immediately try to write the skill. what you want to do instead is run the workflow by hand with the agent first. walk it through every single step. tell it what to check, what good looks like, what bad looks like. correct it in real time. once you have had a full successful run from start to finish, tell the agent to review everything it just did and write the skill itself. it writes a better skill than you will because it has the full context of what actually worked in practice not in theory. 4. recursively building skills is how you go from frustrated to reliable when the skill breaks, and it will break, ask the agent exactly why it failed. it will tell you specifically what went wrong. fix it together in that same conversation. then tell it to update the skill file so that failure mode never happens again. ross mike did this five times with his youtube report generator. it now pulls from eight different data sources and runs flawlessly every single time without him touching it. 5. sub agents are something you earn not something you set up on day one start with one agent. build one workflow. turn it into one skill. once that works add another. ross mike has five sub agents now covering marketing, business, personal and more. it took months to get there and every single one exists because a workflow proved it deserved to exist. the people who set up 15 sub agents on day one and wonder why nothing works skipped all the steps that make the thing actually run. 6. your workflow is the thing the model cannot get anywhere else the model has been trained on everything. it knows more than you about most things. what it does not have is your specific process, your taste, your way of doing things. that is what skills capture. that is what makes your agent actually useful versus a generic one. downloading someone else's skill means downloading their context onto your setup and it will not work the way you want it to because it was never built around how you work. this is the clearest explanation of how agents actually work i have heard. Micky runs this stuff every single day and the results show it. full episode is now live on The Startup Ideas Podcast (SIP) 🧃 where you get your pods people charge for this sorta stuff i give away the sauce for free i just want you to win watch

this video is the CLEAREST explanation of how claude skills + AI agents work and how to use them most people set up an AI agent and wonder why it keeps disappointing them. the context window is everything context is what the model assembles before it takes any action. think of it like everything the agent needs to read before it does anything. the quality of what goes in determines the quality of what comes out. the models are genuinely really good right now. claude and gpt are exceptional. the variable is almost always the context you give them. 1. agent.md files are mostly unnecessary every single line you put in an agent.md file gets added to every single conversation you have with your agent. a 1000 line file is around 7000 tokens burning on every run. the model already knows to use react. it can read your codebase. save the agent.md for proprietary information specific to your company that the model genuinely cannot know on its own. 2. skills are the actual unlock a skill.md file works differently. what loads into context is only the name and description, around 50 tokens. the full instructions only appear when the agent recognizes it needs that skill. so instead of 7000 tokens on every run you have 50. and the agent stays sharp because the context window stays lean. the closer you get to filling the context window the worse the agent performs, same way you perform worse when someone dumps 10 things on you at once. 3. here is how to actually build a skill the right way most people identify a workflow and immediately try to write the skill. what you want to do instead is run the workflow by hand with the agent first. walk it through every single step. tell it what to check, what good looks like, what bad looks like. correct it in real time. once you have had a full successful run from start to finish, tell the agent to review everything it just did and write the skill itself. it writes a better skill than you will because it has the full context of what actually worked in practice not in theory. 4. recursively building skills is how you go from frustrated to reliable when the skill breaks, and it will break, ask the agent exactly why it failed. it will tell you specifically what went wrong. fix it together in that same conversation. then tell it to update the skill file so that failure mode never happens again. ross mike did this five times with his youtube report generator. it now pulls from eight different data sources and runs flawlessly every single time without him touching it. 5. sub agents are something you earn not something you set up on day one start with one agent. build one workflow. turn it into one skill. once that works add another. ross mike has five sub agents now covering marketing, business, personal and more. it took months to get there and every single one exists because a workflow proved it deserved to exist. the people who set up 15 sub agents on day one and wonder why nothing works skipped all the steps that make the thing actually run. 6. your workflow is the thing the model cannot get anywhere else the model has been trained on everything. it knows more than you about most things. what it does not have is your specific process, your taste, your way of doing things. that is what skills capture. that is what makes your agent actually useful versus a generic one. downloading someone else's skill means downloading their context onto your setup and it will not work the way you want it to because it was never built around how you work. this is the clearest explanation of how agents actually work i have heard. Micky runs this stuff every single day and the results show it. full episode is now live on The Startup Ideas Podcast (SIP) 🧃 where you get your pods people charge for this sorta stuff i give away the sauce for free i just want you to win watch

GREG ISENBERG

192,408 views • 2 months ago

$I just compared Claude Code vs Codex vs Cursor CLI The task was to build a Next.js app with Tailwind 4 and shadcn components to collect customer feedback and showcase it with a widget. I gave all three the same prompt and let them go for 30 minutes to see what they came up with. Claude Code with Opus 4.1 Even though I told it to set up the app in the existing project folder, it tried to create a directory for it. After I interrupted and told it not to do that, it built a demo form and landing page with no errors. I had to ask it to make the demo interactive so users could submit a testimonial and preview it. The landing page looked like AI and was pretty basic, but it worked and it was done in a fraction of the time of the others. Total tokens used: 33k Codex with GPT-5 At the end of the 30 minutes I just could not get Codex to produce a working app. It got stuck in a loop of not being able to set up Tailwind 4 and despite many, MANY, attempts, I ended up with a "failed to compile" error. Total tokens used: 102k Cursor Agent with GPT-5 This was the slowest agent by far and a couple of times I actually thought it got stuck in a loop and was close to Ctrl+C'ing to cancel it. The TUI is really nice though, especially how it shows diffs and it did eventually build a working app (after one or two slight errors that needed fixing) The demo was interactive and it had a very minimal design that looked bare but also a lot less like an "AI generated" app than the Opus 4.1 design. It also wasn't too chatty and just did what it needed to do! Code quality was on a par with Opus 4.1, but it did use 5.5x as many tokens to get there. Still cheaper than Opus on a direct comparison but not when you factor in a Claude Code Max subscription. Total tokens: 188k I'll be able to do a proper comparison and record some videos when I'm back from holiday but for now, Opus is still the more capable model out of the box and Claude Code is the more complete CLI product. It will be interesting to see how Cursor evolve their CLI though with commands and subagents because I think with GPT-5 they have a real shot at providing competition for Claude Code if they can optimise output to get similar quality with less tokens. Jump to 0:40 in the video to see the two apps. Which do you think is which? ;)$

I just compared Claude Code vs Codex vs Cursor CLI The task was to build a Next.js app with Tailwind 4 and shadcn components to collect customer feedback and showcase it with a widget. I gave all three the same prompt and let them go for 30 minutes to see what they came up with. Claude Code with Opus 4.1 Even though I told it to set up the app in the existing project folder, it tried to create a directory for it. After I interrupted and told it not to do that, it built a demo form and landing page with no errors. I had to ask it to make the demo interactive so users could submit a testimonial and preview it. The landing page looked like AI and was pretty basic, but it worked and it was done in a fraction of the time of the others. Total tokens used: 33k Codex with GPT-5 At the end of the 30 minutes I just could not get Codex to produce a working app. It got stuck in a loop of not being able to set up Tailwind 4 and despite many, MANY, attempts, I ended up with a "failed to compile" error. Total tokens used: 102k Cursor Agent with GPT-5 This was the slowest agent by far and a couple of times I actually thought it got stuck in a loop and was close to Ctrl+C'ing to cancel it. The TUI is really nice though, especially how it shows diffs and it did eventually build a working app (after one or two slight errors that needed fixing) The demo was interactive and it had a very minimal design that looked bare but also a lot less like an "AI generated" app than the Opus 4.1 design. It also wasn't too chatty and just did what it needed to do! Code quality was on a par with Opus 4.1, but it did use 5.5x as many tokens to get there. Still cheaper than Opus on a direct comparison but not when you factor in a Claude Code Max subscription. Total tokens: 188k I'll be able to do a proper comparison and record some videos when I'm back from holiday but for now, Opus is still the more capable model out of the box and Claude Code is the more complete CLI product. It will be interesting to see how Cursor evolve their CLI though with commands and subagents because I think with GPT-5 they have a real shot at providing competition for Claude Code if they can optimise output to get similar quality with less tokens. Jump to 0:40 in the video to see the two apps. Which do you think is which? ;)

Ian Nuttall

194,949 views • 10 months ago

How can you solve complex tasks using a Large Language Model? Here is a 2-minute introduction to everything you need to know to 10x the quality of your results. Let's talk about three techniques, in order of complexity, starting with the easiest one: • In-Context Learning • Indexing + In-Context Learning • Fine-tuning In-Context Learning The team that trained GPT-3 found something they couldn't explain: You can condition a model using examples of how you want it to behave. I included an example prompt in the attached video. You can "teach" the model how you want it to interpret questions, select the correct answers, and format the results by giving a few examples. You can also give specific knowledge to the model that will be helpful when formulating answers. We call this approach "grounding the model." There's another example in the video. Indexing + In-Context Learning Unfortunately, there is a limit to how much data you can include in a prompt. We call this the "context size." One version of GPT-4 supports a context of approximately 6,000 words, while the other supports 25,000 words. Although this sounds like a lot, many applications need more than that. Imagine you wrote a book and want to build an application to answer any questions about your story. What happens if your book is longer than the context? That's where Indexing comes in. Using a model, you can turn every book passage into an embedding. These are vectors, numbers that "encode" the passage's text. You can then store these embeddings in a particular database that supports fast retrieval of these vectors. You can then turn any question into an embedding and search the database for the list of passages that are similar to that query. Instead of using the entire book to ask the model, you can now use the relevant passages as in-context information, effectively working around the context size limitation. Fine-tuning Fine-tuning can give you an extra boost to get reliable outputs from your LLM. It is, however, the most complex approach on the list. There are different approaches to fine-tuning a model with your data. A popular technique is to process your data with your LLM and use the outputs to train a new classifier that solves your specific task. Notice that here you aren't modifying the LLM. Instead, you are chaining it with your trained classifier. Another approach is to modify the parameters of the LLM using your data. Think of this as "rewiring" the model in a way that solves your particular task. The results and costs will vary depending on how many layers you want to fine-tune from the original model. Many companies think that fine-tuning is the solution to their problems. In my experience, many will benefit from exploring the other two approaches. I love explaining Machine Learning and Artificial Intelligence ideas. If you enjoy in-depth content like this, follow me Santiago so you don't miss what comes next.

How can you solve complex tasks using a Large Language Model? Here is a 2-minute introduction to everything you need to know to 10x the quality of your results. Let's talk about three techniques, in order of complexity, starting with the easiest one: • In-Context Learning • Indexing + In-Context Learning • Fine-tuning In-Context Learning The team that trained GPT-3 found something they couldn't explain: You can condition a model using examples of how you want it to behave. I included an example prompt in the attached video. You can "teach" the model how you want it to interpret questions, select the correct answers, and format the results by giving a few examples. You can also give specific knowledge to the model that will be helpful when formulating answers. We call this approach "grounding the model." There's another example in the video. Indexing + In-Context Learning Unfortunately, there is a limit to how much data you can include in a prompt. We call this the "context size." One version of GPT-4 supports a context of approximately 6,000 words, while the other supports 25,000 words. Although this sounds like a lot, many applications need more than that. Imagine you wrote a book and want to build an application to answer any questions about your story. What happens if your book is longer than the context? That's where Indexing comes in. Using a model, you can turn every book passage into an embedding. These are vectors, numbers that "encode" the passage's text. You can then store these embeddings in a particular database that supports fast retrieval of these vectors. You can then turn any question into an embedding and search the database for the list of passages that are similar to that query. Instead of using the entire book to ask the model, you can now use the relevant passages as in-context information, effectively working around the context size limitation. Fine-tuning Fine-tuning can give you an extra boost to get reliable outputs from your LLM. It is, however, the most complex approach on the list. There are different approaches to fine-tuning a model with your data. A popular technique is to process your data with your LLM and use the outputs to train a new classifier that solves your specific task. Notice that here you aren't modifying the LLM. Instead, you are chaining it with your trained classifier. Another approach is to modify the parameters of the LLM using your data. Think of this as "rewiring" the model in a way that solves your particular task. The results and costs will vary depending on how many layers you want to fine-tune from the original model. Many companies think that fine-tuning is the solution to their problems. In my experience, many will benefit from exploring the other two approaches. I love explaining Machine Learning and Artificial Intelligence ideas. If you enjoy in-depth content like this, follow me Santiago so you don't miss what comes next.

Santiago

384,482 views • 3 years ago

$ANTHROPIC JUST QUIETLY SHIPPED A FEATURE THAT LETS CLAUDE SPAWN A WHOLE TEAM OF AGENTS THAT MESSAGE EACH OTHER AND REVIEW EACH OTHER'S WORK. It's a Claude Code feature called agent teams. The team lead spawns multiple agents that share a task list and message each other directly, not subagents reporting back, actual peers. In the demo a QA agent caught three bugs, sent the work back to the front-end and back-end devs, they fixed it, app shipped in one pass. How to run it: 1. Enable it. Needs Claude Code v2.1.32+. Add to settings.json: "env": { "CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS": "1" }. Or paste that to Claude and say "add this to my settings." Restart. 2. Prompt in plain English. Start with a goal (agents wake with zero context), then "create a team of 3 using Sonnet," describe each role, its deliverable, and who it messages when done. 3. The rules: each agent owns its own files, define exact outputs, name who talks to who, keep it to 3-5 agents. Use it for complex work with separate parts running in parallel. Skip it for simple or sequential tasks, teams cost 3-4x the tokens. Bookmark this.$

ANTHROPIC JUST QUIETLY SHIPPED A FEATURE THAT LETS CLAUDE SPAWN A WHOLE TEAM OF AGENTS THAT MESSAGE EACH OTHER AND REVIEW EACH OTHER'S WORK. It's a Claude Code feature called agent teams. The team lead spawns multiple agents that share a task list and message each other directly, not subagents reporting back, actual peers. In the demo a QA agent caught three bugs, sent the work back to the front-end and back-end devs, they fixed it, app shipped in one pass. How to run it: 1. Enable it. Needs Claude Code v2.1.32+. Add to settings.json: "env": { "CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS": "1" }. Or paste that to Claude and say "add this to my settings." Restart. 2. Prompt in plain English. Start with a goal (agents wake with zero context), then "create a team of 3 using Sonnet," describe each role, its deliverable, and who it messages when done. 3. The rules: each agent owns its own files, define exact outputs, name who talks to who, keep it to 3-5 agents. Use it for complex work with separate parts running in parallel. Skip it for simple or sequential tasks, teams cost 3-4x the tokens. Bookmark this.

Yarchi

456,439 views • 15 days ago

I cant believe this guy just made a permanent solution to context bloat and open sourced it all! when we tested this tool (Context+) for solving an issue on the OpenCode repository, the agent using this tool used ~6.5k fewer tokens, found the code and fixed it in half the time! the results were surprising: 6 to 10k tokens saved per prompt, completed task in ~2 minutes while the agent running without the tool took ~4 mins for the same and got stuck in loops bro built an entire beast by using all the modern tools that we could think of: undo trees, semantic search by meaning (by haskellforall), advanced refactoring, blast radius, advanced file context trees, restore points... i can keep going on semantic code search and context trees are the future of agentic coding and this tool proves it the feature i loved the most is semantic search and how it gets things done 2x faster with least possible tokens it makes an agent that actually knows what it’s doing and not just guessing, it makes meaning from your code similar to RAG. if you aren't optimizing your context, you are just burning money the developer says this tool is still under development, it can have unexpected behavior and the docs need updates but the video shows the reality of how fast it can be github: get here:

I cant believe this guy just made a permanent solution to context bloat and open sourced it all! when we tested this tool (Context+) for solving an issue on the OpenCode repository, the agent using this tool used ~6.5k fewer tokens, found the code and fixed it in half the time! the results were surprising: 6 to 10k tokens saved per prompt, completed task in ~2 minutes while the agent running without the tool took ~4 mins for the same and got stuck in loops bro built an entire beast by using all the modern tools that we could think of: undo trees, semantic search by meaning (by haskellforall), advanced refactoring, blast radius, advanced file context trees, restore points... i can keep going on semantic code search and context trees are the future of agentic coding and this tool proves it the feature i loved the most is semantic search and how it gets things done 2x faster with least possible tokens it makes an agent that actually knows what it’s doing and not just guessing, it makes meaning from your code similar to RAG. if you aren't optimizing your context, you are just burning money the developer says this tool is still under development, it can have unexpected behavior and the docs need updates but the video shows the reality of how fast it can be github: get here:

forloop

225,912 views • 4 months ago

.Scott Nolan’s biggest lesson from joining SpaceX when it was just 30 people was the importance of trading cost for time: “On day 1 I read the employee handbook. At the top of page one was just a single line in bold.” “It said: This is not a science experiment.” “We were not trying to do any crazy new technology, it was about getting rockets to be much cheaper.” Joe Lonsdale: “One of the famous Elon Musk quotes I’ve heard recently is, ‘The cost of time is greater than the cost of cost.’” “Back then though, the cost of cost was pretty high right because you didn’t have much money?” Scott: “It was always a cost trade with time.” Via American Optimist

.Scott Nolan’s biggest lesson from joining SpaceX when it was just 30 people was the importance of trading cost for time: “On day 1 I read the employee handbook. At the top of page one was just a single line in bold.” “It said: This is not a science experiment.” “We were not trying to do any crazy new technology, it was about getting rockets to be much cheaper.” Joe Lonsdale: “One of the famous Elon Musk quotes I’ve heard recently is, ‘The cost of time is greater than the cost of cost.’” “Back then though, the cost of cost was pretty high right because you didn’t have much money?” Scott: “It was always a cost trade with time.” Via American Optimist

Jawwwn

148,369 views • 23 days ago