Video wird geladen...

Video konnte nicht geladen werden

Zur Startseite

MIT HANDED ITS DEEP LEARNING COURSE TO A FRONTIER-LAB ENGINEER FOR 68 MINUTES BECAUSE 90% OF PEOPLE SHIPPING AI CODE CAN'T EXPLAIN HOW THE MODEL ACTUALLY WORKS This is Maxime Labonne. He runs post-training at Liquid AI and wrote the LLM Engineer's Handbook. MIT gave him the room to...

235,980 Aufrufe • vor 1 Monat •via X (Twitter)

0 Kommentare

Keine Kommentare verfügbar

Kommentare vom Original-Post werden hier angezeigt

Ähnliche Videos

this video is the CLEAREST explanation of how claude skills + AI agents work and how to use them most people set up an AI agent and wonder why it keeps disappointing them. the context window is everything context is what the model assembles before it takes any action. think of it like everything the agent needs to read before it does anything. the quality of what goes in determines the quality of what comes out. the models are genuinely really good right now. claude and gpt are exceptional. the variable is almost always the context you give them. 1. agent.md files are mostly unnecessary every single line you put in an agent.md file gets added to every single conversation you have with your agent. a 1000 line file is around 7000 tokens burning on every run. the model already knows to use react. it can read your codebase. save the agent.md for proprietary information specific to your company that the model genuinely cannot know on its own. 2. skills are the actual unlock a skill.md file works differently. what loads into context is only the name and description, around 50 tokens. the full instructions only appear when the agent recognizes it needs that skill. so instead of 7000 tokens on every run you have 50. and the agent stays sharp because the context window stays lean. the closer you get to filling the context window the worse the agent performs, same way you perform worse when someone dumps 10 things on you at once. 3. here is how to actually build a skill the right way most people identify a workflow and immediately try to write the skill. what you want to do instead is run the workflow by hand with the agent first. walk it through every single step. tell it what to check, what good looks like, what bad looks like. correct it in real time. once you have had a full successful run from start to finish, tell the agent to review everything it just did and write the skill itself. it writes a better skill than you will because it has the full context of what actually worked in practice not in theory. 4. recursively building skills is how you go from frustrated to reliable when the skill breaks, and it will break, ask the agent exactly why it failed. it will tell you specifically what went wrong. fix it together in that same conversation. then tell it to update the skill file so that failure mode never happens again. ross mike did this five times with his youtube report generator. it now pulls from eight different data sources and runs flawlessly every single time without him touching it. 5. sub agents are something you earn not something you set up on day one start with one agent. build one workflow. turn it into one skill. once that works add another. ross mike has five sub agents now covering marketing, business, personal and more. it took months to get there and every single one exists because a workflow proved it deserved to exist. the people who set up 15 sub agents on day one and wonder why nothing works skipped all the steps that make the thing actually run. 6. your workflow is the thing the model cannot get anywhere else the model has been trained on everything. it knows more than you about most things. what it does not have is your specific process, your taste, your way of doing things. that is what skills capture. that is what makes your agent actually useful versus a generic one. downloading someone else's skill means downloading their context onto your setup and it will not work the way you want it to because it was never built around how you work. this is the clearest explanation of how agents actually work i have heard. Micky runs this stuff every single day and the results show it. full episode is now live on The Startup Ideas Podcast (SIP) 🧃 where you get your pods people charge for this sorta stuff i give away the sauce for free i just want you to win watch

GREG ISENBERG

192,483 Aufrufe • vor 2 Monaten

AI AGENTS 101 (58 minute free masterclass) send this to anyone who wants to understand ai agents, claude skills, md files, how to get the most out of AI etc in plain english: 1. chat vs agents - chat models answer questions in a back and forth while agents take a goal, figure out the steps, and deliver a result 2. agents don’t stop after one response. they keep running until the task is actually finishedno babysitting required 3. everything runs on a loop. they gather context, decide what to do, take an action, then repeat until done 4. the loop is the system. they look at files, tools, and the internet. decide the next step. execute and then feed that back into the next step. over and over until completion 5. the model is just one piece. gpt, claude, gemini are the reasoning layer. the key is model + loop + tools + context 6. mcp is how agents use tools. it connects things like browser, code, apis, and your internal software. once connected, the agent decides when to use them to get the job done 7. context beats prompt all day. you don't need to write perfect prompts. load your agent with context about your business, style, and goals and then simple instructions work 8. claude.md or agents.md is the onboarding doc it tells the agent who it is, how to behave, what it knows, and what tools it can use. this gets loaded every time before it starts 9. memory.md is how it improves. agents don’t remember by default. this file stores preferences, corrections, and patterns you tell the agent to update it, and it gets better over time 10. skills + harnesses make it usable. skills are reusable tasks like writing, research, analysis the harness is the environment like claude code or openclaw that runs everything. basiclaly, different interfaces, same system underneath this episode with remy on The Startup Ideas Podcast (SIP) 🧃 was one of the clearest ways of understanding a lot of the core concepts of ai agents could be the best beginners course for ai agents 58 mins. all free. no advertisers. i just want to see you build cool stuff. im rooting for you. send to a friend watch

GREG ISENBERG

375,319 Aufrufe • vor 3 Monaten

The creator of High Bandwidth Memory said something that reframes the entire AI investment thesis, AI equals memory (Save this). Most people still think about AI hardware through a training lens. During training, the bottleneck is raw compute, GPUs stay near 100% utilization crunching through billions of gradient updates. Inference is a completely different problem. When a model generates a response, it produces tokens one at a time and at every single step, the entire model has to be loaded from memory into the processor to generate just one token. The GPU cores sit there, waiting for data to arrive. This is what engineers mean when they say inference is memory bound, the bottleneck is not how many calculations you can do per second but rather how fast you can move data from memory to the chip. Adding more GPUs does not fix a memory bandwidth problem, it just gives you more processors starving for the same data. Modern LLMs use a KV cache, a data structure that stores the conversation's context so the model does not have to recompute it from scratch on each step. The KV cache is what gives a model its memory of the conversation. It grows with every token and for long documents or deep reasoning chains, it can dwarf the model weights themselves in memory consumption. This means memory directly determines how long a context the model can hold, how many users you can serve simultaneously, how fast it responds and how cheaply you can run it. A memory constrained model is not just slower but rather qualitatively worse, it forgets earlier parts of the conversation, truncates context and hallucinates more because it literally cannot hold the relevant information long enough to use it. The world now spends more on inference than training, and every ChatGPT query, every Claude document analysis, every API call is an inference workload. Inference economics, cost per token, latency, context length, concurrent users are memory problems first and compute problems second. The companies that control memory bandwidth and supply are not suppliers to the AI trade but rather are the AI trade. Long Micron! Follow me Melvin for more AI, semis and the next big market themes.

Melvin

47,148 Aufrufe • vor 5 Tagen