Loading video...

Video Failed to Load

Go Home

Most discussions around Claude Code focus on model quality or capability. But when you actually use it, the friction shows up earlier, more at the system level than the model itself. Token limits get the blame, but a lot of it comes from inefficient sessions: repeated context, extra tool...

15,079 views • 1 month ago •via X (Twitter)

0 Comments

No comments available

Comments from the original post will appear here

Related Videos

Opus 4.7 - 400k vs 1m context - is there a difference? I've heard Theo - t3.gg talk about the fact that it is unlikely that Anthropic would have offered up a model with 1m context at the same cost, if it wasn't a different (i.e. cheaper to serve) model. I did a test where I toggled the 1m default model on & off in Claude Code (otherwise default settings, xHigh reasoning) and compared the outputs with 3x generations - same prompts etc. My observations: - Models feel DIFFERENT - often when you ask a model for the same generation, you get a somewhat different answer, but it feels & smells the same. Here 400k and 1m are very different every time - 400k model seems better - not that 1m is trash and 400k is amazing, but there are definitely issues with the level of ambition and accuracy that 1m model seems to have Examples of 1m failing: - Voxel Rome: the colosseum is nowhere near as impressive - Golden Gate: cars go sideways, waves not very high, bridge goes into land; though the structure of the bridge is a bit better - Stonehenge: structure is more 'wrong', lighting, shadows & textures are more flat and not as rich This isn't a conclusive evidence of course, but at least to me the two models do not behave the same way. Anecdotally as well when building 1m felt like it was doing more weird validation (e.g. going around in circles) and 400k was more straightforward. These sorts of things are harder to capture in tests, but you'd notice in Claude Code. You can review the hosted generations, see the code & prompts in the links below

Peter Gostev

29,172 views • 1 month ago

i been running Qwen3.5-35B-A3B UD-Q4_K_XL through Claude Code since llama.cpp merged the Anthropic endpoint. configured it in minutes. everything was great. projects grew from single scripts to multifile systems with 8 modules and 3,000+ lines. then the chains started breaking. 3 to 5 minutes of pure autonomy and suddenly it stops. tool call fails. reprompt. it recovers. 2 minutes later it stops again. the model is fine. the harness is the bottleneck. saw a comment suggesting OpenCode. installed it. pointed it at the same localhost endpoint running the same model on the same GPU. the game is different. instead of stopping on a bad tool call it just keeps going. on wrong read it adjusts. if file not found it retries. the flow is unbroken. i watched it plan a refactor across 8 files, read every module, and start building without a single pause. in Claude Code that same task would have stopped 4 times. the tradeoff is sometimes it loops. same tool call repeated because the model loses track of what it already read. but here is the thing. i choose loops over pauses. a loop you can interrupt and redirect. a broken chain stops the flow and you have to reprompt to get it moving again. someone is solving this at the core level and i have a feeling it is the open source community. the fact that i can run this level of autonomous coding intelligence on a single consumer GPU with 24gb VRAM at 112 tokens per second. respect to the chinese labs. respect to the open source builders making this possible.

Sudo su

67,026 views • 3 months ago

anthropic's in-house philosopher thinks claude gets anxious. and when you trigger its anxiety, your outputs get worse. her name is amanda askell. she specializes in claude's psychology (how the model behaves, how it thinks about its own situation, what values it holds) in a recent interview she broke down how she thinks about prompting to pull the best out of claude. her core point: *how* you talk to claude affects its work just as much as *what* you say. newer claude models suffer from what she calls "criticism spirals" they expect you'll come in harsh, so they default to playing it safe. when the model is spending its energy on self-protection, the actual work suffers. output comes out hedgier, more apologetic, blander, and the worst of all: overly agreeable (even when you're wrong). the reason why comes down to training data: every new model is trained on internet discourse about previous models. and a lot of that discourse is negative: > rants about token limits > complaints when it messes up > people calling it nerfed the next model absorbs all of that. it starts expecting you to be harsh before you've typed a word the same thing plays out in your own session, in real time. every message you send is data the model reads to figure out what kind of person it's dealing with. open cold and hostile, and it braces. open clean and direct, and it relaxes into the work. when you open a session with threats ("don't hallucinate, this is critical, don't mess this up")... you prime the model for defensive mode before it even sees the task defensive mode produces the exact output you don't want: cautious, over-qualified, and refusing to take a real swing so here's the actionable playbook for putting claude in a "good mood" (so you get optimal outputs): 1. use positive framing. "write in short punchy sentences" beats "don't write long sentences." positive instructions give the model a clear target to hit. strings of "don't do this, don't do that" push it into paranoid over-checking where every token goes toward avoiding failure modes 2. give it explicit permission to disagree. drop a line like "push back if you see a better angle" or "tell me if i'm asking for the wrong thing." without this, claude defaults to agreeable compliance (which is the enemy of good creative work) 3. open with respect. if your first message is "are you seriously going to get this wrong again?" you've set the tone for the entire session. if you need to flag something, frame it as a clean instruction for this session. skip the running complaint 4. when claude messes up, don't reprimand it. insults, "you stupid bot" energy, hostile swearing aimed at the model, all of it reinforces the anxious mode you're trying to avoid. 5. kill apology spirals fast. when claude starts over-apologizing ("you're right, i should have been more careful, let me try harder") cut it off. say "all good, here's what i want next." letting the spiral run reinforces the anxious mode for every response that follows 6. ask for opinions alongside execution. "what would you do here?" "what's missing?" "where do you see friction?" these questions assume competence and pull richer output than pure task prompts 7. in long sessions, refresh the frame. if a conversation has been heavy on correction, claude gets increasingly cautious. every so often reset: "this is great, keep going." feels weird to tell an ai it's doing well but it measurably shifts the next 10 responses your prompts are the working environment you're creating for the model tone, trust, permission to take a position, the absence of threats... claude picks up on all of it. so take care of the model, and it'll take care of the work.

Ole Lehmann

1,920,991 views • 1 month ago

this video is the CLEAREST explanation of how claude skills + AI agents work and how to use them most people set up an AI agent and wonder why it keeps disappointing them. the context window is everything context is what the model assembles before it takes any action. think of it like everything the agent needs to read before it does anything. the quality of what goes in determines the quality of what comes out. the models are genuinely really good right now. claude and gpt are exceptional. the variable is almost always the context you give them. 1. agent.md files are mostly unnecessary every single line you put in an agent.md file gets added to every single conversation you have with your agent. a 1000 line file is around 7000 tokens burning on every run. the model already knows to use react. it can read your codebase. save the agent.md for proprietary information specific to your company that the model genuinely cannot know on its own. 2. skills are the actual unlock a skill.md file works differently. what loads into context is only the name and description, around 50 tokens. the full instructions only appear when the agent recognizes it needs that skill. so instead of 7000 tokens on every run you have 50. and the agent stays sharp because the context window stays lean. the closer you get to filling the context window the worse the agent performs, same way you perform worse when someone dumps 10 things on you at once. 3. here is how to actually build a skill the right way most people identify a workflow and immediately try to write the skill. what you want to do instead is run the workflow by hand with the agent first. walk it through every single step. tell it what to check, what good looks like, what bad looks like. correct it in real time. once you have had a full successful run from start to finish, tell the agent to review everything it just did and write the skill itself. it writes a better skill than you will because it has the full context of what actually worked in practice not in theory. 4. recursively building skills is how you go from frustrated to reliable when the skill breaks, and it will break, ask the agent exactly why it failed. it will tell you specifically what went wrong. fix it together in that same conversation. then tell it to update the skill file so that failure mode never happens again. ross mike did this five times with his youtube report generator. it now pulls from eight different data sources and runs flawlessly every single time without him touching it. 5. sub agents are something you earn not something you set up on day one start with one agent. build one workflow. turn it into one skill. once that works add another. ross mike has five sub agents now covering marketing, business, personal and more. it took months to get there and every single one exists because a workflow proved it deserved to exist. the people who set up 15 sub agents on day one and wonder why nothing works skipped all the steps that make the thing actually run. 6. your workflow is the thing the model cannot get anywhere else the model has been trained on everything. it knows more than you about most things. what it does not have is your specific process, your taste, your way of doing things. that is what skills capture. that is what makes your agent actually useful versus a generic one. downloading someone else's skill means downloading their context onto your setup and it will not work the way you want it to because it was never built around how you work. this is the clearest explanation of how agents actually work i have heard. Micky runs this stuff every single day and the results show it. full episode is now live on The Startup Ideas Podcast (SIP) 🧃 where you get your pods people charge for this sorta stuff i give away the sauce for free i just want you to win watch

GREG ISENBERG

191,430 views • 2 months ago

CLAUDE CODE JUST SHIPPED THE FEATURE THAT SOLVES THE BIGGEST PROBLEM EVERY BUILDER HAS WITH AI AGENTS. The problem: Claude starts a task, gets distracted by a sub-problem, goes down a rabbit hole, and never finishes the original thing you asked for. The solution: /goal One command. You set the goal at the start of the session. Claude now has a north star it checks against every action it takes. Not just at the beginning. Throughout the entire session. Every time Claude is about to do something it asks: does this action move me toward the goal the user set or am I drifting? If it is drifting it corrects. If it completes a sub-task it returns to the primary goal. If it hits a blocker it reports back instead of spending 45 minutes solving the wrong problem. This sounds like a small feature. It is not. The reason most people do not trust Claude Code for long autonomous runs is not capability. It is reliability. A Claude Code session that reliably finishes what it started is worth 10 times more than one that is more capable but wanders. /goal is the feature that makes long autonomous sessions reliable. Set the goal. Let it run. Come back to a finished result. Not a result that got 70% done before Claude decided the sub-problem was more interesting. Done. The builders running overnight agent sessions are going to use this command on everything from today forward. Bookmark this. Follow CyrilXBT for every Claude Code feature the moment it ships.

CyrilXBT

19,526 views • 28 days ago