正在加载视频...

视频加载失败

AI coding agents hit a wall when codebases get massive. Even with 2M token context windows, a 10M line codebase needs 100M tokens. The real bottleneck isn't just ingesting code - it's getting models to actually pay attention to all that context effectively.

976,161 次观看 • 1 年前 •via X (Twitter)

11 条评论

Garry Tan 的头像
Garry Tan1 年前

Full video

The Rundown AI 的头像
The Rundown AI1 年前

If you're not learning AI in 2025, you're falling behind. Join 1,000,000+ early adopters reading and learn AI in just 5 minutes a day (for free).

Züri Bar Yochay 的头像
Züri Bar Yochay1 年前

No coding task needs the entire 10 M-line repo in scope. You just need the files you’re touching and their dependency chain, maybe 3-5 layers deep. Big codebases naturally break into mostly independent islands, so a modest context window already covers almost everything that matters.

Gavriel Cohen 的头像
Gavriel Cohen1 年前

Why could you possibly need to have 10M lines of code in context? What kind of task would require simultaneously considering every line of the codebase? I don’t think I can hold more than 30 lines in context when I’m coding but I can work on a project with millions of lines by navigating, selectively focusing and using abstractions.

Yacine Mahdid 的头像
Yacine Mahdid1 年前

Long context is pretty hard, did a review of where the methods are right now last month: Long story short the bottleneck is self-attention which isn’t easy to linearize without performance degradation and scarce long context training data.

kohl 的头像
kohl1 年前

AFAIK Claude Code and Codex don’t use or need indexing, nor do humans. Git history, tests, documentation, environment access, vision and intent. The meta data … the why … is crucial. this is why labs are in best position to win agentic swe

Lachlan Phillips exo/acc 👾 的头像
Lachlan Phillips exo/acc 👾1 年前

Collapsible code? Seems that half of this issue is the linear nature of documents. Code should be able to be represented symbolically natively. You should be able to compress most of your codebase and only expand relevant functions during specific queries.

Dr. Bobby Gomez-Reino 的头像
Dr. Bobby Gomez-Reino1 年前

that is likely not the only way to approach it. humans don't keep attention to 10M lines of code to solve programming tasks.

geoff 的头像
geoff1 年前

Have some ideas how to fix this, might be a little radical but it kinda makes sense if you can constrain the search space then all of a sudden a big repo isn’t so big. Speaking as someone who researched this space with a repo measured in the hundreds of billions of tokens. I don’t know. To be proven shortly.

Moritz Wallawitsch 的头像
Moritz Wallawitsch1 年前

Do human SWEs have a 10M line context window?? Working memory is the real bottleneck.

TheOneCoder 的头像
TheOneCoder1 年前

We don't need larger context, sure it helps! But what we need is better tools. I treat my agents, as Gifted Junior/Mid level engineers, thus I designate them to work like that, I am getting really good results! Small tasks, clear scope My goal right now is to build an architect agent that orchestrates many different agents into a solution, by given each a small piece of the pie, with a goal you have multi experts in that part of pie. Proper coding practices, and modularized code will allows you to scale past 10 million lines!

相关视频