Video yükleniyor...

Video Yüklenemedi

Ana Sayfaya Dön

AI coding agents hit a wall when codebases get massive. Even with 2M token context windows, a 10M line codebase needs 100M tokens. The real bottleneck isn't just ingesting code - it's getting models to actually pay attention to all that context effectively.

976,161 görüntüleme • 11 ay önce •via X (Twitter)

11 Yorum

Garry Tan profil fotoğrafı
Garry Tan11 ay önce

Full video

The Rundown AI profil fotoğrafı
The Rundown AI1 yıl önce

If you're not learning AI in 2025, you're falling behind. Join 1,000,000+ early adopters reading and learn AI in just 5 minutes a day (for free).

Züri Bar Yochay profil fotoğrafı
Züri Bar Yochay11 ay önce

No coding task needs the entire 10 M-line repo in scope. You just need the files you’re touching and their dependency chain, maybe 3-5 layers deep. Big codebases naturally break into mostly independent islands, so a modest context window already covers almost everything that matters.

Gavriel Cohen profil fotoğrafı
Gavriel Cohen11 ay önce

Why could you possibly need to have 10M lines of code in context? What kind of task would require simultaneously considering every line of the codebase? I don’t think I can hold more than 30 lines in context when I’m coding but I can work on a project with millions of lines by navigating, selectively focusing and using abstractions.

Yacine Mahdid profil fotoğrafı
Yacine Mahdid11 ay önce

Long context is pretty hard, did a review of where the methods are right now last month: Long story short the bottleneck is self-attention which isn’t easy to linearize without performance degradation and scarce long context training data.

kohl profil fotoğrafı
kohl11 ay önce

AFAIK Claude Code and Codex don’t use or need indexing, nor do humans. Git history, tests, documentation, environment access, vision and intent. The meta data … the why … is crucial. this is why labs are in best position to win agentic swe

Lachlan Phillips exo/acc 👾 profil fotoğrafı
Lachlan Phillips exo/acc 👾11 ay önce

Collapsible code? Seems that half of this issue is the linear nature of documents. Code should be able to be represented symbolically natively. You should be able to compress most of your codebase and only expand relevant functions during specific queries.

Dr. Bobby Gomez-Reino profil fotoğrafı
Dr. Bobby Gomez-Reino11 ay önce

that is likely not the only way to approach it. humans don't keep attention to 10M lines of code to solve programming tasks.

geoff profil fotoğrafı
geoff11 ay önce

Have some ideas how to fix this, might be a little radical but it kinda makes sense if you can constrain the search space then all of a sudden a big repo isn’t so big. Speaking as someone who researched this space with a repo measured in the hundreds of billions of tokens. I don’t know. To be proven shortly.

Moritz Wallawitsch profil fotoğrafı
Moritz Wallawitsch11 ay önce

Do human SWEs have a 10M line context window?? Working memory is the real bottleneck.

TheOneCoder profil fotoğrafı
TheOneCoder11 ay önce

We don't need larger context, sure it helps! But what we need is better tools. I treat my agents, as Gifted Junior/Mid level engineers, thus I designate them to work like that, I am getting really good results! Small tasks, clear scope My goal right now is to build an architect agent that orchestrates many different agents into a solution, by given each a small piece of the pie, with a goal you have multi experts in that part of pie. Proper coding practices, and modularized code will allows you to scale past 10 million lines!

Benzer Videolar