Loading video...

Video Failed to Load

Go Home

Karpathy told Dwarkesh that a 1 billion parameter model, trained on clean data, could hit the intelligence of today's 1.8 trillion parameter frontier. That is a 1,800x compression claim. The math behind it is more defensible than it sounds. When researchers at frontier labs look at random samples from...

507,774 views • 2 months ago •via X (Twitter)

0 Comments

No comments available

Comments from the original post will appear here

Related Videos

The most overlooked part of the SpaceX IPO thesis is the model and most people are completely missing it (Save this) Everyone has been focused on the Anthropic compute deal and the Colossus revenue because those are numbers you can put in a spreadsheet. Six months ago, xAI was competing reasonably well on model performance but was not clearly on the frontier. Then SpaceX exercised its option to acquire Cursor for $60 billion, the largest startup acquisition in history just days after completing the largest IPO in history at $75 billion. Cursor is a team of 700 to 800 people, was on track to exit 2026 at up to $10 billion in revenue, had millions of professional developers using it daily, and had already built a team with the genuine potential to compete at the frontier, the one thing holding them back was compute. SpaceX just gave them the largest GPU cluster in the world to work with. Grok 4.3, a 1.5 trillion parameter model, is currently training with Cursor's proprietary coding data being injected directly into pre-training, not just fine tuning which is a fundamentally more powerful integration than anything the market is currently modeling. The prior version, Grok 4, was already on the Pareto frontier as of 10 to 12 days ago, the most intelligent 500 billion parameter model in the world, sitting alongside Google Gemini, Anthropic, and OpenAI as one of only four systems at the true frontier. Composer 2.5, the previous Cursor model was Pareto dominant in coding tasks just before the acquisition closed, meaning SpaceX inherited a model that was already best-in-class in the highest-value AI use case in the market. The AWS parallel is the one everyone keeps missing. Bezos built data center capacity for Black Friday, sat on idle infrastructure the rest of the year, and monetized it into what was at the time the most profitable technology business in history and investors hated it in 2009 and 2010 because he was burning free cash flow on capacity that had no obvious revenue yet. SpaceX is in exactly that position, it built Colossus for xAI's own training needs, is monetizing excess capacity to Anthropic at $1.25 billion per month across 220,000 Nvidia GPUs, and has reportedly secured up to 20% of Nvidia's early Vera Rubin allocation, giving it the most powerful and scarcest GPU infrastructure in the world during the critical window when those chips are hardest to get. The $60 billion Cursor acquisition closed at a moment when SpaceX had essentially unlimited compute, a team already at the frontier, and a product with deep enterprise distribution, three things no other model lab had simultaneously when it was at this stage. The market is pricing the compute business conservatively and ignoring the model call option entirely, and coding is the fastest path to AGI, once you are on the Pareto frontier with that compute, revenue scales fast. Anthropic went from negligible revenue to $30 billion annualized in under 18 months and that is the existence proof. Bullish on SpaceXAI and Elon Musk

Milk Road AI

69,173 views • 14 days ago

China just released an open source AI model that matches the best closed models from OpenAI and Anthropic. Gavin Baker explained exactly how they did it and the answer should concern every American AI lab. The model is called GLM 5.2. It was built by Z. AI. You get 744 billion parameters, 1 million token context window and its MIT license, meaning anyone can download it, fork it, build a company on it, with no restrictions and no Dario. It scored 51 points on the artificial analysis intelligence index. The highest score any open weight model has ever achieved. It beat GPT 5.5 on the frontier software engineering benchmark. It trails Claude Opus 4.8 by less than one percentage point. And it costs 85% less to run than GPT 5.5 for comparable performance. Gavin Baker said on the All-In podcast that this model has challenged some of his beliefs. Then he explained how China built it. The method is called distillation. Just think of tens of thousands of phones and computers running simultaneously, all hitting the frontier model APIs through masked accounts, asking specific questions, and harvesting what happens inside the model when it answers. Every reasoning step, every token. The entire thinking process gets recorded and fed back into the Chinese model during training. It is a cheat sheet. It is the answer key to the exam. And here is the part that should worry everyone. Sacks said it plainly. China was already nine months behind American models. But now that GLM 5.2 is good enough to run its own reinforcement learning, it can improve itself without needing to distill from American models anymore. The cheat sheet let them get close enough to start writing their own answers. Sacks said we are six months behind on the model and 24 months behind on silicon and they are only a few months behind in total. The Z. AI founder told Elon Musk directly that open weight fable-level capability will be here before Q1 2027. Every restriction Anthropic lobbied for, every self-imposed safety guardrail, every month of delay in releasing American frontier models accelerated this. The Chinese labs were not under those restrictions. They were not going to wait. The composable model future Gavin described, where every enterprise runs a frontier model alongside their own fine-tuned open weight model, is coming regardless of what American labs do next. The question is just whether the open weight half of that stack is American or Chinese. Right now it is Chinese. WATCH THE FULL PODCAST ON The All-In Podcast

Ihtesham Ali

83,056 views • 3 days ago

Sam Altman just handed every startup founder a one-question autopsy. Altman: “If you’re building something on GPT-4 that a reasonable observer would say we’re going to steamroll you.” Not might. Not could. Going to. He said it with the calm of someone describing weather. Because to him it is weather. The model improves. Whatever was built on the old version’s weaknesses gets washed away. That is not strategy. That is erosion. And most founders are building on the erosion line. They find a gap in the current model. They wrap a product around it. They raise money. They hire. They scale. Then OpenAI releases the next version and the gap closes and the product has no reason to exist anymore. Altman: “When we just do our fundamental job, which is make the model better with every crank, then you get the ‘OpenAI killed my startup’ meme.” He is telling you directly. They are not hunting you. They are not even thinking about you. They are just improving the model. You happen to be standing where the improvement lands. That is the part founders refuse to hear. OpenAI does not need to compete with you. It just needs to keep doing exactly what it was already doing and your entire company disappears as a side effect. You are not a competitor. You are a temporary symptom of incomplete intelligence. The moment the intelligence completes you become nothing. Then Brad Lightcap delivered the cleanest diagnostic ever spoken in venture capital. Lightcap: “Ask if a 100x improvement in the model is something they’re excited about.” One question. The entire investment thesis reduced to a single binary. Does the next model make your company more powerful or does it make your company pointless. There is no middle ground. Lightcap: “We know the companies that come to us saying, ‘We want the next model. When is it coming out? I want to be the first to try it.’” These companies built something that feeds on intelligence. The smarter the model gets the more their product can do. They are not threatened by progress. They are starving for it. Then there are the companies Lightcap never hears from. The ones who go quiet when a new model drops. The ones who read the release notes like a death sentence. The ones privately praying the next generation takes longer because every improvement shrinks the ground beneath them. If you are hoping the model stays roughly where it is you have already told the market everything it needs to know about your company. You are not building on intelligence. You are building on the absence of it. Altman: “95% of the world should be betting on the latter category.” The latter category is simple. Assume the model keeps getting better at the pace it has been getting better. Build for that world. Not the world where GPT-4 is the ceiling. The world where GPT-4 is the floor and the ceiling has not been built yet. Then Altman told a story that should be framed on the wall of every startup in the country. A medical AI company came to him that morning. They were not complaining about the model. They were not worried about being replaced. They were demanding it improve faster. Altman: “Here’s how many people are dying every day you delay.” That is what alignment with the trajectory looks like. A company so deeply built on intelligence improving that every day the model stays the same is a day someone dies who did not have to. They are not building on a flaw. They are building on a future that has not arrived fast enough. That is the difference. The wrapper startup patches what the model cannot do today. The real company builds what the model will unlock tomorrow. One is running from the train. The other is laying the track. Altman told you the train is not slowing down. Lightcap told you exactly how to know which side you are on. One question. Does a 100x smarter model make you more valuable or erase you. If you had to pause before answering you already did.

Dustin

39,109 views • 2 months ago

I am stocked to announce that I won the OpenAI Developers Codex x Mollie Hacka Worldwide Hackathon in Paris. 60+ builders, every one of us working solo, one day to ship. I built mine around a single question: who gets to own intelligence? The default answer is scary. You hand your data to a handful of labs, they train the model, they own it, and you rent back a thin slice of what your own data made possible. That is the bargain on the table today. I do not accept it. So I built Lensemble: a Tapestry like distributed training platform for JEPA based World Models. What does it enable: World Models that a community improves together, keeps sovereign, and co-owns. Two bets sit underneath it. First, the paradigm. Language models predict the next token. Powerful for text, a dead end for the physical world. A robot does not need to autocomplete sentences, it needs to predict what happens next in the world. That is what JEPA does: it learns by predicting representations instead of pixels or tokens. I am convinced world models are the most underrated paradigm in AI right now, and the closest thing we have to a ChatGPT moment for robotics. Second, the politics. Your raw trajectories never leave your machine. Each participant trains locally against a shared protocol and ships only an update, never the data. A federated round folds those updates into one shared world model, a LeWorldModel based model, and the gain is measured, not claimed: a 12k-parameter adapter on a frozen backbone, held-out prediction error down about 12 percent, the model measurably less surprised by the world. Then the upside is split by contribution weight, so the people who improved the model own a share of what it earns. This is the thesis behind Project Tapestry, the AI Alliance and Yann LeCun's push for federated, sovereign frontier AI, carried into world models and robotics. Call it Tapestry for the physical world. All of it built solo, in a single day, with Codex as my pair the whole way. Thank you to OpenAI Codex and Mollie for backing builders who ship real things, and to Boris and the organizing crew for the room and the standard you set. Intelligence the world improves, and the world owns. That is the future I want for my kids, and the one I will keep building.

abdel

16,727 views • 9 days ago