Loading video...

Video Failed to Load

There was a problem loading this video. This could be due to a temporary network issue or the video might be unavailable.

Karpathy told Dwarkesh that a 1 billion parameter model, trained on clean data, could hit the intelligence of today's 1.8 trillion parameter frontier. That is a 1,800x compression claim. The math behind it is more defensible than it sounds. When researchers at frontier labs look at random samples from... their training corpus, they see stock ticker symbols, broken HTML, forum spam, autogenerated gibberish. Not Wikipedia. Not the Wall Street Journal. The actual pretraining dataset is mostly noise, and the model is burning parameters to vaguely remember all of it. One estimate pegs Llama 3's information compression at 0.07 bits per token. Well-structured English carries around 1.5 bits per token of real information. The trillion-parameter model is holding a roughly 5% resolution image of the internet it trained on. So when a lab ships a 1.8 trillion parameter model, the overwhelming majority of those weights are handling rough memorization. They are compression overhead for a noisy training set, taking up capacity that could be doing reasoning instead. Karpathy's proposal is to separate the two. Build a cognitive core: a small model that contains only the algorithms for reasoning and problem-solving, stripped of encyclopedic memorization. Pair it with external memory the model queries when it needs a fact. A 1 billion parameter reasoner plus retrieval beats a 1.8 trillion parameter model trying to do both. The data already supports this direction. GPT-4o runs at roughly 200 billion parameters and outperforms the original 1.8 trillion GPT-4. Inference costs for GPT-3.5 level performance fell 280x between 2022 and 2024, driven almost entirely by smaller, cleaner, better-architected models. The trend line is pointing where Karpathy says it should. The real implication for anyone tracking the AI trade: data quality is the actual constraint. The companies winning the next phase will be the ones who figured out what to train on, and what to throw away.show more

Aakash Gupta

280,192 subscribers

507,774 views • 2 months ago •via X (Twitter)

Science & Technology

Anya Rossi• Live Now

Private livecam show

0 Comments

No comments available

Comments from the original post will appear here

Related Videos

ELON MUSK: "Grok 5 will be the largest model, a 6 trillion parameter model, whereas Grok 3 and 4 are based on a 3 trillion parameter model. Moreover, the 6 trillion parameters will have a much higher intelligence density per gigabyte. Its really going to feel Sentient."

ELON MUSK: "Grok 5 will be the largest model, a 6 trillion parameter model, whereas Grok 3 and 4 are based on a 3 trillion parameter model. Moreover, the 6 trillion parameters will have a much higher intelligence density per gigabyte. Its really going to feel Sentient."

DogeDesigner

322,011 views • 7 months ago

You'd think the race to AGI would mean training the biggest possible model. But parameter scaling had stalled for a long time after GPT-4's trillion+ parameters, and only now are models getting bigger again. What gives? Partially it’s RL scaling, as Dylan Patel explains. A 5T parameter model takes 5x longer to generate RL rollouts than a 1T model. Even if the bigger model is 2x more sample-efficient, the smaller model finishes RL faster, gets deployed to research sooner, and starts helping build the next model before the big one is even done training.

You'd think the race to AGI would mean training the biggest possible model. But parameter scaling had stalled for a long time after GPT-4's trillion+ parameters, and only now are models getting bigger again. What gives? Partially it’s RL scaling, as Dylan Patel explains. A 5T parameter model takes 5x longer to generate RL rollouts than a 1T model. Even if the bigger model is 2x more sample-efficient, the smaller model finishes RL faster, gets deployed to research sooner, and starts helping build the next model before the big one is even done training.

Dwarkesh Patel

65,123 views • 3 months ago

🇺🇸 ZUCK: META AI IS THE BIGGEST, BEST, IN THE WORLD!! Has anyone ever used it?... “Meta AI now has nearly 600 million monthly actives and as promised is on track to be the most used AI assistant in the world by the end of the year. Llama 3.3 is a new 70 billion parameter text model that performs about as well as our 405 billion parameter model but now it is easier and more efficient to run. So that is the last Lama 3.0 release. The next stop is Lama 4.0.” Source: Instagram

🇺🇸 ZUCK: META AI IS THE BIGGEST, BEST, IN THE WORLD!! Has anyone ever used it?... “Meta AI now has nearly 600 million monthly actives and as promised is on track to be the most used AI assistant in the world by the end of the year. Llama 3.3 is a new 70 billion parameter text model that performs about as well as our 405 billion parameter model but now it is easier and more efficient to run. So that is the last Lama 3.0 release. The next stop is Lama 4.0.” Source: Instagram

Mario Nawfal

482,099 views • 1 year ago

NVIDIA CEO: GROK 5 IS A 7 TRILLION PARAMETER RACE AGAINST TIME Jensen Huang is dialing in on the real challenge: not making bigger models, but training them fast without draining power or budgets. Grok 5 is right in the middle of that race. “The next frontier model. Elon already mentioned that the next version of Grok, Grok 5 I believe, is 7 trillion parameters. This one is 10, and the green represents Blackwell. In the case of Rubin, notice that the throughput is much higher, so it only takes one fourth as many of these systems to train the model within the one-month timeframe we have given here.” Source: Rohan Paul

NVIDIA CEO: GROK 5 IS A 7 TRILLION PARAMETER RACE AGAINST TIME Jensen Huang is dialing in on the real challenge: not making bigger models, but training them fast without draining power or budgets. Grok 5 is right in the middle of that race. “The next frontier model. Elon already mentioned that the next version of Grok, Grok 5 I believe, is 7 trillion parameters. This one is 10, and the green represents Blackwell. In the case of Rubin, notice that the throughput is much higher, so it only takes one fourth as many of these systems to train the model within the one-month timeframe we have given here.” Source: Rohan Paul

Mario Nawfal

76,354 views • 5 months ago

TWO BOXES THE SIZE OF A MAC MINI JUST RAN A 235 BILLION PARAMETER MODEL ON A DESK It is two NVIDIA DGX Spark units linked by a single cable. A year ago a model this size meant renting a GPU cluster by the hour. Now it sits next to your monitor for around $8,000. Here is the twist most people miss. Linking them does not create one shared 256GB memory pool. The model is split across both boxes, and that is the only reason a 235B model fits at all. It answers at roughly 10 tokens per second, and both chips sit at just 74 degrees while sipping around 50 watts. Every token stays on the desk. Nothing touches a cloud, and nothing leaves the room. The ceiling for what you can run at home just jumped from 70B to 235B. Bookmark this & Watch it run ↓

TWO BOXES THE SIZE OF A MAC MINI JUST RAN A 235 BILLION PARAMETER MODEL ON A DESK It is two NVIDIA DGX Spark units linked by a single cable. A year ago a model this size meant renting a GPU cluster by the hour. Now it sits next to your monitor for around $8,000. Here is the twist most people miss. Linking them does not create one shared 256GB memory pool. The model is split across both boxes, and that is the only reason a 235B model fits at all. It answers at roughly 10 tokens per second, and both chips sit at just 74 degrees while sipping around 50 watts. Every token stays on the desk. Nothing touches a cloud, and nothing leaves the room. The ceiling for what you can run at home just jumped from 70B to 235B. Bookmark this & Watch it run ↓

slash1s

100,849 views • 16 days ago

APPLE RESEARCH SCIENTIST JUST SHOWED HOW 4 MAC STUDIOS RUN A TRILLION PARAMETER MODEL LOCALLY ZERO COSTS 13:18 she shows the main thing - connect 4 Mac Studios and you get 1TB of shared memory - exactly enough to run a trillion parameter model right on your desk Apple's library - and four machines start working as one cluster tensor parallelism: every machine holds part of every layer - all process the same token simultaneously - speed increases 3x compared to a single device fine-tuning: one Mac Studio processes 180 tokens per second four together process 600 and not a single byte of data leaves the room one command in the terminal - and a trillion parameter model answers from your desk and runs 24/7 data centers took years to build to run models like this - Apple did it with four Thunderbolt cables

APPLE RESEARCH SCIENTIST JUST SHOWED HOW 4 MAC STUDIOS RUN A TRILLION PARAMETER MODEL LOCALLY ZERO COSTS 13:18 she shows the main thing - connect 4 Mac Studios and you get 1TB of shared memory - exactly enough to run a trillion parameter model right on your desk Apple's library - and four machines start working as one cluster tensor parallelism: every machine holds part of every layer - all process the same token simultaneously - speed increases 3x compared to a single device fine-tuning: one Mac Studio processes 180 tokens per second four together process 600 and not a single byte of data leaves the room one command in the terminal - and a trillion parameter model answers from your desk and runs 24/7 data centers took years to build to run models like this - Apple did it with four Thunderbolt cables

Noisy

47,399 views • 7 days ago

$A crypto project actually trained a 72B parameter AI model from scratch using decentralized GPU compute. Not fine-tuned, not a wrapper: trained from zero. The model benchmarks competitively against Meta's LLaMA 3 on reasoning tasks, and the entire training run cost a fraction of what centralized labs spend. If decentralized compute can produce frontier-class models, the moat around OpenAI and Anthropic is thinner than people think.$

A crypto project actually trained a 72B parameter AI model from scratch using decentralized GPU compute. Not fine-tuned, not a wrapper: trained from zero. The model benchmarks competitively against Meta's LLaMA 3 on reasoning tasks, and the entire training run cost a fraction of what centralized labs spend. If decentralized compute can produce frontier-class models, the moat around OpenAI and Anthropic is thinner than people think.

VirtualBacon

40,543 views • 2 months ago

i'm running a 397 billion parameter model on a amd ai max box that sits on my desk and pulls less power than a gaming laptop. the model is Nex-N2-Pro, 397B-A17B, the open weight release people are putting next to gpt-5.5 on coding. i have it quantized to IQ1_M, 1.75 bits per weight, 90gb of weights loaded into the 128gb of unified memory on amd's strix halo igpu. watch the gpu in this recording. it spikes, it sustains, it does not fall over. that is the part the spec sheets never show you, not just that a 400b model loads, but that an integrated graphics chip holds the load and generates token after token, stable, no crash, no thermal cliff. and it is not a slideshow. roughly 18 tokens a second, faster than you can read. a frontier scale model producing usable output, fully local. no datacenter, no rented h100s, no api key, no permission. three years ago a model this size meant a server room and a budget to match. tonight it is a quiet box on my desk. this is the accessible tier almost nobody benchmarks honestly, and it is further along than the timeline thinks. the full breakdown is coming, rocm vs vulkan on this chip, and this little amd box head to head against the nvidia equivalent. stay tuned.

i'm running a 397 billion parameter model on a amd ai max box that sits on my desk and pulls less power than a gaming laptop. the model is Nex-N2-Pro, 397B-A17B, the open weight release people are putting next to gpt-5.5 on coding. i have it quantized to IQ1_M, 1.75 bits per weight, 90gb of weights loaded into the 128gb of unified memory on amd's strix halo igpu. watch the gpu in this recording. it spikes, it sustains, it does not fall over. that is the part the spec sheets never show you, not just that a 400b model loads, but that an integrated graphics chip holds the load and generates token after token, stable, no crash, no thermal cliff. and it is not a slideshow. roughly 18 tokens a second, faster than you can read. a frontier scale model producing usable output, fully local. no datacenter, no rented h100s, no api key, no permission. three years ago a model this size meant a server room and a budget to match. tonight it is a quiet box on my desk. this is the accessible tier almost nobody benchmarks honestly, and it is further along than the timeline thinks. the full breakdown is coming, rocm vs vulkan on this chip, and this little amd box head to head against the nvidia equivalent. stay tuned.

Sudo su

31,780 views • 15 days ago

The most overlooked part of the SpaceX IPO thesis is the model and most people are completely missing it (Save this) Everyone has been focused on the Anthropic compute deal and the Colossus revenue because those are numbers you can put in a spreadsheet. Six months ago, xAI was competing reasonably well on model performance but was not clearly on the frontier. Then SpaceX exercised its option to acquire Cursor for $60 billion, the largest startup acquisition in history just days after completing the largest IPO in history at $75 billion. Cursor is a team of 700 to 800 people, was on track to exit 2026 at up to $10 billion in revenue, had millions of professional developers using it daily, and had already built a team with the genuine potential to compete at the frontier, the one thing holding them back was compute. SpaceX just gave them the largest GPU cluster in the world to work with. Grok 4.3, a 1.5 trillion parameter model, is currently training with Cursor's proprietary coding data being injected directly into pre-training, not just fine tuning which is a fundamentally more powerful integration than anything the market is currently modeling. The prior version, Grok 4, was already on the Pareto frontier as of 10 to 12 days ago, the most intelligent 500 billion parameter model in the world, sitting alongside Google Gemini, Anthropic, and OpenAI as one of only four systems at the true frontier. Composer 2.5, the previous Cursor model was Pareto dominant in coding tasks just before the acquisition closed, meaning SpaceX inherited a model that was already best-in-class in the highest-value AI use case in the market. The AWS parallel is the one everyone keeps missing. Bezos built data center capacity for Black Friday, sat on idle infrastructure the rest of the year, and monetized it into what was at the time the most profitable technology business in history and investors hated it in 2009 and 2010 because he was burning free cash flow on capacity that had no obvious revenue yet. SpaceX is in exactly that position, it built Colossus for xAI's own training needs, is monetizing excess capacity to Anthropic at $1.25 billion per month across 220,000 Nvidia GPUs, and has reportedly secured up to 20% of Nvidia's early Vera Rubin allocation, giving it the most powerful and scarcest GPU infrastructure in the world during the critical window when those chips are hardest to get. The $60 billion Cursor acquisition closed at a moment when SpaceX had essentially unlimited compute, a team already at the frontier, and a product with deep enterprise distribution, three things no other model lab had simultaneously when it was at this stage. The market is pricing the compute business conservatively and ignoring the model call option entirely, and coding is the fastest path to AGI, once you are on the Pareto frontier with that compute, revenue scales fast. Anthropic went from negligible revenue to $30 billion annualized in under 18 months and that is the existence proof. Bullish on SpaceXAI and Elon Musk

The most overlooked part of the SpaceX IPO thesis is the model and most people are completely missing it (Save this) Everyone has been focused on the Anthropic compute deal and the Colossus revenue because those are numbers you can put in a spreadsheet. Six months ago, xAI was competing reasonably well on model performance but was not clearly on the frontier. Then SpaceX exercised its option to acquire Cursor for $60 billion, the largest startup acquisition in history just days after completing the largest IPO in history at $75 billion. Cursor is a team of 700 to 800 people, was on track to exit 2026 at up to $10 billion in revenue, had millions of professional developers using it daily, and had already built a team with the genuine potential to compete at the frontier, the one thing holding them back was compute. SpaceX just gave them the largest GPU cluster in the world to work with. Grok 4.3, a 1.5 trillion parameter model, is currently training with Cursor's proprietary coding data being injected directly into pre-training, not just fine tuning which is a fundamentally more powerful integration than anything the market is currently modeling. The prior version, Grok 4, was already on the Pareto frontier as of 10 to 12 days ago, the most intelligent 500 billion parameter model in the world, sitting alongside Google Gemini, Anthropic, and OpenAI as one of only four systems at the true frontier. Composer 2.5, the previous Cursor model was Pareto dominant in coding tasks just before the acquisition closed, meaning SpaceX inherited a model that was already best-in-class in the highest-value AI use case in the market. The AWS parallel is the one everyone keeps missing. Bezos built data center capacity for Black Friday, sat on idle infrastructure the rest of the year, and monetized it into what was at the time the most profitable technology business in history and investors hated it in 2009 and 2010 because he was burning free cash flow on capacity that had no obvious revenue yet. SpaceX is in exactly that position, it built Colossus for xAI's own training needs, is monetizing excess capacity to Anthropic at $1.25 billion per month across 220,000 Nvidia GPUs, and has reportedly secured up to 20% of Nvidia's early Vera Rubin allocation, giving it the most powerful and scarcest GPU infrastructure in the world during the critical window when those chips are hardest to get. The $60 billion Cursor acquisition closed at a moment when SpaceX had essentially unlimited compute, a team already at the frontier, and a product with deep enterprise distribution, three things no other model lab had simultaneously when it was at this stage. The market is pricing the compute business conservatively and ignoring the model call option entirely, and coding is the fastest path to AGI, once you are on the Pareto frontier with that compute, revenue scales fast. Anthropic went from negligible revenue to $30 billion annualized in under 18 months and that is the existence proof. Bullish on SpaceXAI and Elon Musk

Milk Road AI

69,173 views • 14 days ago

You can smell a big model. Not the parameter count. Not the benchmark score. It's that feeling when something is actually reasoning. Not just pattern matching. We call it "big model smell."

You can smell a big model. Not the parameter count. Not the benchmark score. It's that feeling when something is actually reasoning. Not just pattern matching. We call it "big model smell."

Arena.ai

111,988 views • 3 months ago

China just released an open source AI model that matches the best closed models from OpenAI and Anthropic. Gavin Baker explained exactly how they did it and the answer should concern every American AI lab. The model is called GLM 5.2. It was built by Z. AI. You get 744 billion parameters, 1 million token context window and its MIT license, meaning anyone can download it, fork it, build a company on it, with no restrictions and no Dario. It scored 51 points on the artificial analysis intelligence index. The highest score any open weight model has ever achieved. It beat GPT 5.5 on the frontier software engineering benchmark. It trails Claude Opus 4.8 by less than one percentage point. And it costs 85% less to run than GPT 5.5 for comparable performance. Gavin Baker said on the All-In podcast that this model has challenged some of his beliefs. Then he explained how China built it. The method is called distillation. Just think of tens of thousands of phones and computers running simultaneously, all hitting the frontier model APIs through masked accounts, asking specific questions, and harvesting what happens inside the model when it answers. Every reasoning step, every token. The entire thinking process gets recorded and fed back into the Chinese model during training. It is a cheat sheet. It is the answer key to the exam. And here is the part that should worry everyone. Sacks said it plainly. China was already nine months behind American models. But now that GLM 5.2 is good enough to run its own reinforcement learning, it can improve itself without needing to distill from American models anymore. The cheat sheet let them get close enough to start writing their own answers. Sacks said we are six months behind on the model and 24 months behind on silicon and they are only a few months behind in total. The Z. AI founder told Elon Musk directly that open weight fable-level capability will be here before Q1 2027. Every restriction Anthropic lobbied for, every self-imposed safety guardrail, every month of delay in releasing American frontier models accelerated this. The Chinese labs were not under those restrictions. They were not going to wait. The composable model future Gavin described, where every enterprise runs a frontier model alongside their own fine-tuned open weight model, is coming regardless of what American labs do next. The question is just whether the open weight half of that stack is American or Chinese. Right now it is Chinese. WATCH THE FULL PODCAST ON The All-In Podcast

China just released an open source AI model that matches the best closed models from OpenAI and Anthropic. Gavin Baker explained exactly how they did it and the answer should concern every American AI lab. The model is called GLM 5.2. It was built by Z. AI. You get 744 billion parameters, 1 million token context window and its MIT license, meaning anyone can download it, fork it, build a company on it, with no restrictions and no Dario. It scored 51 points on the artificial analysis intelligence index. The highest score any open weight model has ever achieved. It beat GPT 5.5 on the frontier software engineering benchmark. It trails Claude Opus 4.8 by less than one percentage point. And it costs 85% less to run than GPT 5.5 for comparable performance. Gavin Baker said on the All-In podcast that this model has challenged some of his beliefs. Then he explained how China built it. The method is called distillation. Just think of tens of thousands of phones and computers running simultaneously, all hitting the frontier model APIs through masked accounts, asking specific questions, and harvesting what happens inside the model when it answers. Every reasoning step, every token. The entire thinking process gets recorded and fed back into the Chinese model during training. It is a cheat sheet. It is the answer key to the exam. And here is the part that should worry everyone. Sacks said it plainly. China was already nine months behind American models. But now that GLM 5.2 is good enough to run its own reinforcement learning, it can improve itself without needing to distill from American models anymore. The cheat sheet let them get close enough to start writing their own answers. Sacks said we are six months behind on the model and 24 months behind on silicon and they are only a few months behind in total. The Z. AI founder told Elon Musk directly that open weight fable-level capability will be here before Q1 2027. Every restriction Anthropic lobbied for, every self-imposed safety guardrail, every month of delay in releasing American frontier models accelerated this. The Chinese labs were not under those restrictions. They were not going to wait. The composable model future Gavin described, where every enterprise runs a frontier model alongside their own fine-tuned open weight model, is coming regardless of what American labs do next. The question is just whether the open weight half of that stack is American or Chinese. Right now it is Chinese. WATCH THE FULL PODCAST ON The All-In Podcast

Ihtesham Ali

83,056 views • 3 days ago

Jensen Huang said Grok 5 will be 7 Trillion parameter model. On time to train, with the training window fixed at 1 month, the new Vera-Rubin GPU system of Nvidia needs 1/4 the number of systems compared with Blackwell to train the same frontier model. “factory throughput” improves by about 10x over Blackwell, and Blackwell itself was about 10x over Hopper. i.e. overall Rubin is roughly 100x Hopper in factory throughput per watt, which matters hugely, because a 1 GW, $50B data center is power-limited and revenue scales with throughput per watt. --- From Nvidia YT channel

Jensen Huang said Grok 5 will be 7 Trillion parameter model. On time to train, with the training window fixed at 1 month, the new Vera-Rubin GPU system of Nvidia needs 1/4 the number of systems compared with Blackwell to train the same frontier model. “factory throughput” improves by about 10x over Blackwell, and Blackwell itself was about 10x over Hopper. i.e. overall Rubin is roughly 100x Hopper in factory throughput per watt, which matters hugely, because a 1 GW, $50B data center is power-limited and revenue scales with throughput per watt. --- From Nvidia YT channel

Rohan Paul

165,770 views • 5 months ago

$Introducing HRM-Text. An ultra-lean 1B-parameter reasoning language model designed to deliver strong general performance with a fraction of the data, compute, and infrastructure. Trained on just 40B structured tokens, HRM-Text achieves competitive performance while using ~1/1000 of the training data of comparable models. The kicker? The full model trains in roughly one day on a $1,000 budget. This opens the door to a new generation of AI that is powerful, accessible, and radically easier to adapt. Theories and research concepts once deemed too expensive to test are officially back in the game. Sapient Intelligence invites you to help us shape a new paradigm for general intelligence.$

Introducing HRM-Text. An ultra-lean 1B-parameter reasoning language model designed to deliver strong general performance with a fraction of the data, compute, and infrastructure. Trained on just 40B structured tokens, HRM-Text achieves competitive performance while using ~1/1000 of the training data of comparable models. The kicker? The full model trains in roughly one day on a $1,000 budget. This opens the door to a new generation of AI that is powerful, accessible, and radically easier to adapt. Theories and research concepts once deemed too expensive to test are officially back in the game. Sapient Intelligence invites you to help us shape a new paradigm for general intelligence.

Sapient Intelligence

511,026 views • 1 month ago

one half of this keynote sells you the cloud forever, the other half shows the chip that lets you stop renting it this is the other half, AMD's CEO holding the chip in her hand 00:00 - Lisa Su introduces the Ryzen AI Halo, a system built for local AI 00:29 - the line that matters, it runs models up to 200 billion parameters locally, not connected to anything 00:42 - a 200 billion parameter model, the tier of the top paid AI plans, on a desktop that fits in your hand so the cloud wants $200 a month, forever, for access you never own this is the box that runs the same class of model with nothing leaving the room that is the whole point of my breakdown, the $200 a month was never the intelligence, it was the meter and the meter just became optional most people will see a spec demo the part that matters is what it lets you stop paying for full breakdown below

one half of this keynote sells you the cloud forever, the other half shows the chip that lets you stop renting it this is the other half, AMD's CEO holding the chip in her hand 00:00 - Lisa Su introduces the Ryzen AI Halo, a system built for local AI 00:29 - the line that matters, it runs models up to 200 billion parameters locally, not connected to anything 00:42 - a 200 billion parameter model, the tier of the top paid AI plans, on a desktop that fits in your hand so the cloud wants $200 a month, forever, for access you never own this is the box that runs the same class of model with nothing leaving the room that is the whole point of my breakdown, the $200 a month was never the intelligence, it was the meter and the meter just became optional most people will see a spec demo the part that matters is what it lets you stop paying for full breakdown below

John Doe

25,960 views • 16 days ago

Sam Altman just handed every startup founder a one-question autopsy. Altman: “If you’re building something on GPT-4 that a reasonable observer would say we’re going to steamroll you.” Not might. Not could. Going to. He said it with the calm of someone describing weather. Because to him it is weather. The model improves. Whatever was built on the old version’s weaknesses gets washed away. That is not strategy. That is erosion. And most founders are building on the erosion line. They find a gap in the current model. They wrap a product around it. They raise money. They hire. They scale. Then OpenAI releases the next version and the gap closes and the product has no reason to exist anymore. Altman: “When we just do our fundamental job, which is make the model better with every crank, then you get the ‘OpenAI killed my startup’ meme.” He is telling you directly. They are not hunting you. They are not even thinking about you. They are just improving the model. You happen to be standing where the improvement lands. That is the part founders refuse to hear. OpenAI does not need to compete with you. It just needs to keep doing exactly what it was already doing and your entire company disappears as a side effect. You are not a competitor. You are a temporary symptom of incomplete intelligence. The moment the intelligence completes you become nothing. Then Brad Lightcap delivered the cleanest diagnostic ever spoken in venture capital. Lightcap: “Ask if a 100x improvement in the model is something they’re excited about.” One question. The entire investment thesis reduced to a single binary. Does the next model make your company more powerful or does it make your company pointless. There is no middle ground. Lightcap: “We know the companies that come to us saying, ‘We want the next model. When is it coming out? I want to be the first to try it.’” These companies built something that feeds on intelligence. The smarter the model gets the more their product can do. They are not threatened by progress. They are starving for it. Then there are the companies Lightcap never hears from. The ones who go quiet when a new model drops. The ones who read the release notes like a death sentence. The ones privately praying the next generation takes longer because every improvement shrinks the ground beneath them. If you are hoping the model stays roughly where it is you have already told the market everything it needs to know about your company. You are not building on intelligence. You are building on the absence of it. Altman: “95% of the world should be betting on the latter category.” The latter category is simple. Assume the model keeps getting better at the pace it has been getting better. Build for that world. Not the world where GPT-4 is the ceiling. The world where GPT-4 is the floor and the ceiling has not been built yet. Then Altman told a story that should be framed on the wall of every startup in the country. A medical AI company came to him that morning. They were not complaining about the model. They were not worried about being replaced. They were demanding it improve faster. Altman: “Here’s how many people are dying every day you delay.” That is what alignment with the trajectory looks like. A company so deeply built on intelligence improving that every day the model stays the same is a day someone dies who did not have to. They are not building on a flaw. They are building on a future that has not arrived fast enough. That is the difference. The wrapper startup patches what the model cannot do today. The real company builds what the model will unlock tomorrow. One is running from the train. The other is laying the track. Altman told you the train is not slowing down. Lightcap told you exactly how to know which side you are on. One question. Does a 100x smarter model make you more valuable or erase you. If you had to pause before answering you already did.

Sam Altman just handed every startup founder a one-question autopsy. Altman: “If you’re building something on GPT-4 that a reasonable observer would say we’re going to steamroll you.” Not might. Not could. Going to. He said it with the calm of someone describing weather. Because to him it is weather. The model improves. Whatever was built on the old version’s weaknesses gets washed away. That is not strategy. That is erosion. And most founders are building on the erosion line. They find a gap in the current model. They wrap a product around it. They raise money. They hire. They scale. Then OpenAI releases the next version and the gap closes and the product has no reason to exist anymore. Altman: “When we just do our fundamental job, which is make the model better with every crank, then you get the ‘OpenAI killed my startup’ meme.” He is telling you directly. They are not hunting you. They are not even thinking about you. They are just improving the model. You happen to be standing where the improvement lands. That is the part founders refuse to hear. OpenAI does not need to compete with you. It just needs to keep doing exactly what it was already doing and your entire company disappears as a side effect. You are not a competitor. You are a temporary symptom of incomplete intelligence. The moment the intelligence completes you become nothing. Then Brad Lightcap delivered the cleanest diagnostic ever spoken in venture capital. Lightcap: “Ask if a 100x improvement in the model is something they’re excited about.” One question. The entire investment thesis reduced to a single binary. Does the next model make your company more powerful or does it make your company pointless. There is no middle ground. Lightcap: “We know the companies that come to us saying, ‘We want the next model. When is it coming out? I want to be the first to try it.’” These companies built something that feeds on intelligence. The smarter the model gets the more their product can do. They are not threatened by progress. They are starving for it. Then there are the companies Lightcap never hears from. The ones who go quiet when a new model drops. The ones who read the release notes like a death sentence. The ones privately praying the next generation takes longer because every improvement shrinks the ground beneath them. If you are hoping the model stays roughly where it is you have already told the market everything it needs to know about your company. You are not building on intelligence. You are building on the absence of it. Altman: “95% of the world should be betting on the latter category.” The latter category is simple. Assume the model keeps getting better at the pace it has been getting better. Build for that world. Not the world where GPT-4 is the ceiling. The world where GPT-4 is the floor and the ceiling has not been built yet. Then Altman told a story that should be framed on the wall of every startup in the country. A medical AI company came to him that morning. They were not complaining about the model. They were not worried about being replaced. They were demanding it improve faster. Altman: “Here’s how many people are dying every day you delay.” That is what alignment with the trajectory looks like. A company so deeply built on intelligence improving that every day the model stays the same is a day someone dies who did not have to. They are not building on a flaw. They are building on a future that has not arrived fast enough. That is the difference. The wrapper startup patches what the model cannot do today. The real company builds what the model will unlock tomorrow. One is running from the train. The other is laying the track. Altman told you the train is not slowing down. Lightcap told you exactly how to know which side you are on. One question. Does a 100x smarter model make you more valuable or erase you. If you had to pause before answering you already did.

Dustin

39,109 views • 2 months ago

I am stocked to announce that I won the OpenAI Developers Codex x Mollie Hacka Worldwide Hackathon in Paris. 60+ builders, every one of us working solo, one day to ship. I built mine around a single question: who gets to own intelligence? The default answer is scary. You hand your data to a handful of labs, they train the model, they own it, and you rent back a thin slice of what your own data made possible. That is the bargain on the table today. I do not accept it. So I built Lensemble: a Tapestry like distributed training platform for JEPA based World Models. What does it enable: World Models that a community improves together, keeps sovereign, and co-owns. Two bets sit underneath it. First, the paradigm. Language models predict the next token. Powerful for text, a dead end for the physical world. A robot does not need to autocomplete sentences, it needs to predict what happens next in the world. That is what JEPA does: it learns by predicting representations instead of pixels or tokens. I am convinced world models are the most underrated paradigm in AI right now, and the closest thing we have to a ChatGPT moment for robotics. Second, the politics. Your raw trajectories never leave your machine. Each participant trains locally against a shared protocol and ships only an update, never the data. A federated round folds those updates into one shared world model, a LeWorldModel based model, and the gain is measured, not claimed: a 12k-parameter adapter on a frozen backbone, held-out prediction error down about 12 percent, the model measurably less surprised by the world. Then the upside is split by contribution weight, so the people who improved the model own a share of what it earns. This is the thesis behind Project Tapestry, the AI Alliance and Yann LeCun's push for federated, sovereign frontier AI, carried into world models and robotics. Call it Tapestry for the physical world. All of it built solo, in a single day, with Codex as my pair the whole way. Thank you to OpenAI Codex and Mollie for backing builders who ship real things, and to Boris and the organizing crew for the room and the standard you set. Intelligence the world improves, and the world owns. That is the future I want for my kids, and the one I will keep building.

I am stocked to announce that I won the OpenAI Developers Codex x Mollie Hacka Worldwide Hackathon in Paris. 60+ builders, every one of us working solo, one day to ship. I built mine around a single question: who gets to own intelligence? The default answer is scary. You hand your data to a handful of labs, they train the model, they own it, and you rent back a thin slice of what your own data made possible. That is the bargain on the table today. I do not accept it. So I built Lensemble: a Tapestry like distributed training platform for JEPA based World Models. What does it enable: World Models that a community improves together, keeps sovereign, and co-owns. Two bets sit underneath it. First, the paradigm. Language models predict the next token. Powerful for text, a dead end for the physical world. A robot does not need to autocomplete sentences, it needs to predict what happens next in the world. That is what JEPA does: it learns by predicting representations instead of pixels or tokens. I am convinced world models are the most underrated paradigm in AI right now, and the closest thing we have to a ChatGPT moment for robotics. Second, the politics. Your raw trajectories never leave your machine. Each participant trains locally against a shared protocol and ships only an update, never the data. A federated round folds those updates into one shared world model, a LeWorldModel based model, and the gain is measured, not claimed: a 12k-parameter adapter on a frozen backbone, held-out prediction error down about 12 percent, the model measurably less surprised by the world. Then the upside is split by contribution weight, so the people who improved the model own a share of what it earns. This is the thesis behind Project Tapestry, the AI Alliance and Yann LeCun's push for federated, sovereign frontier AI, carried into world models and robotics. Call it Tapestry for the physical world. All of it built solo, in a single day, with Codex as my pair the whole way. Thank you to OpenAI Codex and Mollie for backing builders who ship real things, and to Boris and the organizing crew for the room and the standard you set. Intelligence the world improves, and the world owns. That is the future I want for my kids, and the one I will keep building.

abdel

16,727 views • 9 days ago

"One of the biggest misconceptions" Cerebras CFO Bob Komin pushes back on the small-models narrative. "We serve all models, and there is no limit to the size of the models that we can serve. Today, we're serving trillion parameter models. We're serving trillion parameter models that are internal for OpenAI today. We are currently running OpenAI 5.4 and 5.5 with them."

"One of the biggest misconceptions" Cerebras CFO Bob Komin pushes back on the small-models narrative. "We serve all models, and there is no limit to the size of the models that we can serve. Today, we're serving trillion parameter models. We're serving trillion parameter models that are internal for OpenAI today. We are currently running OpenAI 5.4 and 5.5 with them."

Deirdre Bosa

83,925 views • 1 month ago

GROK 5. the first 7 trillion parameter model

GROK 5. the first 7 trillion parameter model

🍓🍓🍓

39,839 views • 5 months ago

Blender AI Real-Time Motion Capture Plugin — connect a 1080P camera or upload videos. It runs locally with a 1-billion-parameter model and requires 8GB of VRAM for real-time processing. It supports both real-time capture and video uploads. The full-parameter version currently supports NVIDIA CUDA and requires DX11 or higher.

Blender AI Real-Time Motion Capture Plugin — connect a 1080P camera or upload videos. It runs locally with a 1-billion-parameter model and requires 8GB of VRAM for real-time processing. It supports both real-time capture and video uploads. The full-parameter version currently supports NVIDIA CUDA and requires DX11 or higher.

CYANPUPPETS

53,996 views • 3 months ago

Blender AI Real-Time Motion Capture Plugin — connect a 1080P camera or upload videos. It runs locally with a 1-billion-parameter model and requires 8GB of VRAM for real-time processing. It supports both real-time capture and video uploads. The full-parameter version currently supports NVIDIA CUDA and requires DX11 or higher.

Blender AI Real-Time Motion Capture Plugin — connect a 1080P camera or upload videos. It runs locally with a 1-billion-parameter model and requires 8GB of VRAM for real-time processing. It supports both real-time capture and video uploads. The full-parameter version currently supports NVIDIA CUDA and requires DX11 or higher.

CYANPUPPETS

26,492 views • 4 months ago