
Ben Pouladian
@benitoz • 20,652 subscribers
The world isn’t short of oil. It’s short of tokens. Hardware, software, models. System-level AI infrastructure analysis. $NVDA since 2016. DMs open ↓
Shorts
Videos

Today Dario admits that Anthropic only planned for 10x growth but got hit with 80x instead Internally called a “success disaster” Their compute effectively is off by a factor of 8x or more Now do the outages, rate limits, nerfed performance make sense? We need more compute!
Ben Pouladian922,024 просмотров • 1 месяц назад

To source the 131,072 GPU Al "supercluster," Larry Ellison, appealed directly to Jensen Huang, during a dinner joined by Elon Musk at Nobu. "I would describe the dinner as me and Elon begging Jensen for GPUs. Please take our money. We need you to take more of our money. Please!”
Ben Pouladian2,047,603 просмотров • 1 год назад

I read a lot of Peter Lynch. Met him once. The one rule I carry into tech investing is the most boring one he ever wrote, know what you own, down to the physics if the position demands it. For me that has meant living inside NVIDIA's stack for years, and pulling apart the alternatives next to it, Trainium, the TPU, every serious accelerator someone is willing to tape out against Jensen. I was also an early investor in Mellanox, the networking company NVIDIA bought to own the switched fabric the entire scale up era now runs on. So when the conversation turns to networking as the real moat, this is not theory to me. It is a position I watched become the thesis. You do not understand what you own until you understand what could take it. Gavin Baker at The Sohn Idea Contest just gave the most physically grounded read on AI infrastructure I have heard this cycle, and it is a Lynch lesson in disguise. The reframe that matters: The last terrestrial mega data center may already be on someone's drawing board. Everything else follows from two constraints, watts and wafers, and Gavin walks both down to first principles. That is the work. Most people are pricing the narrative. Lynch would have asked what the thing actually is. 1. TSMC is the global rate limiter Jensen reportedly visits every quarter asking to double or triple leading edge capacity. TSMC expands at roughly 5 percent. A handful of disciplined operators in Taiwan are the physical governor on the entire AI buildout. This is the part the bubble crowd misses. The constraint is not demand and it is not capital. It is one fab's deliberate refusal to overbuild. That stretches the cycle longer and smoother instead of bubble and bust. It reads like the mid 1990s capacity cycle, not a standard 25 year memory peak where a 60 to 70 percent price spike would be your signal to cut the weed and walk. I have held NVIDIA since 2016 for exactly this reason. Owning it meant understanding it. The thesis was never the chip. It was the chokepoint. 2. The most underestimated silicon is Trainium Consensus is still pricing a one horse race. Gavin's sharpest non NVIDIA call is AWS Trainium, specifically Trainium 3 ramping in the back half of 2026. Here is the part that took me a while to internalize from studying these architectures side by side. As frontier models go fully Mixture of Experts, inference stops being a matmul problem and becomes a networking problem. You need a switched scale up fabric, not just fast chips. Today two organizations on earth have a working one. NVIDIA and Amazon. NVIDIA's came from Mellanox, which is the whole reason I sized that position the way I did years ago, the bet was always that networking would decide this, not raw flops. The TPU is formidable in its own lane, but the scale up fabric is the moat people are not modeling, and it is why I track every accelerator, not just the one I own. 3. The neocloud moat is operational, not arbitrage The lazy take is that CoreWeave and Crusoe are just renting hyperscaler slack. Gavin's counter is that running dense GPU clusters is like driving an F1 car. Looks easy until you try it. Top tier neoclouds run 2 to 3x the hardware utilization per hour of lower tier providers. That is an execution and inventory moat, and it compounds. 4. The structural short nobody is pricing Watts and wafers eventually force the buildout off the planet. Gavin expects orbital data infrastructure to prove technical and economic viability within roughly two years and take meaningful share by the end of the decade. Space solves power with unattenuated solar and solves cooling with massive radiators in the satellite's own shadow. Dense single rack nodes stitched together with lasers into a virtual hyperscale cluster in orbit. The unpriced risk is everything that over expanded to serve a terrestrial buildout. Cooling, power, industrial equipment names sized for a curve that may bend down within seven years. The whole interview is a lesson in pattern recognition over narrative. Lynch built a career on retail investors knowing their companies better than Wall Street did. The same edge exists in AI infrastructure right now, it just requires you to understand watts and wafers instead of same store sales. If you are not modeling the physical boundaries of the stack through the lens of history, you are not underwriting the position. You are following it.
Ben Pouladian93,402 просмотров • 1 месяц назад

CoreWeave just locked in 10,000+ H100s two quarters early and customers are re-contracting within 5% of original pricing. Meanwhile, H100 and H200 spot prices are climbing again into 2025. This isn’t an AI bubble it’s compute scarcity. $NVDA $CRWV Cassandra Unchained
Ben Pouladian399,177 просмотров • 7 месяцев назад

Jensen just said he was reading a text from DeepMind’s Demis confirming it: pre-training + post-training scaling laws are fully intact, and Gemini 3 is the proof. A huge jump in quality exactly what Blackwell and Rubin were built for. This next curve is going to be wild. $NVDA
Ben Pouladian327,245 просмотров • 7 месяцев назад

Bloom Energy just dropped a bombshell on their Q4 call. Their fuel cells now power absorption chillers using waste heat cutting data center electricity usage by 20%+ All is 800vdc No grid. No HFCs. Free cooling. Power + cooling from one platform. $BE 🔊 Listen to this 👇
Ben Pouladian208,442 просмотров • 4 месяцев назад

Intel CFO just admitted they have $11.6B in inventory but can’t ship what customers need 🤡 Stacy Rasgon with the kill shot: “You have your own factories. How does that happen?” CFO: “Six months ago units weren’t expected to increase… we directionally weren’t managing supply to that expectation” Translation: They got caught flat-footed on the agentic AI demand surge. Meanwhile FundaAI called this EXACT thesis 2 weeks ago server CPU structural shortage driven by: → Hyperscaler demand up 15-20% → TSMC only meeting 80% of wafer demand → Agentic AI creating CPU explosions (agents invoke tools 10-100x faster than humans) Intel’s saving grace? They own their fabs. In a shortage, that’s the moat. $INTC down AH but this is actually the bull case playing out pricing power in a seller’s market.
Ben Pouladian204,838 просмотров • 5 месяцев назад

Rene Haas just confirmed the Vera CPU thesis on yesterday’s Arm Q4 call. He didn’t mean to His framing: GPUs are reticle-limited. CPUs are not. The ratio shift is happening in core count, not chip count His exact words: “256 Vera CPU chips, 88 cores per chip, a 200-kilowatt liquid-cooled rack designed to sit in a data center adjacent to a Vera Rubin system” That is not a host CPU. That is a dedicated agentic orchestration Two days ago NVIDIA’s own engineers published the receipt. They traced a real 33-minute Claude Code session: 283 inference requests 58 main-agent turns coordinating 225 sub-agent invocations Context grew from 15K to 156K tokens before compaction dropped it to 20K Main agent alone processed ~3.5 million input tokens in the first 40 turns Anthropic’s own number: agentic systems consume up to 15x more tokens than chat. Coding agents sustain 95 to 98 percent prompt cache hit rates. Without caching, costs would be 6x higher This is what’s happening between GPU calls. File reads. Tool invocations. Sub-agent spawns. Compaction. KV cache management. None of it runs on the GPU That’s why 12,000 GPUs need 400,000 CPU cores. The 33-to-1 ratio isn’t a forecast. It’s a measurement NVIDIA states it in the blog directly: this won’t be resolved by adding more compute FLOPs and memory capacity Translation: the GPU-only path is exhausted. The agentic chapter requires a platform, not a chip Their seven-chip answer: Vera Rubin NVL72 —capacity and prefill Vera CPU — tool execution, KV cache offload Groq 3 LPX — SRAM-first decode, low-jitter generation NVLink 6, ConnectX-9, BlueField-4, Spectrum-X — fabric Result they claim: 400+ tokens per second per user on trillion-parameter MoE at 400K context. Vera spec: 88 Olympus cores, 176 threads, 1.8 TB/s NVLink-C2C, 1.2 TB/s LPDDR5X, 227 billion transistors. A 256-CPU rack delivers 45,056 threads and 400 TB of memory One detail nobody is talking about. The blog’s second author was previously Head of Agents at Groq. The third was previously at Groq Inc and Intel. NVIDIA didn’t license the LPX architecture. They absorbed the team that built it Haas isn’t pitching a competing thesis. He’s confirming this one from the other side of the table. Arm data center royalties doubled year-on-year. He expects them to double again Things feel slow right now because we’re between platforms. The speedup ships in H2 2026. The architectural argument is over. Deployment is the only variable left I cover this in The Quiet Architect and The Fourth Piece $arm $NVDA
Ben Pouladian62,918 просмотров • 1 месяц назад

Michael Burry keeps yelling “AI bubble.” Yet both CoreWeave and Nebius just said they’re sold out of H100s and pre-sold Blackwells before data centers even open. Two of the biggest AI clouds can’t keep GPUs in stock that’s not a bubble, that’s a compute famine $NVDA $NBIS
Ben Pouladian215,315 просмотров • 7 месяцев назад

Lumentum CEO Michael Hurlston in Japan this morning: sold out through 2027, still racing to add capacity The optical interconnect bottleneck is real. Hyperscaler demand has no ceiling in sight Optics in the rack to ship 2H27 and he still sees copper Laser go pew pew $lite
Ben Pouladian82,137 просмотров • 2 месяцев назад

The $100T industrial world doesn’t flip from x86 just because GPUs are faster It flips because NVIDIA has spent 15+ years building CUDA libraries for every physics, simulation, and engineering domain That’s the real differentiation TPUs don’t have that ecosystem $nvda does
Ben Pouladian168,842 просмотров • 7 месяцев назад

Lumentum CEO just confirmed at MS TMT: NVIDIA saw the scarcity of indium phosphide supply and locked it up. Agreements with both Lumentum AND Coherent the two major InP suppliers globally. The optical interconnect bottleneck play hiding in plain sight. $LITE $COHR $NVDA
Ben Pouladian81,271 просмотров • 4 месяцев назад