Semi Doped's banner

Semi Doped

@semidoped • 7,734 subscribers

Podcast and daily takes from @vikramskr and @austinsemis

Videos

Anya Rossi

sweetdream.ai

SweetDream.ai•Sponsored•Livecam

Watch Anya Live

Anya is streaming live right now! Join her private show and enjoy exclusive content.

Exclusive private shows

1.2k viewers online

Private Show

Join now for exclusive access

Free preview available • Premium content

Vik and Val Bercovici (WEKA) map where AI inference memory is headed. Every 100x cut in KV cache gets swallowed by ~10,000x more usage, so demand climbs. - NVLink beats the board: 128 lanes vs 32 PCIe - WEKA serves NAND-backed storage faster than DRAM over network - DeepSeek's cache reads run ~87x cheaper, China only - CXL needs a dedicated bus, WEKA pools NAND over RDMA/NVLink instead - Val's call: SaaS giants and Neoclouds must merge Chapters: 0:00 Intro and memory prices 3:00 Model routing and offloading 5:10 Network faster than motherboard 13:06 Four-tier memory hierarchy 19:40 KV cache and Jevons paradox 23:50 DeepSeek cache read pricing 31:40 Sliding window attention 34:49 NAND tiers, SLC vs QLC 43:57 High bandwidth flash use cases 49:59 CXL versus NVLink 53:44 AMD Mex and small models 1:03:14 SaaS, Neoclouds and tokenomics Get more of Austin and Vik daily, free! Sign up: Connect with Vik and Austin: Vik's Paid Substack: Austin's Paid Substack: Austin Lyons Vikram Sekar

Vik and Val Bercovici (WEKA) map where AI inference memory is headed. Every 100x cut in KV cache gets swallowed by ~10,000x more usage, so demand climbs. - NVLink beats the board: 128 lanes vs 32 PCIe - WEKA serves NAND-backed storage faster than DRAM over network - DeepSeek's cache reads run ~87x cheaper, China only - CXL needs a dedicated bus, WEKA pools NAND over RDMA/NVLink instead - Val's call: SaaS giants and Neoclouds must merge Chapters: 0:00 Intro and memory prices 3:00 Model routing and offloading 5:10 Network faster than motherboard 13:06 Four-tier memory hierarchy 19:40 KV cache and Jevons paradox 23:50 DeepSeek cache read pricing 31:40 Sliding window attention 34:49 NAND tiers, SLC vs QLC 43:57 High bandwidth flash use cases 49:59 CXL versus NVLink 53:44 AMD Mex and small models 1:03:14 SaaS, Neoclouds and tokenomics Get more of Austin and Vik daily, free! Sign up: Connect with Vik and Austin: Vik's Paid Substack: Austin's Paid Substack: Austin Lyons Vikram Sekar

59,992 views • 18 days ago

New episode: Advanced packaging for AI chips, from wire bonds to TSMC CoWOS and Intel EMIB. Packaging is no longer an afterthought. It is the chip, and Intel's EMIB challenges TSMC's CoWOS. - Three CoWOS flavors: silicon, organic RDL, local bridges - EMIB embeds tiny bridges into the substrate, no interposer - EMIB-T and EMIB-M add through-silicon vias and power capacitors - Google is booking 3M TPUs on EMIB via MediaTek by 2028 - Package sizes keep climbing: 5.5x reticle today, 40x ahead Chapters: (0:00) "There Is No Chip Without the Packaging" (0:28) Intro and SpaceX IPO Day (5:15) What We're Covering: CoWOS, EMIB, Google (7:40) Simple Packaging: Wire Bonds to Flip Chip (17:07) What Makes Packaging "Advanced" (33:44) CoWOS: Three Flavors Explained (45:30) EMIB: Intel's Embedded Bridge Approach (52:47) EMIB-T and EMIB-M (57:31) CoWOS vs. EMIB Trade-offs (1:02:18) Google's 3M TPU EMIB Order Austin Lyons Vikram Sekar

New episode: Advanced packaging for AI chips, from wire bonds to TSMC CoWOS and Intel EMIB. Packaging is no longer an afterthought. It is the chip, and Intel's EMIB challenges TSMC's CoWOS. - Three CoWOS flavors: silicon, organic RDL, local bridges - EMIB embeds tiny bridges into the substrate, no interposer - EMIB-T and EMIB-M add through-silicon vias and power capacitors - Google is booking 3M TPUs on EMIB via MediaTek by 2028 - Package sizes keep climbing: 5.5x reticle today, 40x ahead Chapters: (0:00) "There Is No Chip Without the Packaging" (0:28) Intro and SpaceX IPO Day (5:15) What We're Covering: CoWOS, EMIB, Google (7:40) Simple Packaging: Wire Bonds to Flip Chip (17:07) What Makes Packaging "Advanced" (33:44) CoWOS: Three Flavors Explained (45:30) EMIB: Intel's Embedded Bridge Approach (52:47) EMIB-T and EMIB-M (57:31) CoWOS vs. EMIB Trade-offs (1:02:18) Google's 3M TPU EMIB Order Austin Lyons Vikram Sekar

66,710 views • 1 month ago

New episode: Austin and Vik break down Qualcomm's Investor Day - HBC, Alphawave, Modular, more • 2/3 of revenue from data center, auto, and IoT by 2029, not handsets • HBC stacks LPDDR on the logic die, not beside it: 2,000 lanes become 100,000 • Disaggregation lets latecomers slot a decode rack beside Nvidia • Alphawave is like QCOM's Mellanox; Modular (Chris Lattner) is its CUDA • C1000 CPU wins Meta, but ships ~2 yrs into a CPU shortage Chapters: 0:00 Communications? That's just the start 4:08 Inside $QCOM investor day 9:16 Can Qualcomm build a data center business? 13:09 Disaggregated inference opens the door 17:57 High Bandwidth Compute: memory on the XPU 30:29 "No advanced packaging" just moves the problem 36:20 The roadmap, Alphawave, and Modular 46:00 The C1000 CPU and the agentic shortage 50:40 Cars as token generators, the $1T robotics bet 57:32 The memory market: MOAR Get more of Austin and Vik daily, free! Sign up: Connect with Vik and Austin: Vik's Paid Substack: Austin's Paid Substack: Austin Lyons Vikram Sekar

New episode: Austin and Vik break down Qualcomm's Investor Day - HBC, Alphawave, Modular, more • 2/3 of revenue from data center, auto, and IoT by 2029, not handsets • HBC stacks LPDDR on the logic die, not beside it: 2,000 lanes become 100,000 • Disaggregation lets latecomers slot a decode rack beside Nvidia • Alphawave is like QCOM's Mellanox; Modular (Chris Lattner) is its CUDA • C1000 CPU wins Meta, but ships ~2 yrs into a CPU shortage Chapters: 0:00 Communications? That's just the start 4:08 Inside $QCOM investor day 9:16 Can Qualcomm build a data center business? 13:09 Disaggregated inference opens the door 17:57 High Bandwidth Compute: memory on the XPU 30:29 "No advanced packaging" just moves the problem 36:20 The roadmap, Alphawave, and Modular 46:00 The C1000 CPU and the agentic shortage 50:40 Cars as token generators, the $1T robotics bet 57:32 The memory market: MOAR Get more of Austin and Vik daily, free! Sign up: Connect with Vik and Austin: Vik's Paid Substack: Austin's Paid Substack: Austin Lyons Vikram Sekar

45,764 views • 29 days ago

Memory is eating the world When consumers feel the heat, you know its a problem. While big-3 memory companies are enjoying sweet-sweet profit margins…the rest of the electronics market is cracking under pricing pressure. - GoPro is more like GoingUnder - Apple products 10-30% pricier - Even music synths are more expensive Is it price gouging at this point? Is this revenge for all the memory cyclicality? When will this end? Chapters: 0:00 Memory crisis hits 1:20 AI impacting consumers 3:00 AI causing inflation 6:48 Consumer demand drop? 8:49 AI demand inelastic 10:56 Long-term memory outlook 11:02 GoPro's memory woes 12:24 Apple's pricing power 18:20 Apple seeks CXMT DRAM 21:38 Shrinkflation for phones 23:16 Korea's memory investment 26:10 Micron's killing profits 33:00 Why AI needs so much DRAM 40:27 Future of AI training 44:34 Cost-optimizing inference Get more of Austin and Vik daily, free! Sign up: Connect with Vik and Austin: Vik's Paid Substack: Austin's Paid Substack: Austin Lyons Vikram Sekar $MU

Memory is eating the world When consumers feel the heat, you know its a problem. While big-3 memory companies are enjoying sweet-sweet profit margins…the rest of the electronics market is cracking under pricing pressure. - GoPro is more like GoingUnder - Apple products 10-30% pricier - Even music synths are more expensive Is it price gouging at this point? Is this revenge for all the memory cyclicality? When will this end? Chapters: 0:00 Memory crisis hits 1:20 AI impacting consumers 3:00 AI causing inflation 6:48 Consumer demand drop? 8:49 AI demand inelastic 10:56 Long-term memory outlook 11:02 GoPro's memory woes 12:24 Apple's pricing power 18:20 Apple seeks CXMT DRAM 21:38 Shrinkflation for phones 23:16 Korea's memory investment 26:10 Micron's killing profits 33:00 Why AI needs so much DRAM 40:27 Future of AI training 44:34 Cost-optimizing inference Get more of Austin and Vik daily, free! Sign up: Connect with Vik and Austin: Vik's Paid Substack: Austin's Paid Substack: Austin Lyons Vikram Sekar $MU

33,812 views • 25 days ago

A masterclass on Google's TPU v8 Networking. Two TPU chips? Pssh. We already knew workload-specific silicon was here. But two scale-up networking topologies? That's the actual Google TPU news. Workload-specific interconnects. Think about that. New Semi Doped with Vikram Sekar and Austin Lyons. Copper? Yep. Optics? Yep. What we cover: - TPU splits in two: 8t training, 8i inference. - Virgo: 47 Pb/s scale-out fabric, 100% OCS. - Boardfly scale-up: copper PCB + AECs inside racks, OCS between groups. 16 hops → 7. - Training uses 3D torus (Rubik's Cube). - Inference doesn't. Workload-specific topologies now. - Dragonfly traces to a 2008 paper by Kim, Dally, Scott, Abts. Abts went on to build Groq's interconnect before Nvidia. Chapters: 0:00 Intro 0:21 Two TPUs for two workloads 2:31 HBM, SRAM, and Axion CPUs 7:22 Why networking is the new bottleneck 17:14 Virgo: rebuilding scale-out on optics 25:24 3D torus Rubik's Cube scale-up for training 34:50 Boardfly: scale-up for MoE inference 42:07 Workload-specific everything $GOOGL

A masterclass on Google's TPU v8 Networking. Two TPU chips? Pssh. We already knew workload-specific silicon was here. But two scale-up networking topologies? That's the actual Google TPU news. Workload-specific interconnects. Think about that. New Semi Doped with Vikram Sekar and Austin Lyons. Copper? Yep. Optics? Yep. What we cover: - TPU splits in two: 8t training, 8i inference. - Virgo: 47 Pb/s scale-out fabric, 100% OCS. - Boardfly scale-up: copper PCB + AECs inside racks, OCS between groups. 16 hops → 7. - Training uses 3D torus (Rubik's Cube). - Inference doesn't. Workload-specific topologies now. - Dragonfly traces to a 2008 paper by Kim, Dally, Scott, Abts. Abts went on to build Groq's interconnect before Nvidia. Chapters: 0:00 Intro 0:21 Two TPUs for two workloads 2:31 HBM, SRAM, and Axion CPUs 7:22 Why networking is the new bottleneck 17:14 Virgo: rebuilding scale-out on optics 25:24 3D torus Rubik's Cube scale-up for training 34:50 Boardfly: scale-up for MoE inference 42:07 Workload-specific everything $GOOGL

92,064 views • 3 months ago

Whiplash week. Optics? Copper? Both? Mon AM: Nvidia bets $4B on optics. Mon PM: Credo posts 200% YoY growth on copper. Wed PM: Hock Tan claims 400G/lane works over copper, potentially pushing CPO past 2030. 48 hours of whiplash. Optics? Copper? The answer is both. The question is when. This week and Vikram Sekar unpack: - Nvidia locking up laser supply - Credo’s blowout quarter and the reliability thesis - Broadcom’s copper bombshell - A 4D chess theory on why Hock Tan downplays optics when Broadcom is a CPO company Chapters (00:00) - Newsletter Plugs: Groq LPUs & Broadcom’s Laser Business (03:15) - Dynamo & the Rise of Workload-Specific Hardware (08:04) - Austin’s Broadcom Laser Deep Dive (09:53) - The Week’s Whiplash: Optics Monday, Copper Wednesday (17:50) - Why Nvidia Invested $4B: Geopolitics, Supply & the HBM Playbook (24:15) - CPO Lasers & Optical Circuit Switches (26:16) - Credo Earnings: 200% YoY Growth & the Copper Bull Case (31:09) - Reliability, AECs & Oracle’s GPU Cluster Problem (35:48) - Credo’s Optics Play: Micro-LED Active Cables & the CPO Timing Risk (38:45) - Broadcom Earnings: Hock Tan’s Copper Bombshell (43:34) - Customer-Owned Tooling: Hock Tan Says “Good Luck” (44:25) - Vik’s 4D Chess Theory: Why Hock Tan Talks Up Copper (47:03) - Wrap-Up: It’s Both — The Real Question Is Timing $AVGO $CRDO $NVDA

Whiplash week. Optics? Copper? Both? Mon AM: Nvidia bets $4B on optics. Mon PM: Credo posts 200% YoY growth on copper. Wed PM: Hock Tan claims 400G/lane works over copper, potentially pushing CPO past 2030. 48 hours of whiplash. Optics? Copper? The answer is both. The question is when. This week and Vikram Sekar unpack: - Nvidia locking up laser supply - Credo’s blowout quarter and the reliability thesis - Broadcom’s copper bombshell - A 4D chess theory on why Hock Tan downplays optics when Broadcom is a CPO company Chapters (00:00) - Newsletter Plugs: Groq LPUs & Broadcom’s Laser Business (03:15) - Dynamo & the Rise of Workload-Specific Hardware (08:04) - Austin’s Broadcom Laser Deep Dive (09:53) - The Week’s Whiplash: Optics Monday, Copper Wednesday (17:50) - Why Nvidia Invested $4B: Geopolitics, Supply & the HBM Playbook (24:15) - CPO Lasers & Optical Circuit Switches (26:16) - Credo Earnings: 200% YoY Growth & the Copper Bull Case (31:09) - Reliability, AECs & Oracle’s GPU Cluster Problem (35:48) - Credo’s Optics Play: Micro-LED Active Cables & the CPO Timing Risk (38:45) - Broadcom Earnings: Hock Tan’s Copper Bombshell (43:34) - Customer-Owned Tooling: Hock Tan Says “Good Luck” (44:25) - Vik’s 4D Chess Theory: Why Hock Tan Talks Up Copper (47:03) - Wrap-Up: It’s Both — The Real Question Is Timing $AVGO $CRDO $NVDA

33,695 views • 4 months ago

New interview: Reiner Pope, co-founder/CEO of MatX A counterintuitive throughput insight: “Low latency means small batch sizes. That is just Little’s law. Memory occupancy in HBM is proportional to batch size. So you can actually fit longer contexts than you could if the latency were larger. Low latency is not just a usability win, it improves throughput.” We get into: • The hybrid SRAM + HBM bet, and why pipeline parallelism finally works • Why sparse MoE drives MatX to “the most interconnect of any announced product” • Why frontier labs are willing to bet on an AI ASIC startup • Memory-bandwidth-efficient attention, numerics, and what MatX publishes (and what it does not) • Why 95% of model-side news is noise for chip design • The biggest challenges ahead 00:00 “We left Google one week before ChatGPT” 00:24 Intro: who is MatX 01:17 Origin story: leaving Google for LLM chips 02:21 GPT-3 and the “too expensive” problem 04:25 Why buy hardware that is not a GPU 05:52 Overcoming the CUDA moat 08:46 Early investors 09:35 The name MatX 09:59 The chip: matrix multiply + hybrid SRAM/HBM 12:11 Why pipeline parallelism finally works 14:22 Reading papers and Google going dark 15:20 Research agenda: attention and numerics 17:06 Five specs and meeting customers where they are 19:24 Why frontier labs are the natural first customer 20:32 Workloads: training, prefill, decode 22:18 Little’s law and the throughput case for low latency 24:29 Interconnect and MoE topology 26:35 Inside the team: 100 people, full stack 28:32 Agentic AI: 95% noise for hardware 30:35 KV cache sizing in an agentic world 32:11 How MatX uses AI for chip design (Verilog + BlueSpec) 34:23 Go to market: proving credibility under NDA 35:12 Porting effort for frontier labs 36:34 Biggest skepticism: manufacturing at gigawatt scale 37:32 Hiring plug Vikram Sekar

New interview: Reiner Pope, co-founder/CEO of MatX A counterintuitive throughput insight: “Low latency means small batch sizes. That is just Little’s law. Memory occupancy in HBM is proportional to batch size. So you can actually fit longer contexts than you could if the latency were larger. Low latency is not just a usability win, it improves throughput.” We get into: • The hybrid SRAM + HBM bet, and why pipeline parallelism finally works • Why sparse MoE drives MatX to “the most interconnect of any announced product” • Why frontier labs are willing to bet on an AI ASIC startup • Memory-bandwidth-efficient attention, numerics, and what MatX publishes (and what it does not) • Why 95% of model-side news is noise for chip design • The biggest challenges ahead 00:00 “We left Google one week before ChatGPT” 00:24 Intro: who is MatX 01:17 Origin story: leaving Google for LLM chips 02:21 GPT-3 and the “too expensive” problem 04:25 Why buy hardware that is not a GPU 05:52 Overcoming the CUDA moat 08:46 Early investors 09:35 The name MatX 09:59 The chip: matrix multiply + hybrid SRAM/HBM 12:11 Why pipeline parallelism finally works 14:22 Reading papers and Google going dark 15:20 Research agenda: attention and numerics 17:06 Five specs and meeting customers where they are 19:24 Why frontier labs are the natural first customer 20:32 Workloads: training, prefill, decode 22:18 Little’s law and the throughput case for low latency 24:29 Interconnect and MoE topology 26:35 Inside the team: 100 people, full stack 28:32 Agentic AI: 95% noise for hardware 30:35 KV cache sizing in an agentic world 32:11 How MatX uses AI for chip design (Verilog + BlueSpec) 34:23 Go to market: proving credibility under NDA 35:12 Porting effort for frontier labs 36:34 Biggest skepticism: manufacturing at gigawatt scale 37:32 Hiring plug Vikram Sekar

19,439 views • 3 months ago

The optical networking supercycle is here! In this podcast, and Vikram Sekar go through all the tech jargon and explain what everything means. In just about 45 mins, you will know everything required to keep up with the next revolution in AI. Chapters 00:00 Introduction to AI and CPU Bottlenecks 03:00 The Rise of Silicon Photonics 06:01 Understanding Optical Networking and Data Centers 08:49 Scale Across: Connecting Data Centers 11:56 Scale Out: Optimizing Data Center Connectivity 14:53 Scale Up: The Future of GPU Connectivity 23:32 The Shift from Copper to Optical Connections 26:13 Challenges and Reliability of Lasers 30:47 Understanding Co-Packaged Optics 34:17 Market Dynamics: Demand and Supply of Lasers 40:46 Emerging Technologies: Optical Circuit Switches

The optical networking supercycle is here! In this podcast, and Vikram Sekar go through all the tech jargon and explain what everything means. In just about 45 mins, you will know everything required to keep up with the next revolution in AI. Chapters 00:00 Introduction to AI and CPU Bottlenecks 03:00 The Rise of Silicon Photonics 06:01 Understanding Optical Networking and Data Centers 08:49 Scale Across: Connecting Data Centers 11:56 Scale Out: Optimizing Data Center Connectivity 14:53 Scale Up: The Future of GPU Connectivity 23:32 The Shift from Copper to Optical Connections 26:13 Challenges and Reliability of Lasers 30:47 Understanding Co-Packaged Optics 34:17 Market Dynamics: Demand and Supply of Lasers 40:46 Emerging Technologies: Optical Circuit Switches

21,842 views • 5 months ago

This week we bring Fintwit’s favorite game to the podcast. “Would you buy this optical company?” The game goes like this. First explain: - What the company does - How it relates to the optics industry - What is their strength/moat - What is the risk/downside Then ask: Would you buy it? Why/Why not? Not financial advice. Just a silly, fun game. Educational. Better be serious when investing money. Pod bros don’t cut it. DYDD. Chapters (00:01) - Intro (06:59) - AXT $AXTI (13:38) - Tower Semiconductor $TSEM (23:58) - GlobalFoundries $GFS (32:43) - Lumentum $LITE (39:38) - Coherent $COHR (47:09) - Fabrinet $FN (54:07) - Corning $GLW Vikram Sekar

This week we bring Fintwit’s favorite game to the podcast. “Would you buy this optical company?” The game goes like this. First explain: - What the company does - How it relates to the optics industry - What is their strength/moat - What is the risk/downside Then ask: Would you buy it? Why/Why not? Not financial advice. Just a silly, fun game. Educational. Better be serious when investing money. Pod bros don’t cut it. DYDD. Chapters (00:01) - Intro (06:59) - AXT $AXTI (13:38) - Tower Semiconductor $TSEM (23:58) - GlobalFoundries $GFS (32:43) - Lumentum $LITE (39:38) - Coherent $COHR (47:09) - Fabrinet $FN (54:07) - Corning $GLW Vikram Sekar

13,093 views • 5 months ago

Context memory essentially unlocks Agentic AI Much needed for Opus 4.6's "multi-agent swarms" In this SemiDoped pod, Vikram Sekar talks to Val Bercovici from Weka about context storage. - How token warehouses save inference costs - A new networking tier? Context Storage Network! - High Bandwidth Flash for context? - Weka's Augmented Memory Grid for context storage - Where this is all headed The convo is info packed. Don't miss out on it! b/acc, context platform engineer Chapters (00:00) Introduction to Weka and AI Storage Solutions (05:18) The Evolution of Context Memory in AI (09:30) Understanding Memory Hierarchies and Their Impact (16:24) Latency Challenges in Modern Storage Solutions (21:32) The Role of Networking in AI Storage Efficiency (29:42) Dynamic Resource Utilization in AI Networks (30:04) Introducing the Context Memory Network (31:13) High Bandwidth Flash: A Game Changer (32:54) Weka’s Neural Mesh and Storage Solutions (35:01) Axon: Transforming GPU Storage into Memory (39:00) Augmented Memory Grid Explained (42:00) Pooling DRAM and CXL Innovations (46:02) Token Warehouses and Inference Economics (52:10) The Future of Storage Innovations

Context memory essentially unlocks Agentic AI Much needed for Opus 4.6's "multi-agent swarms" In this SemiDoped pod, Vikram Sekar talks to Val Bercovici from Weka about context storage. - How token warehouses save inference costs - A new networking tier? Context Storage Network! - High Bandwidth Flash for context? - Weka's Augmented Memory Grid for context storage - Where this is all headed The convo is info packed. Don't miss out on it! b/acc, context platform engineer Chapters (00:00) Introduction to Weka and AI Storage Solutions (05:18) The Evolution of Context Memory in AI (09:30) Understanding Memory Hierarchies and Their Impact (16:24) Latency Challenges in Modern Storage Solutions (21:32) The Role of Networking in AI Storage Efficiency (29:42) Dynamic Resource Utilization in AI Networks (30:04) Introducing the Context Memory Network (31:13) High Bandwidth Flash: A Game Changer (32:54) Weka’s Neural Mesh and Storage Solutions (35:01) Axon: Transforming GPU Storage into Memory (39:00) Augmented Memory Grid Explained (42:00) Pooling DRAM and CXL Innovations (46:02) Token Warehouses and Inference Economics (52:10) The Future of Storage Innovations

12,796 views • 5 months ago

No more content to load