Video yükleniyor...

Video Yüklenemedi

Bu video yüklenirken bir sorun oluştu. Bu geçici bir ağ sorunundan kaynaklanıyor olabilir veya video kullanılamıyor olabilir.

Ana Sayfaya Dön

$AMD $5 Trillion MC Is Inevitable Long Term👑 This thread will focus more on Inference! 2026 EPYC "Venice" $TSM 2nm to save Large GW Scale Inference by 40% more than Prior Turin gen. Context: EPYC Turin achieves ~$0.001 per million tokens for batch inference vs $0.02-$0.12/ million tokens as... I wrote the thread below. Venice is going to lower cost down to $0.0005-$0.0006/Million Tokens. OpenAI spent roughly $20B on Inference and Training, where 80-90% of that was for Inference per Analysts. AKA Renting Compute is Expensive AF! In this thread, I want to focus on why most analysts and investors are underestimating the role EPYC "Venice" and future Gen on overall Data center revenue. And $TSM ramping up 2nm supply early is a confirmation that AMD will be a major buyer long term. I will also link the thread the Gap between AMD Analysts & Reality and 2nm Ramp Thread so you have more comprehensive view of what I'm writing here. Before I go into detail this is my 2026 Projection: AI GPUs: $35-$50B EPYC Data Center: $15B-$17B Client Segment: $12-$13B Gaming: $6B Embedded: $4B-$5B Total Revenue $70-$100B Non-GAAP net income $18B-$25B Non-GAAP EPS $10.97-$15.40 Foward P/E 55x-70x= $603-$1,078 AMD's Analysts are projecting $0 Revenue for MI450 and sluggish EPYC Growth. Meaning, all analysts are either full of 💩 or Sexist, you decide! Analysts are also projecting 0% growth on AMD "Secret Weapon" Chip as $MSFT said we are at significant Windows refresh and upgrade cycle. Do you think TSMC would allocate more 2nm supply to $AMD at $0 MI450 revenue and sluggish EPYC? 1. EPYC is going to be the leader in lowest Inference! Current Turin cost saving is 95% vs $NVDA or 98-99% on Inference cost when you factor in renting Inference compute from Amazon Web Services, Microsoft Azure, or $NVDA Neocloud pets. TSMC claimed: 10-15% higher performance at iso-power, 25-30% lower power at iso-speed, and ~15% higher transistor density compared to 3nm. This reduces operational expenses (energy, cooling) while increasing throughput per chip. EPYC Turin achieves ~$0.001 per million tokens for batch inference (via vLLM on models like Llama 3 70B), driven by high core counts and low hardware costs. EPYC Venice offers ~1.7x overall performance and up to 70% more compute capability per core, with up to 256 cores (512 threads). Enhanced vector/AI instructions and open-source firmware (openSIL) optimize for inference workloads. AMD Incorporates AI Engines (now part of AMD's XDNA) for on-chip acceleration, improving efficiency for low-latency and edge inference. This reduces reliance on discrete GPUs, lowering system complexity and TCO. Venice SKUs are projected at $3,000-$15,000 ($5,000 for 256-core flagship), far below NVIDIA Rubin ($50,000-$90,000) or AMD's own MI450 GPUs ($40,000-$50,000). High memory bandwidth (up to 1.6 TB/s) supports efficient batch inference. Venice is designed exactly for Large customers that want to lower Inference Cost and MI450 Helios is for Customers that want Training at lowest TCO, TDP as well as lower Upfront 1GW scale(Full build $35-$40B vs $NVDA $55B-$80B). 2. Real World Example: OpenAI's 2025 inference spend reached ~$20B, escalating to even higher total compute rental (mostly inference) amid token volume growth(from video generating). By 2026, with usage doubling (consistent with industry trends: token demand grows 2-5x YoY), assume OpenAI processes ~1,800 billion million-tokens annually $NVDA Blackwell at $0.02-$0.12 is $36B(most optimized) Rubin is projected to be at $0.01/million tokens or $18B annual Inference Cost vs $AMD Venice $0.0005/million tokens or $0.9B annual Inference Cost => Massive saving for OpenAI or anyone that are paying 80-90% Annual Bill for Inference compute. In short, it is unsustainable to pay this much rent vs owning for all current AI players for the medium to long term. Rubin excels in low-latency decode (if Groq integration from $20B deal in 2027-2028), but Venice dominates batch (80% of inference by 2030). Actual savings depend on deployment scale (OpenAI's 6GW AMD plans), electricity rates, and software maturity. If Rubin only hits $0.03, savings swell to $53.1B vs. $17.1B. 3. Will running Inference on Venice and future Gen slow down response generation in 2026 and beyond? Human perception of "fast enough" for chat, agents, search augmentation, summarization, coding assistance is roughly Meaning, EPYC may generate $100B a year on data center revenue, Hence $MSFT $AMZN $META $GOOGL OpenAI xAI and 42+ Countries are leaning AMD for Inference, because the cost saving is MASSIVE! 4. Regular users (you, me, people using ChatGPT, Claude, Gemini, Grok, Perplexity...) are extremely unlikely to notice any slowdown and in many cases might even experience slightly faster or more consistent response times if the industry heavily shifts toward AMD EPYC for inference. What actually happens when companies save massively on inference? When OpenAI , Anthropic , Gemini , Grok Meta .... save billions on the batch/enterprise/RAG layer using EPYC Venice, they typically do one or more of these things with the savings, none of which make your chat slower but enhancing their bottom line(Profit) ~Keep prices the same → make more profit ~Lower subscription prices / increase free tier limits ~Train bigger & better models more frequently ~Offer longer context windows ~Add more reasoning steps / tool calls / agents per query ~Improve multimodal capabilities ~Build more data centers / reduce throttling during peaks In practice the consumer experience usually gets better, not worse, when inference becomes dramatically cheaper. Prime example is $META leaning AMD heavily or currently AMD largest customer. or Grok 2 to Grok 3 heavily used AMD for Inference saving. And most Grok Users reported Groke responses snappier, not slower. 5. What does this mean for potential Revenue? Noted that TSMC is massively ramping 2nm supply for $AMD both MI450 and EPYC. EPYC Conservative projection: FY2025: $10.5B(best Est) FY2026: $16B FY2027: $29B FY2028: $49B FY2029: $75B FY2030: $100B Large customers: $META OpenAI $MSFT $AMZN $GOOGL xAI (Apple?) Smaller customer: $DELL $HPE $SMCI and 42+ other countries. The roadmap to $5 Trillion is very much inevitable as Inference Cost from Renting or owning $NVDA are too high, but $NVDA will still dominate Training market share, where MI families are likely to take 15-20% market share, but the TAM is also expanding Rapidly. Most Institutions are projecting $2-$3Trillion TAM by 2030. $NVDA said $4 Trillion. Dr. Lisa Su said $1 Trillion+ by 2030. So you decide on how much TAM. If you enjoy this kind of analysis, Slap the Like/Repost and Bookmark to please the X Algo as it is Free.99! If you want to support my work further, consider subscribe to see more in-depth analysis! Alright, that is it. Not Financial Advice!show more

Mike

25,141 subscribers

102,223 görüntüleme • 7 ay önce •via X (Twitter)

Anya Rossi• Live Now

Private livecam show

0 Yorum

Yorum bulunmuyor

Orijinal gönderinin yorumları burada görünecek

Benzer Videolar

$AMD is ready to break $1 Trillion MC| $TSM 2nm🧵 TLDR FY 2026(Excluding China AI Revenue) AI GPUs: $35-$50B EPYC Data Center: $15B-$17B Client Segment: $12-$13B Gaming: $6B Embedded: $4B-$5B Total Revenue $70-$100B Non-GAAP net income $18B-$25B Non-GAAP EPS $10.97-$15.40 Foward P/E 55x-70x= $603-$1,078 The semiconductor industry is at a pivotal juncture, with advanced process nodes like TSMC's 2nm technology becoming the battleground for leadership in artificial intelligence and high-performance computing (HPC). Amid this landscape, AMD stands poised to secure early production and higher allocation of its Venice (EPYC ) and MI450 (Instinct GPUs) on TSMC's 2nm process. This strategic advantage is not merely a product of timing but a culmination of a robust partnership, market demand, technical superiority, and geopolitical dynamics. The AI and HPC markets are experiencing unprecedented growth, with inference workloads projected to constitute 80-90% of AI compute by 2030. AMD's EPYC processors and Instinct GPUs are uniquely positioned to capitalize on this trend, particularly given the demand from hyperscalers such as OpenAI , $META , $MSFT, $AMZN, and $ORCL. With $TSM starting 2nm Mass Production in Taiwan is ensuring AMD to meet FY2026 $70B to $100B revenue, driven by non-GAAP net income of $18B to $25B highlights the scale of this opportunity, starkly contrasting with analyst revenue consensus of $39-$45B. This discrepancy arises from analysts' failure to account for major orders, notably from OpenAI(Today SoftBank secured OpenAI a massive cash balance of $55-$62B).OpenAI is raising $100B, so this left $77B from UAE, Saudi, $MSFT, and others. $AMD is on track to receive higher allocation of EPYC Venice and Mi450 in 2026. AMD's acquisition of Xilinx has significantly strengthened its position in AI inference, particularly through adaptive computing technologies like FPGA-based AI Engines. The upcoming Zen 6 "Venice" generation (on TSMC 2nm, launching with MI450 in 2026) promises ~1.7× performance uplift, enhanced vector/AI capabilities, greater thread density, and open firmware innovations positioning EPYC to maintain its inference leadership while powering massive hybrid AI superclusters. TSMC's Fab 22 in Kaohsiung, Taiwan, is now the epicenter of 2nm mass production, a earlier strategic move to meet soaring demand from $AMD and $AAPL. Early production slots are typically reserved for customers with the highest revenue potential and strategic importance. AMD's early tape-out of Venice and the MI450's role as the first AMD GPU on 2nm place it at the forefront of this allocation. The 2nm process offers 10-15% higher performance or 25-30% lower power use compared to 3nm, a critical advantage for AI and HPC applications(TSMC claimed) Moreover, TSMC's recent 20% yield improvement in Versal production, as mentioned in related discussions, indicates efficient scaling. Higher yields translate to more chips produced per wafer, reducing costs and increasing allocation for key customers like AMD. This efficiency is particularly important given the aggressive timelines of customers like OpenAI, who require rapid scaling to meet their computational needs. The reopening of the China market adds another layer of demand pressure. Vendors and hyperscalers are begging for allocation of AMD's MI308X, MI300X, and MI355X, and the 2nm capacity will be critical to meet this need. TSMC's early production of 2nm ensures AMD can capitalize on this opportunity, securing higher allocation to fulfill these orders. Dr. Lisa Su's emphasis on disciplined supply chain planning for multiple gigawatt-scale customers, such as OpenAI, demonstrates AMD's readiness to scale. TSMC's confidence in AMD's ability to absorb this capacity is evident in the early 2nm production allocation. This discipline is particularly important in a market where demand outstrips supply by 10-12x. TSMC's competitors, such as Samsung and Intel, are still in the early stages of their 2nm and equivalent processes. Samsung's 2nm GAA transistors and Intel's 18A process are not yet in mass production, giving TSMC and AMD a first-mover advantage. Nvidia's acquisition of Groq Inc. is a defensive move to diversify into inference, but it does not immediately address the 2nm gap. AMD EPYC Venice and future Gen are already ahead of lowest cost for Inference along with MI450 has TCO of $0.65 to $1.00 per million inference tokens, significantly lower than Nvidia's Rubik (H2 2026) at $0.70 to $1.20 and Broadcom's XPU (2027-2029) at $0.70 to $1.30. Additionally, the MI450's TDP is estimated at 1000-1800W, compared to Nvidia's 2300-3600W (Ultra), reducing operational costs and energy consumption(TSMC 2nm vs TSMC 3nm). The MI450 features 432GB of HBM4 memory and 19.6 TB/s bandwidth, surpassing Nvidia's Rubik (288GB HBM4, 16 TB/s) and Broadcom's XPU (192/256GB HBM4, 7 TB/s est). This enhanced memory and bandwidth capacity is essential for handling the complex, data-intensive workloads of large language models and other AI applications. AMD's full-stack vision, combining EPYC hosts with Instinct accelerators, offers the lowest total cost of ownership (TCO) and thermal design power (TDP). This synergy is unbeatable for both training and inference, further justifying TSMC's prioritization. The 2nm process amplifies these advantages, ensuring AMD can maintain its competitive edge over rivals like Nvidia, whose Rubin GPUs are still on N3P (a 3nm derivative). Today, TSMC just secured $AMD to join the top 10 largest companies in the world as it begins 2nm mass production in Taiwan. AMD and Apple are to receive highest allocation. The long-standing partnership with TSMC, massive demand from hyperscalers, technical advantages of 2nm, and disciplined supply chain planning all point to AMD's favored position. The 2nm process's early mass production at Fab 22, combined with AMD's revenue potential and competitive edge, justifies TSMC's prioritization. This allocation is critical for AMD to meet aggressive demand, capture market share, and solidify its position as a leader in AI and HPC, especially in the inference-dominated future. Dr. Lisa Su "We will multiple customers/hyperscalers at GW scale" Not Financial Advice!

$AMD is ready to break $1 Trillion MC| $TSM 2nm🧵 TLDR FY 2026(Excluding China AI Revenue) AI GPUs: $35-$50B EPYC Data Center: $15B-$17B Client Segment: $12-$13B Gaming: $6B Embedded: $4B-$5B Total Revenue $70-$100B Non-GAAP net income $18B-$25B Non-GAAP EPS $10.97-$15.40 Foward P/E 55x-70x= $603-$1,078 The semiconductor industry is at a pivotal juncture, with advanced process nodes like TSMC's 2nm technology becoming the battleground for leadership in artificial intelligence and high-performance computing (HPC). Amid this landscape, AMD stands poised to secure early production and higher allocation of its Venice (EPYC ) and MI450 (Instinct GPUs) on TSMC's 2nm process. This strategic advantage is not merely a product of timing but a culmination of a robust partnership, market demand, technical superiority, and geopolitical dynamics. The AI and HPC markets are experiencing unprecedented growth, with inference workloads projected to constitute 80-90% of AI compute by 2030. AMD's EPYC processors and Instinct GPUs are uniquely positioned to capitalize on this trend, particularly given the demand from hyperscalers such as OpenAI , $META , $MSFT, $AMZN, and $ORCL. With $TSM starting 2nm Mass Production in Taiwan is ensuring AMD to meet FY2026 $70B to $100B revenue, driven by non-GAAP net income of $18B to $25B highlights the scale of this opportunity, starkly contrasting with analyst revenue consensus of $39-$45B. This discrepancy arises from analysts' failure to account for major orders, notably from OpenAI(Today SoftBank secured OpenAI a massive cash balance of $55-$62B).OpenAI is raising $100B, so this left $77B from UAE, Saudi, $MSFT, and others. $AMD is on track to receive higher allocation of EPYC Venice and Mi450 in 2026. AMD's acquisition of Xilinx has significantly strengthened its position in AI inference, particularly through adaptive computing technologies like FPGA-based AI Engines. The upcoming Zen 6 "Venice" generation (on TSMC 2nm, launching with MI450 in 2026) promises ~1.7× performance uplift, enhanced vector/AI capabilities, greater thread density, and open firmware innovations positioning EPYC to maintain its inference leadership while powering massive hybrid AI superclusters. TSMC's Fab 22 in Kaohsiung, Taiwan, is now the epicenter of 2nm mass production, a earlier strategic move to meet soaring demand from $AMD and $AAPL. Early production slots are typically reserved for customers with the highest revenue potential and strategic importance. AMD's early tape-out of Venice and the MI450's role as the first AMD GPU on 2nm place it at the forefront of this allocation. The 2nm process offers 10-15% higher performance or 25-30% lower power use compared to 3nm, a critical advantage for AI and HPC applications(TSMC claimed) Moreover, TSMC's recent 20% yield improvement in Versal production, as mentioned in related discussions, indicates efficient scaling. Higher yields translate to more chips produced per wafer, reducing costs and increasing allocation for key customers like AMD. This efficiency is particularly important given the aggressive timelines of customers like OpenAI, who require rapid scaling to meet their computational needs. The reopening of the China market adds another layer of demand pressure. Vendors and hyperscalers are begging for allocation of AMD's MI308X, MI300X, and MI355X, and the 2nm capacity will be critical to meet this need. TSMC's early production of 2nm ensures AMD can capitalize on this opportunity, securing higher allocation to fulfill these orders. Dr. Lisa Su's emphasis on disciplined supply chain planning for multiple gigawatt-scale customers, such as OpenAI, demonstrates AMD's readiness to scale. TSMC's confidence in AMD's ability to absorb this capacity is evident in the early 2nm production allocation. This discipline is particularly important in a market where demand outstrips supply by 10-12x. TSMC's competitors, such as Samsung and Intel, are still in the early stages of their 2nm and equivalent processes. Samsung's 2nm GAA transistors and Intel's 18A process are not yet in mass production, giving TSMC and AMD a first-mover advantage. Nvidia's acquisition of Groq Inc. is a defensive move to diversify into inference, but it does not immediately address the 2nm gap. AMD EPYC Venice and future Gen are already ahead of lowest cost for Inference along with MI450 has TCO of $0.65 to $1.00 per million inference tokens, significantly lower than Nvidia's Rubik (H2 2026) at $0.70 to $1.20 and Broadcom's XPU (2027-2029) at $0.70 to $1.30. Additionally, the MI450's TDP is estimated at 1000-1800W, compared to Nvidia's 2300-3600W (Ultra), reducing operational costs and energy consumption(TSMC 2nm vs TSMC 3nm). The MI450 features 432GB of HBM4 memory and 19.6 TB/s bandwidth, surpassing Nvidia's Rubik (288GB HBM4, 16 TB/s) and Broadcom's XPU (192/256GB HBM4, 7 TB/s est). This enhanced memory and bandwidth capacity is essential for handling the complex, data-intensive workloads of large language models and other AI applications. AMD's full-stack vision, combining EPYC hosts with Instinct accelerators, offers the lowest total cost of ownership (TCO) and thermal design power (TDP). This synergy is unbeatable for both training and inference, further justifying TSMC's prioritization. The 2nm process amplifies these advantages, ensuring AMD can maintain its competitive edge over rivals like Nvidia, whose Rubin GPUs are still on N3P (a 3nm derivative). Today, TSMC just secured $AMD to join the top 10 largest companies in the world as it begins 2nm mass production in Taiwan. AMD and Apple are to receive highest allocation. The long-standing partnership with TSMC, massive demand from hyperscalers, technical advantages of 2nm, and disciplined supply chain planning all point to AMD's favored position. The 2nm process's early mass production at Fab 22, combined with AMD's revenue potential and competitive edge, justifies TSMC's prioritization. This allocation is critical for AMD to meet aggressive demand, capture market share, and solidify its position as a leader in AI and HPC, especially in the inference-dominated future. Dr. Lisa Su "We will multiple customers/hyperscalers at GW scale" Not Financial Advice!

Mike

43,219 görüntüleme • 7 ay önce

$AMD $MSFT Partnership is MASSIVE in 2026 🚀 If you were excited about my thread on $AMD $AMZN AWS long time partnership, you will be even more excited about what Microsoft gonna do with 2026 AMD EPYC "Venice". Historical Context: The relationship between AMD and Microsoft began in the early 2000s, with Microsoft initially focusing on Intel's x86 architecture for its Windows operating system and server products. However, AMD's entry into the server market with its Opteron processors in 2003 marked the beginning of a competitive dynamic that eventually led to collaboration. The partnership intensified with the launch of 3rd Generation EPYC "Milan" in 2021, powering Azure's N2D and C2D VM families. By 2025, Microsoft had integrated 5th Generation EPYC "Turin" into new compute-optimized instances, reflecting a strategic shift towards AMD for cost and performance benefits. This "Secret Weapon" breakthrough will mark another inflection point for AMD Microsoft Azure relationship, will probably be more aggressive than EPYC "Milan" moment in 2021. We can call it EPYC "Venice" moment 2026" 1. Technical performance of AMD EPYC "Venice" (2026) AMD's 6th Gen EPYC "Venice" processors, slated for 2026, introduce New Chiplet design breakthrough. a revolutionary chiplet interconnect fabric that redefines server scalability for AI. This isn't just faster silicon; it's a paradigm shift for Microsoft Azure , enabling hyper-efficient, rack-scale AI inference that slashes costs and latency while boosting throughput. ~Up to 256 Zen 6 cores, a 70% performance increase over "Turin," optimized for AI and HPC. ~Memory and Bandwidth: 1.6 TB/s per socket, doubling "Turin's" capability, with support for MR-DIMM/MCR-DIMM. ~Efficiency: 1,500-1,700W power draw, a 50% reduction, aligning with Microsoft's sustainability initiatives. ~Interconnect: PCIe 6.0 and a new chiplet fabric for rack-scale AI, reducing latency and enhancing scalability. 2. Why $MSFT will adopt $AMD YPYC Share to 50%+ in 2026. AMD EPYC Share: ~30-35% of Azure's x86 CPU-based business while Intel Xeon share is 65% Microsoft's Azure has been progressively integrating AMD EPYC, with "Venice" expected to expand this footprint: A. Dominance of AI Inference Workloads ~AI inference constitutes 80% of AI workloads in cloud environments, with latency-sensitive applications like chatbots, recommendation engines, and fraud detection requiring sub-second response times. ~"Venice's" 35x inference performance uplift directly addresses these requirements, outperforming Intel's offerings and custom Arm solutions in multi-threaded scenarios. B. Cost Efficiency and Operational Savings ~Azure's 2025 capex of $118B is under pressure to deliver returns. "Venice" can reduce operational expenses by $20-30B annually due to its power efficiency and performance gains, improving Azure's margins to 35-40%. ~The cost per inference operation is significantly lower with "Venice," estimated at 24-31% less than Intel-based alternatives, enhancing Azure's competitiveness against AWS and GCP. C. Scalability for Enterprise AI: ~"Venice" supports rack-scale AI deployments, enabling Azure to scale AI services for enterprise customers. For example, a 1,000-node cluster can process 700,000+ tokens per second, crucial for large-scale AI applications like personalized marketing and predictive analytics. ~This scalability is particularly important as Azure aims to capture the $100B+ AI opportunity by 2026, as stated by Microsoft CEO Satya Nadella. D. Reduction of Nvidia Dependency ~While Nvidia ( $NVDA) dominates AI accelerators, AMD's integrated EPYC-GPU solutions (MI450 with "Venice") offer a balanced approach, reducing Azure's reliance on Nvidia's high-cost GPUs. ~"Venice" enables hybrid inference models, where CPU-based inference handles 80% of workloads, and GPU acceleration is reserved for training and complex tasks, optimizing resource allocation. 3. Financial Implication: ~Revenue from Azure could reach $15-18B annually by 2026, part of a total revenue projection of $70-100B ~Profit margins could improve to 55-60%, boosting net income to $20-25B, supported by scale economies and reduced production costs. Intel could respond by giving more aggressive discounts, but this breakthrough has been a decade long of $AMD R&D, or rethinking chiplet design, a complete new approach. "Venice's" lead in AI inference and efficiency is challenging to match. Broader Industry: Other hyperscalers ( Amazon Web Services , GCP) and enterprises will follow Azure's lead, standardizing EPYC technology and pressuring Intel further. This could lead to a broader industry shift towards AMD, enhancing its ecosystem and bargaining power. Conclusion: The strategic adoption of AMD's 6th Generation EPYC "Venice" processors by Microsoft Azure in 2026 marks a pivotal moment in the evolution of cloud computing, particularly for AI inference capabilities. "Venice's" groundbreaking chiplet design, offering a 35x performance uplift for AI inference tasks, a 50% reduction in power consumption, and unparalleled scalability, positions Azure to leapfrog its competitors in the race for AI dominance. This technical superiority, combined with significant cost savings potentially $20-30B annually in operational expenses; aligns perfectly with Microsoft's ambitions to capture the $100B+ Revenue AI opportunity by 2026. The shift to 50% x86 market share for AMD within Azure is not merely a technical transition but a strategic realignment that redefines the competitive landscape. Historically, Microsoft's partnership with AMD has evolved from niche deployments to a core component of Azure's infrastructure, and "Venice" accelerates this trend. The 30-35% AMD EPYC share in 2025 is expected to double, driven by new VM families like C4D and H4D, which will dominate AI-intensive and HPC workloads. This migration is incentivized by "Venice's" efficiency gains, reducing dependency on Intel and Nvidia, and enhancing Azure's sustainability profile. Not Financial Advice!

Mike

141,018 görüntüleme • 9 ay önce

$AMD $AMZN partnership will 🚀 in 2026 🔥 Amazon/AMD partnership is hidden among hot headlines from OpenAI $NVDA $ORCL... TLDR: Amazon refused to bid up the overpriced $NVDA chips among other hyperscalers, and decided to work closely with $AMD. Amazon is expected to spend up to $10-$20B a year on 2026 EPYC breakthrough Gen and Future Gen. Dr. Su confirmed "we have plenty for other large customers". For its 2026 EPYC "Venice" processors, AMD is using a multi-node manufacturing strategy: the CPU core complex dies (CCDs) are built on TSMC's 2 nm-class node (N2), while the I/O die (IOD) uses the N3P (3 nm) process. Context: Andy Jassy Amazon Web Services has been working with AMD on EPYC processors since November 2018. With this "secret weapon" breakthrough(patented), this long time partnership has expanded to New breakthrough 2026 EPYC Gen. AMD's 6th Gen EPYC "Venice" processors, slated for 2026, introduce New Chiplet design breakthrough. a revolutionary chiplet interconnect fabric that redefines server scalability for AI. This isn't just faster silicon; it's a paradigm shift for AWS, enabling hyper-efficient, rack-scale AI inference that slashes costs and latency while boosting throughput. AMD to benefit AWS's $100B+ AI opportunity along with $ORCL $MSFT $GOOGL $META Saudi, UAE ,38+ countries and startups. In early October, Amazon/AWS announced the new EC2 M8a instances as their latest-generation, general-purpose compute instances now powered by AMD EPYC 9005 "Turin" processors. Amazon announced the M8a as having up to 30% higher performance and up to 19% better price performance over M7a. With my testing of both at 32 vCPUs, the new AMD EPYC Turin instance provided 1.59x the performance over the prior-generation EPYC Genoa instance! How will this impact AWS AI Inference? ~Cost Efficiency: Inference is 80%+ of AI workloads and latency-sensitive (e.g., chatbots need <1s responses). "Secret weapon" enables 35x better inference perf (per AMD's CDNA roadmap tie-in), cutting AWS's energy use by 50%+ in clusters. With $118B 2025 capex, this could save $20–$30B annually in OPEX, boosting margins to 35%-40%. ~Scalability for Agentic AI: Supports "Helios" rack-scale platforms (up to 128 GPUs + EPYC hosts), delivering 3.58x FP6 perf for distributed inference. AWS can run 700K+ more tokens/sec in 1,000-node clusters (via EPYC 9575F boosts), enabling real-time apps like personalized search or fraud detection at enterprise scale. ~Adoption Catalysts: Early partners like Oracle signal broad uptake; AWS's existing AMD instances G4ad with Radeon GPUs) pave the way. By 2026, EPYC could power 40%+ of AWS AI infra, outpacing Nvidia's GPU lock-in via open standards (ROCm 8 software). Lastly, Amazon’s trajectory toward a $320 stock price is not a speculative leap but a grounded projection rooted in its unmatched fundamentals and strategic AI leadership. With Amazon Web Services poised to surpass $100 billion in annual revenue by 2026, driven by explosive AI inference demand, Amazon is redefining cloud computing’s future. The adoption of AMD’s 2026 EPYC processors with "Secret" architecture is a game-changer, slashing costs by up to 50% and boosting inference throughput 3x, enabling AWS to dominate enterprise AI workloads with unmatched efficiency. This technological edge, combined with Amazon’s e-commerce dominance and high-margin advertising growth, supports a valuation rerating to 22x EV/EBITDA, and it is still a discount to historical highs. Trading at $222, $AMZN is undervalued for its 15–20% revenue CAGR and 25%+ EPS growth through 2030.

$AMD $AMZN partnership will 🚀 in 2026 🔥 Amazon/AMD partnership is hidden among hot headlines from OpenAI $NVDA $ORCL... TLDR: Amazon refused to bid up the overpriced $NVDA chips among other hyperscalers, and decided to work closely with $AMD. Amazon is expected to spend up to $10-$20B a year on 2026 EPYC breakthrough Gen and Future Gen. Dr. Su confirmed "we have plenty for other large customers". For its 2026 EPYC "Venice" processors, AMD is using a multi-node manufacturing strategy: the CPU core complex dies (CCDs) are built on TSMC's 2 nm-class node (N2), while the I/O die (IOD) uses the N3P (3 nm) process. Context: Andy Jassy Amazon Web Services has been working with AMD on EPYC processors since November 2018. With this "secret weapon" breakthrough(patented), this long time partnership has expanded to New breakthrough 2026 EPYC Gen. AMD's 6th Gen EPYC "Venice" processors, slated for 2026, introduce New Chiplet design breakthrough. a revolutionary chiplet interconnect fabric that redefines server scalability for AI. This isn't just faster silicon; it's a paradigm shift for AWS, enabling hyper-efficient, rack-scale AI inference that slashes costs and latency while boosting throughput. AMD to benefit AWS's $100B+ AI opportunity along with $ORCL $MSFT $GOOGL $META Saudi, UAE ,38+ countries and startups. In early October, Amazon/AWS announced the new EC2 M8a instances as their latest-generation, general-purpose compute instances now powered by AMD EPYC 9005 "Turin" processors. Amazon announced the M8a as having up to 30% higher performance and up to 19% better price performance over M7a. With my testing of both at 32 vCPUs, the new AMD EPYC Turin instance provided 1.59x the performance over the prior-generation EPYC Genoa instance! How will this impact AWS AI Inference? ~Cost Efficiency: Inference is 80%+ of AI workloads and latency-sensitive (e.g., chatbots need <1s responses). "Secret weapon" enables 35x better inference perf (per AMD's CDNA roadmap tie-in), cutting AWS's energy use by 50%+ in clusters. With $118B 2025 capex, this could save $20–$30B annually in OPEX, boosting margins to 35%-40%. ~Scalability for Agentic AI: Supports "Helios" rack-scale platforms (up to 128 GPUs + EPYC hosts), delivering 3.58x FP6 perf for distributed inference. AWS can run 700K+ more tokens/sec in 1,000-node clusters (via EPYC 9575F boosts), enabling real-time apps like personalized search or fraud detection at enterprise scale. ~Adoption Catalysts: Early partners like Oracle signal broad uptake; AWS's existing AMD instances G4ad with Radeon GPUs) pave the way. By 2026, EPYC could power 40%+ of AWS AI infra, outpacing Nvidia's GPU lock-in via open standards (ROCm 8 software). Lastly, Amazon’s trajectory toward a $320 stock price is not a speculative leap but a grounded projection rooted in its unmatched fundamentals and strategic AI leadership. With Amazon Web Services poised to surpass $100 billion in annual revenue by 2026, driven by explosive AI inference demand, Amazon is redefining cloud computing’s future. The adoption of AMD’s 2026 EPYC processors with "Secret" architecture is a game-changer, slashing costs by up to 50% and boosting inference throughput 3x, enabling AWS to dominate enterprise AI workloads with unmatched efficiency. This technological edge, combined with Amazon’s e-commerce dominance and high-margin advertising growth, supports a valuation rerating to 22x EV/EBITDA, and it is still a discount to historical highs. Trading at $222, $AMZN is undervalued for its 15–20% revenue CAGR and 25%+ EPS growth through 2030.

Mike

511,082 görüntüleme • 9 ay önce

$AMD's heading to $5T MC LT| Lowest $/M tokens 🧵 The real reason why Institutions are FOMOing into AMD while other Semi stocks are underperforming ($NVDA $AVGO) Not Financial Advice! DYOR! Under Dr. Lisa Su’s leadership, AMD has transformed from a distant challenger into a formidable force in AI infrastructure, delivering the industry’s most compelling TCO story for high-volume inference. Her clear vision open ecosystems, aggressive annual roadmaps, rack-scale innovation, and relentless focus on tokens-per-dollar has positioned AMD’s Helios racks as the go-to solution for hyperscalers and AI natives struggling with exploding token costs, collapsing the cost down to $0.0003-$0.0005/M tokens. I will link various threads on this analysis to supply chain and wafer ratio if you are interested in understanding the full picture. In the last 3-4 months, explosive Agentic AI demand significantly increased Inference demand for Agentic AI models with 5-10 agents. If you are a listener of CNBC or Bloomberg, u should know enterprises and companies are complaining abt cost of token, and how it starts to spike up way too much to make sense. The fact that most data center today are run by $NVDA Chips, where the cost is way too high for Training or Inference. 1. Token cost Here are some quick comp, so u understand why $META OpenAI Anthropic $MSFT $AMZN Softbank $GOOGL and many more small to medium AI Natives are buying AMD CPUs and GPUs as much as they want, or pretty much AMD chips are sold out for the next 3-5 years. Inference (Cost per Million Tokens) ~$NVDA B200 / HGX: ~$0.02–$0.08 on optimized workloads (FP4/MXFP4, speculative decoding). Significant improvement over Hopper but still premium-priced. GB200 NVL72 rack-scale: $0.05–$0.25+ ~$AMD Helios Racks: $0.0003-$0.0005 per M tokens, dramatically lower than NVIDIA equivalents in owned infra. MI355X node-level: Up to 40% more tokens per dollar vs. competing solutions ( B200), driven by higher memory capacity (up to 288GB+ HBM), strong bandwidth, and lower acquisition costs. Training ~$NVDA Rubin Rack is estimated $0.7-$1.2/M Tokens ~$AMD Helios Rack is estimated $0.65-$1.0/M Tokens 2. Why Hyperscalers and AI Natives Are Choosing AMD Token consumption (especially Agentic) is outpacing even NVIDIA’s efficiency gains, making diversification mandatory for economic viability. Massive deals reflect this reality like $META, OpenAI, $MSFT, Softbank, $AMZN, Oracle, LumaAI, G42... Dr. Lisa Su’s Vision in Action: Since taking the helm, Su has driven AMD’s turnaround with disciplined execution, annual GPU cadence (MI300 → MI350 → MI400), full-stack software (ROCm 7), open ecosystems (UALink, OCP designs), and customer-centric rack-scale solutions like Helios. Her emphasis on “tokens per dollar” and TCO has turned AMD into the pragmatic choice for sustainable AI scaling. Power/Energy Efficiency: ~Helios Rack-level is estimated at 120kW-140kW with 50% more HBM4 where Inference and Training cost matter ~Rubin Rack-Level is estimated at 160kW-230kw AMD Helios shines in owned TCO, memory density, and energy flexibility at hyperscale. Cost to build 1GW data center 1GW Helios Rack full build is estimated $30-$35B 1GW Rubin Rack full build is estimated $45-$55B 3. Superior CPUs to pair with GPUs on massive scale 5-10-20GW Agentic AI. autonomous, multi-step workflows with orchestration, tool use, parallel agents, data movement, and enterprise integration has dramatically increased the importance of strong host CPUs alongside GPUs. This shifts the CPU-to-GPU ratio higher and makes balanced systems critical toward 1:1 to 5:1 as enterprises testing more than 5-10 agents. AMD EPYC Venice excels ~Leadership core density (up to 256 Zen 6 cores per socket) for running many agents in parallel, orchestration layers, and high-throughput control-plane tasks. ~Superior performance-per-core and power efficiency ( up to 2.1x higher perf/core and 2.26x better SPECpower vs. NVIDIA Grace in benchmarks). ~Tight integration in Helios: One Venice CPU + multiple MI450 GPUs per node, enabling efficient data feeding to GPUs ("zero-copy"), parallel execution, and full rack utilization for complex agentic loops. Hyperscalers (Meta, Microsoft, Amazon, Google, Softbank) and AI natives (OpenAI, Anthropic...) are adopting high-core EPYC at scale specifically for these agentic demands, as CPUs now handle a larger share of non-model work (orchestration, policy enforcement, tool calls). This complements AMD’s lower-cost GPUs for overall TCO wins. Conclusion: NVIDIA’s Vera Rubin cannot compete with a 2 years old EPYC Turin, but AMD under Dr. Lisa Su has engineered the lowest cost-per-million-tokens, highly competitive energy-efficient solutions, and superior CPU orchestration for agentic AI at scale with Helios. Dr. Su has championed this shift since at least 2023, foreseeing the rise of agentic workflows that demand far more orchestration, parallel agents, and balanced compute well before the industry fully embraced it. Her long-term vision of AI moving from simple prompts to always-on, multi-agent systems has driven AMD’s investments in high-core EPYC CPUs and integrated rack-scale solutions, perfectly positioning the company for today’s realities. Hyperscalers and AI natives effectively have no choice but to buy more AMD system for Agentic AI as leadership in economical, power-aware, high-volume internal + agentic use. However, due to supply constraints where Supply is far behind Demand, this makes multi-vendor reality along with in-house chips drive faster industry progress, lower overall costs, and better sustainability. Not Financial Advice! DYOR! Video source: Microsoft Build 2026

$AMD's heading to $5T MC LT| Lowest $/M tokens 🧵 The real reason why Institutions are FOMOing into AMD while other Semi stocks are underperforming ($NVDA $AVGO) Not Financial Advice! DYOR! Under Dr. Lisa Su’s leadership, AMD has transformed from a distant challenger into a formidable force in AI infrastructure, delivering the industry’s most compelling TCO story for high-volume inference. Her clear vision open ecosystems, aggressive annual roadmaps, rack-scale innovation, and relentless focus on tokens-per-dollar has positioned AMD’s Helios racks as the go-to solution for hyperscalers and AI natives struggling with exploding token costs, collapsing the cost down to $0.0003-$0.0005/M tokens. I will link various threads on this analysis to supply chain and wafer ratio if you are interested in understanding the full picture. In the last 3-4 months, explosive Agentic AI demand significantly increased Inference demand for Agentic AI models with 5-10 agents. If you are a listener of CNBC or Bloomberg, u should know enterprises and companies are complaining abt cost of token, and how it starts to spike up way too much to make sense. The fact that most data center today are run by $NVDA Chips, where the cost is way too high for Training or Inference. 1. Token cost Here are some quick comp, so u understand why $META OpenAI Anthropic $MSFT $AMZN Softbank $GOOGL and many more small to medium AI Natives are buying AMD CPUs and GPUs as much as they want, or pretty much AMD chips are sold out for the next 3-5 years. Inference (Cost per Million Tokens) ~$NVDA B200 / HGX: ~$0.02–$0.08 on optimized workloads (FP4/MXFP4, speculative decoding). Significant improvement over Hopper but still premium-priced. GB200 NVL72 rack-scale: $0.05–$0.25+ ~$AMD Helios Racks: $0.0003-$0.0005 per M tokens, dramatically lower than NVIDIA equivalents in owned infra. MI355X node-level: Up to 40% more tokens per dollar vs. competing solutions ( B200), driven by higher memory capacity (up to 288GB+ HBM), strong bandwidth, and lower acquisition costs. Training ~$NVDA Rubin Rack is estimated $0.7-$1.2/M Tokens ~$AMD Helios Rack is estimated $0.65-$1.0/M Tokens 2. Why Hyperscalers and AI Natives Are Choosing AMD Token consumption (especially Agentic) is outpacing even NVIDIA’s efficiency gains, making diversification mandatory for economic viability. Massive deals reflect this reality like $META, OpenAI, $MSFT, Softbank, $AMZN, Oracle, LumaAI, G42... Dr. Lisa Su’s Vision in Action: Since taking the helm, Su has driven AMD’s turnaround with disciplined execution, annual GPU cadence (MI300 → MI350 → MI400), full-stack software (ROCm 7), open ecosystems (UALink, OCP designs), and customer-centric rack-scale solutions like Helios. Her emphasis on “tokens per dollar” and TCO has turned AMD into the pragmatic choice for sustainable AI scaling. Power/Energy Efficiency: ~Helios Rack-level is estimated at 120kW-140kW with 50% more HBM4 where Inference and Training cost matter ~Rubin Rack-Level is estimated at 160kW-230kw AMD Helios shines in owned TCO, memory density, and energy flexibility at hyperscale. Cost to build 1GW data center 1GW Helios Rack full build is estimated $30-$35B 1GW Rubin Rack full build is estimated $45-$55B 3. Superior CPUs to pair with GPUs on massive scale 5-10-20GW Agentic AI. autonomous, multi-step workflows with orchestration, tool use, parallel agents, data movement, and enterprise integration has dramatically increased the importance of strong host CPUs alongside GPUs. This shifts the CPU-to-GPU ratio higher and makes balanced systems critical toward 1:1 to 5:1 as enterprises testing more than 5-10 agents. AMD EPYC Venice excels ~Leadership core density (up to 256 Zen 6 cores per socket) for running many agents in parallel, orchestration layers, and high-throughput control-plane tasks. ~Superior performance-per-core and power efficiency ( up to 2.1x higher perf/core and 2.26x better SPECpower vs. NVIDIA Grace in benchmarks). ~Tight integration in Helios: One Venice CPU + multiple MI450 GPUs per node, enabling efficient data feeding to GPUs ("zero-copy"), parallel execution, and full rack utilization for complex agentic loops. Hyperscalers (Meta, Microsoft, Amazon, Google, Softbank) and AI natives (OpenAI, Anthropic...) are adopting high-core EPYC at scale specifically for these agentic demands, as CPUs now handle a larger share of non-model work (orchestration, policy enforcement, tool calls). This complements AMD’s lower-cost GPUs for overall TCO wins. Conclusion: NVIDIA’s Vera Rubin cannot compete with a 2 years old EPYC Turin, but AMD under Dr. Lisa Su has engineered the lowest cost-per-million-tokens, highly competitive energy-efficient solutions, and superior CPU orchestration for agentic AI at scale with Helios. Dr. Su has championed this shift since at least 2023, foreseeing the rise of agentic workflows that demand far more orchestration, parallel agents, and balanced compute well before the industry fully embraced it. Her long-term vision of AI moving from simple prompts to always-on, multi-agent systems has driven AMD’s investments in high-core EPYC CPUs and integrated rack-scale solutions, perfectly positioning the company for today’s realities. Hyperscalers and AI natives effectively have no choice but to buy more AMD system for Agentic AI as leadership in economical, power-aware, high-volume internal + agentic use. However, due to supply constraints where Supply is far behind Demand, this makes multi-vendor reality along with in-house chips drive faster industry progress, lower overall costs, and better sustainability. Not Financial Advice! DYOR! Video source: Microsoft Build 2026

Mike

145,778 görüntüleme • 1 ay önce

$AMD| The FOMO to buy AMD Chips is NOW 🧵 Not Financial Advice! DYOR! Research Purpose Only! The Inference Queen is the biggest winner in Agentic AI where all other CPUs are struggling to compete with a 2yr old EPYC Turin and EPYC Venice is in mass production phase. AMD stresses deployability today on standard x86 platforms (no proprietary architectures required), full software compatibility, and open standards. This positions Venice + Helios as a practical, high-density alternative to competing solutions while underscoring that agentic AI shifts the balance toward CPU-rich racks alongside GPUs, and most importantly, lowering the cost of token to accelerate adoption and innovation. Context: The Wall Street Journal yesterday came out with an article that OpenAI is condiering drasstically lowering the token prices to win more customers from Anthropic. The narrative "they" are trying to exacerbate the current AI selloff won't last long. This is a fundamental misunderstanding of what is going on, or what I already discussed for months and years. Followers and Subscribers already knew this for years, that this day would come, where token cost will bcome the central discussion among enterprises as there is no such thing as unlimited budget or Tokenmaxxing when they use $NVDA chips or In-house Hyperscalers chips. I will link various threads if you are interested in understanding the full picture from supply chain to recent TSMC Rapid 2nm expansion up to 12 Fabs total by 2027/2028. Hyperscalers and AI natives effectively have no choice but to buy more AMD system for Agentic AI as leadership in economical, power-aware, high-volume internal + agentic use. However, due to supply constraints where Supply is far behind Demand, this makes multi-vendor reality along with in-house chips drive faster industry progress, lower overall costs, and better sustainability. NVIDIA’s Vera Rubin cannot compete with a 2 years old EPYC Turin, but AMD under Dr. Lisa Su has engineered the lowest cost-per-million-tokens, highly competitive energy-efficient solutions, and superior CPU orchestration for agentic AI at scale with Helios. Dr. Su has championed this shift since at least 2023, foreseeing the rise of agentic workflows that demand far more orchestration, parallel agents, and balanced compute well before the industry fully embraced it. Her long-term vision of AI moving from simple prompts to always on, multi-agent systems has driven AMD’s investments in high-core EPYC CPUs and integrated rack-scale solutions, perfectly positioning the company for today’s realities. The OpenAI-AMD 1GW Helios deployment (starting H2 2026) represents a pivotal vertical integration move that directly supercharges the inference economics. This isn't incremental; it's a structural shift toward ownership of massive, optimized rack-scale capacity, enabling the lowest token costs and triggering the enterprise adoption flywheel. We need to be honest, $AMD is the only company that made a big bet on Inference since the day Chatgpt became sensational where $NVDA and others were betting big on Training. At the end of the day, Token bill from Anthropic has to obey economics. Meaning the bills rise, companies have to get more out of it to justify the cost. It cannot be an unlimited inference budget, and it has to show up on efficiency, profitability and operating leverage. 1. Tokenomics After you understand this, you will understand why Citi cited Anthropic is likely to sign a deal with $AMD along with Hyperscalers, AI Labs, Sovereign AI like Softbank 5GW in France and many other countries. However, OpenAI and $META are now wanting faster deployment, and they are AMD shareholders now, they have prioritized allocation. Anthropic and Hyperscalers just cannot compete when Helios Rack lower token cost to$0.0003–$0.0005 per million tokens at GW scale. Cost to build 1GW data center 1GW Helios Rack full build is estimated $30-$35B 1GW Rubin Rack full build is estimated $45-$55B Inference (Cost per Million Tokens) ~$NVDA B200 / HGX: ~$0.02–$0.08 on optimized workloads (FP4/MXFP4, speculative decoding). Significant improvement over Hopper but still premium-priced. GB200 NVL72 rack-scale: $0.05–$0.25+ ~$AMD Helios Racks: $0.0003-$0.0005 per M tokens, dramatically lower than NVIDIA equivalents in owned infra. MI355X node-level: Up to 40% more tokens per dollar vs. competing solutions ( B200), driven by higher memory capacity (up to 288GB+ HBM), strong bandwidth, and lower acquisition costs. Training ~$NVDA Rubin Rack is estimated $0.7-$1.2/M Tokens ~$AMD Helios Rack is estimated $0.65-$1.0/M Tokens Now, OpenAI, META and Hyperscalers can lower Inference cost even further with $AMD EPYC Venice "dense rack" or Agentic AI Rack. AMD published a detailed technical blog emphasizing that the future of agentic AI autonomous, multi-step AI systems requiring heavy orchestration, databases, caching, APIs, and control planes demands massive CPU-dense rack-scale infrastructure, not just GPUs. The catalyst prominently positions their upcoming 6th Gen EPYC "Venice" processors as the key enabler for next-generation dense racks, delivering leadership throughput under real-world power, cooling, and density constraints. ~EPYC Venice (Zen 6 architecture, up to 256 cores / 512 threads per socket) is projected to deliver exceptional rack-level performance. In AMD’s modeled 100 kW rack comparisons, Venice-powered systems are expected to achieve ~3.30x the throughput of NVIDIA’s Vera (88-core Olympus) baseline across a broad mix of agentic-supporting workloads. ~This builds on current-generation 5th Gen EPYC "Turin" (up to 192 cores), which already delivers ~2.37x rack throughput vs. Vera and ~1.6x vs. Intel’s Xeon 6980P (128 cores). ~ Liquid-cooled Turin deployments already support >27,000 CPU cores per rack today. Venice is architected to push this beyond 36,000 cores in the same rack class, dramatically increasing concurrent agent capacity and overall infrastructure efficiency. 2. Ownership vs renting compute from Hyperscalers matter to OpenAI and only owning $AMD chips can meaningfully lower token cost for enterprises. ~Eliminates cloud overhead: No provider margins, utilization buffers, or egress fees. Direct control over power contracts, cooling, scheduling, and orchestration at dedicated facilities. ~Helios optimizations at GW scale: Rack-level density (1.4+ exaFLOPS FP8 per rack), high HBM4 bandwidth, EPYC orchestration for agentic workloads, and superior TCO/TDP. AMD's long-standing focus on tokens per dollar/watt shines here 20-40%+ efficiency edges in inference-heavy scenarios. ~At 1GW+ optimized deployment, inference hits $0.0003–$0.0005 per million tokens (community/analyst models tied to Helios metrics). This is dramatically lower than typical rented/cloud equivalents, especially for high-volume output tokens in agentic flows. High token bills today, enterprises running heavy agentic/coding/analysis workloads can face $50-100M+/month at current API rates (flagship models $5-30+/M output, scaled to massive volumes). Post-Helios compression, same volume will drop to $10-15M/month (or better) via lower underlying costs passed through as pricing flexibility, volume tiers, caching, or batch discounts. ROI thresholds collapse. More companies greenlight pilots → production → massive scaling. Agentic AI (autonomous workflows) multiplies token demand exponentially, but affordability removes the friction. OpenAI gains flexibility, Unlike more cloud-dependent rivals (Anthropic), they can lower effective pricing, offer aggressive enterprise bundles, or absorb volume without margin destruction directly tackling "high token bill" complaints while maintaining profitability as usage explodes. 3. Agentic AI Models shifted CPU:GPU Ratio to 1:1 toward 3-5:1 with Explosively Token-Hungry Workloads Agentic AI (autonomous, multi-step agents with planning, tool use, iteration, and self-correction) is fundamentally more compute and token intensive than conversational or single-turn generative AI. Agentic AI. autonomous, multi-step workflows with orchestration, tool use, parallel agents, data movement, and enterprise integration has dramatically increased the importance of strong host CPUs alongside GPUs. This shifts the CPU-to-GPU ratio higher and makes balanced systems critical toward 1:1 to 5:1 as enterprises testing more than 5-10 agents. AMD EPYC Venice excels ~Leadership core density (up to 256 Zen 6 cores per socket) for running many agents in parallel, orchestration layers, and high-throughput control-plane tasks. ~Superior performance-per-core and power efficiency ( up to 2.1x higher perf/core and 2.26x better SPECpower vs. NVIDIA Grace in benchmarks). ~Tight integration in Helios: One Venice CPU + multiple MI450 GPUs per node, enabling efficient data feeding to GPUs ("zero-copy"), parallel execution, and full rack utilization for complex agentic loops. Hyperscalers (Meta, Microsoft, Amazon, Google, Softbank) and AI natives (OpenAI, Anthropic...) are adopting high-core EPYC at scale specifically for these agentic demands, as CPUs now handle a larger share of non-model work (orchestration, policy enforcement, tool calls). This complements AMD’s lower-cost GPUs for overall TCO wins. ~Agents often generate 10–100x+ more tokens per task due to iterative reasoning chains, multiple tool calls, verification loops, and long-context orchestration. ~Goldman Sachs forecasts token consumption multiplying 24x by 2030 (to 120 quadrillion tokens/month) largely driven by agentic adoption in consumer and enterprise. ~Enterprise data shows agent-pattern workloads growing at 680% annualized rates, projected to surpass conversational AI in token volume by Q3 2026. ~Daily enterprise agent token consumption is already in the billions, with complex workflows (coding, workflows, analysis) amplifying this dramatically. 4. Competitive Edge: Winning Customers from Anthropic Anthropic’s Claude models (especially Opus/Sonnet) excel in complex reasoning and agentic coding, commanding premium positioning. However, their higher underlying costs (heavier reliance on third-party cloud with margins) limit pricing flexibility compared to OpenAI’s owned Helios capacity. Anthropic is on track to generate $10.9 billion in Q2 revenue. The company expects to achieve its first-ever quarterly adjusted operating profit of $559 million. However, sustaining full-year profitability remains challenging due to immense computing and model training costs The truth is, Anthropic has no choice but to buy as much $AMD chips as possible if they want to compete with OpenAI or get investors attention. This 5% adjusted operating profit to revenue ratio is just pathetic. Current pricing dynamics (2026): OpenAI already undercuts on many tiers ( flagship output tokens significantly cheaper than equivalent Claude Opus). Nano/mini models offer 5–10x advantages for volume work. Anthropic holds edges in long-context flat pricing and certain reasoning quality. OpenAI after Helios Rack Ownership, At $0.0003–$0.0005/M effective costs, OpenAI gains massive headroom to: ~Aggressively discount high-volume agentic tiers or bundles. ~Offer “unlimited” enterprise plans or usage-based models that Anthropic struggles to match without margin erosion. ~Target cost-sensitive, high-throughput agent deployments (dev tools, automation platforms) where token bills explode. Enterprises facing $ millions in monthly agentic bills will migrate to the provider delivering better economics at scale. OpenAI’s combination of strong models (o-series reasoning) + lowest TCO positions it to erode Anthropic’s enterprise share, especially as agentic becomes the dominant token consumer. Cheaper tokens expand the total addressable market dramatically. This feeds the data/model improvement loop, justifying further capex. AMD benefits from proven scale pulling in more customers (Meta, Oracle, Microsfot, Amazon, Softbank, TensorWave, LumaAI ... already aligned on Helios). Conclusion: Dr. Lisa Su has been laser focused on inference economics since at least 2022–2023, repeatedly emphasizing that the real battleground for AI scalability would be TCO, power efficiency (TDP), and ultimately tokens per dollar and per watt not just raw training FLOPS. While many viewed inference as a secondary, commoditized workload, Dr. Su architected AMD’s roadmap around rack-scale systems optimized for high-volume, sustained inference that would dominate as models matured and usage exploded. Helios represents the culmination of that multi-year bet: a fully integrated, open platform designed precisely for the economics of massive token throughput. This deep, strategic partnership with OpenAI starting with the 1GW Helios deployment in H2 2026 and scaling to 6GW, is the embodiment of that shared vision. Both companies foresaw a future where agentic AI models evolve to become extraordinarily token-hungry: autonomous agents executing complex, iterative workflows with planning, tool use, verification loops, and long-context reasoning. These workloads can consume 100x+ more tokens per task than traditional chat or single-turn generation, driving exponential demand as capabilities improve and enterprises deploy them at scale. By owning and optimizing this massive Helios capacity at GW scale, OpenAI achieves inference costs as low as $0.0003–$0.0005 per million tokens. This structural cost advantage allows OpenAI to absorb the coming token explosion profitably, dramatically lower effective pricing for enterprises, and win high-volume agentic workloads from higher-cost competitors like Anthropic. What was once a prohibitive monthly token bill becomes an affordable accelerator for productivity and innovation. The OpenAI-AMD alliance validates Dr. Su’s prescient strategy and turns the Agentic flywheel into reality: Collapsing inference costs → explosive token consumption → richer data and better models → accelerate greater demand. This partnership doesn’t just address today’s economics, it positions both leaders at the center of the infrastructure buildout that will power AI’s next decade. By delivering the lowest inference economics at scale, OpenAI not only solves enterprise bill pain but gains a decisive weapon to win share from higher-cost rivals like Anthropic. And that is why OpenAI and $META will deploy EPYC Dense Rack Not Financial Advice! DYOR! Research Purpose Only!

$AMD| The FOMO to buy AMD Chips is NOW 🧵 Not Financial Advice! DYOR! Research Purpose Only! The Inference Queen is the biggest winner in Agentic AI where all other CPUs are struggling to compete with a 2yr old EPYC Turin and EPYC Venice is in mass production phase. AMD stresses deployability today on standard x86 platforms (no proprietary architectures required), full software compatibility, and open standards. This positions Venice + Helios as a practical, high-density alternative to competing solutions while underscoring that agentic AI shifts the balance toward CPU-rich racks alongside GPUs, and most importantly, lowering the cost of token to accelerate adoption and innovation. Context: The Wall Street Journal yesterday came out with an article that OpenAI is condiering drasstically lowering the token prices to win more customers from Anthropic. The narrative "they" are trying to exacerbate the current AI selloff won't last long. This is a fundamental misunderstanding of what is going on, or what I already discussed for months and years. Followers and Subscribers already knew this for years, that this day would come, where token cost will bcome the central discussion among enterprises as there is no such thing as unlimited budget or Tokenmaxxing when they use $NVDA chips or In-house Hyperscalers chips. I will link various threads if you are interested in understanding the full picture from supply chain to recent TSMC Rapid 2nm expansion up to 12 Fabs total by 2027/2028. Hyperscalers and AI natives effectively have no choice but to buy more AMD system for Agentic AI as leadership in economical, power-aware, high-volume internal + agentic use. However, due to supply constraints where Supply is far behind Demand, this makes multi-vendor reality along with in-house chips drive faster industry progress, lower overall costs, and better sustainability. NVIDIA’s Vera Rubin cannot compete with a 2 years old EPYC Turin, but AMD under Dr. Lisa Su has engineered the lowest cost-per-million-tokens, highly competitive energy-efficient solutions, and superior CPU orchestration for agentic AI at scale with Helios. Dr. Su has championed this shift since at least 2023, foreseeing the rise of agentic workflows that demand far more orchestration, parallel agents, and balanced compute well before the industry fully embraced it. Her long-term vision of AI moving from simple prompts to always on, multi-agent systems has driven AMD’s investments in high-core EPYC CPUs and integrated rack-scale solutions, perfectly positioning the company for today’s realities. The OpenAI-AMD 1GW Helios deployment (starting H2 2026) represents a pivotal vertical integration move that directly supercharges the inference economics. This isn't incremental; it's a structural shift toward ownership of massive, optimized rack-scale capacity, enabling the lowest token costs and triggering the enterprise adoption flywheel. We need to be honest, $AMD is the only company that made a big bet on Inference since the day Chatgpt became sensational where $NVDA and others were betting big on Training. At the end of the day, Token bill from Anthropic has to obey economics. Meaning the bills rise, companies have to get more out of it to justify the cost. It cannot be an unlimited inference budget, and it has to show up on efficiency, profitability and operating leverage. 1. Tokenomics After you understand this, you will understand why Citi cited Anthropic is likely to sign a deal with $AMD along with Hyperscalers, AI Labs, Sovereign AI like Softbank 5GW in France and many other countries. However, OpenAI and $META are now wanting faster deployment, and they are AMD shareholders now, they have prioritized allocation. Anthropic and Hyperscalers just cannot compete when Helios Rack lower token cost to$0.0003–$0.0005 per million tokens at GW scale. Cost to build 1GW data center 1GW Helios Rack full build is estimated $30-$35B 1GW Rubin Rack full build is estimated $45-$55B Inference (Cost per Million Tokens) ~$NVDA B200 / HGX: ~$0.02–$0.08 on optimized workloads (FP4/MXFP4, speculative decoding). Significant improvement over Hopper but still premium-priced. GB200 NVL72 rack-scale: $0.05–$0.25+ ~$AMD Helios Racks: $0.0003-$0.0005 per M tokens, dramatically lower than NVIDIA equivalents in owned infra. MI355X node-level: Up to 40% more tokens per dollar vs. competing solutions ( B200), driven by higher memory capacity (up to 288GB+ HBM), strong bandwidth, and lower acquisition costs. Training ~$NVDA Rubin Rack is estimated $0.7-$1.2/M Tokens ~$AMD Helios Rack is estimated $0.65-$1.0/M Tokens Now, OpenAI, META and Hyperscalers can lower Inference cost even further with $AMD EPYC Venice "dense rack" or Agentic AI Rack. AMD published a detailed technical blog emphasizing that the future of agentic AI autonomous, multi-step AI systems requiring heavy orchestration, databases, caching, APIs, and control planes demands massive CPU-dense rack-scale infrastructure, not just GPUs. The catalyst prominently positions their upcoming 6th Gen EPYC "Venice" processors as the key enabler for next-generation dense racks, delivering leadership throughput under real-world power, cooling, and density constraints. ~EPYC Venice (Zen 6 architecture, up to 256 cores / 512 threads per socket) is projected to deliver exceptional rack-level performance. In AMD’s modeled 100 kW rack comparisons, Venice-powered systems are expected to achieve ~3.30x the throughput of NVIDIA’s Vera (88-core Olympus) baseline across a broad mix of agentic-supporting workloads. ~This builds on current-generation 5th Gen EPYC "Turin" (up to 192 cores), which already delivers ~2.37x rack throughput vs. Vera and ~1.6x vs. Intel’s Xeon 6980P (128 cores). ~ Liquid-cooled Turin deployments already support >27,000 CPU cores per rack today. Venice is architected to push this beyond 36,000 cores in the same rack class, dramatically increasing concurrent agent capacity and overall infrastructure efficiency. 2. Ownership vs renting compute from Hyperscalers matter to OpenAI and only owning $AMD chips can meaningfully lower token cost for enterprises. ~Eliminates cloud overhead: No provider margins, utilization buffers, or egress fees. Direct control over power contracts, cooling, scheduling, and orchestration at dedicated facilities. ~Helios optimizations at GW scale: Rack-level density (1.4+ exaFLOPS FP8 per rack), high HBM4 bandwidth, EPYC orchestration for agentic workloads, and superior TCO/TDP. AMD's long-standing focus on tokens per dollar/watt shines here 20-40%+ efficiency edges in inference-heavy scenarios. ~At 1GW+ optimized deployment, inference hits $0.0003–$0.0005 per million tokens (community/analyst models tied to Helios metrics). This is dramatically lower than typical rented/cloud equivalents, especially for high-volume output tokens in agentic flows. High token bills today, enterprises running heavy agentic/coding/analysis workloads can face $50-100M+/month at current API rates (flagship models $5-30+/M output, scaled to massive volumes). Post-Helios compression, same volume will drop to $10-15M/month (or better) via lower underlying costs passed through as pricing flexibility, volume tiers, caching, or batch discounts. ROI thresholds collapse. More companies greenlight pilots → production → massive scaling. Agentic AI (autonomous workflows) multiplies token demand exponentially, but affordability removes the friction. OpenAI gains flexibility, Unlike more cloud-dependent rivals (Anthropic), they can lower effective pricing, offer aggressive enterprise bundles, or absorb volume without margin destruction directly tackling "high token bill" complaints while maintaining profitability as usage explodes. 3. Agentic AI Models shifted CPU:GPU Ratio to 1:1 toward 3-5:1 with Explosively Token-Hungry Workloads Agentic AI (autonomous, multi-step agents with planning, tool use, iteration, and self-correction) is fundamentally more compute and token intensive than conversational or single-turn generative AI. Agentic AI. autonomous, multi-step workflows with orchestration, tool use, parallel agents, data movement, and enterprise integration has dramatically increased the importance of strong host CPUs alongside GPUs. This shifts the CPU-to-GPU ratio higher and makes balanced systems critical toward 1:1 to 5:1 as enterprises testing more than 5-10 agents. AMD EPYC Venice excels ~Leadership core density (up to 256 Zen 6 cores per socket) for running many agents in parallel, orchestration layers, and high-throughput control-plane tasks. ~Superior performance-per-core and power efficiency ( up to 2.1x higher perf/core and 2.26x better SPECpower vs. NVIDIA Grace in benchmarks). ~Tight integration in Helios: One Venice CPU + multiple MI450 GPUs per node, enabling efficient data feeding to GPUs ("zero-copy"), parallel execution, and full rack utilization for complex agentic loops. Hyperscalers (Meta, Microsoft, Amazon, Google, Softbank) and AI natives (OpenAI, Anthropic...) are adopting high-core EPYC at scale specifically for these agentic demands, as CPUs now handle a larger share of non-model work (orchestration, policy enforcement, tool calls). This complements AMD’s lower-cost GPUs for overall TCO wins. ~Agents often generate 10–100x+ more tokens per task due to iterative reasoning chains, multiple tool calls, verification loops, and long-context orchestration. ~Goldman Sachs forecasts token consumption multiplying 24x by 2030 (to 120 quadrillion tokens/month) largely driven by agentic adoption in consumer and enterprise. ~Enterprise data shows agent-pattern workloads growing at 680% annualized rates, projected to surpass conversational AI in token volume by Q3 2026. ~Daily enterprise agent token consumption is already in the billions, with complex workflows (coding, workflows, analysis) amplifying this dramatically. 4. Competitive Edge: Winning Customers from Anthropic Anthropic’s Claude models (especially Opus/Sonnet) excel in complex reasoning and agentic coding, commanding premium positioning. However, their higher underlying costs (heavier reliance on third-party cloud with margins) limit pricing flexibility compared to OpenAI’s owned Helios capacity. Anthropic is on track to generate $10.9 billion in Q2 revenue. The company expects to achieve its first-ever quarterly adjusted operating profit of $559 million. However, sustaining full-year profitability remains challenging due to immense computing and model training costs The truth is, Anthropic has no choice but to buy as much $AMD chips as possible if they want to compete with OpenAI or get investors attention. This 5% adjusted operating profit to revenue ratio is just pathetic. Current pricing dynamics (2026): OpenAI already undercuts on many tiers ( flagship output tokens significantly cheaper than equivalent Claude Opus). Nano/mini models offer 5–10x advantages for volume work. Anthropic holds edges in long-context flat pricing and certain reasoning quality. OpenAI after Helios Rack Ownership, At $0.0003–$0.0005/M effective costs, OpenAI gains massive headroom to: ~Aggressively discount high-volume agentic tiers or bundles. ~Offer “unlimited” enterprise plans or usage-based models that Anthropic struggles to match without margin erosion. ~Target cost-sensitive, high-throughput agent deployments (dev tools, automation platforms) where token bills explode. Enterprises facing $ millions in monthly agentic bills will migrate to the provider delivering better economics at scale. OpenAI’s combination of strong models (o-series reasoning) + lowest TCO positions it to erode Anthropic’s enterprise share, especially as agentic becomes the dominant token consumer. Cheaper tokens expand the total addressable market dramatically. This feeds the data/model improvement loop, justifying further capex. AMD benefits from proven scale pulling in more customers (Meta, Oracle, Microsfot, Amazon, Softbank, TensorWave, LumaAI ... already aligned on Helios). Conclusion: Dr. Lisa Su has been laser focused on inference economics since at least 2022–2023, repeatedly emphasizing that the real battleground for AI scalability would be TCO, power efficiency (TDP), and ultimately tokens per dollar and per watt not just raw training FLOPS. While many viewed inference as a secondary, commoditized workload, Dr. Su architected AMD’s roadmap around rack-scale systems optimized for high-volume, sustained inference that would dominate as models matured and usage exploded. Helios represents the culmination of that multi-year bet: a fully integrated, open platform designed precisely for the economics of massive token throughput. This deep, strategic partnership with OpenAI starting with the 1GW Helios deployment in H2 2026 and scaling to 6GW, is the embodiment of that shared vision. Both companies foresaw a future where agentic AI models evolve to become extraordinarily token-hungry: autonomous agents executing complex, iterative workflows with planning, tool use, verification loops, and long-context reasoning. These workloads can consume 100x+ more tokens per task than traditional chat or single-turn generation, driving exponential demand as capabilities improve and enterprises deploy them at scale. By owning and optimizing this massive Helios capacity at GW scale, OpenAI achieves inference costs as low as $0.0003–$0.0005 per million tokens. This structural cost advantage allows OpenAI to absorb the coming token explosion profitably, dramatically lower effective pricing for enterprises, and win high-volume agentic workloads from higher-cost competitors like Anthropic. What was once a prohibitive monthly token bill becomes an affordable accelerator for productivity and innovation. The OpenAI-AMD alliance validates Dr. Su’s prescient strategy and turns the Agentic flywheel into reality: Collapsing inference costs → explosive token consumption → richer data and better models → accelerate greater demand. This partnership doesn’t just address today’s economics, it positions both leaders at the center of the infrastructure buildout that will power AI’s next decade. By delivering the lowest inference economics at scale, OpenAI not only solves enterprise bill pain but gains a decisive weapon to win share from higher-cost rivals like Anthropic. And that is why OpenAI and $META will deploy EPYC Dense Rack Not Financial Advice! DYOR! Research Purpose Only!

Mike

84,951 görüntüleme • 1 ay önce

$AMD $620/share is too conservative for 2026 🧵 Some quick facts before I dive into this super long thread: $META allocated 42% GPUs to $AMD and 58% to $NVDA OpenAI allocated 6GW(38%) to $AMD and 10GW to $NVDA My $620 PT below by end of 2026 was only for 10-15% market share. I believe $AMD is going to have much much higher market share than I projected. The AI accelerator market is exploding, projected to reach $500 billion by 2028(is now heading $1Tril), driven by insatiable demand for training and inference compute in large language models (LLMs), recommendation systems, and autonomous systems. Nvidia ($NVDA) has long held a stranglehold, commanding over 90% market share through its CUDA ecosystem and superior rack-scale solutions. However, AMD is mounting a formidable challenge, leveraging cost advantages, open-source software momentum, and hyperscaler partnerships to erode Nvidia's moat. Recent deals—such as Meta's ($META) allocation of 42% of its GPU capacity to AMD and OpenAI's commitment to 6GW of AMD compute (versus 10GW for Nvidia)—signal a tipping point. At the forefront is AMD's Instinct MI450 series, a next-generation AI GPU slated for H2 2026 launch, which promises "no-excuses" leadership in training, inference, and distributed workloads. This analysis dissects how AMD will capture more market share and why hyperscalers like $Meta , xAI , Oracle , and others are poised to become voracious buyers of the MI450. AMD's AI GPU revenue has surged from negligible levels in 2022 to an estimated $4-5 billion in 2025, capturing ~6% of the data center GPU market. This growth stems from the Instinct MI300X, which offers 141GB of HBM3 memory and competitive FP8/FP16 performance at 20-30% lower cost than Nvidia's H100. Hyperscalers, facing NVIDIA 's overcharging, have turned to AMD for diversification. Meta, for instance, plans 600,000 H100-equivalent GPUs by end-2024, with ~42% (or 250,000+ units) sourced from AMD's MI300 series for inference tasks like image editing and AI assistants. Similarly, OpenAI's recent multi-year deal commits to 6GW of AMD compute—equivalent to ~300,000-400,000 MI450 GPUs—starting with 1GW in 2026, explicitly to counterbalance its 10GW Nvidia allocation. These aren't one-offs. Microsoft Azure, Amazon AWS, and Oracle Cloud Infrastructure (OCI) have integrated MI300X for AI workloads, with Oracle deploying 30,000 MI355X units in zettascale clusters. xAI, Elon Musk Musk's AI venture, ran 30% of Grok-1's production traffic on MI300X GPUs and has confirmed ongoing purchases. Collectively, these partners represent over $400 billion in projected AI infrastructure spend through 2028, with AMD targeting up to 40% market share. For those that subscribed, I wrote a specific thread on how AMD "secret weapon" is going to change the game in 2026 with an improved designs on all its products, yes AMD has patent on it. Software is the linchpin. AMD's ROCm platform, once derided as "half-baked," now supports day-zero integration for Llama-4, DeepSeek V3, and GPT-OSS models—closing the CUDA gap. Benchmarks show MI355X (MI450 precursor) outperforming Nvidia's B200 in inference by 1.5-2x on memory-bound tasks, at 25-35% lower TCO. For training, MI450's rack-scale IF128 configuration (128 GPUs, 1.4 PB/s intra-rack bandwidth) rivals Nvidia's VR200 NVL144, enabling clusters like xAI's Colossus (scaling to 1M GPUs). My below thread projected Etimated conservative FY 25 revenue: $34-$36B Estimated conservative FY 26 revenue: $55B-$62B Below is why $AMD is revenue is going to be much higher after OpenAI deal. 1. OpenAI 1GW in 2026. With high demand for MI355X at $30,000k+ per unit, with MI450 is likely to be sold in the $45k-$55k. We can safely calcuate 1GW would require roughly 400,000 MI450 GPUs. or Roughly ~$20B revenue in 2026 alone from OpenAI. That would mean $AMD would hit $56B just from one partnership(OpenAI) in 2026 2. $META, the biggest spender on AI Infrastructure right now, Daddy Zuckerberg bought 250,000+ MI300, and is buying MI355X for recommendation engines and Llama training. It is very unlikely for Daddy Zuck to slow down AMD Chips, due to its Inference superiority to NVDA Chips. Most likely we will see at least 300,000-400,000 MI355X ordered from now toward end of H1 2025. And another 300,000-500,000 MI450 by H2 2025. Or ~$20B from just Meta in H2 alone, excluded H1. 3. xAI : Musk confirmed "AMD GPUs work very well" for Grok's small/medium models, with 30% of Grok-1 on MI300X. xAI's Colossus (200K+ GPUs, targeting 1M) and Oracle partnership (via OCI's MI355X cluster) position it for MI450 trials in H1 2026. With $6B funding and Grok integration into Oracle services, xAI could allocate 10-20% ($10B-$15B) to MI450 for distributed inference. We haven't heard the detail from Daddy Elon Musk yet, but most likely not going to be spending less than OpenAI or Sam Altman 4. Oracle ($ORCL): A multi-billion-dollar MI355X deal powers OCI's AI superclusters, with $500B+ remaining performance obligations. Larry Ellison's zettascale ambitions and xAI/OpenAI integrations make Oracle a MI450 anchor tenant—projected 50-100k units ($15B+ spend) for enterprise AI platforms. $ORCL is likely to spend more on the new "secret weapon" due to its capability in AI inference and cost advantage for $500B backlog. 5. Others ( Microsoft , Amazon , Saudi+other countries): Microsoft (Azure MI300X for training) and Amazon ($148B 15-year spend) test MI450 via Stargate ($500B with Oracle/SoftBank). Emerging buyers like G42 (5GW UAE campus), Crusoe, and Hot Aisle add 5-10GW demand. These potentially would add $15B-$30B in 2026 alone. We also need to factor in $TSM supply constraint( $NVDA is TSMC favorite), so $AMD market cap/growth is being tamed by TSMC. So what are you saying Mike, well $AMD 2026 revenue could hit $90-$100B by end of 2026 or nearly 185% growth YoYo. So what does that mean for valuation? I have no idea how Mr. Market gonna value AMD in 2026 with 3 digits growth. My Conservative $620 was my best projection until today with OpenAI partnership. I'm telling you as one of the biggest AMD bull, that I will leave it to "smart money" and other investors to do the price discovery while I'm chilling and writing DDs daily. Lastly, AMD's MI450 isn't hype—it's a calibrated strike at Nvidia's vulnerabilities, amplified by hyperscaler bets like Meta's 42% allocation and OpenAI's 6GW lifeline. By prioritizing inference efficiency, rack-scale innovation, and open ecosystems, AMD will siphon 10-15% share in 2026, scaling to 20%+ as TCO trumps CUDA loyalty. Meta, xAI, Oracle et al. aren't passive; they're active co-designers, betting billions on MI450 to fuel AGI pursuits without Nvidia's premium. For investors, this is AMD's inflection Per Dr. Lisa Su Not Financial Advice!

$AMD $620/share is too conservative for 2026 🧵 Some quick facts before I dive into this super long thread: $META allocated 42% GPUs to $AMD and 58% to $NVDA OpenAI allocated 6GW(38%) to $AMD and 10GW to $NVDA My $620 PT below by end of 2026 was only for 10-15% market share. I believe $AMD is going to have much much higher market share than I projected. The AI accelerator market is exploding, projected to reach $500 billion by 2028(is now heading $1Tril), driven by insatiable demand for training and inference compute in large language models (LLMs), recommendation systems, and autonomous systems. Nvidia ($NVDA) has long held a stranglehold, commanding over 90% market share through its CUDA ecosystem and superior rack-scale solutions. However, AMD is mounting a formidable challenge, leveraging cost advantages, open-source software momentum, and hyperscaler partnerships to erode Nvidia's moat. Recent deals—such as Meta's ($META) allocation of 42% of its GPU capacity to AMD and OpenAI's commitment to 6GW of AMD compute (versus 10GW for Nvidia)—signal a tipping point. At the forefront is AMD's Instinct MI450 series, a next-generation AI GPU slated for H2 2026 launch, which promises "no-excuses" leadership in training, inference, and distributed workloads. This analysis dissects how AMD will capture more market share and why hyperscalers like $Meta , xAI , Oracle , and others are poised to become voracious buyers of the MI450. AMD's AI GPU revenue has surged from negligible levels in 2022 to an estimated $4-5 billion in 2025, capturing ~6% of the data center GPU market. This growth stems from the Instinct MI300X, which offers 141GB of HBM3 memory and competitive FP8/FP16 performance at 20-30% lower cost than Nvidia's H100. Hyperscalers, facing NVIDIA 's overcharging, have turned to AMD for diversification. Meta, for instance, plans 600,000 H100-equivalent GPUs by end-2024, with ~42% (or 250,000+ units) sourced from AMD's MI300 series for inference tasks like image editing and AI assistants. Similarly, OpenAI's recent multi-year deal commits to 6GW of AMD compute—equivalent to ~300,000-400,000 MI450 GPUs—starting with 1GW in 2026, explicitly to counterbalance its 10GW Nvidia allocation. These aren't one-offs. Microsoft Azure, Amazon AWS, and Oracle Cloud Infrastructure (OCI) have integrated MI300X for AI workloads, with Oracle deploying 30,000 MI355X units in zettascale clusters. xAI, Elon Musk Musk's AI venture, ran 30% of Grok-1's production traffic on MI300X GPUs and has confirmed ongoing purchases. Collectively, these partners represent over $400 billion in projected AI infrastructure spend through 2028, with AMD targeting up to 40% market share. For those that subscribed, I wrote a specific thread on how AMD "secret weapon" is going to change the game in 2026 with an improved designs on all its products, yes AMD has patent on it. Software is the linchpin. AMD's ROCm platform, once derided as "half-baked," now supports day-zero integration for Llama-4, DeepSeek V3, and GPT-OSS models—closing the CUDA gap. Benchmarks show MI355X (MI450 precursor) outperforming Nvidia's B200 in inference by 1.5-2x on memory-bound tasks, at 25-35% lower TCO. For training, MI450's rack-scale IF128 configuration (128 GPUs, 1.4 PB/s intra-rack bandwidth) rivals Nvidia's VR200 NVL144, enabling clusters like xAI's Colossus (scaling to 1M GPUs). My below thread projected Etimated conservative FY 25 revenue: $34-$36B Estimated conservative FY 26 revenue: $55B-$62B Below is why $AMD is revenue is going to be much higher after OpenAI deal. 1. OpenAI 1GW in 2026. With high demand for MI355X at $30,000k+ per unit, with MI450 is likely to be sold in the $45k-$55k. We can safely calcuate 1GW would require roughly 400,000 MI450 GPUs. or Roughly ~$20B revenue in 2026 alone from OpenAI. That would mean $AMD would hit $56B just from one partnership(OpenAI) in 2026 2. $META, the biggest spender on AI Infrastructure right now, Daddy Zuckerberg bought 250,000+ MI300, and is buying MI355X for recommendation engines and Llama training. It is very unlikely for Daddy Zuck to slow down AMD Chips, due to its Inference superiority to NVDA Chips. Most likely we will see at least 300,000-400,000 MI355X ordered from now toward end of H1 2025. And another 300,000-500,000 MI450 by H2 2025. Or ~$20B from just Meta in H2 alone, excluded H1. 3. xAI : Musk confirmed "AMD GPUs work very well" for Grok's small/medium models, with 30% of Grok-1 on MI300X. xAI's Colossus (200K+ GPUs, targeting 1M) and Oracle partnership (via OCI's MI355X cluster) position it for MI450 trials in H1 2026. With $6B funding and Grok integration into Oracle services, xAI could allocate 10-20% ($10B-$15B) to MI450 for distributed inference. We haven't heard the detail from Daddy Elon Musk yet, but most likely not going to be spending less than OpenAI or Sam Altman 4. Oracle ($ORCL): A multi-billion-dollar MI355X deal powers OCI's AI superclusters, with $500B+ remaining performance obligations. Larry Ellison's zettascale ambitions and xAI/OpenAI integrations make Oracle a MI450 anchor tenant—projected 50-100k units ($15B+ spend) for enterprise AI platforms. $ORCL is likely to spend more on the new "secret weapon" due to its capability in AI inference and cost advantage for $500B backlog. 5. Others ( Microsoft , Amazon , Saudi+other countries): Microsoft (Azure MI300X for training) and Amazon ($148B 15-year spend) test MI450 via Stargate ($500B with Oracle/SoftBank). Emerging buyers like G42 (5GW UAE campus), Crusoe, and Hot Aisle add 5-10GW demand. These potentially would add $15B-$30B in 2026 alone. We also need to factor in $TSM supply constraint( $NVDA is TSMC favorite), so $AMD market cap/growth is being tamed by TSMC. So what are you saying Mike, well $AMD 2026 revenue could hit $90-$100B by end of 2026 or nearly 185% growth YoYo. So what does that mean for valuation? I have no idea how Mr. Market gonna value AMD in 2026 with 3 digits growth. My Conservative $620 was my best projection until today with OpenAI partnership. I'm telling you as one of the biggest AMD bull, that I will leave it to "smart money" and other investors to do the price discovery while I'm chilling and writing DDs daily. Lastly, AMD's MI450 isn't hype—it's a calibrated strike at Nvidia's vulnerabilities, amplified by hyperscaler bets like Meta's 42% allocation and OpenAI's 6GW lifeline. By prioritizing inference efficiency, rack-scale innovation, and open ecosystems, AMD will siphon 10-15% share in 2026, scaling to 20%+ as TCO trumps CUDA loyalty. Meta, xAI, Oracle et al. aren't passive; they're active co-designers, betting billions on MI450 to fuel AGI pursuits without Nvidia's premium. For investors, this is AMD's inflection Per Dr. Lisa Su Not Financial Advice!

Mike

711,006 görüntüleme • 9 ay önce

$AMD is easily a $1,200 stock IMO| CPUs TAM 🧵 Not Financial Advice! DYOR! In this thread, I want to discuss the actual TAM for CPUs data center for just 2026, where many are giving different ranges, where I don't agree with. I will explain in detail why I disagree with these research firms and financial analysts using Math. And this thread should not be treated as Financial Advice. I'm just explaining my research and thought process so we can have a discussion. In 2024/2025, I gave out $620 PT for FY2026 was too conservative for AMD potential. At the time, It was early and many were just laughing, that PT was unrealistic and the AI world is run on GPUs only. Today, most of these folks are laughing with me. That is ok, I dont offer financial advice, and I do not need everyone to agree with me. I respect other opinions. If you enjoy this kind of thread, slap the like/repost/bookmark. If you want to support my work further and gain more in-depth analysis, consider subscribe! In early 2026, hyperscalers, enterprises, and OEMs are scrambling as Intel and AMD server CPUs are largely sold out for the year, with prices jumping 10–20% and lead times stretching from weeks to months (or longer for certain SKUs). What was once a GPU dominated story has flipped: the shift to explosive Agentic AI with its multi-step reasoning loops, tool calling, multi-agent orchestration, real-time data movement, and reinforcement learning, is dramatically tightening CPU:GPU ratios from the old training-era 1:4–8 all the way to 1:1 to 5:1 or even CPU-heavy configurations. CEOs across NVIDIA, AMD, Intel, Google, Meta, Microsoft, and public companies have been sounding the alarm on CNBC, Bloomberg, and earnings calls. CPUs are “cool again,” and in many agentic deployments they are becoming the new bottleneck alongside (or even ahead of) GPUs and custom ASICs. In 2025, roughly 12-15m AI GPUs + AI ASICs GPUs shipped, and is expect to be 15-20m units by 2026, where it suggesting Training demand is not going away. The actual TAM is structural, multiplicative demand that has already forced AMD to double its long-term server CPU TAM forecast to >$120 billion by 2030 (>35% CAGR), with Dr. Lisa Su noting Q2 2026 server CPU sales expected to surge 70%+ year-over-year and demand “far exceeding expectations.” At the same time, AMD’s secured 30–40% share of TSMC’s initial 2nm capacity (behind only Apple’s >50%) positions it to ramp Zen 6-based EPYC Venice exactly when this agentic wave hits hardest but even that aggressive five-fab 2nm expansion (with plans scaling toward 11 total advanced facilities) cannot instantly close the gap in the near-term. Supply constraints on wafers, advanced packaging, and power are compounding the squeeze, just as hyperscalers forward-buy and lock in long-term deals. 1. The actual potential TAM Various sources and institutions are giving $50-$160-$200B CPUs TAM toward 2030, and i disagree, where supply is severely behind vs Demand by at least 2-3 years or even longer by some estimates. The actual TAM will probably be 15-20m for FY2026. The typical average selling price from low to high end is $5,000 to $15,000, but due to rising memory, and different inflationary pressures on Semi, it would be more logical to think between $7,000-17,000. A. CPU:GPU Ratio at 1:1 A basic calucation at mid range =12,000 x 15-20m CPUs= $180-$240B TAM B. CPU:GPU Ratio at 5:1 = $12,000 x 75m-100m CPUs= $900B-$1.2T TAM Of course TSMC cannot even supply 20% of this massive inflection TAM in 2026. But do we think of Demand for TAM or Supply for TAM? Hence we are seeing massive 2nm Ramp from TSMC for $AMD. IMO, conservatively, I would take down 15-20% on 1:1 or $135-$192B TAM for just 2026. Im not even talking about 2030. We are just months into this, it is impossible to estimate Cagr atm, but this is 1-5 agents running tasks, I wrote a thread on 24/7 autonomous agents thread, where companies could use 50-250 agents to run tasks for them 24/7. It would require a different structural CPU:GPU to bring down the cost of token as well as handling the Orchestration bottleneck. GPUs would be useless and sit idle waiting for CPU due to highly CPU-intensive nature. The cost per Million tokens must come down more rapidly for this 50-250 autonomous agents to work, otherwise the token cost would be too enormous. Helios Rack is estimated to bring inference cost down to $0.0003-$0.0005/M tokens with 18 EPYC Venices along with 72 MI455x and other chips+ Components. A heavier or CPUs dense rack would bring down inference cost further. EPYC Verano(2027 gen 7 AI-optimized) is expected to drive inference costs meaningfully lower than the Venice baseline likely to the $0.00002–$0.00025 per million tokens range (or even sub-$0.00015 in highly optimized agentic/batch workloads). Verano have higher core counts than Venice, LPDDR5X SOCAMM2 memory support, more AI optimized and Next-Gen rack density & efficiency. 2. $AMD secured at least 30-40% of TSMC 2nm capacity and Memory from Samsung through 2028-2030. 2 2nm fabs are entering ramping phase toward 60-65k wafers per months and 5 dedicated 2nm fabs entering mass production/ramp in 2026. Will link sub threads below if you are interest for full detail. Apple is reported to secure 50%+ 2nm capacity for Iphone 18 and Mac chips and AMD secured at least 30-40% capacity while $NVDA $AVGO $ARM $AMZN $GOOGL and others are on 3nm. This broader aggressive ramp from TSMC to target up to 11 fabs is to address $AMD massive growth ahead. Where $ARM is facing massive CPUs supply constraints as they have to compete with other Mega Cap players on 3nm allocation. And $INTC is also facing supply constraints for data center CPUs and PC per management with lead times extrended to longer than 12 weeks. Dr. Su is aiming for higher than 50%+ Market share, and I believe it is achievable in 2026 or 2027 as AMD has the strongest CPUs offerings. Dr. Su did not want to take advantage of the shortage and she said during the Q1 earning call, AMD is prioritizing Units shipped while guiding margin to be inching 60%. If Jensen were in charge, I'm sure margin would be 70-75% in this kind of severe CPUs shortage condition. But that is not how Dr. Su operates for more than a decade. She wants most market share. So we will see it in revenue growth, but as TSMC ramps faster and faster, AMD Operating and FCF margin will massively improve vs prior decade. A significantly higher margin profile than before. 3. How I came up with $1,200 withint 12-18 months? At $1,200/ share, that would be around $2 Trillion MC. I expect FY2027 revenue to be $124-$144B where data center revenue dominates overall revenue. AI GPUs: I will stick to the lowest end so show u that I'm conservative at $18B for each GW vs $NVDA Rubin is $30B+ (most likely Helios Rack in the $20B+ due to memory price rising). We know deals with OpenAI and Meta are around 12GW and additional multi-customers at multi-GW scale were hinted and will be revealed as we get to July 22-23 2026 Advancing AI event. For now I will conservatively add a bit more to this model. (3-6GW Helios Rack Range) EPYC Venice is reported to be in $15,000-$20,000. However large customers will likely to enjoy $10-$12k discount. I expect AMD to be able to ramp 7m EPYC Venice for entire 2026 and 3-4m of EPYC Verano(higher price than Venice). If we take an average selling price of $10,000 to be on the conservative side. Take down another 30% to be even more conservative on projection. I like to be conservative. That would be ~ 7m EPYC CPUs(Venice + Verano) for FY2027 or 583,000 units per month or 15,000 additional 2nm wafers per month which is completely reasonable for current TSMC Ramp, and I may be too conservative here. EPYC Verano and MI500 series will also be on 2nm. AI GPUs: 3GW x $18B= $54B EPYC CPUs: $10k x 7m CPUs= $70B = Data center revenue alone is $124B Other segments= probably in the $20-$25B FY 2027. FY2027 revenue = $124-$149B At 7m EPYC CPUs for entire 2027, that would be more than 50% market share when we comp it to availability from supply side, not from total Demand. It is possible that TSMC could significantly ramp even more capacity in 2027, so we will see. Metric Q1 2026 FY2027 Gross Margin 55-56% 60-62% Operating Margin 25-26% 32-35% Net Income Margin ~22% 26-30% FCF Margin 25% 28-30% At $124-$149B Revenue FY 2027 Net Income would be $32-$44B EPS would be $20-$27 (GAAP) Non-GAAP would be $25-$31 At $1,200 a share or $2T valuation that would be: 13.4-16x Price to Sales (P/S) 38-48 P/E At this kind of growth of AI SuperCycle, I think it is very reasonable valuation. If we use today at $406/share or $661B MC: 2027 P/S = 4.4x-5.3x 2027 P/E = 13x-16x Is AMD today expensive or cheap to you? Above is already a very conservative where I trimmed 20-30% of doable units. Meaning, there could be upside if TSMC is able to ramp meaningfully like they are planning. Conclusion: A $1,200 per share valuation IMO for AMD in FY2027 is not expensive at all; it is, in fact, conservative when viewed against the structural explosion in agentic AI demand we have mapped out. With server CPU TAM potentially scaling into the $100–$200B+ range in just CPU:GPU 1:1 Ratio for just 2026. AMD positioned to capture 50%+ share thanks to its 2nm TSMC allocation advantage and full-stack leadership, the company could realistically deliver $124–149B in total revenue and $25–$31+ non-GAAP EPS. At those levels, $1,200 implies a 2027 P/E = 13x-16x. Entirely reasonable for a company that will have become the clear Inference Queen (and in many workloads the preferred) AI infrastructure provider, with operating margins expanding above 30% and tens of billions in high-margin rack-scale AI revenue. Dr. Lisa Su was right presciently so about the Agentic AI inflection all the way back to her early 2022–2023 commentary on the coming shift from pure training to inference and orchestration-heavy workloads. While the broader market only fully woke up to this in 2026 when she doubled AMD’s long-term server CPU TAM forecast to >$120B by 2030 (with >35% CAGR), Dr. Su and her team have consistently positioned the company at the center of the CPU renaissance. The explosive demand we are seeing today, sold-out lines, rising ASPs, and hyperscalers forward-buying entire gigawatts of Helios-class systems is exactly the outcome she forecasted years ago. Not Financial Advice! DYOR!

Mike

301,322 görüntüleme • 2 ay önce

$AMD $5 Trillion is Inevitable LT| Agentic AI🧵 Agentic AI is the new $5 Trillion TAM 🚨🚨🚨 This thead will do Comp with $INTC and how to quantify this massive Agentic AI demand spike, and forcing Jensen to rush a CPU design. Global Agentic AI Market size is estimated to be $3-$5Trillion TAM by 2030(McKinsey) Quantifying the demand from agentic AI for AMD involves assessing the broader market growth for agentic systems, their unique computational requirements (particularly for CPUs in orchestration and reasoning tasks), and AMD's positioning very well through products like EPYC processors and partnerships. AMD EPYC Venice is the most superior choice in 2026-2027 for most Agentic AI workloads Agentic AI refers to autonomous AI agents that perform multi-step tasks, involving sequential logic, tool integration, and decision-making workloads that heavily rely on CPUs for handling orchestration, memory management, and context switching, rather than just GPU-parallelized training or batch inference. Agentic AI is often cited as 40-100x more "hungry" than traditional AI due to its continuous, 24/7 operation and complex workflows. This stems from factors like chain-of-thought reasoning (multiple LLM calls per query), API/tool interactions, memory management, and orchestration loops, which can generate 10-100x more tokens and require real-time responsiveness. For example, a single agentic query might trigger 5-20 model inferences, making it 10-20x more compute-intensive than simple chatbots, and the always-on nature compounds this to 40-100x overall. Nvidia's CEO has highlighted this as driving "easily 100x more computation" for inference in agentic/reasoning setups. AMD's EPYC Venice (6th Gen EPYC, codenamed "Venice") and Intel's Xeon 7 Diamond Rapids represent the pinnacle of server CPU technology in 2026, both targeting high-performance data center workloads like AI inference, agentic AI orchestration, cloud computing, and HPC. Venice builds on AMD's Zen 6 architecture, emphasizing core density and efficiency, while Diamond Rapids leverages Intel's Panther Cove P-cores for balanced performance. Both chips adopt similar advancements like 16-channel DDR5 memory and PCIe Gen 6, but differ in core counts, process nodes, and overall design philosophy. Intel has faced acute supply constraints across its Xeon lineup, including legacy nodes (Intel 7/3) and the ramping 18A process for next-gen parts. Intel shortage is expected with lead times up to 6 months or longer. 1. AMD EPYC Venice vs Intel Xeon 7 Diamond Rapids Architecture AMD: Zen 6 chiplet design with 8 CCDs and dual IODs Intel: Panther Cove P-cores; multi-die architecture with 4 compute tiles Core/Thread Count AMD: Up to 256 cores / 512 threads (Zen 6c variant) Intel: Up to 192 cores / 192 threads Process Node AMD: TSMC N2 (2nm) Intel: Intel 18A (1.8nm-class); in-house fab Memory Support AMD: 16-channel DDR5; up to 1.6 TB/s bandwidth. Intel: 16-channel DDR5 ; up to 1.6 TB/s bandwidth I/O and Connectivity AMD: PCIe Gen 6 (up to 128 lanes); twice the CPU-to-GPU bandwidth Intel: PCIe Gen 6 (up to 128 lanes); LGA 9324 socket Power (TDP) AMD: Starting 400-500W, potentially lower due to efficiency gains from TSMC 2nm Intel: Starting 400-500W, as it targets competitive efficiency Performance Projections AMD: Up to 70% uplift vs. 5th Gen Turin (1.7x in multi-threaded/AI tasks) Intel: ~40% faster than Granite Rapids (Xeon 6, 128-core). Lags AMD in per-core perf and 40-50% behind Venice core-for-core comp Target Workloads AMD: AI inference/orchestration, HPC, cloud virtualization. Partnerships Intel: Hyperscale AI, general enterprise. Custom silicon Pricing: AMD: estimated $10k-$20k for top SKUs Intel: estimated $8-$18k Availability: AMD: Significant Ramp H2 2026 due to higher allocation from TSMC Intel: H1-H2 2026 delayed, but trying to catch up Overall: ~Venice's 256 cores provide a 33% edge over Diamond Rapids' 192, making it superior for massively parallel tasks like AI training/inference or virtualization ~TSMC's N2 vs. Intel 18A debates rage on which is "better," but AMD's mature chiplet approach yields better density ( 32 cores/CCD vs. Intel's 48/tile). Venice's redesign reduces latency, aiding agentic AI where CPUs handle orchestration ~ Early projections show Venice widening AMD's lead matching or exceeding Diamond Rapids' perf with fewer watts in multi-threaded benchmarks. Intel's no-SMT design (to prioritize AI) handicaps it vs. AMD's 512 threads, though Clearwater Forest (E-core) could compete in density-focused niches. ~Power & Cooling: Both push above 400-500W, demanding liquid cooling. ~AMD been taking market share now above 40%. AMD EPYC Venice emerges as the superior choice in 2026 for most server workloads. Its higher core/thread count (256/512 vs. 192/192), stronger per-core performance, and architecture optimized for AI-driven tasks (agentic orchestration with GPU integration) provide decisive advantages in throughput, scalability, and efficiency. Projections indicate Venice delivering 1.7x the performance of prior gens while widening the gap over Intel ( 40-70% leads in multi-threaded benchmarks). AMD's fabless model with TSMC ensures reliable scaling, and its ecosystem ( open ROCm) appeals to AI adopters. Intel's Diamond Rapids is competitive in single-threaded enterprise apps and custom hyperscale ( NVLink), with potential fab advantages for supply/security. However, without SMT and lower density, it falls short in core-for-core battles—exposing Intel to another generation of AMD dominance unless 18A yields surprise efficiency gains. For data centers prioritizing raw compute ( AI, HPC), Venice wins; for Intel-centric ecosystems or specialized I/O, Diamond Rapids holds ground. Real benchmarks post-launch will confirm, but logic points to AMD pulling ahead. 2. Market size , Potential Revenue and Supply Global Agentic AI market size is projected to be $3-$5 Trillion by 2030 according to McKinsey, where consensus points to 40-50% CAGR driven by small to large enterprise demand. I also wrote a full thread on how and why Agentic AI is so explosive that AMD will blow all anlaysts estimate for subscribers. Link below if you are interested. AMD's data center segment hit a record $5.4B in Q4 2025 (up 39% YoY), with EPYC shipments ramping due to agentic demand. With 2GW of deployment in H2 2026, AMD AI data center revenue has $40-$50B+ at the lowest or most conservative projection; or Total Revenue in the $77-$94B For FY2026. However, Agentic AI massive demand spike could send EPYC revenue 3x to 4x in the next few years, potentially surpassing MI series GPU demand as enterprises prioritize CPU-dense Rack setups. This is pushing $NVDA Jensen to rush a CPU design and acquired Groq, a new CPU player due to this massive TAM. Noted that this is just popping just in weeks, highlighting we are just so early in this AI Supercycle and the pace of adoption is insane, and clearly productivity will skyrocket. Why? Because Agentic AI is 24/7 Smart AI agent working for you or your businesses is a mad compelling, and it is estimated to be 40-100x more Inference Hugnry! Many experts already said it is impossible to project this kind of Inference Demand. AI CapEx is expected to ramp up even more in 2027-2028-2029 and 2030 as Global Agentic AI is going to scale to $3-$5 Trillion TAM by 2030. The nature of Agentic is driving higher CPU/GPU ratio, with CPUs handling 50-90% of Agentic workflows. For example, The current Helios Rack: 18 compute trays per rack with 72 GPUs + 18 CPUs. The beauty of this $META and $AMD long term partnership is, that it is absolutely flexible to adjust racks to higher CPU rato or equal to service different needs. Helios rack can be easily swap to 2 GPUs 2CPUs or even CPUs only trays for dedicated orchestration/head nodes. You see, the beauty of this open rack-scale is flexibility and evolvability. If Agentic AI demand pushes much higher, AMD should be able to adjust variant trays without abandoning Heilos Rack. We can't talk just about massive Agentic AI demand without talking about the Supply side or TSMC. TSMC, AMD's primary foundry for advanced nodes ( Zen 6/Venice on N2/2nm), is addressing AI-driven shortages through massive expansions. TSMC accelerates fab construction with up to 10 facilities targeted for 2026. TSMC is accelerating its domestic manufacturing expansion, with industry sources indicating that as many as ten fabs could be under construction or preparing to begin operations across Taiwan’s major science parks. TSMC Capex: $52-56B in 2026 (up 37% YoY), with $45B already approved for new/upgraded capacities. 70-80% for advanced processes (2nm/A16), 10-20% for packaging (CoWoS quadrupling to 120-140K wafers/month by late 2026). In addition, Taiwanese companies (led by TSMC) commit to at least $250B in direct investments in US-based advanced semiconductor, AI, and energy production/innovation capacity.Taiwan provides $250B in government credit guarantees to facilitate additional investments and build a full US semiconductor ecosystem (including industrial parks). TSMC completed a second land purchase in Arizona (January 2026) for gigafab scaling, with an additional $100B+ (potentially four more modules) to further expand and qualify for tariff exemptions. AMD with secured 12GW from OpenAI and $META and massive Agentic AI will mean higher priority acess to 20-30% more wafers on TSMC advanced nodes, as TSMC has multi-year agreements with AMD for AI chips. Dr. C. C. Wei, CEO of TSMC quote: "I spend a lot of time in the last three or four months talking to my customer and then customers. Customer. I want to make sure that my customers demand are real. I talk to those cloud service providers, all of them. Their answer is. I'm quite satisfied with their answer. Actually they show me the evidence that the AI really help their business. So they grow their business successfully and he or she in their financial return. So I also double check their financial status. They are very rich." Amid shortages, the US buildout ensures AMD can ramp production of Instinct GPUs and EPYC CPUs without the constraints hitting competitors like Intel. By diversifying away from Taiwan (85% of advanced nodes today), the agreement mitigates supply disruptions, ensuring stable flows for AMD's chips. Scaling production and securing supply will matter for AMD the most in the next 5-10 years growth. The growth could be 80-100% YoY or higher; or it could be in the 60%. The aggressive TSMC supply ramp is reassuring the higher growth point. Conclusion: AMD stands at a pivotal inflection point in 2026, where the explosive rise of agentic AI demanding 40-100x more inference compute through its 24/7, multi-step orchestration positions the company to potentially triple its EPYC CPU revenue to $45-60B+ by 2028 while scaling Instinct GPUs to tens of billions annually by 2027. Agentic AI demand could push AI CapEx closer to $1 Trillion in 2027, far higher than most estimates. Dr. Lisa Su, AMD's visionary CEO, is masterfully securing supply to harness this massive demand by prioritizing operational execution and deep TSMC collaboration, ensuring readiness for the second-half 2026 AI ramp. Dr. Su has explicitly called out surging EPYC demand for agentic tasks where CPUs power head nodes and traditional workloads alongside GPUs while guiding for data center dominance through proactive capacity planning and partnerships like Nutanix ($150M investment for open agentic platforms) or providing tens of millions CPUs for OpenAI, $META, $ORCL, $AMZN, $MSFT, $GOOGL and others. Her strategy includes multi-year TSMC agreements for advanced nodes (N2 for Venice CPUs and future Instincts), diversifying beyond Taiwan to mitigate risks, and unveiling innovations like the MI455X GPU at CES 2026, which she touted as enabling "the next trillion-dollar market opportunity" in physical AI. Dr. Su's forward-looking vision predicting AI reaching 5 billion users emphasizes "AI everywhere," backed by hardware like Ryzen AI chips, all while declaring demand "going through the roof" and committing to scale without bottlenecks. TSMC's aggressive ramp-up, fueled by $52-56B in 2026 capex (up 37% YoY) and 10+ new fabs across Taiwan, the US (Arizona cluster expanding to 6+ modules with $165B+ investment), Japan, and Europe, provides profound reassurance for AMD's supply stability. The January 2026 US-Taiwan agreement committing $250B in investments and credit guarantees for US reshoring accelerates this, granting tariff relief (15% rates with 1.5-2.5x exemptions) tied to capacity buildouts, enabling TSMC to potentially double output over the decade to meet AI wafer hunger. This translates to 20-30% higher wafer allocations on key nodes, sidestepping Intel-like shortages and empowering Dr. Su's team to deliver on hyperscaler demands without disruption. Ultimately, this synergy cements AMD's leadership in the agentic era, promising sustained growth, $5T+ valuations at scale, and a resilient path forward as AI reshapes the world. This is NOT Financial Advice! Video source: AMD CES 2026

$AMD $5 Trillion is Inevitable LT| Agentic AI🧵 Agentic AI is the new $5 Trillion TAM 🚨🚨🚨 This thead will do Comp with $INTC and how to quantify this massive Agentic AI demand spike, and forcing Jensen to rush a CPU design. Global Agentic AI Market size is estimated to be $3-$5Trillion TAM by 2030(McKinsey) Quantifying the demand from agentic AI for AMD involves assessing the broader market growth for agentic systems, their unique computational requirements (particularly for CPUs in orchestration and reasoning tasks), and AMD's positioning very well through products like EPYC processors and partnerships. AMD EPYC Venice is the most superior choice in 2026-2027 for most Agentic AI workloads Agentic AI refers to autonomous AI agents that perform multi-step tasks, involving sequential logic, tool integration, and decision-making workloads that heavily rely on CPUs for handling orchestration, memory management, and context switching, rather than just GPU-parallelized training or batch inference. Agentic AI is often cited as 40-100x more "hungry" than traditional AI due to its continuous, 24/7 operation and complex workflows. This stems from factors like chain-of-thought reasoning (multiple LLM calls per query), API/tool interactions, memory management, and orchestration loops, which can generate 10-100x more tokens and require real-time responsiveness. For example, a single agentic query might trigger 5-20 model inferences, making it 10-20x more compute-intensive than simple chatbots, and the always-on nature compounds this to 40-100x overall. Nvidia's CEO has highlighted this as driving "easily 100x more computation" for inference in agentic/reasoning setups. AMD's EPYC Venice (6th Gen EPYC, codenamed "Venice") and Intel's Xeon 7 Diamond Rapids represent the pinnacle of server CPU technology in 2026, both targeting high-performance data center workloads like AI inference, agentic AI orchestration, cloud computing, and HPC. Venice builds on AMD's Zen 6 architecture, emphasizing core density and efficiency, while Diamond Rapids leverages Intel's Panther Cove P-cores for balanced performance. Both chips adopt similar advancements like 16-channel DDR5 memory and PCIe Gen 6, but differ in core counts, process nodes, and overall design philosophy. Intel has faced acute supply constraints across its Xeon lineup, including legacy nodes (Intel 7/3) and the ramping 18A process for next-gen parts. Intel shortage is expected with lead times up to 6 months or longer. 1. AMD EPYC Venice vs Intel Xeon 7 Diamond Rapids Architecture AMD: Zen 6 chiplet design with 8 CCDs and dual IODs Intel: Panther Cove P-cores; multi-die architecture with 4 compute tiles Core/Thread Count AMD: Up to 256 cores / 512 threads (Zen 6c variant) Intel: Up to 192 cores / 192 threads Process Node AMD: TSMC N2 (2nm) Intel: Intel 18A (1.8nm-class); in-house fab Memory Support AMD: 16-channel DDR5; up to 1.6 TB/s bandwidth. Intel: 16-channel DDR5 ; up to 1.6 TB/s bandwidth I/O and Connectivity AMD: PCIe Gen 6 (up to 128 lanes); twice the CPU-to-GPU bandwidth Intel: PCIe Gen 6 (up to 128 lanes); LGA 9324 socket Power (TDP) AMD: Starting 400-500W, potentially lower due to efficiency gains from TSMC 2nm Intel: Starting 400-500W, as it targets competitive efficiency Performance Projections AMD: Up to 70% uplift vs. 5th Gen Turin (1.7x in multi-threaded/AI tasks) Intel: ~40% faster than Granite Rapids (Xeon 6, 128-core). Lags AMD in per-core perf and 40-50% behind Venice core-for-core comp Target Workloads AMD: AI inference/orchestration, HPC, cloud virtualization. Partnerships Intel: Hyperscale AI, general enterprise. Custom silicon Pricing: AMD: estimated $10k-$20k for top SKUs Intel: estimated $8-$18k Availability: AMD: Significant Ramp H2 2026 due to higher allocation from TSMC Intel: H1-H2 2026 delayed, but trying to catch up Overall: ~Venice's 256 cores provide a 33% edge over Diamond Rapids' 192, making it superior for massively parallel tasks like AI training/inference or virtualization ~TSMC's N2 vs. Intel 18A debates rage on which is "better," but AMD's mature chiplet approach yields better density ( 32 cores/CCD vs. Intel's 48/tile). Venice's redesign reduces latency, aiding agentic AI where CPUs handle orchestration ~ Early projections show Venice widening AMD's lead matching or exceeding Diamond Rapids' perf with fewer watts in multi-threaded benchmarks. Intel's no-SMT design (to prioritize AI) handicaps it vs. AMD's 512 threads, though Clearwater Forest (E-core) could compete in density-focused niches. ~Power & Cooling: Both push above 400-500W, demanding liquid cooling. ~AMD been taking market share now above 40%. AMD EPYC Venice emerges as the superior choice in 2026 for most server workloads. Its higher core/thread count (256/512 vs. 192/192), stronger per-core performance, and architecture optimized for AI-driven tasks (agentic orchestration with GPU integration) provide decisive advantages in throughput, scalability, and efficiency. Projections indicate Venice delivering 1.7x the performance of prior gens while widening the gap over Intel ( 40-70% leads in multi-threaded benchmarks). AMD's fabless model with TSMC ensures reliable scaling, and its ecosystem ( open ROCm) appeals to AI adopters. Intel's Diamond Rapids is competitive in single-threaded enterprise apps and custom hyperscale ( NVLink), with potential fab advantages for supply/security. However, without SMT and lower density, it falls short in core-for-core battles—exposing Intel to another generation of AMD dominance unless 18A yields surprise efficiency gains. For data centers prioritizing raw compute ( AI, HPC), Venice wins; for Intel-centric ecosystems or specialized I/O, Diamond Rapids holds ground. Real benchmarks post-launch will confirm, but logic points to AMD pulling ahead. 2. Market size , Potential Revenue and Supply Global Agentic AI market size is projected to be $3-$5 Trillion by 2030 according to McKinsey, where consensus points to 40-50% CAGR driven by small to large enterprise demand. I also wrote a full thread on how and why Agentic AI is so explosive that AMD will blow all anlaysts estimate for subscribers. Link below if you are interested. AMD's data center segment hit a record $5.4B in Q4 2025 (up 39% YoY), with EPYC shipments ramping due to agentic demand. With 2GW of deployment in H2 2026, AMD AI data center revenue has $40-$50B+ at the lowest or most conservative projection; or Total Revenue in the $77-$94B For FY2026. However, Agentic AI massive demand spike could send EPYC revenue 3x to 4x in the next few years, potentially surpassing MI series GPU demand as enterprises prioritize CPU-dense Rack setups. This is pushing $NVDA Jensen to rush a CPU design and acquired Groq, a new CPU player due to this massive TAM. Noted that this is just popping just in weeks, highlighting we are just so early in this AI Supercycle and the pace of adoption is insane, and clearly productivity will skyrocket. Why? Because Agentic AI is 24/7 Smart AI agent working for you or your businesses is a mad compelling, and it is estimated to be 40-100x more Inference Hugnry! Many experts already said it is impossible to project this kind of Inference Demand. AI CapEx is expected to ramp up even more in 2027-2028-2029 and 2030 as Global Agentic AI is going to scale to $3-$5 Trillion TAM by 2030. The nature of Agentic is driving higher CPU/GPU ratio, with CPUs handling 50-90% of Agentic workflows. For example, The current Helios Rack: 18 compute trays per rack with 72 GPUs + 18 CPUs. The beauty of this $META and $AMD long term partnership is, that it is absolutely flexible to adjust racks to higher CPU rato or equal to service different needs. Helios rack can be easily swap to 2 GPUs 2CPUs or even CPUs only trays for dedicated orchestration/head nodes. You see, the beauty of this open rack-scale is flexibility and evolvability. If Agentic AI demand pushes much higher, AMD should be able to adjust variant trays without abandoning Heilos Rack. We can't talk just about massive Agentic AI demand without talking about the Supply side or TSMC. TSMC, AMD's primary foundry for advanced nodes ( Zen 6/Venice on N2/2nm), is addressing AI-driven shortages through massive expansions. TSMC accelerates fab construction with up to 10 facilities targeted for 2026. TSMC is accelerating its domestic manufacturing expansion, with industry sources indicating that as many as ten fabs could be under construction or preparing to begin operations across Taiwan’s major science parks. TSMC Capex: $52-56B in 2026 (up 37% YoY), with $45B already approved for new/upgraded capacities. 70-80% for advanced processes (2nm/A16), 10-20% for packaging (CoWoS quadrupling to 120-140K wafers/month by late 2026). In addition, Taiwanese companies (led by TSMC) commit to at least $250B in direct investments in US-based advanced semiconductor, AI, and energy production/innovation capacity.Taiwan provides $250B in government credit guarantees to facilitate additional investments and build a full US semiconductor ecosystem (including industrial parks). TSMC completed a second land purchase in Arizona (January 2026) for gigafab scaling, with an additional $100B+ (potentially four more modules) to further expand and qualify for tariff exemptions. AMD with secured 12GW from OpenAI and $META and massive Agentic AI will mean higher priority acess to 20-30% more wafers on TSMC advanced nodes, as TSMC has multi-year agreements with AMD for AI chips. Dr. C. C. Wei, CEO of TSMC quote: "I spend a lot of time in the last three or four months talking to my customer and then customers. Customer. I want to make sure that my customers demand are real. I talk to those cloud service providers, all of them. Their answer is. I'm quite satisfied with their answer. Actually they show me the evidence that the AI really help their business. So they grow their business successfully and he or she in their financial return. So I also double check their financial status. They are very rich." Amid shortages, the US buildout ensures AMD can ramp production of Instinct GPUs and EPYC CPUs without the constraints hitting competitors like Intel. By diversifying away from Taiwan (85% of advanced nodes today), the agreement mitigates supply disruptions, ensuring stable flows for AMD's chips. Scaling production and securing supply will matter for AMD the most in the next 5-10 years growth. The growth could be 80-100% YoY or higher; or it could be in the 60%. The aggressive TSMC supply ramp is reassuring the higher growth point. Conclusion: AMD stands at a pivotal inflection point in 2026, where the explosive rise of agentic AI demanding 40-100x more inference compute through its 24/7, multi-step orchestration positions the company to potentially triple its EPYC CPU revenue to $45-60B+ by 2028 while scaling Instinct GPUs to tens of billions annually by 2027. Agentic AI demand could push AI CapEx closer to $1 Trillion in 2027, far higher than most estimates. Dr. Lisa Su, AMD's visionary CEO, is masterfully securing supply to harness this massive demand by prioritizing operational execution and deep TSMC collaboration, ensuring readiness for the second-half 2026 AI ramp. Dr. Su has explicitly called out surging EPYC demand for agentic tasks where CPUs power head nodes and traditional workloads alongside GPUs while guiding for data center dominance through proactive capacity planning and partnerships like Nutanix ($150M investment for open agentic platforms) or providing tens of millions CPUs for OpenAI, $META, $ORCL, $AMZN, $MSFT, $GOOGL and others. Her strategy includes multi-year TSMC agreements for advanced nodes (N2 for Venice CPUs and future Instincts), diversifying beyond Taiwan to mitigate risks, and unveiling innovations like the MI455X GPU at CES 2026, which she touted as enabling "the next trillion-dollar market opportunity" in physical AI. Dr. Su's forward-looking vision predicting AI reaching 5 billion users emphasizes "AI everywhere," backed by hardware like Ryzen AI chips, all while declaring demand "going through the roof" and committing to scale without bottlenecks. TSMC's aggressive ramp-up, fueled by $52-56B in 2026 capex (up 37% YoY) and 10+ new fabs across Taiwan, the US (Arizona cluster expanding to 6+ modules with $165B+ investment), Japan, and Europe, provides profound reassurance for AMD's supply stability. The January 2026 US-Taiwan agreement committing $250B in investments and credit guarantees for US reshoring accelerates this, granting tariff relief (15% rates with 1.5-2.5x exemptions) tied to capacity buildouts, enabling TSMC to potentially double output over the decade to meet AI wafer hunger. This translates to 20-30% higher wafer allocations on key nodes, sidestepping Intel-like shortages and empowering Dr. Su's team to deliver on hyperscaler demands without disruption. Ultimately, this synergy cements AMD's leadership in the agentic era, promising sustained growth, $5T+ valuations at scale, and a resilient path forward as AI reshapes the world. This is NOT Financial Advice! Video source: AMD CES 2026

Mike

44,460 görüntüleme • 5 ay önce

$AMD Massive Rotation from $NVDA $INTC🧵 Not Financial Advice! DYOR! 5-10 minutes before the bell today, last trading day of May 2026, massive rotation out of $INTC and $NVDA into $AMD. I wrote this thread this morning on what $TSM said on Energy Efficiency is now TOP Priotity and why AMD is the biggest winner. Of course I did not have influence on this rebalancing, I was just pointing out why Dr. Su saw this coming years ago. (Check the picture to understand more). I been talking about Agentic AI for like 3-4 years now. OpenClaw broke the CPU:GPU Ratio 1:4 narrative to 1:1 to 5:1 in late Jan and Feb 2026. I will link various threads where you can understand the full picture from supply chain, to TSMC expansion, and different Wafer Ratio for EPYC Venice and MI455X. Energy efficiency is a structural, long-term driver behind institutional rotation from $NVDA and $INTC into $AMD (with spillover strength in $AVGO for complementary networking/custom silicon). This isn't just short-term rebalancing, it's a massive bet on the shift from AI training (performance-at-any-cost) to inference, deployment, and embodied/agentic systems (where total cost of ownership, power draw, and scalability dominate). Precisely What I been writing about $AMD for years now, probably at least more than 5,000 threads.This is the FOMO from Institutions to own $AMD. Do know that AMD is the least owned Semi Stock among vs Peers. AI infrastructure is moving beyond massive training clusters to widespread inference for Agentic AI (running models 24/7) and embodied AI (robots, autonomous agents, edge devices). These workloads prioritize: ~Tokens-per-watt and performance-per-watt ~Lower total power consumption for data centers facing grid constraints ~Better economics at scale (cost-per-token, TCO) ~Thermal and power efficiency for on-device/robotics use Hyperscalers are now thinking more about Margin, Profitability, and $/M Tokens At $516/share. AMD Fwd PEG Ratio is still 35/100+= 0.35 AKA very cheap IMO for the growth and potential. A. Why institutions rotated out of $NVDA? Because Agentic AI is going to dominated by CPUs for years to come, moving violently to 5-10-20:1 CPU:GPU Ratio as enterprises are demanding more than 10-20 agents to run tasks. Now, that does not mean training is going away, Inference is just going to grow much faster. B. Why instiutitons rotated out of $INTC? Because AMD x86 unit share is only at 30-31% but Revenue share is already at 46.2% according to Mercury Research. And Dr. Su wants 50-60% market share, and that would mean 60-70%+ Revenue share where the CPUs TAM Is now already at $200B in 2026 and projected to be $500B by 2030. C. Why $AMD? Because AMD secured meaningful 2nm Capacity, Advanced Packaging and Memory through 2027-2028. And TSMC is expanding 2 primary 2nm Fabs toward 60-65k WPM each, and speeding up 5 2nm Fabs in Taiwan. With total up to 12 2nm Fabs through 2027/2028. 2nm Capacity is expected to be 140k+ WPM toward end of 2026, and 220-240k WPM by end of 2027. Apple has secured 35-45k WPM. And AMD does not have to worry about allocation competition until late 2027 from $AVGO for $META and $GOOGL(This may change) D. Agentic AI will evolve to 24/7 Autonomous Agent, and that will become the foundational layer for Robotic or Physical AI. Agentic AI (autonomous systems that plan, reason, use tools, self-correct, pursue long-horizon goals, and adapt) provides the high-level cognitive architecture. It turns raw perception and low-level control into useful, general-purpose behavior in the physical world. Physical AI (or Embodied AI) refers to AI that senses, understands, and acts directly in the real world through robots, actuators, and sensors. Agentic capabilities are what make this scalable and useful beyond narrow, scripted tasks. Reactive/programmed machines → To proactive, goal-oriented autonomous agents. How does this work? Autonomous Agent layer is the brain ~Vision-Language-Action models or robotics foundation models. ~Agentic loops: Planning, chain-of-thought reasoning, reflection, tool use (simulators, APIs), multi-step task decomposition. ~Persistent 24/7 operation with Memory, world modeling, continuous learning. Institutions may not like $AMD from 2022-2025, but they cannot stop this evolution and it is inevitable. Part of my main thesis for AMD to get to $5 Trillion Market Cap Long Term. Conclusion: Institutions are rotating capital toward AMD not merely for tactical rebalancing, but because Dr. Lisa Su and her team anticipated this exact inflection years in advance and have been methodically engineering AMD’s platform to dominate it. Dr. Su has long championed the convergence of Agentic AI as the high-level cognitive foundation for Physical AI and robotics. As far back as her 2023/2024 CES keynote and earlier strategic commentary, she described Physical AI (including humanoid robotics and edge autonomy) as “the next big thing”; a natural extension of agentic workflows moving from digital reasoning to real-world action. She emphasized that enabling persistent, 24/7 autonomous agents requires a full-stack approach: high-performance CPUs for orchestration and motion control, dedicated accelerators for real-time vision and multimodal inference, and open software ecosystems for rapid development. This vision aligns precisely with the structural drivers we’ve discussed. As AI shifts from training to massive-scale inference and embodiment, energy efficiency, total cost of ownership, and heterogeneous compute become first-order advantages. AMD’s Instinct MI350/MI355 series, Ryzen AI Embedded processors, and EPYC platforms deliver superior performance-per-watt and balanced CPU + GPU + NPU integration ideal for power-constrained robots that must run sophisticated agentic reasoning loops without excessive thermal or battery drain. Dr. Su has repeatedly highlighted the rising importance of CPUs in agentic systems (moving toward 1:1 or even CPU-heavy ratios with GPUs), positioning AMD’s strengths in orchestration, memory handling, and efficiency as critical for the next phase of growth. AMD is engineered for the deployment realities of embodied agents: scalable, efficient, and deployable at the edge and in physical systems. The institutional flows out of NVDA and INTC into AMD reflect recognition of this prepared leadership. Dr. Su didn’t just see the future of Agentic AI powering robotics, she has spent years building the silicon, software, and partnerships to make it practical and economically viable. This rotation signals confidence that the companies best positioned for the physical, always-on intelligence layer will capture the highest-volume opportunities in the coming decade. Not Financial Advice! DYOR!

$AMD Massive Rotation from $NVDA $INTC🧵 Not Financial Advice! DYOR! 5-10 minutes before the bell today, last trading day of May 2026, massive rotation out of $INTC and $NVDA into $AMD. I wrote this thread this morning on what $TSM said on Energy Efficiency is now TOP Priotity and why AMD is the biggest winner. Of course I did not have influence on this rebalancing, I was just pointing out why Dr. Su saw this coming years ago. (Check the picture to understand more). I been talking about Agentic AI for like 3-4 years now. OpenClaw broke the CPU:GPU Ratio 1:4 narrative to 1:1 to 5:1 in late Jan and Feb 2026. I will link various threads where you can understand the full picture from supply chain, to TSMC expansion, and different Wafer Ratio for EPYC Venice and MI455X. Energy efficiency is a structural, long-term driver behind institutional rotation from $NVDA and $INTC into $AMD (with spillover strength in $AVGO for complementary networking/custom silicon). This isn't just short-term rebalancing, it's a massive bet on the shift from AI training (performance-at-any-cost) to inference, deployment, and embodied/agentic systems (where total cost of ownership, power draw, and scalability dominate). Precisely What I been writing about $AMD for years now, probably at least more than 5,000 threads.This is the FOMO from Institutions to own $AMD. Do know that AMD is the least owned Semi Stock among vs Peers. AI infrastructure is moving beyond massive training clusters to widespread inference for Agentic AI (running models 24/7) and embodied AI (robots, autonomous agents, edge devices). These workloads prioritize: ~Tokens-per-watt and performance-per-watt ~Lower total power consumption for data centers facing grid constraints ~Better economics at scale (cost-per-token, TCO) ~Thermal and power efficiency for on-device/robotics use Hyperscalers are now thinking more about Margin, Profitability, and $/M Tokens At $516/share. AMD Fwd PEG Ratio is still 35/100+= 0.35 AKA very cheap IMO for the growth and potential. A. Why institutions rotated out of $NVDA? Because Agentic AI is going to dominated by CPUs for years to come, moving violently to 5-10-20:1 CPU:GPU Ratio as enterprises are demanding more than 10-20 agents to run tasks. Now, that does not mean training is going away, Inference is just going to grow much faster. B. Why instiutitons rotated out of $INTC? Because AMD x86 unit share is only at 30-31% but Revenue share is already at 46.2% according to Mercury Research. And Dr. Su wants 50-60% market share, and that would mean 60-70%+ Revenue share where the CPUs TAM Is now already at $200B in 2026 and projected to be $500B by 2030. C. Why $AMD? Because AMD secured meaningful 2nm Capacity, Advanced Packaging and Memory through 2027-2028. And TSMC is expanding 2 primary 2nm Fabs toward 60-65k WPM each, and speeding up 5 2nm Fabs in Taiwan. With total up to 12 2nm Fabs through 2027/2028. 2nm Capacity is expected to be 140k+ WPM toward end of 2026, and 220-240k WPM by end of 2027. Apple has secured 35-45k WPM. And AMD does not have to worry about allocation competition until late 2027 from $AVGO for $META and $GOOGL(This may change) D. Agentic AI will evolve to 24/7 Autonomous Agent, and that will become the foundational layer for Robotic or Physical AI. Agentic AI (autonomous systems that plan, reason, use tools, self-correct, pursue long-horizon goals, and adapt) provides the high-level cognitive architecture. It turns raw perception and low-level control into useful, general-purpose behavior in the physical world. Physical AI (or Embodied AI) refers to AI that senses, understands, and acts directly in the real world through robots, actuators, and sensors. Agentic capabilities are what make this scalable and useful beyond narrow, scripted tasks. Reactive/programmed machines → To proactive, goal-oriented autonomous agents. How does this work? Autonomous Agent layer is the brain ~Vision-Language-Action models or robotics foundation models. ~Agentic loops: Planning, chain-of-thought reasoning, reflection, tool use (simulators, APIs), multi-step task decomposition. ~Persistent 24/7 operation with Memory, world modeling, continuous learning. Institutions may not like $AMD from 2022-2025, but they cannot stop this evolution and it is inevitable. Part of my main thesis for AMD to get to $5 Trillion Market Cap Long Term. Conclusion: Institutions are rotating capital toward AMD not merely for tactical rebalancing, but because Dr. Lisa Su and her team anticipated this exact inflection years in advance and have been methodically engineering AMD’s platform to dominate it. Dr. Su has long championed the convergence of Agentic AI as the high-level cognitive foundation for Physical AI and robotics. As far back as her 2023/2024 CES keynote and earlier strategic commentary, she described Physical AI (including humanoid robotics and edge autonomy) as “the next big thing”; a natural extension of agentic workflows moving from digital reasoning to real-world action. She emphasized that enabling persistent, 24/7 autonomous agents requires a full-stack approach: high-performance CPUs for orchestration and motion control, dedicated accelerators for real-time vision and multimodal inference, and open software ecosystems for rapid development. This vision aligns precisely with the structural drivers we’ve discussed. As AI shifts from training to massive-scale inference and embodiment, energy efficiency, total cost of ownership, and heterogeneous compute become first-order advantages. AMD’s Instinct MI350/MI355 series, Ryzen AI Embedded processors, and EPYC platforms deliver superior performance-per-watt and balanced CPU + GPU + NPU integration ideal for power-constrained robots that must run sophisticated agentic reasoning loops without excessive thermal or battery drain. Dr. Su has repeatedly highlighted the rising importance of CPUs in agentic systems (moving toward 1:1 or even CPU-heavy ratios with GPUs), positioning AMD’s strengths in orchestration, memory handling, and efficiency as critical for the next phase of growth. AMD is engineered for the deployment realities of embodied agents: scalable, efficient, and deployable at the edge and in physical systems. The institutional flows out of NVDA and INTC into AMD reflect recognition of this prepared leadership. Dr. Su didn’t just see the future of Agentic AI powering robotics, she has spent years building the silicon, software, and partnerships to make it practical and economically viable. This rotation signals confidence that the companies best positioned for the physical, always-on intelligence layer will capture the highest-volume opportunities in the coming decade. Not Financial Advice! DYOR!

Mike

104,109 görüntüleme • 2 ay önce

Micron is going to $4,000 and once you understand what inference actually is, the number stops sounding crazy (Save this). Dylan Patel just said that by 2030, OpenAI and Anthropic alone will need over 100 gigawatts of compute combined and by 2040, we may not even be measuring AI infrastructure in gigawatts anymore. We may be talking about terawatts. Every single one of those gigawatts needs memory to function. Without it, the compute is worthless. Most people heard that and thought about Nvidia but they should be thinking about Micron. Every AI model generating a response has two phases. The first is prefill, processing your prompt which is compute-heavy and the second is decode generating each word one token at a time and that phase is almost entirely memory-bound, not compute-bound. During decode, the GPU's processing units sit idle more than 95% of the time, waiting for data to arrive from memory. Google confirmed it in a research paper that decode-phase bottlenecks are dominated by memory bandwidth and capacity not raw compute. The GPU is not the bottleneck but the memory feeding the GPU is. This matters because inference is now where all the money lives. Training a model happens once, Inference happens billions of times a day every ChatGPT response, every Claude output, every agentic workflow running in the background and every one of those token streams is a billing event tied directly to memory performance. Adding more GPUs does not fix this because GPUs are already underutilized in inference because they are sitting idle waiting on memory. Adding more memory bandwidth and capacity is what directly reduces token cost, reduces latency, and allows the same cluster to serve dramatically more users simultaneously. Longer context windows compound the problem further, a model running a 1 million token context window requires dramatically more memory per session than a 10,000 token window, and every new model generation pushes context longer. The market treats memory as a downstream beneficiary of Nvidia orders. The correct framework is the opposite, Micron is the upstream constraint on how much value every Nvidia GPU can actually generate at inference scale. Micron guided Q4 to $50 billion in revenue, has HBM4 ramping at twice the pace of the prior generation, and CEO Sanjay Mehrotra has said supply will not catch demand before the end of 2027. At 8x forward earnings on $112 projected FY2027 EPS, Micron is the most undervalued infrastructure company in the entire AI stack. Inference is memory. Memory is Micron and the inference ramp has barely started. Milk Road Pro members are already up massively on this position and we're just getting started. If you want the full breakdown of what we're buying and why, come join us for just a dollar using the link below!

Micron is going to $4,000 and once you understand what inference actually is, the number stops sounding crazy (Save this). Dylan Patel just said that by 2030, OpenAI and Anthropic alone will need over 100 gigawatts of compute combined and by 2040, we may not even be measuring AI infrastructure in gigawatts anymore. We may be talking about terawatts. Every single one of those gigawatts needs memory to function. Without it, the compute is worthless. Most people heard that and thought about Nvidia but they should be thinking about Micron. Every AI model generating a response has two phases. The first is prefill, processing your prompt which is compute-heavy and the second is decode generating each word one token at a time and that phase is almost entirely memory-bound, not compute-bound. During decode, the GPU's processing units sit idle more than 95% of the time, waiting for data to arrive from memory. Google confirmed it in a research paper that decode-phase bottlenecks are dominated by memory bandwidth and capacity not raw compute. The GPU is not the bottleneck but the memory feeding the GPU is. This matters because inference is now where all the money lives. Training a model happens once, Inference happens billions of times a day every ChatGPT response, every Claude output, every agentic workflow running in the background and every one of those token streams is a billing event tied directly to memory performance. Adding more GPUs does not fix this because GPUs are already underutilized in inference because they are sitting idle waiting on memory. Adding more memory bandwidth and capacity is what directly reduces token cost, reduces latency, and allows the same cluster to serve dramatically more users simultaneously. Longer context windows compound the problem further, a model running a 1 million token context window requires dramatically more memory per session than a 10,000 token window, and every new model generation pushes context longer. The market treats memory as a downstream beneficiary of Nvidia orders. The correct framework is the opposite, Micron is the upstream constraint on how much value every Nvidia GPU can actually generate at inference scale. Micron guided Q4 to $50 billion in revenue, has HBM4 ramping at twice the pace of the prior generation, and CEO Sanjay Mehrotra has said supply will not catch demand before the end of 2027. At 8x forward earnings on $112 projected FY2027 EPS, Micron is the most undervalued infrastructure company in the entire AI stack. Inference is memory. Memory is Micron and the inference ramp has barely started. Milk Road Pro members are already up massively on this position and we're just getting started. If you want the full breakdown of what we're buying and why, come join us for just a dollar using the link below!

Milk Road AI

128,678 görüntüleme • 1 ay önce

$AMD Strategic Price Positioning Long🧵 AMD is increasingly the most hated semi stock that can rival $NVDA dominance in GPUs and software(Cuda v. ROCm). $AMD is also the most under-owned among all Funds in 2025 according to Bank of America! For what I learnt for years as an investor with Dr. Lisa Su, all analysts and market are underestimate Dr. Su leadership. $AMD is capable of raising price, making high quality hardware with software. Dr. Su or AMD choice to adopt a lower price strategy to gain market share is a deliberate and multifacets approach rooted in competitive positioning, market dynamics, and long-term growth objectives. As an investor, it may take time like CPUs and embedded to see margin improving. 1. . Penetration Pricing to Challenge Dominant Competitors AMD has historically positioned itself as a cost-effective alternative to dominant players like Intel in CPUs and Nvidia in GPUs. By setting prices lower than competitors, AMD aims to attract customers and quickly gain market share. This is a classic penetration pricing strategy, where the goal is to capture a significant portion of the market by offering high-performance products at a lower price point. ~CPU Market Example: When AMD launched its Ryzen processors in 2017, it priced them competitively compared to Intel's Core processors, emphasizing a better price-to-performance ratio. Ryzen CPUs offered higher core counts and multi-core performance at lower prices, appealing to cost-conscious consumers, gamers, and professionals. This strategy helped AMD increase its CPU market share to 16.6% by early 2025, narrowing the gap with Intel. ~GPU Market Context: In the GPU market, where Nvidia holds an 88% share compared to AMD's 12%, AMD has been criticized for not launching GPUs at low enough prices to compete effectively. However, posts on X and articles suggest AMD is shifting its GPU strategy to focus on mainstream, cost-effective products rather than high-end enthusiast segments, aiming to regain market share through competitive pricing. 2. Appealing to Cost-Conscious Market Segments AMD targets price-sensitive customers, including gamers, small businesses, and enterprises looking for high-performance computing at a lower cost. This is particularly effective in segments where performance is critical, but budgets are constrained. ~Value Proposition: AMD’s Ryzen and EPYC processors, as well as Radeon GPUs, are designed to deliver performance comparable to or better than competitors in specific workloads (e.g., multi-core processing or AI compute) at a lower price. For example, Ryzen processors have been noted for their superior multi-core performance compared to Intel CPUs at similar or lower price points, making them attractive for tasks like video editing or gaming. ~AI and Data Center: In the AI and data center markets, AMD’s cost-effective Instinct MI300X GPUs and EPYC CPUs target enterprises seeking affordable alternatives to Nvidia’s expensive AI ecosystem. This strategy taps into an underleveraged market segment that Nvidia’s broad, premium-priced AI solutions may not fully address. 3. Building Scale and Developer Support AMD’s leadership, including Jack Huynh, has emphasized the importance of scale—gaining a larger market share to attract developer support and optimize software ecosystems. A lower price strategy helps AMD achieve this by increasing adoption among consumers and enterprises. ~Gaming GPUs: By focusing on mainstream GPUs with competitive pricing (e.g., targeting an 80% addressable market rather than the high-end 10%), AMD aims to build a larger user base. This scale encourages developers to optimize games for AMD’s technologies, such as FSR 3 (FidelityFX Super Resolution) and Anti-Lag 2, improving the ecosystem and competitiveness against Nvidia’s CUDA platform. ~Open Ecosystem in AI: AMD’s open-source ROCm platform contrasts with Nvidia’s proprietary CUDA, appealing to developers who prefer flexibility. Lower-priced hardware makes it easier for developers to adopt AMD’s solutions, fostering a broader AI software ecosystem. 4. Historical Context and Brand Positioning Since its founding in 1969, AMD has positioned itself as a challenger brand, often acting as a “second source” supplier to Intel. This role required competitive pricing to gain a foothold in markets dominated by established players. Over time, AMD has built a reputation for quality and affordability, reinforced by products like the Am9080 (a reverse-engineered Intel 8080) and modern Ryzen and EPYC lines. This historical strategy of undercutting competitors’ prices while delivering comparable performance continues to define AMD’s approach. 5. Countering Competitor Dominance AMD operates in highly competitive markets where Intel and Nvidia have significant advantages in brand recognition, market share, and ecosystems. A lower price strategy is a pragmatic way to disrupt this in CPUs: ~Intel’s historical dominance in the CPU market (servers, desktops, and laptops) has been challenged by AMD’s Ryzen and EPYC processors, which offer better value. For instance, AMD’s EPYC CPUs have driven a 122% year-over-year revenue increase in the data center segment, partly due to their cost-effectiveness, helping AMD capture 94% of CPU sales at some retailers. ~Nvidia in GPUs: Nvidia’s 88% GPU market share and premium pricing (e.g., high-end GPUs like the RTX 4090) leave room for AMD to compete in the mid-to-low range. However, AMD’s failure to launch GPUs at sufficiently low prices (e.g., the RX 7900 XT at $900 instead of its current $680) has limited its success, prompting a strategic shift toward more aggressive pricing in future RDNA 4 GPUs. 6. Market Share as a Long-Term Investment AMD’s lower price strategy is not just about immediate sales but also about long-term market positioning. By capturing market share, AMD can: ~Increase Brand Loyalty: Affordable, high-performance products build customer loyalty, especially among gamers and small businesses, creating a foundation for future sales. ~Drive Revenue Growth: Market share gains in CPUs (e.g., 16.6% in 2025) and data centers (e.g., $3.5 billion in Q3 revenue) translate into higher revenue, even if margins are initially lower. ~Influence Industry Standards: Greater market presence allows AMD to influence hardware and software standards, such as pushing for open-source AI frameworks or gaming optimizations, reducing reliance on competitors’ proprietary systems. 7. Challenges and Risks While effective, AMD’s lower price strategy carries risks: ~Profitability Concerns: Lower prices can compress profit margins, and some analysts note that AMD’s high stock valuation expects future profitability that may be delayed if pricing remains aggressive. ~Perception of Quality: Persistently low prices risk positioning AMD as a “budget” brand, potentially undermining its ability to compete in premium segments. ~Competitor Response: Intel and Nvidia can counter with price cuts or superior features, as seen with Nvidia’s feature-rich GPUs. AMD must balance price with innovation to avoid being outmaneuvered. 8. Strategic Shift in GPUs Recent reports indicate AMD is adjusting its GPU strategy to prioritize market share over competing in the high-end enthusiast segment. For the upcoming Radeon RX 8000 series (RDNA 4), AMD is focusing on mainstream GPUs priced competitively to appeal to a broader audience, rather than chasing Nvidia’s high-end dominance. This shift aligns with AMD’s broader goal of achieving 40–50% market share by targeting the “80%” of the market that prioritizes affordability over premium features. Lastly, AMD’s lower price strategy is a calculated move to disrupt Intel and Nvidia’s dominance, capture market share, and build scale for long-term growth. By offering high-performance CPUs and GPUs at competitive prices, AMD appeals to cost-conscious consumers and enterprises, particularly in the CPU and AI markets, where it has seen significant gains (e.g., 16.6% CPU market share and $3.5 billion in data center revenue). Recent price increase on MI350 and MI355 and more on MI400 signaled #AI chip leadership and pricing power, which will result in significant top and bottom line growth.

Mike

38,006 görüntüleme • 10 ay önce

$AMD $NVDA & the AMD Bear SemiAnalysis 🧵 Here are some facts: $META allocated 42% AI GPUs to $AMD OpenAI allocated 6GW(38%) to $AMD 1. Model-Specific Bias: Llama 3.3 70B graph favored NVIDIA due to TRT-LLM optimizations, highlighting throughput and latency where Blackwell excels. In contrast, the GPT-OSS 120B chart shifts focus to cost and interactivity, where MI355X shines. This selective model choice clearly suggests SemiAnalysis tailors benchmarks to reinforce narratives—NVIDIA’s dominance in speed (Llama 3.3) and AMD’s niche in cost (GPT-OSS). GPT-OSS 120B, with its sparse attention mechanisms (similar to DeepSeek-V3.2-Exp), shows AMD’s CDNA 4 architecture, while Llama 3.3’s dense attention favors NVIDIA’s Tensor Cores. SemiAnalysis’ decision to emphasize Llama 3.3 initially could reflect its AMD bear stance. 2. The way Data is presented The Llama 3.3 graph focused on raw performance metrics (throughput vs. latency), downplaying cost, where AMD holds an edge. This new chart, buried in follow-up posts, reveals AMD’s strength but receives less prominence, suggesting a curated narrative. Labeling variability (e.g., B200 with/without TRT) and the lack of uniform scaling across graphs indicate potential cherry-picking of configurations to favor NVIDIA’s optimized setups. 3. Historical Context: SemiAnalysis’ past critiques of AMD’s R&D and ROCm (web results from May 2025) align with a bearish outlook. Their own hype/brand around NVIDIA’s 15x ROI contrasts with muted coverage of AMD’s cost advantages, reinforcing bias. Despite AMD’s participation in InferenceMAX, the benchmark’s framing (e.g., prioritizing Blackwell’s ROI) reflect SemiAnalysis’ market predictions rather than balanced analysis. Lastly, AMD’s Instinct MI355X proves superior in inference and cost per million tokens for the GPT-OSS 120B model, offering a 25% cost advantage over NVIDIA’s H200 at moderate-to-high interactivity levels. This efficiency, driven by AMD’s memory bandwidth and FP4 support, makes it a better choice for cost-sensitive, multi-user deployments over a three-year horizon. However, SemiAnalysis’ sole focus(presentation graph) on Llama 3.3—where NVIDIA excels demonstrates a pattern of cherry-picking models and data to favor NVIDIA , consistent with its historical AMD bearish stance. This selective presentation risks misleading stakeholders by overshadowing AMD economic strengths. My personal take: I would trust Dr. Lisa Su, and Greg Brockman Sam Altman take on AMD and how they viewed and allocated 6GW for AMD over SemiAnalysis . At the end of the day, Large customers pay when it works. $Meta allocated 42% AI GPUs to $AMD for a reason. And the "secret weapon" will improve energy consumption by 20-50%, meaning at 6GW, OpenAI would be able to deploy 25-50% more MI450 at a much better cost advantage, higher memory bandwidth, and the queen of Inference! Oh and ROCm 8 is expected to be on par with CUDA in 2026.

Mike

104,194 görüntüleme • 9 ay önce

$$AMD| $META is using $GOOGL to negotiate 🧵 The Ironwood pod is 5.1–10x more expensive annually ($148.3 million ÷ $14.87–$29.04 million) and 5.1–10x more expensive monthly ($12.36 million ÷ $1.24–$2.42 million) than renting 15 MI450 racks for equivalent compute. The rapidly evolving landscape of artificial intelligence infrastructure presents a complex interplay of technological innovation, market dynamics, and strategic maneuvering among major players. Recent leaked information suggesting that Meta Platforms ($META) might work with Google's Tensor Processing Unit (TPU) in 2027 has sparked speculation about its true intent. This leak is likely a strategic move by Meta to negotiate more favorable terms with AMD , leveraging the competitive dynamics of the AI hardware market to optimize its substantial investment in AI infrastructure. By examining the key elements of this scenario Meta's investment strategy, the comparative advantages of AMD's MI450 and Google's Ironwood TPU, and the broader market context; we can discern the potential beneficiaries and the strategic implications of this information. Meta's aggressive pursuit of AI capabilities is underscored by its planned expenditure of $66-72 billion on AI infrastructure in 2025, with expectations to escalate significantly in 2026. This investment is part of a broader strategy to build "titan clusters" like Prometheus, which are projected to reach 1 gigawatt of compute power by 2026. Such a scale of investment reflects Meta's recognition of the critical role that AI will play in its future growth, particularly in enhancing its social media platforms and developing new AI-driven applications. However, the financial burden of this infrastructure buildout necessitates a careful consideration of cost-effectiveness and scalability, which brings us to the leaked information about potential collaboration with Google's Ironwood TPU. Google's Ironwood TPU, introduced as the seventh-generation ASIC optimized for TensorFlow-based inference, represents a high-cost, cloud-locked solution priced at $445 million per pod (9,216 chips) over three years. This model, while offering significant performance gains and power efficiency, is tailored for pod-scale deployment and integrated with Google's cloud services, limiting flexibility and increasing costs for customers. In contrast, AMD's MI450 GPU, priced at $30,000–$40,000 per unit, provides a modular, open ROCm ecosystem that delivers comparable compute capacity at a fraction of the cost. Renting 15 MI450 racks could achieve similar 42+ exaFLOPS inference compute at 5–10x lower cost than renting a single Ironwood pod, underscoring AMD's competitive edge in terms of total cost of ownership (TCO). The leaked information about Meta's potential TPU deployment in 2027, therefore, can be interpreted as a negotiating tactic rather than a definitive shift in strategy. By signaling interest in Google's solution, Meta may be attempting to pressure AMD into offering more favorable terms/prices for 5-10GW. This tactic aligns with Meta's broader goal to finance most of its AI spend internally while exploring partnerships that can reduce costs and enhance flexibility. The post's emphasis on MI450's TCO advantage and its partnerships with major players like OpenAI, Microsoft, and Meta itself suggests that AMD is a critical component of Meta's AI infrastructure strategy. The threat of working with Google's TPU could prompt AMD to reassess its pricing, provide additional support, or offer incentives to retain Meta as a customer, thereby securing or expanding its market share. From a logical standpoint, Meta stands to benefit the most from this strategy. As a major buyer in a high-stakes market projected to surpass $1 trillion in annual spending by 2030, Meta's negotiating power is significant. The leaked information could lead to substantial cost savings on its $66-72 billion investment, enhancing its financial flexibility and allowing for further investment in AI capabilities. Moreover, this tactic reinforces Meta's position as a leader in the AI infrastructure race, potentially attracting more external financing for its data center projects and strengthening its competitive stance against other hyperscalers like Amazon and Microsoft. AMD could also benefit from this scenario. The negotiation pressure might lead to small short-term concessions, but it could also solidify long-term partnerships with Meta, ensuring continued demand for MI450 and other AI hardware solutions. Initially Meta's 42% allocation to AMD MI300X and its partnerships with Oracle, Dell, and HP indicates a deep integration of AMD's technology into Meta's infrastructure, which could be leveraged to maintain this relationship. For AMD, retaining Meta as a large key customer is crucial to capturing a larger share of the rapidly growing data center infrastructure market, driven by the insatiable demand for AI compute power. Google, on the other hand, faces a more limited benefit from this leaked information. While securing Meta as a customer would reinforce its position in the AI hardware market, the high cost and ecosystem lock-in of the Ironwood TPU might deter Meta from fully committing to this solution. The leaked information could prompt Google to reconsider its pricing or ecosystem strategy to remain competitive, but the immediate impact is likely to be minimal compared to the potential gains for Meta and AMD. Investors and market analysts also stand to benefit from this information, as it provides insights into the competitive dynamics of the AI hardware market. Adjustments in portfolios based on anticipated shifts in market share and profitability could lead to opportunities for those who correctly anticipate outcomes. The negotiation dynamic might introduce volatility, but it also highlights the strategic importance of cost-effective solutions in the AI infrastructure space. Lastly, the leaked information about Meta potentially working with Google's TPU in 2027 is likely a strategic move to negotiate with AMD, leveraging the competitive landscape to optimize its AI infrastructure investment. Meta, as the primary negotiator, stands to gain the most by securing better terms from AMD, reducing costs, and enhancing its financial flexibility. AMD, while initially at risk, could benefit from retaining a key customer and solidifying its market position. Google faces limited immediate benefits but may need to adapt its strategy to remain competitive. This scenario underscores the complex interplay of technology, market dynamics, and strategic maneuvering in the AI hardware market, where cost-effectiveness and scalability are paramount. As the data center infrastructure market continues to grow, the outcomes of such negotiations will shape the future of AI development and deployment.$

$AMD| $META is using $GOOGL to negotiate 🧵 The Ironwood pod is 5.1–10x more expensive annually ($148.3 million ÷ $14.87–$29.04 million) and 5.1–10x more expensive monthly ($12.36 million ÷ $1.24–$2.42 million) than renting 15 MI450 racks for equivalent compute. The rapidly evolving landscape of artificial intelligence infrastructure presents a complex interplay of technological innovation, market dynamics, and strategic maneuvering among major players. Recent leaked information suggesting that Meta Platforms ($META) might work with Google's Tensor Processing Unit (TPU) in 2027 has sparked speculation about its true intent. This leak is likely a strategic move by Meta to negotiate more favorable terms with AMD , leveraging the competitive dynamics of the AI hardware market to optimize its substantial investment in AI infrastructure. By examining the key elements of this scenario Meta's investment strategy, the comparative advantages of AMD's MI450 and Google's Ironwood TPU, and the broader market context; we can discern the potential beneficiaries and the strategic implications of this information. Meta's aggressive pursuit of AI capabilities is underscored by its planned expenditure of $66-72 billion on AI infrastructure in 2025, with expectations to escalate significantly in 2026. This investment is part of a broader strategy to build "titan clusters" like Prometheus, which are projected to reach 1 gigawatt of compute power by 2026. Such a scale of investment reflects Meta's recognition of the critical role that AI will play in its future growth, particularly in enhancing its social media platforms and developing new AI-driven applications. However, the financial burden of this infrastructure buildout necessitates a careful consideration of cost-effectiveness and scalability, which brings us to the leaked information about potential collaboration with Google's Ironwood TPU. Google's Ironwood TPU, introduced as the seventh-generation ASIC optimized for TensorFlow-based inference, represents a high-cost, cloud-locked solution priced at $445 million per pod (9,216 chips) over three years. This model, while offering significant performance gains and power efficiency, is tailored for pod-scale deployment and integrated with Google's cloud services, limiting flexibility and increasing costs for customers. In contrast, AMD's MI450 GPU, priced at $30,000–$40,000 per unit, provides a modular, open ROCm ecosystem that delivers comparable compute capacity at a fraction of the cost. Renting 15 MI450 racks could achieve similar 42+ exaFLOPS inference compute at 5–10x lower cost than renting a single Ironwood pod, underscoring AMD's competitive edge in terms of total cost of ownership (TCO). The leaked information about Meta's potential TPU deployment in 2027, therefore, can be interpreted as a negotiating tactic rather than a definitive shift in strategy. By signaling interest in Google's solution, Meta may be attempting to pressure AMD into offering more favorable terms/prices for 5-10GW. This tactic aligns with Meta's broader goal to finance most of its AI spend internally while exploring partnerships that can reduce costs and enhance flexibility. The post's emphasis on MI450's TCO advantage and its partnerships with major players like OpenAI, Microsoft, and Meta itself suggests that AMD is a critical component of Meta's AI infrastructure strategy. The threat of working with Google's TPU could prompt AMD to reassess its pricing, provide additional support, or offer incentives to retain Meta as a customer, thereby securing or expanding its market share. From a logical standpoint, Meta stands to benefit the most from this strategy. As a major buyer in a high-stakes market projected to surpass $1 trillion in annual spending by 2030, Meta's negotiating power is significant. The leaked information could lead to substantial cost savings on its $66-72 billion investment, enhancing its financial flexibility and allowing for further investment in AI capabilities. Moreover, this tactic reinforces Meta's position as a leader in the AI infrastructure race, potentially attracting more external financing for its data center projects and strengthening its competitive stance against other hyperscalers like Amazon and Microsoft. AMD could also benefit from this scenario. The negotiation pressure might lead to small short-term concessions, but it could also solidify long-term partnerships with Meta, ensuring continued demand for MI450 and other AI hardware solutions. Initially Meta's 42% allocation to AMD MI300X and its partnerships with Oracle, Dell, and HP indicates a deep integration of AMD's technology into Meta's infrastructure, which could be leveraged to maintain this relationship. For AMD, retaining Meta as a large key customer is crucial to capturing a larger share of the rapidly growing data center infrastructure market, driven by the insatiable demand for AI compute power. Google, on the other hand, faces a more limited benefit from this leaked information. While securing Meta as a customer would reinforce its position in the AI hardware market, the high cost and ecosystem lock-in of the Ironwood TPU might deter Meta from fully committing to this solution. The leaked information could prompt Google to reconsider its pricing or ecosystem strategy to remain competitive, but the immediate impact is likely to be minimal compared to the potential gains for Meta and AMD. Investors and market analysts also stand to benefit from this information, as it provides insights into the competitive dynamics of the AI hardware market. Adjustments in portfolios based on anticipated shifts in market share and profitability could lead to opportunities for those who correctly anticipate outcomes. The negotiation dynamic might introduce volatility, but it also highlights the strategic importance of cost-effective solutions in the AI infrastructure space. Lastly, the leaked information about Meta potentially working with Google's TPU in 2027 is likely a strategic move to negotiate with AMD, leveraging the competitive landscape to optimize its AI infrastructure investment. Meta, as the primary negotiator, stands to gain the most by securing better terms from AMD, reducing costs, and enhancing its financial flexibility. AMD, while initially at risk, could benefit from retaining a key customer and solidifying its market position. Google faces limited immediate benefits but may need to adapt its strategy to remain competitive. This scenario underscores the complex interplay of technology, market dynamics, and strategic maneuvering in the AI hardware market, where cost-effectiveness and scalability are paramount. As the data center infrastructure market continues to grow, the outcomes of such negotiations will shape the future of AI development and deployment.

Mike

182,193 görüntüleme • 8 ay önce

Etched is deploying two new technologies in chip design: low-voltage inference and cluster-scale memory. CEO Gavin Uberti says they'll make their chips much more power-efficient and way, way faster than today's leading GPUs. He breaks it down: "We looked at a lot of early research directions, and we realized the key things that models need are way more compute and way faster memory." "If you think about inference, there are two key parts: prefill and decode. For prefill, it's a compute-bound problem. You need to have more FLOPS, more operations per second on each of your chips." "On our GPU, the bottleneck's actually thermals. You can't really run a GPU at more than around 50% of what it could theoretically do, or it'll melt." "So we're using a new technology today called low-voltage inference to try to solve this problem. You bring the voltage of the chip down dramatically, which allows us to have way, way better efficiency in terms of how much power is drawn per unit of math, and thus fit way way more flops onto the chip..." "For decode, it's all about bandwidth. Not just bandwidth on a chip, but bandwidth across your cluster. That's why we have this technology we call cluster-scale memory. It reduces the amount of time it takes to communicate from one chip to another dramatically." "As a result we can go use all of our HBM, HBM bandwidth, SRAM, SRAM bandwidth, and our scale-up domain as a single coherent pool. And that means if you're a user, you can go get much faster tokens per second, while still keeping your costs low."

Etched is deploying two new technologies in chip design: low-voltage inference and cluster-scale memory. CEO Gavin Uberti says they'll make their chips much more power-efficient and way, way faster than today's leading GPUs. He breaks it down: "We looked at a lot of early research directions, and we realized the key things that models need are way more compute and way faster memory." "If you think about inference, there are two key parts: prefill and decode. For prefill, it's a compute-bound problem. You need to have more FLOPS, more operations per second on each of your chips." "On our GPU, the bottleneck's actually thermals. You can't really run a GPU at more than around 50% of what it could theoretically do, or it'll melt." "So we're using a new technology today called low-voltage inference to try to solve this problem. You bring the voltage of the chip down dramatically, which allows us to have way, way better efficiency in terms of how much power is drawn per unit of math, and thus fit way way more flops onto the chip..." "For decode, it's all about bandwidth. Not just bandwidth on a chip, but bandwidth across your cluster. That's why we have this technology we call cluster-scale memory. It reduces the amount of time it takes to communicate from one chip to another dramatically." "As a result we can go use all of our HBM, HBM bandwidth, SRAM, SRAM bandwidth, and our scale-up domain as a single coherent pool. And that means if you're a user, you can go get much faster tokens per second, while still keeping your costs low."

TBPN

20,404 görüntüleme • 1 ay önce

$MU $SNDK $LITE $VRT NVIDIA and Groq: 2nd and 3rd Order Strategic Infrastructure Effects and Market Implications Public reporting indicates NVIDIA has agreed to acquire Groq for approximately $20,000,000,000 in cash, while excluding Groq’s nascent cloud business from the transaction perimeter. The reported carve-out materially constrains the immediate, direct linkage from the acquisition to incremental, NVIDIA-controlled data center capacity build-out because GroqCloud appears to be the principal channel through which Groq hardware is currently monetized at scale as a service. The infrastructure-market implications therefore depend primarily on post-close product strategy: whether NVIDIA (1) commercializes Groq silicon as a distinct inference product line and drives broad deployment through OEM/ODM channels and partners, (2) uses the acquisition mainly to absorb IP and talent while de-emphasizing standalone Groq hardware volumes, or (3) uses Groq technology to reshape NVIDIA’s own inference systems and networking roadmaps. The dominant transmission mechanism into memory, networking, and facility infrastructure markets is the degree to which NVIDIA shifts incremental inference deployments away from GPU architectures that are tightly coupled to external high-bandwidth memory (HBM) and toward Groq’s current architecture, which emphasizes large on-chip SRAM, deterministic compiler-scheduled execution, and direct chip-to-chip connectivity. Independent and company-published materials describe Groq’s current-generation approach as having no external memory, keeping weights and KV cache on-chip during processing, and requiring model sharding across multiple chips due to limited on-chip SRAM per device. That architectural choice is directionally HBM-negative on a per-accelerator basis and ambiguous for DRAM, NAND, networking, power, and cooling on a per-token basis because the design can reduce memory wall losses and tail-latency overhead while potentially increasing the number of chips and interconnect endpoints required to serve large models and long-context workloads. HBM implications are the most mechanically straightforward but should be framed as second-derivative rather than absolute. If Groq-class inference silicon meaningfully displaces NVIDIA GPU-based inference deployments, incremental HBM bit demand tied to inference growth could be reduced relative to a GPU-only baseline because Groq’s current approach does not appear to attach HBM stacks to each accelerator. However, current market structure suggests HBM remains supply-constrained and is being pulled by multiple vectors including continued GPU training scale and high-capacity inference configurations, with leading suppliers signaling tight conditions extending beyond 2026. In that environment, reduced inference-driven HBM intensity could primarily reallocate scarce HBM supply toward higher-end training and premium inference GPUs rather than creating an outright volume collapse, preserving high utilization of HBM capacity while potentially affecting the slope of pricing power and capacity expansion urgency over a multi-year horizon. The key downside scenario for the HBM complex would be a durable architectural bifurcation where “good-enough” inference shifts disproportionately to HBM-less ASICs across a broad swath of deployments (latency-sensitive, batch-1, cost-per-token optimized), while training remains GPU-HBM dominated; such a split would reduce the portion of future inference compute that naturally monetizes through HBM content and could compress the incremental HBM-per-AI-dollar ratio. The key upside/neutral scenario for HBM is that the supply chain remains fully allocated regardless, with NVIDIA using any “freed” HBM to ship more high-end GPUs into training and long-context inference, especially as roadmaps increase HBM per GPU, sustaining robust aggregate bit demand even if inference becomes more heterogeneous. Conventional DRAM implications split into 2 channels: (1) DRAM wafer capacity diversion into HBM and (2) DDR content per server in AI clusters. Supplier commentary indicates that AI-driven memory demand is supporting elevated DRAM markets more broadly, and HBM production is resource-intensive versus conventional DRAM, tightening supply for DDR products in parallel. A meaningful NVIDIA pivot to an inference architecture that reduces HBM dependence could, at the margin, ease the most acute HBM-driven bottlenecks and allow memory manufacturers more flexibility in balancing DRAM mix, which could be modestly DDR-positive on the supply side (less crowding-out) even if it is DDR-neutral or slightly negative on the demand side (if per-node CPU/DDR requirements decline due to more efficient accelerator utilization). The dominant practical outcome is likely that DDR demand remains supported by broad AI server proliferation and increasing memory footprints at the system level (CPUs, networking stacks, caching layers, retrieval-augmented pipelines), while HBM remains the premium profit pool; therefore, any HBM displacement that increases total server volumes could indirectly keep DDR demand resilient even if DDR per accelerator is not rising materially. NAND flash implications are comparatively indirect and volume-driven rather than architecture-driven. Inference clusters require SSD capacity for model storage, container images, logging, and increasingly for fast local retrieval indices and embedding stores, but the storage footprint per unit of compute is typically smaller than in training pipelines that stage large datasets and checkpoints. If NVIDIA uses Groq to lower inference cost and latency enough to expand the total number of inference deployment locations (regional colocation, enterprise on-prem, sovereign footprints), aggregate SSD attach could rise through geographic fragmentation and replication of model artifacts across more sites, even if per-site storage is modest. The NAND effect is therefore likely to be demand-broadening and mix-positive (datacenter SSDs) but not a primary swing factor versus the macro AI capex cycle and consumer/device cycles. Hard disk drive (HDD) markets should see negligible direct sensitivity because nearline HDD demand is driven by bulk storage and cloud archiving economics, while inference acceleration choices primarily reshape compute and network layers; any HDD benefit would be a tertiary function of overall data center square footage expansion rather than a direct consequence of Groq silicon displacing GPUs. Optical networking implications require separating (1) intra-cluster back-end fabrics that connect accelerators and (2) front-end / data center interconnect (DCI) that connects sites and regions. Groq’s own positioning and third-party reporting suggest scaling beyond a single node or rack relies on high-bandwidth fabrics and, in some described configurations, optical interconnect scaling across hundreds of chips. If NVIDIA commercializes Groq at scale, 2 offsetting forces emerge: lower cost-per-token and improved latency could expand inference throughput and drive more east-west traffic, increasing demand for high-speed switching and optics; conversely, if Groq delivers materially higher utilization and tokens per unit of network bandwidth for certain workloads, the network required per served token could decline. Public NVIDIA materials already indicate an aggressive photonics roadmap aimed at scaling AI factories, including co-packaged optics (CPO) switches and explicit collaboration with Coherent and Lumentum in the silicon photonics supply chain. That linkage is important because it suggests that, independent of Groq, NVIDIA is already pushing optics integration deeper into the switch package to reduce power and increase resiliency; Groq increases the strategic incentive to reduce network power and latency if inference becomes even more distributed and latency-sensitive. For Lumentum and Coherent specifically, the net implication is less about “more optics versus fewer optics” and more about a shift in optics form factor and value capture. Co-packaged optics can reduce reliance on pluggable transceivers in some switch architectures while increasing demand for integrated photonic engines, lasers, fiber attach, packaging processes, and component-level supply. NVIDIA’s own announcements explicitly position Coherent and Lumentum as collaborators in creating the integrated silicon/optics process and supply chain for photonics switches. If Groq accelerates the transition to very large-scale fabrics (more endpoints, higher port speeds, tighter power envelopes), that tends to pull forward CPO adoption and amplifies demand for the underlying photonics components even if the conventional pluggable module TAM is structurally pressured over time. If Groq instead pushes inference toward smaller, more localized pods (closer to users, more regional colocation), that can be optics-positive for DCI and metro connectivity because more sites must be interconnected at high bandwidth with low latency, favoring coherent optics and high-speed interconnect between facilities. The principal risk for optics suppliers is timing and margin structure: a faster move to NVIDIA-driven integrated photonics could concentrate bargaining power and compress margins for commoditized transceiver modules while favoring suppliers with differentiated lasers, integration capability, and qualification depth in NVIDIA’s CPO ecosystem. AEC and copper interconnect implications hinge on whether Groq deployment increases the density of short-reach links inside racks and rows. High-speed copper remains structurally advantaged at very short distances on cost, power, and serviceability, but reaches become constrained as lane speeds and aggregate bandwidth rise, creating a role for active electrical cables (AECs), retimers, and signal-conditioning silicon. Credo explicitly positions its AEC products as enabling reliable lossless 800G connectivity for AI clusters, and the company has highlighted participation at NVIDIA GTC with content focused on extending PCIe/CXL using AECs, indicating relevance to next-generation system topologies that require longer reach and higher signal integrity than passive copper can deliver. If NVIDIA turns Groq into a widely deployed inference card or chassis product, the likely near-term effect is AEC-positive because (1) more inference throughput tends to increase top-of-rack connectivity requirements, (2) distributing inference across more racks and sites increases short-reach links per unit of delivered service, and (3) PCIe-attached accelerator architectures tend to require robust signal conditioning as systems move to PCIe 6.x and beyond. Groq workshop materials explicitly reference GroqCard and GroqNode form factors, reinforcing that PCIe-attached deployment has been central to Groq’s current packaging strategy. The main countervailing risk is that Groq’s deterministic chip-to-chip fabric could be implemented primarily through backplanes and direct board-level connectivity that reduces the need for merchant AECs inside the box; in that case, incremental AEC demand would concentrate more in rack-to-switch and node-to-fabric links rather than within-chassis chip fabrics. Astera Labs implications are connectivity-architecture sensitive and, on balance, skew positive if NVIDIA increases heterogeneity and disaggregation in AI systems. NVIDIA has publicly positioned NVLink Fusion as a pathway for partners to build semi-custom AI infrastructure and has explicitly identified Astera Labs as a partner in that ecosystem, with Astera describing NVLink-related solutions expanding its connectivity platform across PCIe, CXL, and Ethernet plus fleet observability software. A Groq acquisition increases the probability that NVIDIA offers a broader menu of accelerators (training GPUs, inference-focused ASICs) and therefore increases the importance of scalable, high-reliability connectivity, retiming, switching, and telemetry across mixed topologies. If Groq silicon remains PCIe-attached in many deployments, PCIe 6.x retimers/switches and active cable modules become more central, aligning with Astera’s core portfolio. If NVIDIA instead integrates Groq concepts into scale-up fabrics (NVLink-like domains) or uses Groq to expand into inference “appliances” that must be rapidly deployed in colocation environments, the need for standard-compliant, serviceable connectivity with strong RAS/telemetry increases, again aligning with Astera’s positioning. Power equipment and cooling implications for Vertiv and adjacent suppliers should be viewed through the lens of rack power density, cooling modality (air vs liquid), and site deployment model (hyperscale campuses vs distributed colocation/enterprise). Groq claims its LPU and rack designs are “air-cooled by design” and require no complex cooling and power infrastructure, and third-party reporting has described Groq’s approach as relying on parallelism across many lower-power units rather than extreme per-chip performance. If NVIDIA scales Groq as a mainstream inference platform, the mix of data center cooling spend could shift modestly away from the highest-density liquid-cooled racks toward more air-cooled or hybrid deployments, particularly for inference pods placed in existing facilities that cannot easily retrofit for very high rack heat flux. That would be a mix headwind for suppliers most levered exclusively to high-end liquid cooling attachments per rack, but it is not necessarily a volume headwind for Vertiv given the company’s broad exposure to both power and cooling infrastructure and the likelihood that total AI deployment locations expand. Vertiv’s own industry commentary emphasizes that AI racks require higher power-density UPS, batteries, power distribution equipment, and switchgear capable of handling rapid load transients, and that hybrid cooling systems will evolve across deployment environments. Those statements align with a world where inference growth increases the count of powered racks and raises the operational complexity of power delivery even if per-rack density is lower than the most extreme training clusters. The most material infrastructure impact may occur outside the rack and upstream of the data hall: grid interconnects, substations, transformers, switchgear, generators, and utility-scale generation additions. Recent regulatory actions in the U.S. highlight that projected data center demand is already driving large planned increases in electricity generation capacity, underscoring that power availability is a binding constraint. In that context, an inference architecture that lowers joules per token could reduce the power required per unit of inference delivered, but it can also accelerate demand by lowering cost and improving latency, increasing the total volume of inference served (a classic rebound effect). The net outcome is likely continued, elevated demand for power infrastructure even if efficiency improves, with the key swing factor being whether AI capex remains on a multi-year growth trajectory or enters a digestion phase. Other data center infrastructure implications include server/ODM mix, facility design standardization, and networking architecture choices. If NVIDIA positions Groq-based inference as a broadly distributable “standard server + accelerator” solution rather than as an integrated, liquid-cooled rack like GB200 NVL72, spend could shift toward more conventional air-cooled server designs, higher unit volumes of mainstream racks, and faster deployment in colocation footprints, increasing demand for modular power rooms, busways, and rapidly deployable cooling solutions. If NVIDIA instead integrates Groq into its “AI factory” paradigm, the primary effect is likely acceleration of dense back-end fabric build-outs and a faster push toward photonics switching, increasing demand for fiber plant, connectors, and integrated optics supply chains while potentially compressing the lifecycle of transitional architectures based on pluggable optics and mid-reach copper. NVIDIA’s stated roadmap toward co-packaged optics and silicon photonics switches is already oriented toward scaling to very large GPU counts; adding a high-end inference ASIC increases the strategic importance of power-efficient, low-latency fabrics because inference economics become increasingly sensitive to network overhead as compute cost declines. Across the covered segments, the most defensible base case is limited near-term dislocation and a medium-term increase in uncertainty around memory intensity per unit of inference growth. HBM faces the clearest relative risk from an HBM-less inference platform, but supply tightness and GPU training roadmaps reduce the probability of an absolute demand shock over the next 12–24 months. Optical, AEC/copper, and power/cooling are more likely to remain volume-supported because they scale with endpoint count, deployment fragmentation, and total data center footprint, and those tend to rise when inference becomes cheaper and more widely deployed. The highest-conviction second-order effect is a shift in infrastructure mix: incrementally more distributed inference deployments (favoring colocation power/cooling standardization, DCI optics, and serviceable short-reach interconnect) and a gradual migration from pluggable optics toward integrated photonics in back-end fabrics (favoring suppliers positioned in the CPO ecosystem).

$MU $SNDK $LITE $VRT NVIDIA and Groq: 2nd and 3rd Order Strategic Infrastructure Effects and Market Implications Public reporting indicates NVIDIA has agreed to acquire Groq for approximately $20,000,000,000 in cash, while excluding Groq’s nascent cloud business from the transaction perimeter. The reported carve-out materially constrains the immediate, direct linkage from the acquisition to incremental, NVIDIA-controlled data center capacity build-out because GroqCloud appears to be the principal channel through which Groq hardware is currently monetized at scale as a service. The infrastructure-market implications therefore depend primarily on post-close product strategy: whether NVIDIA (1) commercializes Groq silicon as a distinct inference product line and drives broad deployment through OEM/ODM channels and partners, (2) uses the acquisition mainly to absorb IP and talent while de-emphasizing standalone Groq hardware volumes, or (3) uses Groq technology to reshape NVIDIA’s own inference systems and networking roadmaps. The dominant transmission mechanism into memory, networking, and facility infrastructure markets is the degree to which NVIDIA shifts incremental inference deployments away from GPU architectures that are tightly coupled to external high-bandwidth memory (HBM) and toward Groq’s current architecture, which emphasizes large on-chip SRAM, deterministic compiler-scheduled execution, and direct chip-to-chip connectivity. Independent and company-published materials describe Groq’s current-generation approach as having no external memory, keeping weights and KV cache on-chip during processing, and requiring model sharding across multiple chips due to limited on-chip SRAM per device. That architectural choice is directionally HBM-negative on a per-accelerator basis and ambiguous for DRAM, NAND, networking, power, and cooling on a per-token basis because the design can reduce memory wall losses and tail-latency overhead while potentially increasing the number of chips and interconnect endpoints required to serve large models and long-context workloads. HBM implications are the most mechanically straightforward but should be framed as second-derivative rather than absolute. If Groq-class inference silicon meaningfully displaces NVIDIA GPU-based inference deployments, incremental HBM bit demand tied to inference growth could be reduced relative to a GPU-only baseline because Groq’s current approach does not appear to attach HBM stacks to each accelerator. However, current market structure suggests HBM remains supply-constrained and is being pulled by multiple vectors including continued GPU training scale and high-capacity inference configurations, with leading suppliers signaling tight conditions extending beyond 2026. In that environment, reduced inference-driven HBM intensity could primarily reallocate scarce HBM supply toward higher-end training and premium inference GPUs rather than creating an outright volume collapse, preserving high utilization of HBM capacity while potentially affecting the slope of pricing power and capacity expansion urgency over a multi-year horizon. The key downside scenario for the HBM complex would be a durable architectural bifurcation where “good-enough” inference shifts disproportionately to HBM-less ASICs across a broad swath of deployments (latency-sensitive, batch-1, cost-per-token optimized), while training remains GPU-HBM dominated; such a split would reduce the portion of future inference compute that naturally monetizes through HBM content and could compress the incremental HBM-per-AI-dollar ratio. The key upside/neutral scenario for HBM is that the supply chain remains fully allocated regardless, with NVIDIA using any “freed” HBM to ship more high-end GPUs into training and long-context inference, especially as roadmaps increase HBM per GPU, sustaining robust aggregate bit demand even if inference becomes more heterogeneous. Conventional DRAM implications split into 2 channels: (1) DRAM wafer capacity diversion into HBM and (2) DDR content per server in AI clusters. Supplier commentary indicates that AI-driven memory demand is supporting elevated DRAM markets more broadly, and HBM production is resource-intensive versus conventional DRAM, tightening supply for DDR products in parallel. A meaningful NVIDIA pivot to an inference architecture that reduces HBM dependence could, at the margin, ease the most acute HBM-driven bottlenecks and allow memory manufacturers more flexibility in balancing DRAM mix, which could be modestly DDR-positive on the supply side (less crowding-out) even if it is DDR-neutral or slightly negative on the demand side (if per-node CPU/DDR requirements decline due to more efficient accelerator utilization). The dominant practical outcome is likely that DDR demand remains supported by broad AI server proliferation and increasing memory footprints at the system level (CPUs, networking stacks, caching layers, retrieval-augmented pipelines), while HBM remains the premium profit pool; therefore, any HBM displacement that increases total server volumes could indirectly keep DDR demand resilient even if DDR per accelerator is not rising materially. NAND flash implications are comparatively indirect and volume-driven rather than architecture-driven. Inference clusters require SSD capacity for model storage, container images, logging, and increasingly for fast local retrieval indices and embedding stores, but the storage footprint per unit of compute is typically smaller than in training pipelines that stage large datasets and checkpoints. If NVIDIA uses Groq to lower inference cost and latency enough to expand the total number of inference deployment locations (regional colocation, enterprise on-prem, sovereign footprints), aggregate SSD attach could rise through geographic fragmentation and replication of model artifacts across more sites, even if per-site storage is modest. The NAND effect is therefore likely to be demand-broadening and mix-positive (datacenter SSDs) but not a primary swing factor versus the macro AI capex cycle and consumer/device cycles. Hard disk drive (HDD) markets should see negligible direct sensitivity because nearline HDD demand is driven by bulk storage and cloud archiving economics, while inference acceleration choices primarily reshape compute and network layers; any HDD benefit would be a tertiary function of overall data center square footage expansion rather than a direct consequence of Groq silicon displacing GPUs. Optical networking implications require separating (1) intra-cluster back-end fabrics that connect accelerators and (2) front-end / data center interconnect (DCI) that connects sites and regions. Groq’s own positioning and third-party reporting suggest scaling beyond a single node or rack relies on high-bandwidth fabrics and, in some described configurations, optical interconnect scaling across hundreds of chips. If NVIDIA commercializes Groq at scale, 2 offsetting forces emerge: lower cost-per-token and improved latency could expand inference throughput and drive more east-west traffic, increasing demand for high-speed switching and optics; conversely, if Groq delivers materially higher utilization and tokens per unit of network bandwidth for certain workloads, the network required per served token could decline. Public NVIDIA materials already indicate an aggressive photonics roadmap aimed at scaling AI factories, including co-packaged optics (CPO) switches and explicit collaboration with Coherent and Lumentum in the silicon photonics supply chain. That linkage is important because it suggests that, independent of Groq, NVIDIA is already pushing optics integration deeper into the switch package to reduce power and increase resiliency; Groq increases the strategic incentive to reduce network power and latency if inference becomes even more distributed and latency-sensitive. For Lumentum and Coherent specifically, the net implication is less about “more optics versus fewer optics” and more about a shift in optics form factor and value capture. Co-packaged optics can reduce reliance on pluggable transceivers in some switch architectures while increasing demand for integrated photonic engines, lasers, fiber attach, packaging processes, and component-level supply. NVIDIA’s own announcements explicitly position Coherent and Lumentum as collaborators in creating the integrated silicon/optics process and supply chain for photonics switches. If Groq accelerates the transition to very large-scale fabrics (more endpoints, higher port speeds, tighter power envelopes), that tends to pull forward CPO adoption and amplifies demand for the underlying photonics components even if the conventional pluggable module TAM is structurally pressured over time. If Groq instead pushes inference toward smaller, more localized pods (closer to users, more regional colocation), that can be optics-positive for DCI and metro connectivity because more sites must be interconnected at high bandwidth with low latency, favoring coherent optics and high-speed interconnect between facilities. The principal risk for optics suppliers is timing and margin structure: a faster move to NVIDIA-driven integrated photonics could concentrate bargaining power and compress margins for commoditized transceiver modules while favoring suppliers with differentiated lasers, integration capability, and qualification depth in NVIDIA’s CPO ecosystem. AEC and copper interconnect implications hinge on whether Groq deployment increases the density of short-reach links inside racks and rows. High-speed copper remains structurally advantaged at very short distances on cost, power, and serviceability, but reaches become constrained as lane speeds and aggregate bandwidth rise, creating a role for active electrical cables (AECs), retimers, and signal-conditioning silicon. Credo explicitly positions its AEC products as enabling reliable lossless 800G connectivity for AI clusters, and the company has highlighted participation at NVIDIA GTC with content focused on extending PCIe/CXL using AECs, indicating relevance to next-generation system topologies that require longer reach and higher signal integrity than passive copper can deliver. If NVIDIA turns Groq into a widely deployed inference card or chassis product, the likely near-term effect is AEC-positive because (1) more inference throughput tends to increase top-of-rack connectivity requirements, (2) distributing inference across more racks and sites increases short-reach links per unit of delivered service, and (3) PCIe-attached accelerator architectures tend to require robust signal conditioning as systems move to PCIe 6.x and beyond. Groq workshop materials explicitly reference GroqCard and GroqNode form factors, reinforcing that PCIe-attached deployment has been central to Groq’s current packaging strategy. The main countervailing risk is that Groq’s deterministic chip-to-chip fabric could be implemented primarily through backplanes and direct board-level connectivity that reduces the need for merchant AECs inside the box; in that case, incremental AEC demand would concentrate more in rack-to-switch and node-to-fabric links rather than within-chassis chip fabrics. Astera Labs implications are connectivity-architecture sensitive and, on balance, skew positive if NVIDIA increases heterogeneity and disaggregation in AI systems. NVIDIA has publicly positioned NVLink Fusion as a pathway for partners to build semi-custom AI infrastructure and has explicitly identified Astera Labs as a partner in that ecosystem, with Astera describing NVLink-related solutions expanding its connectivity platform across PCIe, CXL, and Ethernet plus fleet observability software. A Groq acquisition increases the probability that NVIDIA offers a broader menu of accelerators (training GPUs, inference-focused ASICs) and therefore increases the importance of scalable, high-reliability connectivity, retiming, switching, and telemetry across mixed topologies. If Groq silicon remains PCIe-attached in many deployments, PCIe 6.x retimers/switches and active cable modules become more central, aligning with Astera’s core portfolio. If NVIDIA instead integrates Groq concepts into scale-up fabrics (NVLink-like domains) or uses Groq to expand into inference “appliances” that must be rapidly deployed in colocation environments, the need for standard-compliant, serviceable connectivity with strong RAS/telemetry increases, again aligning with Astera’s positioning. Power equipment and cooling implications for Vertiv and adjacent suppliers should be viewed through the lens of rack power density, cooling modality (air vs liquid), and site deployment model (hyperscale campuses vs distributed colocation/enterprise). Groq claims its LPU and rack designs are “air-cooled by design” and require no complex cooling and power infrastructure, and third-party reporting has described Groq’s approach as relying on parallelism across many lower-power units rather than extreme per-chip performance. If NVIDIA scales Groq as a mainstream inference platform, the mix of data center cooling spend could shift modestly away from the highest-density liquid-cooled racks toward more air-cooled or hybrid deployments, particularly for inference pods placed in existing facilities that cannot easily retrofit for very high rack heat flux. That would be a mix headwind for suppliers most levered exclusively to high-end liquid cooling attachments per rack, but it is not necessarily a volume headwind for Vertiv given the company’s broad exposure to both power and cooling infrastructure and the likelihood that total AI deployment locations expand. Vertiv’s own industry commentary emphasizes that AI racks require higher power-density UPS, batteries, power distribution equipment, and switchgear capable of handling rapid load transients, and that hybrid cooling systems will evolve across deployment environments. Those statements align with a world where inference growth increases the count of powered racks and raises the operational complexity of power delivery even if per-rack density is lower than the most extreme training clusters. The most material infrastructure impact may occur outside the rack and upstream of the data hall: grid interconnects, substations, transformers, switchgear, generators, and utility-scale generation additions. Recent regulatory actions in the U.S. highlight that projected data center demand is already driving large planned increases in electricity generation capacity, underscoring that power availability is a binding constraint. In that context, an inference architecture that lowers joules per token could reduce the power required per unit of inference delivered, but it can also accelerate demand by lowering cost and improving latency, increasing the total volume of inference served (a classic rebound effect). The net outcome is likely continued, elevated demand for power infrastructure even if efficiency improves, with the key swing factor being whether AI capex remains on a multi-year growth trajectory or enters a digestion phase. Other data center infrastructure implications include server/ODM mix, facility design standardization, and networking architecture choices. If NVIDIA positions Groq-based inference as a broadly distributable “standard server + accelerator” solution rather than as an integrated, liquid-cooled rack like GB200 NVL72, spend could shift toward more conventional air-cooled server designs, higher unit volumes of mainstream racks, and faster deployment in colocation footprints, increasing demand for modular power rooms, busways, and rapidly deployable cooling solutions. If NVIDIA instead integrates Groq into its “AI factory” paradigm, the primary effect is likely acceleration of dense back-end fabric build-outs and a faster push toward photonics switching, increasing demand for fiber plant, connectors, and integrated optics supply chains while potentially compressing the lifecycle of transitional architectures based on pluggable optics and mid-reach copper. NVIDIA’s stated roadmap toward co-packaged optics and silicon photonics switches is already oriented toward scaling to very large GPU counts; adding a high-end inference ASIC increases the strategic importance of power-efficient, low-latency fabrics because inference economics become increasingly sensitive to network overhead as compute cost declines. Across the covered segments, the most defensible base case is limited near-term dislocation and a medium-term increase in uncertainty around memory intensity per unit of inference growth. HBM faces the clearest relative risk from an HBM-less inference platform, but supply tightness and GPU training roadmaps reduce the probability of an absolute demand shock over the next 12–24 months. Optical, AEC/copper, and power/cooling are more likely to remain volume-supported because they scale with endpoint count, deployment fragmentation, and total data center footprint, and those tend to rise when inference becomes cheaper and more widely deployed. The highest-conviction second-order effect is a shift in infrastructure mix: incrementally more distributed inference deployments (favoring colocation power/cooling standardization, DCI optics, and serviceable short-reach interconnect) and a gradual migration from pluggable optics toward integrated photonics in back-end fabrics (favoring suppliers positioned in the CPO ecosystem).

TheValueist

76,170 görüntüleme • 7 ay önce

Morgan Stanley just raised their 2027 AI capex forecast to $1.1 trillion and that number still doesn't include SpaceX or a lot of the other AI companies (Save this). When you factor those in, the real 2027 figure is probably closer to $1.5 trillion and AI lab inference revenue combined is tracking toward $300 billion in 2027. On its surface that ratio sounds alarming, spending $1.5 trillion in capex to generate $300 billion in revenue. But the framing collapses the moment you examine two things the bears consistently ignore, gross margins and the revenue trajectory. Gross margins on inference revenue are running at 60 to 70 percent. That means the $300 billion in inference revenue generates $180 to $210 billion in gross profit and that number compounds rapidly as utilization scales on infrastructure that is already built and paid for. The Capex is not being deployed against today's revenue but rather being deployed against a revenue trajectory that has shown no signs of decelerating. To understand how aggressive that trajectory actually is, consider that Morgan Stanley's $1.1 trillion hyperscaler forecast is nearly double what analysts projected for the same year just twelve months ago And they described the demand as inelastic, meaning it is not slowing down regardless of rising costs, tighter financing conditions or geopolitical risk. The AI industry ended 2025 tracking well over $200 billion in combined inference revenue and the growth rate since then has continued to accelerate rather than flatten. Anthropic alone scaled from negligible revenue to a $30 billion annualized run rate in approximately 18 months while OpenAI is tracking toward $280 billion in annual revenue by 2030 from $13 billion in 2025. There is also a structural reality in the capex number that the bears never account for. Roughly 35 percent of total AI spending goes toward training, building the next model generation which is not revenue-generating in the current period. That means only about 65 percent of the $1.5 trillion in capex is actually deployed against the inference infrastructure that earns revenue today. When you apply the 60 to 70 percent gross margin to the revenue that sits on top of that 65 percent figure, the economics look substantially better than the headline capex to revenue ratio implies. Every CEO who has been closest to this buildout has consistently underestimated it and Jensen Huang projected $1 trillion in AI capex two years ago and was called delusional. Dario Amodei said in early 2026 that AI revenues would reach the low hundreds of billions by 2028 and trillions before 2030 and given where Anthropic's own revenue trajectory is today, he is likely revising those numbers upward. The pattern here is consistent, every time someone models the revenue ceiling, the actual number breaks through it faster than expected. Come join Milk Road Pro for our full breakdown, the real unit economics of the AI inference buildout, how the capex to revenue ratio evolves over the next three years, and our entire AI thesis! Link below!

Morgan Stanley just raised their 2027 AI capex forecast to $1.1 trillion and that number still doesn't include SpaceX or a lot of the other AI companies (Save this). When you factor those in, the real 2027 figure is probably closer to $1.5 trillion and AI lab inference revenue combined is tracking toward $300 billion in 2027. On its surface that ratio sounds alarming, spending $1.5 trillion in capex to generate $300 billion in revenue. But the framing collapses the moment you examine two things the bears consistently ignore, gross margins and the revenue trajectory. Gross margins on inference revenue are running at 60 to 70 percent. That means the $300 billion in inference revenue generates $180 to $210 billion in gross profit and that number compounds rapidly as utilization scales on infrastructure that is already built and paid for. The Capex is not being deployed against today's revenue but rather being deployed against a revenue trajectory that has shown no signs of decelerating. To understand how aggressive that trajectory actually is, consider that Morgan Stanley's $1.1 trillion hyperscaler forecast is nearly double what analysts projected for the same year just twelve months ago And they described the demand as inelastic, meaning it is not slowing down regardless of rising costs, tighter financing conditions or geopolitical risk. The AI industry ended 2025 tracking well over $200 billion in combined inference revenue and the growth rate since then has continued to accelerate rather than flatten. Anthropic alone scaled from negligible revenue to a $30 billion annualized run rate in approximately 18 months while OpenAI is tracking toward $280 billion in annual revenue by 2030 from $13 billion in 2025. There is also a structural reality in the capex number that the bears never account for. Roughly 35 percent of total AI spending goes toward training, building the next model generation which is not revenue-generating in the current period. That means only about 65 percent of the $1.5 trillion in capex is actually deployed against the inference infrastructure that earns revenue today. When you apply the 60 to 70 percent gross margin to the revenue that sits on top of that 65 percent figure, the economics look substantially better than the headline capex to revenue ratio implies. Every CEO who has been closest to this buildout has consistently underestimated it and Jensen Huang projected $1 trillion in AI capex two years ago and was called delusional. Dario Amodei said in early 2026 that AI revenues would reach the low hundreds of billions by 2028 and trillions before 2030 and given where Anthropic's own revenue trajectory is today, he is likely revising those numbers upward. The pattern here is consistent, every time someone models the revenue ceiling, the actual number breaks through it faster than expected. Come join Milk Road Pro for our full breakdown, the real unit economics of the AI inference buildout, how the capex to revenue ratio evolves over the next three years, and our entire AI thesis! Link below!

Milk Road AI

21,141 görüntüleme • 1 ay önce

$AMD Valuation at $70-$100B Revenue in 2026🧵 As of December 4, 2025, AMD's stock trades at approximately $220, with a market cap of $355billion. Revised Valuation with $70B Revenue Earnings Per Share (EPS): Assuming a 40% operating margin (consistent with historical trends, probably higher), $70 billion in revenue translates to $28 billion in operating income. After taxes and interest, net income could be $20 billion, or $12.50 EPS Forward P/E: At 50x-60x (a premium due to growth), the stock price could reach $650-$750 EV/EBITDA: With $28 billion EBITDA, at 40x, EV is $1.12 trillion. Subtracting $5 billion net debt, equity value is $1.115 trillion, or $697 per share. Revised Valuation with $100B Revenue EPS: $100 billion revenue at 45% margin yields $45 billion operating income, $35 billion net income, or $22 EPS. Forward P/E: At 50x-70x, the stock price could reach $1,100-$1,540 EV/EBITDA: $40 billion EBITDA at 45x EV/EBITDA yields $1.8 trillion EV. Subtracting $5 billion net debt, equity value is $1.795 trillion, or $1,122 per share. The market's willingness to assign a high P/E multiple to AMD will be based on the anticipation that these partnerships will translate into substantial revenue and earnings growth. The P/E ratio for the semiconductor industry is approximately 58.57, a significant increase from previous years because of AI CapEx Growth and we are only 2nd year of 10 years cycle. Hence, if $AMD grew to $70B-$100B revenue in 2026, 50x-70x P/E is justified. AMD's existing partnerships with OpenAI , $Meta, $MSFT, $AMZN, $GOOGL, $DELL, $HPE, $SMCI,xAI , Oracle, Vulture combined with new collaborations with international 40+ countries like Saudi,UAE form a solid foundation for revenue growth. The OpenAI deal alone could contribute $25 billion to $28 billion(2026), while Meta's expanded allocation and Oracle's increased orders with the rest add substantial upside of 1m+ GPUs(FY2026) . Technological Leadership: The MI450 GPU, with its superior inference and training capabilities, positions AMD to disrupt Nvidia's market dominance. Benchmarks show 1.5-2x performance advantages at 35-50% lower TCO, making it an attractive choice for hyperscalers. The ROCm platform's maturity, supporting day-zero integration for major AI models, closes the software gap with CUDA, enhancing AMD's competitiveness. In conclusion, AMD's combination of strategic partnerships, technological leadership, and favorable market dynamics positions it to achieve $70 billion to $100 billion in revenue by 2026. This growth is not merely aspirational but grounded in real demand signals and execution capabilities. While risks remain, the upside potential is significant, making AMD a the best AI Name in this AI Supercycle trading at extreme cheap valuation. Not Financial Advice!

$AMD Valuation at $70-$100B Revenue in 2026🧵 As of December 4, 2025, AMD's stock trades at approximately $220, with a market cap of $355billion. Revised Valuation with $70B Revenue Earnings Per Share (EPS): Assuming a 40% operating margin (consistent with historical trends, probably higher), $70 billion in revenue translates to $28 billion in operating income. After taxes and interest, net income could be $20 billion, or $12.50 EPS Forward P/E: At 50x-60x (a premium due to growth), the stock price could reach $650-$750 EV/EBITDA: With $28 billion EBITDA, at 40x, EV is $1.12 trillion. Subtracting $5 billion net debt, equity value is $1.115 trillion, or $697 per share. Revised Valuation with $100B Revenue EPS: $100 billion revenue at 45% margin yields $45 billion operating income, $35 billion net income, or $22 EPS. Forward P/E: At 50x-70x, the stock price could reach $1,100-$1,540 EV/EBITDA: $40 billion EBITDA at 45x EV/EBITDA yields $1.8 trillion EV. Subtracting $5 billion net debt, equity value is $1.795 trillion, or $1,122 per share. The market's willingness to assign a high P/E multiple to AMD will be based on the anticipation that these partnerships will translate into substantial revenue and earnings growth. The P/E ratio for the semiconductor industry is approximately 58.57, a significant increase from previous years because of AI CapEx Growth and we are only 2nd year of 10 years cycle. Hence, if $AMD grew to $70B-$100B revenue in 2026, 50x-70x P/E is justified. AMD's existing partnerships with OpenAI , $Meta, $MSFT, $AMZN, $GOOGL, $DELL, $HPE, $SMCI,xAI , Oracle, Vulture combined with new collaborations with international 40+ countries like Saudi,UAE form a solid foundation for revenue growth. The OpenAI deal alone could contribute $25 billion to $28 billion(2026), while Meta's expanded allocation and Oracle's increased orders with the rest add substantial upside of 1m+ GPUs(FY2026) . Technological Leadership: The MI450 GPU, with its superior inference and training capabilities, positions AMD to disrupt Nvidia's market dominance. Benchmarks show 1.5-2x performance advantages at 35-50% lower TCO, making it an attractive choice for hyperscalers. The ROCm platform's maturity, supporting day-zero integration for major AI models, closes the software gap with CUDA, enhancing AMD's competitiveness. In conclusion, AMD's combination of strategic partnerships, technological leadership, and favorable market dynamics positions it to achieve $70 billion to $100 billion in revenue by 2026. This growth is not merely aspirational but grounded in real demand signals and execution capabilities. While risks remain, the upside potential is significant, making AMD a the best AI Name in this AI Supercycle trading at extreme cheap valuation. Not Financial Advice!

Mike

187,491 görüntüleme • 7 ay önce

Nebius will be a trillion dollar company (Save this). The neocloud market, purpose-built AI cloud infrastructure, separate from legacy hyperscalers generated roughly $25 billion in revenue in 2025, up 223% year over year. Synergy Research projects it will approach $400 billion by 2031, compounding at 58% annually one of the fastest sustained growth rates ever recorded for an infrastructure category of this scale. The CEO's explanation for why they win is worth understanding in detail. GPU compute is scarce and that part everyone knows but Nebius is not simply renting GPUs by the hour and marking them up, which is what most neocloud imitators do. They have built their own physical capacity for inference, optimized the full technology stack from the software layer all the way down to the rack hardware and recently acquired a company called Agen specifically to push inference latency even lower and throughput even higher. The CEO frames the core problem directly that in 2026, every product you build is powered by tokens, AI intelligence and while you can get those tokens from OpenAI or Anthropic via a simple API call, the moment you want to run open source models, specialized vertical models, or anything other than the two dominant frontier labs, you run into a wall. You can download the weights from Hugging Face and assemble the pieces. But getting those workloads to run at scale, at the economics you need, with the reliability your product requires, is an extraordinarily complex engineering challenge that most companies cannot staff or afford to solve in-house. That is the problem Nebius is solving, and that is why their inference product called Token Factory exists. The financial results are among the most dramatic growth numbers reported by any public company this year. In Q1 2026, Nebius posted $399 million in revenue, a 684% increase from the same quarter a year earlier. In the span of twelve months, the company swung from a $104 million net loss to $621 million in net income. Cash from operations went from negative $184 million to positive $2.26 billion in the same period meaning this is not growth funded by burning investor capital, it is growth that is now generating its own fuel. For the full year 2026, Nebius is guiding for an annualized revenue run rate of $7 billion to $9 billion, with pipeline creation tracking to surpass $4 billion. The contracted backlog sits at $49 billion, anchored by a $27 billion agreement with Meta, a deal worth up to $19.4 billion with Microsoft, and a public endorsement from Jensen Huang at NVIDIA's GTC conference in 2026. The current market cap is approximately $56 billion. A company with $7 to $9 billion in annualized revenue, growing at 684%, turning cash-flow positive, sitting on $49 billion in contracted backlog, operating in a market compounding at 58% annually toward $400 billion, that company has a credible path to 20x from its current valuation if execution holds. That is the trillion dollar case, and it does not require any heroic assumptions and it requires Nebius to keep doing what it is already demonstrably doing. Milk Road Pro called this one early. Our analysts added Nebius to the portfolio when it was still flying under the radar, and we are sitting on a massive gain on that position right now. If you want to see what else we are building conviction on before the rest of the market catches up, come join us at Milk Road Pro using the link below!

Nebius will be a trillion dollar company (Save this). The neocloud market, purpose-built AI cloud infrastructure, separate from legacy hyperscalers generated roughly $25 billion in revenue in 2025, up 223% year over year. Synergy Research projects it will approach $400 billion by 2031, compounding at 58% annually one of the fastest sustained growth rates ever recorded for an infrastructure category of this scale. The CEO's explanation for why they win is worth understanding in detail. GPU compute is scarce and that part everyone knows but Nebius is not simply renting GPUs by the hour and marking them up, which is what most neocloud imitators do. They have built their own physical capacity for inference, optimized the full technology stack from the software layer all the way down to the rack hardware and recently acquired a company called Agen specifically to push inference latency even lower and throughput even higher. The CEO frames the core problem directly that in 2026, every product you build is powered by tokens, AI intelligence and while you can get those tokens from OpenAI or Anthropic via a simple API call, the moment you want to run open source models, specialized vertical models, or anything other than the two dominant frontier labs, you run into a wall. You can download the weights from Hugging Face and assemble the pieces. But getting those workloads to run at scale, at the economics you need, with the reliability your product requires, is an extraordinarily complex engineering challenge that most companies cannot staff or afford to solve in-house. That is the problem Nebius is solving, and that is why their inference product called Token Factory exists. The financial results are among the most dramatic growth numbers reported by any public company this year. In Q1 2026, Nebius posted $399 million in revenue, a 684% increase from the same quarter a year earlier. In the span of twelve months, the company swung from a $104 million net loss to $621 million in net income. Cash from operations went from negative $184 million to positive $2.26 billion in the same period meaning this is not growth funded by burning investor capital, it is growth that is now generating its own fuel. For the full year 2026, Nebius is guiding for an annualized revenue run rate of $7 billion to $9 billion, with pipeline creation tracking to surpass $4 billion. The contracted backlog sits at $49 billion, anchored by a $27 billion agreement with Meta, a deal worth up to $19.4 billion with Microsoft, and a public endorsement from Jensen Huang at NVIDIA's GTC conference in 2026. The current market cap is approximately $56 billion. A company with $7 to $9 billion in annualized revenue, growing at 684%, turning cash-flow positive, sitting on $49 billion in contracted backlog, operating in a market compounding at 58% annually toward $400 billion, that company has a credible path to 20x from its current valuation if execution holds. That is the trillion dollar case, and it does not require any heroic assumptions and it requires Nebius to keep doing what it is already demonstrably doing. Milk Road Pro called this one early. Our analysts added Nebius to the portfolio when it was still flying under the radar, and we are sitting on a massive gain on that position right now. If you want to see what else we are building conviction on before the rest of the market catches up, come join us at Milk Road Pro using the link below!

Milk Road AI

28,622 görüntüleme • 2 ay önce

$NVDA $GFS NVIDIA’s reported agreement to acquire Groq for $20B in cash (per CNBC, amplified via Reuters and other wire coverage) represents a materially different strategic posture than NVIDIA’s prior M&A pattern, given both the headline size (largest reported NVIDIA acquisition to date) and the unusual carve-out that Groq’s early-stage cloud business would not be included. Public reporting indicates the information originated from Alex Davis, CEO of Disruptive (lead investor in Groq’s latest financing), and that neither NVIDIA nor Groq had issued an immediate confirmation at the time of publication. The same reporting frames the transaction as coming together quickly, only months after Groq raised $750M at a ~$6.9B valuation, and highlights Groq’s positioning as a high-performance inference chip vendor founded by ex-Google TPU engineers. Groq is best understood as a vertically integrated inference acceleration company whose core asset is an application-specific processor optimized for deterministic, low-latency execution of transformer-style workloads, paired with a compiler-led software stack and a distribution layer (GroqCloud) designed to reduce developer friction via OpenAI-compatible APIs and integrations. Groq brands its architecture as a Language Processing Unit (LPU) and consistently emphasizes that the design target is inference, not training. The company’s own architecture description centers on 1-core execution, large on-chip SRAM used as primary storage (explicitly not cache), a custom compiler that statically schedules compute and communication, and direct chip-to-chip connectivity intended to coordinate multi-chip execution without relying on conventional caching hierarchies or dynamic runtime scheduling. The technical premise is a deliberate inversion of the conventional GPU approach. GPUs deliver throughput via massively parallel, multi-core execution with dynamic scheduling, complex memory hierarchies, and heavy reliance on off-chip HBM bandwidth and sophisticated runtime/kernel optimization. Groq instead argues that inference bottlenecks are driven by latency variance (tail latency), synchronization overhead, and memory access unpredictability inherent in dynamically scheduled, cache-heavy architectures, particularly when workloads are latency sensitive and batch sizes cannot be inflated. Groq’s solution is to move “control” into the compiler: the full execution graph and inter-chip communication schedule are computed ahead of time down to clock-cycle granularity, with deterministic execution designed to reduce run-to-run variance. In Groq’s framing, the removal of caches, reorder buffers, speculative execution overhead, and other sources of contention enables predictable latency and high utilization without per-model kernel engineering typical of GPU tuning cycles. A critical nuance is that Groq’s determinism is not merely a software claim; it is tightly coupled to architectural constraints and system design choices that trade flexibility for predictability. Third-party technical commentary indicates Groq’s chip uses a fully deterministic VLIW-style approach with minimal buffering, no external memory, and heavy dependence on sharding models across many chips because on-chip SRAM capacity is limited. SemiAnalysis describes a ~725 mm^2 die on GlobalFoundries 14nm with ~230MB of SRAM and notes that “no useful models” fit on a single chip, forcing multi-chip partitioning for modern LLMs and driving a system-level design where networking and compilation are first-class scheduling problems rather than ancillary infrastructure. This is consistent with Groq’s own messaging that tensor parallelism across chips is a primary design goal, enabled by large on-chip SRAM and compile-time coordination of compute plus interconnect. The on-chip SRAM emphasis is central to Groq’s latency story and also its most constraining trade-off. Groq claims on-chip SRAM bandwidth “upwards of 80 TB/s” and contrasts that with off-chip HBM bandwidth “about 8 TB/s,” asserting a potential 10x advantage from bandwidth plus reduced trips across chip-to-memory boundaries. While these comparisons are marketing-oriented and depend on workload specifics, the architectural implication is clear: Groq prioritizes ultra-fast local weight/activation access and then scales capacity by adding chips, not by attaching large off-chip memory pools. This design can reduce latency for sequential inference layers and minimize unpredictable stalls, but it pushes complexity into partitioning strategy, interconnect topology, and compiler scheduling, and it increases the number of chips needed for very large parameter counts and large KV-cache footprints. Groq also highlights numeric formats and compiler-driven precision management as a performance lever. In its 2025 technical blog, Groq describes “TruePoint numerics,” including 100-bit intermediate accumulation and selective quantization choices (FP32 for attention-sensitive operations, block floating point for MoE weights, FP8 storage in error-tolerant layers), and claims 2-4x speedups versus BF16 without measurable accuracy degradation on benchmarks such as MMLU and HumanEval. Even if the absolute uplift is workload dependent, the strategic point is that Groq is pursuing performance via end-to-end co-design: precision policy is not just hardware capability (FP8/BF16) but compiler-enforced mapping of precision to error sensitivity, which can matter materially for inference cost-per-token if it reduces memory traffic and boosts throughput without forcing aggressive, accuracy-damaging quantization. Independent performance datapoints indicate Groq has been credible on latency-oriented inference speed, at least for certain regimes. EE Times reported in 2023 that Groq demonstrated Llama-2 70B inference at ~240 tokens/s per user on a cloud-based dev system described as 10 racks and 64 chips, using the company’s 1st-gen silicon introduced several years earlier. Separate Groq commentary around independent benchmarking cites results showing ~241 tokens/s throughput and ~0.8s time to receive 100 output tokens for a Llama-2 70B API configuration, positioning the platform as a step-change in “available speed” for certain interactive use cases. These figures do not settle total cost-of-ownership versus GPUs or hyperscaler ASICs, but they establish that Groq’s system-level architecture can deliver strong single-user throughput and latency on large models when properly partitioned and scheduled. GroqCloud is the commercial wrapper that packages this hardware/software stack as “tokens-as-a-service,” aiming to make Groq adoption feel like switching API endpoints rather than adopting new silicon. Groq’s documentation states its API is designed to be “mostly compatible” with OpenAI client libraries, and its pricing page provides model-specific token rates, published speeds (tokens/s), prompt caching discounts, and batch processing discounts. For example, pricing lists inputs as low as $0.05 per 1M tokens and outputs as low as $0.08 per 1M tokens for certain smaller LLM configurations, with higher prices for larger models and long-context or MoE variants; it also advertises prompt caching with a 50% discount on cached input tokens for certain models and a batch API offering 50% lower cost for asynchronous processing windows. These mechanics are economically important because they demonstrate Groq’s go-to-market is not simply “sell chips,” but “sell predictable unit economics per token,” with tooling (batch, caching) that directly targets inference cost drivers (reused prompts, throughput smoothing, and asynchronous workloads). The cloud footprint and distribution partnerships indicate Groq has been building an inference-native “edge within the cloud” strategy rather than competing head-on with hyperscalers on breadth of services. A 2025 Groq newsroom release describes a European deployment in Helsinki with Equinix, positioned as latency reduction and data governance for European customers, and explicitly references Equinix Fabric enabling private connectivity to GroqCloud over public, private, or sovereign infrastructure. The same release enumerates additional capacity in the U.S. (Equinix, DataBank), Canada (Bell Canada), and Saudi Arabia (HUMAIN), and states these sites collectively served more than 20M tokens/s across Groq’s global network at that time. That supply-side metric matters because it provides a directional sense that Groq is scaling capacity as a network, not merely as a chip vendor. Customer disclosure is inherently limited because Groq is private and many enterprise deployments are not public, but Groq’s marketing materials and partnerships provide signals about demand vectors. The company’s public website displays logos of large consumer and enterprise brands (e.g., Dropbox, Vercel, Chevron, Volkswagen, Canva, Robinhood, Riot Games, Workday, Ramp) and includes a published customer quote claiming a 7.41x chat speed increase and an 89% cost reduction after moving to GroqCloud, followed by a tripling of token consumption. While marketing claims should be treated as case-specific and not generalized, they indicate that Groq is targeting both AI-native developers (who measure success by latency and cost-per-token) and enterprise buyers (who care about predictable performance and governance). Supplier and dependency mapping for Groq spans 3 layers: silicon production, system integration, and cloud infrastructure. On silicon, third-party analysis indicates GlobalFoundries 14nm for the 1st-gen Groq chip, implying a supply chain less constrained by the most capacity-tight leading-edge nodes and advanced packaging bottlenecks that dominate high-end GPU supply (HBM stacks, CoWoS-type packaging constraints). If accurate, this is strategically meaningful because it suggests Groq capacity expansion could be gated more by conventional wafer supply, board assembly, and data center power than by the same HBM/advanced packaging scarcity that has constrained top-tier GPU ramp cycles. On systems and cloud, Groq’s own releases identify colocation and connectivity partners (Equinix, DataBank, Bell Canada) and a Middle East partner (HUMAIN), implying dependencies on data center real estate, power availability, and network connectivity, alongside procurement of standard server components, NICs/switching, racks, and cooling infrastructure. The Groq design narrative also emphasizes air cooling and reduced need for complex power/cooling infrastructure, which—if realized in deployments—can widen the set of feasible hosting locations and lower deployment friction relative to liquid-cooled, very high power density GPU racks. Against that backdrop, the strategic rationale for NVIDIA acquiring Groq can be framed as a set of overlapping objectives: inference silicon optionality, architectural hedging, competitive defense, and supply chain diversification, with the carve-out of GroqCloud signaling a preference to avoid direct cloud competition and to focus on IP and product portfolio control rather than operating a capital-intensive token-serving business. The deal, if confirmed, would occur at a valuation step-up of ~190% versus Groq’s reported ~$6.9B private valuation in the September $750M round, reinforcing that any acquisition logic would be predominantly strategic rather than a conventional financial multiple arbitrage. The most compelling strategic driver is inference. Training has historically been the center of gravity for cutting-edge GPU demand, but inference volume is structurally larger and more distributed as deployments scale, with economics dominated by cost-per-token, latency guarantees, and utilization under spiky demand. Inference workloads also create a strategic vulnerability for NVIDIA: hyperscalers and large platforms can justify bespoke ASICs (TPU, Trainium/Inferentia, Maia-class efforts) because inference is stable, repeatable, and can amortize software investment at massive scale. Groq’s core proposition—deterministic, compiler-scheduled inference with predictable latency—aligns directly with the segment where GPU generality is least valued and where “good enough” programmability plus superior unit economics can win share. Acquiring Groq would allow NVIDIA to own a credible inference-native architecture rather than relying solely on GPUs and software optimization to defend that segment. Competitive defense logic is also plausible. Groq occupies a specific competitive wedge: low-latency, high-throughput interactive inference, delivered via a simple API abstraction that reduces switching cost. That wedge directly pressures GPU inference margins in the long run because it makes inference price/performance comparisons more transparent at the token level, and it targets a developer persona that historically defaulted to CUDA-first ecosystems. Even if NVIDIA’s current-generation systems can achieve very high tokens/s per user with extensive optimization, the strategic risk is that competing architectures normalize the idea that inference is best served by special-purpose silicon with a simpler programming model, weakening CUDA lock-in at the application layer. NVIDIA has actively demonstrated that Blackwell-era systems can exceed 1,000 tokens/s per user in benchmarked configurations, but that performance leadership does not automatically translate to lowest cost-per-token across the full range of batch sizes, latency targets, and deployment environments. Groq’s existence as a credible alternative architecture forces NVIDIA to keep defending inference economics rather than only raw performance leadership. The “technology acquisition” rationale is unusually strong in this specific case because Groq’s differentiator is not a single block of silicon IP but an end-to-end methodology: compiler-led static scheduling, deterministic networking, and a system architecture designed around tensor-parallel inference rather than throughput-maximizing batch inference. NVIDIA’s stack is already compiler-heavy (TensorRT, Triton, CUDA graphs, kernel fusion, speculative decoding techniques), but GPUs remain dynamically scheduled devices with complex memory hierarchies and stochastic latency behaviors under contention. Groq’s approach provides an alternate design point: treating the entire inference execution (compute plus communication) as a statically schedulable program. In principle, that IP could be valuable even if Groq silicon itself is not adopted at massive scale, because it can inform how NVIDIA builds future inference-optimized products, compilers, and networking fabrics, especially as distributed inference with large models makes communication a first-order performance determinant. Supply chain diversification is a non-obvious but potentially important driver. If Groq’s mainstream product generation is truly based on a mature process node and avoids HBM, then the scaling constraints look different than those of state-of-the-art GPUs. NVIDIA’s ability to meet incremental demand has been tightly coupled to advanced packaging and HBM supply, and those constraints can remain binding even when wafer supply is available. An inference ASIC architecture that relies primarily on on-chip SRAM and scales by adding chips—while not costless—could reduce dependence on HBM availability and advanced packaging capacity, enabling NVIDIA to ship “inference capacity” in higher absolute volumes or into geographies and customer segments where the highest-end GPUs are economically or logistically difficult to deploy. This could be particularly relevant for latency-sensitive inference deployed in regional colocation footprints rather than centralized hyperscale campuses. The carve-out of GroqCloud, if accurate, is itself a strategic signal about NVIDIA’s priorities. Operating a token-serving cloud at scale is capital intensive, structurally lower margin than silicon IP rents, and creates channel conflict with hyperscalers and CSP partners who are core NVIDIA customers. NVIDIA has generally positioned its cloud offerings through partnerships rather than as a direct hyperscale competitor. Excluding GroqCloud would preserve neutrality with CSPs and avoid inheriting multi-region data residency obligations and partner contracts, while still allowing NVIDIA to acquire Groq’s silicon, compiler technology, and engineering talent. At the same time, excluding GroqCloud would also mean NVIDIA would not automatically acquire the commercial proof-point of Groq’s unit economics or the customer contracts that validate product-market fit at scale, increasing the importance of diligence on whether Groq’s cloud pricing is structurally profitable or partially subsidized by fundraising. There is also a “preemptive acquisition” angle. The reporting identifies recent investors in Groq’s latest round including large financial institutions and strategic/industry players. In that context, Groq represents an asset that could plausibly have been acquired by a competitor (AMD/Intel) or by a hyperscaler seeking to accelerate inference independence. NVIDIA acquiring Groq could be a defensive move to prevent a credible inference-native architecture from being weaponized by a rival with deep distribution. Even if GroqCloud is carved out, controlling the silicon roadmap and compiler IP would meaningfully constrain Groq’s ability to evolve into a standalone competitor, unless the carved-out entity retains long-term rights to the hardware and software stack. However, the strategic case is not one-sided; there are meaningful risks and potential contradictions that would need to be reconciled for the transaction to be value-accretive on a multi-year horizon. 1st, Groq’s architecture appears to rely on scaling out chip count to achieve capacity, which introduces system cost, networking complexity, and physical footprint considerations. The absence of external memory and limited on-chip SRAM implies very large models require substantial chip parallelism, and the economics then depend heavily on chip cost, yield, power efficiency, and interconnect overhead. SemiAnalysis explicitly frames Groq as trading space for time and raises questions about token economics and whether publicly advertised pricing reflects fully loaded costs or market share capture. 2nd, integration risk is non-trivial. Groq’s compiler-led deterministic model is philosophically and practically different from CUDA’s dominant programming and execution model. A poorly executed integration could create internal product confusion, dilute engineering focus, or alienate developers if the combined stack fragments. 3rd, there is cannibalization risk. If Groq-class inference silicon undercuts GPU inference economics, NVIDIA could face internal margin trade-offs, even if the goal is to defend share against hyperscaler ASICs. Cannibalization can still be rational if it prevents larger share loss, but it would require crisp portfolio segmentation and go-to-market discipline. The presence of NVIDIA’s own rapidly improving inference performance complicates the “need” for Groq but does not eliminate the “option value.” NVIDIA has demonstrated benchmark-leading tokens/s per user on Blackwell-based systems, suggesting that raw interactive throughput is not necessarily the limiting factor for NVIDIA’s product line. The more enduring strategic question is unit economics and architectural control: whether future inference demand is better monetized through general-purpose GPUs plus software optimization, or whether a bifurcated product portfolio (training GPUs plus inference-native ASICs) becomes necessary to defend total AI compute wallet share as hyperscaler ASIC penetration increases. Acquiring Groq could be a decisive move to ensure NVIDIA participates in both regimes rather than betting exclusively on GPUs to win inference forever. What is “special” about Groq’s technology relative to a typical accelerator roadmap is the tight coupling of determinism, compilation, and networking into a single scheduling problem. The LPU narrative emphasizes deterministic compute and networking, static scheduling, and direct chip-to-chip coordination that allows “hundreds” (more precisely, 100s) of chips to behave like a single scheduled resource. The architecture also explicitly targets tensor-parallel, latency-optimized distribution rather than pure data-parallel throughput scaling, which matters for real-time applications where a single response must arrive quickly rather than many requests being processed in bulk. The implication is that Groq is optimized for the time-to-first-token and steady token streaming behavior that defines user experience in interactive LLMs, and it attempts to achieve that without relying on large batch sizes that can degrade latency. From a portfolio manager’s perspective, the most important interpretation is that an NVIDIA-Groq combination would likely be less about “NVIDIA needs more inference speed” and more about controlling the architectural trajectory of inference acceleration and removing a fast-improving, developer-friendly competitor from the market. The carve-out of GroqCloud would reinforce that the transaction is aimed at IP, talent, and product optionality, not acquiring a cloud revenue stream. The valuation step-up implied by $20B versus $6.9B would therefore be justified only if the acquired assets materially reduce long-term competitive risk (hyperscaler ASIC displacement, inference margin compression) or enable new monetization vectors (inference ASIC product line, supply chain de-bottlenecking, improved software determinism) that would be difficult to achieve on a comparable timeline via internal R&D.

$NVDA $GFS NVIDIA’s reported agreement to acquire Groq for $20B in cash (per CNBC, amplified via Reuters and other wire coverage) represents a materially different strategic posture than NVIDIA’s prior M&A pattern, given both the headline size (largest reported NVIDIA acquisition to date) and the unusual carve-out that Groq’s early-stage cloud business would not be included. Public reporting indicates the information originated from Alex Davis, CEO of Disruptive (lead investor in Groq’s latest financing), and that neither NVIDIA nor Groq had issued an immediate confirmation at the time of publication. The same reporting frames the transaction as coming together quickly, only months after Groq raised $750M at a ~$6.9B valuation, and highlights Groq’s positioning as a high-performance inference chip vendor founded by ex-Google TPU engineers. Groq is best understood as a vertically integrated inference acceleration company whose core asset is an application-specific processor optimized for deterministic, low-latency execution of transformer-style workloads, paired with a compiler-led software stack and a distribution layer (GroqCloud) designed to reduce developer friction via OpenAI-compatible APIs and integrations. Groq brands its architecture as a Language Processing Unit (LPU) and consistently emphasizes that the design target is inference, not training. The company’s own architecture description centers on 1-core execution, large on-chip SRAM used as primary storage (explicitly not cache), a custom compiler that statically schedules compute and communication, and direct chip-to-chip connectivity intended to coordinate multi-chip execution without relying on conventional caching hierarchies or dynamic runtime scheduling. The technical premise is a deliberate inversion of the conventional GPU approach. GPUs deliver throughput via massively parallel, multi-core execution with dynamic scheduling, complex memory hierarchies, and heavy reliance on off-chip HBM bandwidth and sophisticated runtime/kernel optimization. Groq instead argues that inference bottlenecks are driven by latency variance (tail latency), synchronization overhead, and memory access unpredictability inherent in dynamically scheduled, cache-heavy architectures, particularly when workloads are latency sensitive and batch sizes cannot be inflated. Groq’s solution is to move “control” into the compiler: the full execution graph and inter-chip communication schedule are computed ahead of time down to clock-cycle granularity, with deterministic execution designed to reduce run-to-run variance. In Groq’s framing, the removal of caches, reorder buffers, speculative execution overhead, and other sources of contention enables predictable latency and high utilization without per-model kernel engineering typical of GPU tuning cycles. A critical nuance is that Groq’s determinism is not merely a software claim; it is tightly coupled to architectural constraints and system design choices that trade flexibility for predictability. Third-party technical commentary indicates Groq’s chip uses a fully deterministic VLIW-style approach with minimal buffering, no external memory, and heavy dependence on sharding models across many chips because on-chip SRAM capacity is limited. SemiAnalysis describes a ~725 mm^2 die on GlobalFoundries 14nm with ~230MB of SRAM and notes that “no useful models” fit on a single chip, forcing multi-chip partitioning for modern LLMs and driving a system-level design where networking and compilation are first-class scheduling problems rather than ancillary infrastructure. This is consistent with Groq’s own messaging that tensor parallelism across chips is a primary design goal, enabled by large on-chip SRAM and compile-time coordination of compute plus interconnect. The on-chip SRAM emphasis is central to Groq’s latency story and also its most constraining trade-off. Groq claims on-chip SRAM bandwidth “upwards of 80 TB/s” and contrasts that with off-chip HBM bandwidth “about 8 TB/s,” asserting a potential 10x advantage from bandwidth plus reduced trips across chip-to-memory boundaries. While these comparisons are marketing-oriented and depend on workload specifics, the architectural implication is clear: Groq prioritizes ultra-fast local weight/activation access and then scales capacity by adding chips, not by attaching large off-chip memory pools. This design can reduce latency for sequential inference layers and minimize unpredictable stalls, but it pushes complexity into partitioning strategy, interconnect topology, and compiler scheduling, and it increases the number of chips needed for very large parameter counts and large KV-cache footprints. Groq also highlights numeric formats and compiler-driven precision management as a performance lever. In its 2025 technical blog, Groq describes “TruePoint numerics,” including 100-bit intermediate accumulation and selective quantization choices (FP32 for attention-sensitive operations, block floating point for MoE weights, FP8 storage in error-tolerant layers), and claims 2-4x speedups versus BF16 without measurable accuracy degradation on benchmarks such as MMLU and HumanEval. Even if the absolute uplift is workload dependent, the strategic point is that Groq is pursuing performance via end-to-end co-design: precision policy is not just hardware capability (FP8/BF16) but compiler-enforced mapping of precision to error sensitivity, which can matter materially for inference cost-per-token if it reduces memory traffic and boosts throughput without forcing aggressive, accuracy-damaging quantization. Independent performance datapoints indicate Groq has been credible on latency-oriented inference speed, at least for certain regimes. EE Times reported in 2023 that Groq demonstrated Llama-2 70B inference at ~240 tokens/s per user on a cloud-based dev system described as 10 racks and 64 chips, using the company’s 1st-gen silicon introduced several years earlier. Separate Groq commentary around independent benchmarking cites results showing ~241 tokens/s throughput and ~0.8s time to receive 100 output tokens for a Llama-2 70B API configuration, positioning the platform as a step-change in “available speed” for certain interactive use cases. These figures do not settle total cost-of-ownership versus GPUs or hyperscaler ASICs, but they establish that Groq’s system-level architecture can deliver strong single-user throughput and latency on large models when properly partitioned and scheduled. GroqCloud is the commercial wrapper that packages this hardware/software stack as “tokens-as-a-service,” aiming to make Groq adoption feel like switching API endpoints rather than adopting new silicon. Groq’s documentation states its API is designed to be “mostly compatible” with OpenAI client libraries, and its pricing page provides model-specific token rates, published speeds (tokens/s), prompt caching discounts, and batch processing discounts. For example, pricing lists inputs as low as $0.05 per 1M tokens and outputs as low as $0.08 per 1M tokens for certain smaller LLM configurations, with higher prices for larger models and long-context or MoE variants; it also advertises prompt caching with a 50% discount on cached input tokens for certain models and a batch API offering 50% lower cost for asynchronous processing windows. These mechanics are economically important because they demonstrate Groq’s go-to-market is not simply “sell chips,” but “sell predictable unit economics per token,” with tooling (batch, caching) that directly targets inference cost drivers (reused prompts, throughput smoothing, and asynchronous workloads). The cloud footprint and distribution partnerships indicate Groq has been building an inference-native “edge within the cloud” strategy rather than competing head-on with hyperscalers on breadth of services. A 2025 Groq newsroom release describes a European deployment in Helsinki with Equinix, positioned as latency reduction and data governance for European customers, and explicitly references Equinix Fabric enabling private connectivity to GroqCloud over public, private, or sovereign infrastructure. The same release enumerates additional capacity in the U.S. (Equinix, DataBank), Canada (Bell Canada), and Saudi Arabia (HUMAIN), and states these sites collectively served more than 20M tokens/s across Groq’s global network at that time. That supply-side metric matters because it provides a directional sense that Groq is scaling capacity as a network, not merely as a chip vendor. Customer disclosure is inherently limited because Groq is private and many enterprise deployments are not public, but Groq’s marketing materials and partnerships provide signals about demand vectors. The company’s public website displays logos of large consumer and enterprise brands (e.g., Dropbox, Vercel, Chevron, Volkswagen, Canva, Robinhood, Riot Games, Workday, Ramp) and includes a published customer quote claiming a 7.41x chat speed increase and an 89% cost reduction after moving to GroqCloud, followed by a tripling of token consumption. While marketing claims should be treated as case-specific and not generalized, they indicate that Groq is targeting both AI-native developers (who measure success by latency and cost-per-token) and enterprise buyers (who care about predictable performance and governance). Supplier and dependency mapping for Groq spans 3 layers: silicon production, system integration, and cloud infrastructure. On silicon, third-party analysis indicates GlobalFoundries 14nm for the 1st-gen Groq chip, implying a supply chain less constrained by the most capacity-tight leading-edge nodes and advanced packaging bottlenecks that dominate high-end GPU supply (HBM stacks, CoWoS-type packaging constraints). If accurate, this is strategically meaningful because it suggests Groq capacity expansion could be gated more by conventional wafer supply, board assembly, and data center power than by the same HBM/advanced packaging scarcity that has constrained top-tier GPU ramp cycles. On systems and cloud, Groq’s own releases identify colocation and connectivity partners (Equinix, DataBank, Bell Canada) and a Middle East partner (HUMAIN), implying dependencies on data center real estate, power availability, and network connectivity, alongside procurement of standard server components, NICs/switching, racks, and cooling infrastructure. The Groq design narrative also emphasizes air cooling and reduced need for complex power/cooling infrastructure, which—if realized in deployments—can widen the set of feasible hosting locations and lower deployment friction relative to liquid-cooled, very high power density GPU racks. Against that backdrop, the strategic rationale for NVIDIA acquiring Groq can be framed as a set of overlapping objectives: inference silicon optionality, architectural hedging, competitive defense, and supply chain diversification, with the carve-out of GroqCloud signaling a preference to avoid direct cloud competition and to focus on IP and product portfolio control rather than operating a capital-intensive token-serving business. The deal, if confirmed, would occur at a valuation step-up of ~190% versus Groq’s reported ~$6.9B private valuation in the September $750M round, reinforcing that any acquisition logic would be predominantly strategic rather than a conventional financial multiple arbitrage. The most compelling strategic driver is inference. Training has historically been the center of gravity for cutting-edge GPU demand, but inference volume is structurally larger and more distributed as deployments scale, with economics dominated by cost-per-token, latency guarantees, and utilization under spiky demand. Inference workloads also create a strategic vulnerability for NVIDIA: hyperscalers and large platforms can justify bespoke ASICs (TPU, Trainium/Inferentia, Maia-class efforts) because inference is stable, repeatable, and can amortize software investment at massive scale. Groq’s core proposition—deterministic, compiler-scheduled inference with predictable latency—aligns directly with the segment where GPU generality is least valued and where “good enough” programmability plus superior unit economics can win share. Acquiring Groq would allow NVIDIA to own a credible inference-native architecture rather than relying solely on GPUs and software optimization to defend that segment. Competitive defense logic is also plausible. Groq occupies a specific competitive wedge: low-latency, high-throughput interactive inference, delivered via a simple API abstraction that reduces switching cost. That wedge directly pressures GPU inference margins in the long run because it makes inference price/performance comparisons more transparent at the token level, and it targets a developer persona that historically defaulted to CUDA-first ecosystems. Even if NVIDIA’s current-generation systems can achieve very high tokens/s per user with extensive optimization, the strategic risk is that competing architectures normalize the idea that inference is best served by special-purpose silicon with a simpler programming model, weakening CUDA lock-in at the application layer. NVIDIA has actively demonstrated that Blackwell-era systems can exceed 1,000 tokens/s per user in benchmarked configurations, but that performance leadership does not automatically translate to lowest cost-per-token across the full range of batch sizes, latency targets, and deployment environments. Groq’s existence as a credible alternative architecture forces NVIDIA to keep defending inference economics rather than only raw performance leadership. The “technology acquisition” rationale is unusually strong in this specific case because Groq’s differentiator is not a single block of silicon IP but an end-to-end methodology: compiler-led static scheduling, deterministic networking, and a system architecture designed around tensor-parallel inference rather than throughput-maximizing batch inference. NVIDIA’s stack is already compiler-heavy (TensorRT, Triton, CUDA graphs, kernel fusion, speculative decoding techniques), but GPUs remain dynamically scheduled devices with complex memory hierarchies and stochastic latency behaviors under contention. Groq’s approach provides an alternate design point: treating the entire inference execution (compute plus communication) as a statically schedulable program. In principle, that IP could be valuable even if Groq silicon itself is not adopted at massive scale, because it can inform how NVIDIA builds future inference-optimized products, compilers, and networking fabrics, especially as distributed inference with large models makes communication a first-order performance determinant. Supply chain diversification is a non-obvious but potentially important driver. If Groq’s mainstream product generation is truly based on a mature process node and avoids HBM, then the scaling constraints look different than those of state-of-the-art GPUs. NVIDIA’s ability to meet incremental demand has been tightly coupled to advanced packaging and HBM supply, and those constraints can remain binding even when wafer supply is available. An inference ASIC architecture that relies primarily on on-chip SRAM and scales by adding chips—while not costless—could reduce dependence on HBM availability and advanced packaging capacity, enabling NVIDIA to ship “inference capacity” in higher absolute volumes or into geographies and customer segments where the highest-end GPUs are economically or logistically difficult to deploy. This could be particularly relevant for latency-sensitive inference deployed in regional colocation footprints rather than centralized hyperscale campuses. The carve-out of GroqCloud, if accurate, is itself a strategic signal about NVIDIA’s priorities. Operating a token-serving cloud at scale is capital intensive, structurally lower margin than silicon IP rents, and creates channel conflict with hyperscalers and CSP partners who are core NVIDIA customers. NVIDIA has generally positioned its cloud offerings through partnerships rather than as a direct hyperscale competitor. Excluding GroqCloud would preserve neutrality with CSPs and avoid inheriting multi-region data residency obligations and partner contracts, while still allowing NVIDIA to acquire Groq’s silicon, compiler technology, and engineering talent. At the same time, excluding GroqCloud would also mean NVIDIA would not automatically acquire the commercial proof-point of Groq’s unit economics or the customer contracts that validate product-market fit at scale, increasing the importance of diligence on whether Groq’s cloud pricing is structurally profitable or partially subsidized by fundraising. There is also a “preemptive acquisition” angle. The reporting identifies recent investors in Groq’s latest round including large financial institutions and strategic/industry players. In that context, Groq represents an asset that could plausibly have been acquired by a competitor (AMD/Intel) or by a hyperscaler seeking to accelerate inference independence. NVIDIA acquiring Groq could be a defensive move to prevent a credible inference-native architecture from being weaponized by a rival with deep distribution. Even if GroqCloud is carved out, controlling the silicon roadmap and compiler IP would meaningfully constrain Groq’s ability to evolve into a standalone competitor, unless the carved-out entity retains long-term rights to the hardware and software stack. However, the strategic case is not one-sided; there are meaningful risks and potential contradictions that would need to be reconciled for the transaction to be value-accretive on a multi-year horizon. 1st, Groq’s architecture appears to rely on scaling out chip count to achieve capacity, which introduces system cost, networking complexity, and physical footprint considerations. The absence of external memory and limited on-chip SRAM implies very large models require substantial chip parallelism, and the economics then depend heavily on chip cost, yield, power efficiency, and interconnect overhead. SemiAnalysis explicitly frames Groq as trading space for time and raises questions about token economics and whether publicly advertised pricing reflects fully loaded costs or market share capture. 2nd, integration risk is non-trivial. Groq’s compiler-led deterministic model is philosophically and practically different from CUDA’s dominant programming and execution model. A poorly executed integration could create internal product confusion, dilute engineering focus, or alienate developers if the combined stack fragments. 3rd, there is cannibalization risk. If Groq-class inference silicon undercuts GPU inference economics, NVIDIA could face internal margin trade-offs, even if the goal is to defend share against hyperscaler ASICs. Cannibalization can still be rational if it prevents larger share loss, but it would require crisp portfolio segmentation and go-to-market discipline. The presence of NVIDIA’s own rapidly improving inference performance complicates the “need” for Groq but does not eliminate the “option value.” NVIDIA has demonstrated benchmark-leading tokens/s per user on Blackwell-based systems, suggesting that raw interactive throughput is not necessarily the limiting factor for NVIDIA’s product line. The more enduring strategic question is unit economics and architectural control: whether future inference demand is better monetized through general-purpose GPUs plus software optimization, or whether a bifurcated product portfolio (training GPUs plus inference-native ASICs) becomes necessary to defend total AI compute wallet share as hyperscaler ASIC penetration increases. Acquiring Groq could be a decisive move to ensure NVIDIA participates in both regimes rather than betting exclusively on GPUs to win inference forever. What is “special” about Groq’s technology relative to a typical accelerator roadmap is the tight coupling of determinism, compilation, and networking into a single scheduling problem. The LPU narrative emphasizes deterministic compute and networking, static scheduling, and direct chip-to-chip coordination that allows “hundreds” (more precisely, 100s) of chips to behave like a single scheduled resource. The architecture also explicitly targets tensor-parallel, latency-optimized distribution rather than pure data-parallel throughput scaling, which matters for real-time applications where a single response must arrive quickly rather than many requests being processed in bulk. The implication is that Groq is optimized for the time-to-first-token and steady token streaming behavior that defines user experience in interactive LLMs, and it attempts to achieve that without relying on large batch sizes that can degrade latency. From a portfolio manager’s perspective, the most important interpretation is that an NVIDIA-Groq combination would likely be less about “NVIDIA needs more inference speed” and more about controlling the architectural trajectory of inference acceleration and removing a fast-improving, developer-friendly competitor from the market. The carve-out of GroqCloud would reinforce that the transaction is aimed at IP, talent, and product optionality, not acquiring a cloud revenue stream. The valuation step-up implied by $20B versus $6.9B would therefore be justified only if the acquired assets materially reduce long-term competitive risk (hyperscaler ASIC displacement, inference margin compression) or enable new monetization vectors (inference ASIC product line, supply chain de-bottlenecking, improved software determinism) that would be difficult to achieve on a comparable timeline via internal R&D.

TheValueist

102,145 görüntüleme • 7 ay önce

Cerebras just IPO’d and the stock already ran up over 100% (Save this). For the entire 70 year history of the semiconductor industry, every company on earth has followed the same process. You take a dinner plate sized silicon wafer, put hundreds of tiny chips onto it, and dice it up like a pizza. Nvidia does it this way, AMD does it this way, Intel has done it this way for six decades and everyone who tried to break that convention failed. Until Cerebras asked the most annoyingly obvious question in the industry’s history, what if you just didn’t cut it? The result is the Wafer Scale Engine, a single chip 56 times larger than Nvidia’s H100 and it fundamentally changes the physics of how AI inference works. The reason this matters is not the size, it’s the bandwidth. Every time an AI model generates a single word, it has to reach into memory, pull weights, multiply them together, and produce a prediction and when you’re running millions of concurrent sessions at once, the bottleneck is not raw processing power but how fast data moves between memory and compute. Nvidia’s H100 moves data at roughly 3 terabytes per second, while Cerebras’ WSE-3 moves data at 21 petabytes per second, roughly 7,000 times faster because memory and compute live on the same enormous piece of silicon and data barely has to travel at all. That gap is exactly why OpenAI went from 150 tokens per second on traditional GPUs to 2,000 tokens per second on Cerebras hardware, and why AWS integrated Cerebras into Bedrock to deliver roughly 5x more inference capacity in the same physical footprint. The macro setup is making the trade even more urgent. South Korea DRAM export prices recently jumped 35%, flash memory surged 47%, and SSD pricing spiked nearly 140% and every single one of those increases hits Nvidia-based infrastructure directly, because the H100 requires 80GB of the most expensive, most contested memory in the AI supply chain. Cerebras’ WSE-3 uses zero external HBM memory, baking 44GB of SRAM directly into the wafer itself which means as memory pricing goes parabolic, every CFO evaluating AI infrastructure is suddenly looking much more seriously at the architecture that sidesteps that cost entirely. The demand is already showing up in the backlog. Cerebras ended 2025 with $24.6 billion in remaining performance obligations for a company doing just over $500 million in annual revenue, that is a number that implies years of contracted growth already sitting on the books. The IPO was 20x oversubscribed, the price range was raised twice before listing, and shares opened 89% above their listing price on a $5.55 billion raise that made it the largest semiconductor IPO in history. The risks are real and worth naming. 86% of 2025 revenue came from two entities with UAE ties, U.S. revenue actually fell 34% to $187 million, and the $20 billion OpenAI contract is conditional, if Cerebras misses delivery milestones, OpenAI can terminate and trigger repayment demands on a $1 billion loan facility. And yet the market is valuing Cerebras at roughly 91x trailing revenue, richer than Nvidia, AMD, and Arm combined. What investors are betting on is not that Cerebras beats Nvidia, it is that the inference supercycle is large enough to support an entirely different architecture optimized for a different workload, and that $24.6 billion in contracted backlog converts to diversified revenue before the market starts asking harder questions. CEO Andrew Feldman said this took a decade of late nights to get right, everyone who tried to copy it failed and given that the entire inference economy is now running through exactly the bottleneck Cerebras was built to eliminate, the market is starting to believe him.

Cerebras just IPO’d and the stock already ran up over 100% (Save this). For the entire 70 year history of the semiconductor industry, every company on earth has followed the same process. You take a dinner plate sized silicon wafer, put hundreds of tiny chips onto it, and dice it up like a pizza. Nvidia does it this way, AMD does it this way, Intel has done it this way for six decades and everyone who tried to break that convention failed. Until Cerebras asked the most annoyingly obvious question in the industry’s history, what if you just didn’t cut it? The result is the Wafer Scale Engine, a single chip 56 times larger than Nvidia’s H100 and it fundamentally changes the physics of how AI inference works. The reason this matters is not the size, it’s the bandwidth. Every time an AI model generates a single word, it has to reach into memory, pull weights, multiply them together, and produce a prediction and when you’re running millions of concurrent sessions at once, the bottleneck is not raw processing power but how fast data moves between memory and compute. Nvidia’s H100 moves data at roughly 3 terabytes per second, while Cerebras’ WSE-3 moves data at 21 petabytes per second, roughly 7,000 times faster because memory and compute live on the same enormous piece of silicon and data barely has to travel at all. That gap is exactly why OpenAI went from 150 tokens per second on traditional GPUs to 2,000 tokens per second on Cerebras hardware, and why AWS integrated Cerebras into Bedrock to deliver roughly 5x more inference capacity in the same physical footprint. The macro setup is making the trade even more urgent. South Korea DRAM export prices recently jumped 35%, flash memory surged 47%, and SSD pricing spiked nearly 140% and every single one of those increases hits Nvidia-based infrastructure directly, because the H100 requires 80GB of the most expensive, most contested memory in the AI supply chain. Cerebras’ WSE-3 uses zero external HBM memory, baking 44GB of SRAM directly into the wafer itself which means as memory pricing goes parabolic, every CFO evaluating AI infrastructure is suddenly looking much more seriously at the architecture that sidesteps that cost entirely. The demand is already showing up in the backlog. Cerebras ended 2025 with $24.6 billion in remaining performance obligations for a company doing just over $500 million in annual revenue, that is a number that implies years of contracted growth already sitting on the books. The IPO was 20x oversubscribed, the price range was raised twice before listing, and shares opened 89% above their listing price on a $5.55 billion raise that made it the largest semiconductor IPO in history. The risks are real and worth naming. 86% of 2025 revenue came from two entities with UAE ties, U.S. revenue actually fell 34% to $187 million, and the $20 billion OpenAI contract is conditional, if Cerebras misses delivery milestones, OpenAI can terminate and trigger repayment demands on a $1 billion loan facility. And yet the market is valuing Cerebras at roughly 91x trailing revenue, richer than Nvidia, AMD, and Arm combined. What investors are betting on is not that Cerebras beats Nvidia, it is that the inference supercycle is large enough to support an entirely different architecture optimized for a different workload, and that $24.6 billion in contracted backlog converts to diversified revenue before the market starts asking harder questions. CEO Andrew Feldman said this took a decade of late nights to get right, everyone who tried to copy it failed and given that the entire inference economy is now running through exactly the bottleneck Cerebras was built to eliminate, the market is starting to believe him.

Milk Road AI

30,441 görüntüleme • 2 ay önce