SemiAnalysis's banner

SemiAnalysis

@SemiAnalysis_ • 140,217 subscribers

Shorts

Jensen giving out Pocky sticks in Asia. Just like how some neoclouds feel that NVIDIA gives brownie points to neoclouds that are marketing or selling NVIDIA software that no AI-native end user wants, like MIG, NIMS, and NVAIE. Some neoclouds feel that NVIDIA gives brownie points to them for buying NVIDIA BlueField-3 DPUs even though they are nowhere near having the skillset to properly use them.

Jensen giving out Pocky sticks in Asia. Just like how some neoclouds feel that NVIDIA gives brownie points to neoclouds that are marketing or selling NVIDIA software that no AI-native end user wants, like MIG, NIMS, and NVAIE. Some neoclouds feel that NVIDIA gives brownie points to them for buying NVIDIA BlueField-3 DPUs even though they are nowhere near having the skillset to properly use them.

302,452 просмотров

TSMC 2nm process data leak investigation🧵 Taipei time August 5th, a trade secret leak case involving TSMC's 2nm technology was uncovered. 9 engineers were suspected of leaking confidential information, and 3 of them were detained by the Intellectual Property and Commercial Court.

TSMC 2nm process data leak investigation🧵 Taipei time August 5th, a trade secret leak case involving TSMC's 2nm technology was uncovered. 9 engineers were suspected of leaking confidential information, and 3 of them were detained by the Intellectual Property and Commercial Court.

351,136 просмотров

$Wide Expert Parallelism increases the total memory bandwidth available per MoE deployment. This means the model distributes the MoE expert weights across multiple GPUs, so each GPU only needs to load a tiny fraction of the weights. This translates to higher throughput per GPU, increasing perf per dollar and perf per watt.$

Wide Expert Parallelism increases the total memory bandwidth available per MoE deployment. This means the model distributes the MoE expert weights across multiple GPUs, so each GPU only needs to load a tiny fraction of the weights. This translates to higher throughput per GPU, increasing perf per dollar and perf per watt.

30,205 просмотров

hi i'm semiman

hi i'm semiman

51,930 просмотров

Congrats to Dylan's Cousin, Dwarkesh Patel , on reaching 1 million subscribers on YouTube! Dylan & Dwarkesh Patel's family come from Gujarat but they grew up in America most of their lives mainly in Econo Lodge motels.

Congrats to Dylan's Cousin, Dwarkesh Patel , on reaching 1 million subscribers on YouTube! Dylan & Dwarkesh Patel's family come from Gujarat but they grew up in America most of their lives mainly in Econo Lodge motels.

112,857 просмотров

A Day in the Life of Dwarkesh Patel's Cousin

A Day in the Life of Dwarkesh Patel's Cousin

51,310 просмотров

This is how OpenAI comes back and beats Anthropic

This is how OpenAI comes back and beats Anthropic

41,939 просмотров

AMD's software quality has massively improved since AMD DC GPU division went hardcore mode back in January 2025. It isn't just us saying this but many of AMD's Instinct GPU customers are saying this too. Great work to Anush Elangovan 's team of amazing engineers.🥳

AMD's software quality has massively improved since AMD DC GPU division went hardcore mode back in January 2025. It isn't just us saying this but many of AMD's Instinct GPU customers are saying this too. Great work to Anush Elangovan 's team of amazing engineers.🥳

65,099 просмотров

AMD claims that all their software is open source, yet reality does not match this claim. For example, AMD's rocprof-trace-decoder is still completely closed source currently despite repeated requests for months by ML community members such as George Hotz, who is a daily AMD GPU end user. On the same June Spotify podcast, Tobias Macey & AMD's Anush Elangovan (who has a new twitter pfp) said that "open source allows for innovation to go at the pace at which the people using [AMD] want to, right, and it's not limited by the ability of what we put out in closed source form". We agree with an open source first approach and we agree that AMD should not limit ML community members like George Hotz by continuing to keep rocprof-trace-decoder closed source. When will AMD open source rocprof-trace-decoder?

AMD claims that all their software is open source, yet reality does not match this claim. For example, AMD's rocprof-trace-decoder is still completely closed source currently despite repeated requests for months by ML community members such as George Hotz, who is a daily AMD GPU end user. On the same June Spotify podcast, Tobias Macey & AMD's Anush Elangovan (who has a new twitter pfp) said that "open source allows for innovation to go at the pace at which the people using [AMD] want to, right, and it's not limited by the ability of what we put out in closed source form". We agree with an open source first approach and we agree that AMD should not limit ML community members like George Hotz by continuing to keep rocprof-trace-decoder closed source. When will AMD open source rocprof-trace-decoder?

53,783 просмотров

Something NVIDIA & Google do better than anyone else is software-hardware-system co-design, and not just optimizing hardware for current model architectures, but predicting future ones. Back in early 2022, when NVIDIA started the design process for NVL72, MoE (Mixture of Experts) models were not yet the standard, and dense models were still dominant for frontier models. However, NVIDIA's strong software-hardware co-design culture enabled them to make a calculated bet that MoEs were the future, and they built NVL72 specifically for best MoE performance per TCO (Total Cost of Ownership). Furthermore, back in 2022, disaggregated prefill and wide expert parallelism (wideEP) MoE inference optimizations hadn't been invented yet, but it turns out that these MoE inference optimizations work best on large-scale systems like NVL72. While most other AI chip companies' in-house AI labs focus on training small 5B models that mainly use data parallelism, NVIDIA and Google's in-house AI labs continuously push the boundaries of model architecture and training recipes, such as NVFP4 training. Just like Super Idol & IShowSpeed, there must be a strong partnership between software engineers and hardware engineers to deliver the best systems that maximize performance per TCO.

Something NVIDIA & Google do better than anyone else is software-hardware-system co-design, and not just optimizing hardware for current model architectures, but predicting future ones. Back in early 2022, when NVIDIA started the design process for NVL72, MoE (Mixture of Experts) models were not yet the standard, and dense models were still dominant for frontier models. However, NVIDIA's strong software-hardware co-design culture enabled them to make a calculated bet that MoEs were the future, and they built NVL72 specifically for best MoE performance per TCO (Total Cost of Ownership). Furthermore, back in 2022, disaggregated prefill and wide expert parallelism (wideEP) MoE inference optimizations hadn't been invented yet, but it turns out that these MoE inference optimizations work best on large-scale systems like NVL72. While most other AI chip companies' in-house AI labs focus on training small 5B models that mainly use data parallelism, NVIDIA and Google's in-house AI labs continuously push the boundaries of model architecture and training recipes, such as NVFP4 training. Just like Super Idol & IShowSpeed, there must be a strong partnership between software engineers and hardware engineers to deliver the best systems that maximize performance per TCO.

51,021 просмотров

Racks on racks on racks Racks on Racks, Buildin' Stacks Stackin' GPUs Power Whipping Liquid Flex Drip Check

Racks on racks on racks Racks on Racks, Buildin' Stacks Stackin' GPUs Power Whipping Liquid Flex Drip Check

68,561 просмотров

In late 2023, AMD made its best acquisition to date: NodAI, led by CEO Anush Elangovan. At the time, AMD had a 0% chance of challenging CUDA, while AMD was strong in hardware, it didn't understand software. Since the NodAI acquisition, Anush has driven AMD’s AI software strategy and helped reshape the org around the importance of software and software–hardware co-design. As a result, AMD now has a non-zero chance of breaking the CUDA moat. Had NVIDIA acquired NodAI instead, AMD would almost certainly still be stuck at a 0% chance.

In late 2023, AMD made its best acquisition to date: NodAI, led by CEO Anush Elangovan. At the time, AMD had a 0% chance of challenging CUDA, while AMD was strong in hardware, it didn't understand software. Since the NodAI acquisition, Anush has driven AMD’s AI software strategy and helped reshape the org around the importance of software and software–hardware co-design. As a result, AMD now has a non-zero chance of breaking the CUDA moat. Had NVIDIA acquired NodAI instead, AMD would almost certainly still be stuck at a 0% chance.

40,493 просмотров

A lot of AMD's most cracked engineers that work on the latest disagg inferencing & MoRI first princples collective library are in China celebrating major chinese holidays. Should we get human Anush Elangovan in an chinese dance battle verus AI?

A lot of AMD's most cracked engineers that work on the latest disagg inferencing & MoRI first princples collective library are in China celebrating major chinese holidays. Should we get human Anush Elangovan in an chinese dance battle verus AI?

21,470 просмотров

MI355 disaggregated serving is competitive to B200 disaggregated serving for FP8 but when composability all the optimizations that frontier labs use together like wide expert parallelism+disagg+FP4+kvcache offloading AMD is still jestermaxxing & lackluster. King Anush Elangovan needs to focus on composability of inference optimizations for ROCm stack in order to defeat the CUDA moat

MI355 disaggregated serving is competitive to B200 disaggregated serving for FP8 but when composability all the optimizations that frontier labs use together like wide expert parallelism+disagg+FP4+kvcache offloading AMD is still jestermaxxing & lackluster. King Anush Elangovan needs to focus on composability of inference optimizations for ROCm stack in order to defeat the CUDA moat

22,416 просмотров

Since RCCL is an fork of NCCL, RCCL is basically a copy+paste carbon copy of NCCL except it takes months for new features added to NCCL to reach RCCL. This is clearly not optimal for AMD to be building on their competitors platform as that means that AMD will never be better than NVIDIA or even reach parity (when iso-time) due to the delays/engineering burden of syncing with upstream. AMD is working on an moonshot project cuz MORI-CCL aims to be an first principles from scratch re-built of the AMD collective library software to not be dependent of their competitor's software. Ironically enough with MORI-CCL, it currently doesn't support AMD's Pensando NICs yet it supports NVIDIA ConnectX-7 NICs with AMD GPUs. Support for AMD's Pensando NICs is coming after support for NVIDIA ConnectX-7 NICs.

Since RCCL is an fork of NCCL, RCCL is basically a copy+paste carbon copy of NCCL except it takes months for new features added to NCCL to reach RCCL. This is clearly not optimal for AMD to be building on their competitors platform as that means that AMD will never be better than NVIDIA or even reach parity (when iso-time) due to the delays/engineering burden of syncing with upstream. AMD is working on an moonshot project cuz MORI-CCL aims to be an first principles from scratch re-built of the AMD collective library software to not be dependent of their competitor's software. Ironically enough with MORI-CCL, it currently doesn't support AMD's Pensando NICs yet it supports NVIDIA ConnectX-7 NICs with AMD GPUs. Support for AMD's Pensando NICs is coming after support for NVIDIA ConnectX-7 NICs.

33,512 просмотров

NVIDIA's CUDA Moat is becoming a Copper Moat with scale-up networking. Bigger models demand more from the scale-up network, and we❤️ Copper

NVIDIA's CUDA Moat is becoming a Copper Moat with scale-up networking. Bigger models demand more from the scale-up network, and we❤️ Copper

38,731 просмотров

We have been cookin with Seedance-2 🤤

We have been cookin with Seedance-2 🤤

19,502 просмотров

Peter Griffin explains Elon Musk 's first principle move when Tennessee pushed back on gas turbines for his AI datacenter, so Elon just built a gigawatt gas turbines across the border in Mississippi. Now Colossus-2 will be the first to 1GW

Peter Griffin explains Elon Musk 's first principle move when Tennessee pushed back on gas turbines for his AI datacenter, so Elon just built a gigawatt gas turbines across the border in Mississippi. Now Colossus-2 will be the first to 1GW

26,590 просмотров

Choose your fighter 😈

Choose your fighter 😈

15,389 просмотров

Ai Accelerator Chips

Ai Accelerator Chips

26,198 просмотров

Videos

Anya Rossi

sweetdream.ai

SweetDream.ai•Sponsored•Livecam

Watch Anya Live

Anya is streaming live right now! Join her private show and enjoy exclusive content.

Exclusive private shows

1.2k viewers online

Private Show

Join now for exclusive access

Free preview available • Premium content

A year ago, the big three was OpenAI, Anthropic, and Google. Things have changed. Moonshot's Kimi K3 sits above Gemini on every composite benchmark, and it's open source in 10 days. New episode: what K3 reveals about frontier margins, model sizes, and who's actually still in the game. 00:11 Is Kimi K3 the Third Best Model? 04:04 Why Delay the Weights? 05:30 2.8T Parameters and Serving Constraints 06:48 Frontier Margins and the 3x Price Hike 11:10 New Architecture, What Comes Next 14:09 Will Open Source Catch Closed? 19:51 Built for Chinese Accelerators 22:57 The Harness Is the Product 28:49 We're Still Early

A year ago, the big three was OpenAI, Anthropic, and Google. Things have changed. Moonshot's Kimi K3 sits above Gemini on every composite benchmark, and it's open source in 10 days. New episode: what K3 reveals about frontier margins, model sizes, and who's actually still in the game. 00:11 Is Kimi K3 the Third Best Model? 04:04 Why Delay the Weights? 05:30 2.8T Parameters and Serving Constraints 06:48 Frontier Margins and the 3x Price Hike 11:10 New Architecture, What Comes Next 14:09 Will Open Source Catch Closed? 19:51 Built for Chinese Accelerators 22:57 The Harness Is the Product 28:49 We're Still Early

100,699 просмотров • 1 день назад

God bless America the land of the free and home of the brave 🫡 250

God bless America the land of the free and home of the brave 🫡 250

122,661 просмотров • 15 дней назад

Jim Cramer "There is a company that I regard as the gospel, SemiAnalysis. I don't think people realize that SemiAnalysis is the arbiter. They're like God in the semis, and when they do, when they bless something. They are the most honest guys I've come across"

Jim Cramer "There is a company that I regard as the gospel, SemiAnalysis. I don't think people realize that SemiAnalysis is the arbiter. They're like God in the semis, and when they do, when they bless something. They are the most honest guys I've come across"

513,175 просмотров • 3 месяцев назад

to be clear, NVIDIA is NOT a car

to be clear, NVIDIA is NOT a car

282,792 просмотров • 3 месяцев назад

DeepSeek V4 ships with two variants allowing it to hit 1M Context. Kimbo @ ICML26 breaks down the attention changes and the Mega MOE speeding up compute.

DeepSeek V4 ships with two variants allowing it to hit 1M Context. Kimbo @ ICML26 breaks down the attention changes and the Mega MOE speeding up compute.

39,693 просмотров • 13 дней назад

STEVE JOB COMES BACK ALIVE TO ANNOUNCE HIS GREATEST PRODUCT YET

STEVE JOB COMES BACK ALIVE TO ANNOUNCE HIS GREATEST PRODUCT YET

88,748 просмотров • 1 месяц назад

After studying 300 Leetcode Hards, solving every Jane Street puzzle from the Dwarkesh ads, and watching one Horace He lecture, he finally landed the $400k annualized Jane Street internship. Unfortunately, during onboarding his manager said “this diff is negative alpha,” so Jane Street deployed an AI model to translate all feedback into HR-safe speech in real time.

After studying 300 Leetcode Hards, solving every Jane Street puzzle from the Dwarkesh ads, and watching one Horace He lecture, he finally landed the $400k annualized Jane Street internship. Unfortunately, during onboarding his manager said “this diff is negative alpha,” so Jane Street deployed an AI model to translate all feedback into HR-safe speech in real time.

137,241 просмотров • 2 месяцев назад

If vLLM vs SGLang can compute to one conclusion, it's gotta be that competition is beneficial. Cam Quilici explains why InferenceX showcases in isolation.

If vLLM vs SGLang can compute to one conclusion, it's gotta be that competition is beneficial. Cam Quilici explains why InferenceX showcases in isolation.

25,906 просмотров • 11 дней назад

MYTHOS IS HERE, MYTHOS IS HERE!!

MYTHOS IS HERE, MYTHOS IS HERE!!

56,128 просмотров • 29 дней назад

Running a single deep coding model at max context on Cerebras requires 24 systems ($24M Capex) just to support 256 concurrent users. At that scale, $100M gets you way more memory bandwidth in standard GB300 racks.

Running a single deep coding model at max context on Cerebras requires 24 systems ($24M Capex) just to support 256 concurrent users. At that scale, $100M gets you way more memory bandwidth in standard GB300 racks.

93,448 просмотров • 1 месяц назад

Ex-OpenAI Tech Lead, Justin Lebar joins SemiAnalysis as an Visiting Fellow to Burn $10,000 in 3 hours to find dozens of AMDGPU LLVM, x86 LLVM, NVPTX bugs 00:00 - Intro & Justin’s background 00:59 - How compiler fuzzing works 01:56 - Why we did this project 02:48 - The gap in GPU vs. CPU compiler testing 04:13 - The major AMD & x86 bugs we found 05:38 - Using LLMs to read code & find vulnerabilities 07:56 - The impact of UltraCode mode 12:18 - Doing this without AI (Time & manual limits) 15:03 - The future of AI in software development 16:17 - What’s next + key takeaways for devs

Ex-OpenAI Tech Lead, Justin Lebar joins SemiAnalysis as an Visiting Fellow to Burn $10,000 in 3 hours to find dozens of AMDGPU LLVM, x86 LLVM, NVPTX bugs 00:00 - Intro & Justin’s background 00:59 - How compiler fuzzing works 01:56 - Why we did this project 02:48 - The gap in GPU vs. CPU compiler testing 04:13 - The major AMD & x86 bugs we found 05:38 - Using LLMs to read code & find vulnerabilities 07:56 - The impact of UltraCode mode 12:18 - Doing this without AI (Time & manual limits) 15:03 - The future of AI in software development 16:17 - What’s next + key takeaways for devs

73,240 просмотров • 1 месяц назад

Cerebras represents a whole NVL72 rack on a single wafer. By routing around defects and staying on-die, they bypass the networking power bottleneck that traditional GPU clusters face.

Cerebras represents a whole NVL72 rack on a single wafer. By routing around defects and staying on-die, they bypass the networking power bottleneck that traditional GPU clusters face.

82,080 просмотров • 1 месяц назад

It has become extremely trendy among some SF AI researchers to donate to shrimp welfare. They estimate that they help improve the welfare of 1,500 shrimps per year for every $1 donated. Why do they donate to shrimps? They claim that it is the most cost-effective way of reducing suffering of sentient beings. Note that the Shrimp Welfare non-profit does not actually prevent shrimps from being killed but instead promotes the use of electrical stunning as a more humane slaughtering method that aligns with the goal of reducing shrimp suffering.

It has become extremely trendy among some SF AI researchers to donate to shrimp welfare. They estimate that they help improve the welfare of 1,500 shrimps per year for every $1 donated. Why do they donate to shrimps? They claim that it is the most cost-effective way of reducing suffering of sentient beings. Note that the Shrimp Welfare non-profit does not actually prevent shrimps from being killed but instead promotes the use of electrical stunning as a more humane slaughtering method that aligns with the goal of reducing shrimp suffering.

265,476 просмотров • 8 месяцев назад

Anthropic may have built themselves into an innovator's dilemma with Claude's CLI focus while the real AI agent revolution needs something much bigger.

Anthropic may have built themselves into an innovator's dilemma with Claude's CLI focus while the real AI agent revolution needs something much bigger.

77,682 просмотров • 2 месяцев назад

The next-gen Cerebras CS4 is staying on 5nm. Why? Because going to 3nm doesn't magically fix the fact that SRAM scaling has completely flattened.

The next-gen Cerebras CS4 is staying on 5nm. Why? Because going to 3nm doesn't magically fix the fact that SRAM scaling has completely flattened.

55,749 просмотров • 1 месяц назад

Long-term memory agreements have historically been the sign of the top. Micron prints incredible earnings, stock sells off. Everyone rotates out. But the prepayment terms and pricing floors this cycle look nothing like prior rounds. Our Core Research team breaks down why they're still bullish.

Long-term memory agreements have historically been the sign of the top. Micron prints incredible earnings, stock sells off. Everyone rotates out. But the prepayment terms and pricing floors this cycle look nothing like prior rounds. Our Core Research team breaks down why they're still bullish.

96,287 просмотров • 3 месяцев назад

Between colleagues attending dates with robots and ishowspeed getting mogged by a rizzbot, man is one step away from humanoid kind. EPISODE 16 LIVE NOW!

Between colleagues attending dates with robots and ishowspeed getting mogged by a rizzbot, man is one step away from humanoid kind. EPISODE 16 LIVE NOW!

22,393 просмотров • 25 дней назад

can i get a KA-CHOW!

can i get a KA-CHOW!

65,661 просмотров • 3 месяцев назад

"I don't need to change the world overnight. I'm gonna change the world over the next 50 years" - Jensen Huang, former Denny’s dishwasher

"I don't need to change the world overnight. I'm gonna change the world over the next 50 years" - Jensen Huang, former Denny’s dishwasher

179,735 просмотров • 10 месяцев назад

Peter Griffin decodes the full AI Stack. AI Factory: power & water in, intelligence out.

Peter Griffin decodes the full AI Stack. AI Factory: power & water in, intelligence out.

197,561 просмотров • 11 месяцев назад