SemiAnalysis's banner
SemiAnalysis's profile picture

SemiAnalysis

@SemiAnalysis_93,698 subscribers

Shorts

hi i'm semiman

hi i'm semiman

51,782 просмотров

Jensen giving out Pocky sticks in Asia. Just like how some neoclouds feel that NVIDIA gives brownie points to neoclouds that are marketing or selling NVIDIA software that no AI-native end user wants, like MIG, NIMS, and NVAIE. Some neoclouds feel that NVIDIA gives brownie points to them for buying NVIDIA BlueField-3 DPUs even though they are nowhere near having the skillset to properly use them.

Jensen giving out Pocky sticks in Asia. Just like how some neoclouds feel that NVIDIA gives brownie points to neoclouds that are marketing or selling NVIDIA software that no AI-native end user wants, like MIG, NIMS, and NVAIE. Some neoclouds feel that NVIDIA gives brownie points to them for buying NVIDIA BlueField-3 DPUs even though they are nowhere near having the skillset to properly use them.

302,452 просмотров

TSMC 2nm process data leak investigation🧵 Taipei time August 5th, a trade secret leak case involving TSMC's 2nm technology was uncovered. 9 engineers were suspected of leaking confidential information, and 3 of them were detained by the Intellectual Property and Commercial Court.

TSMC 2nm process data leak investigation🧵 Taipei time August 5th, a trade secret leak case involving TSMC's 2nm technology was uncovered. 9 engineers were suspected of leaking confidential information, and 3 of them were detained by the Intellectual Property and Commercial Court.

351,136 просмотров

A Day in the Life of Dwarkesh Patel's Cousin

A Day in the Life of Dwarkesh Patel's Cousin

50,137 просмотров

Congrats to Dylan's Cousin, Dwarkesh Patel , on reaching 1 million subscribers on YouTube! Dylan & Dwarkesh Patel's family come from Gujarat but they grew up in America most of their lives mainly in Econo Lodge motels.

Congrats to Dylan's Cousin, Dwarkesh Patel , on reaching 1 million subscribers on YouTube! Dylan & Dwarkesh Patel's family come from Gujarat but they grew up in America most of their lives mainly in Econo Lodge motels.

112,546 просмотров

This is how OpenAI comes back and beats Anthropic

This is how OpenAI comes back and beats Anthropic

41,939 просмотров

AMD's software quality has massively improved since AMD DC GPU division went hardcore mode back in January 2025. It isn't just us saying this but many of AMD's Instinct GPU customers are saying this too. Great work to Anush Elangovan 's team of amazing engineers.🥳

AMD's software quality has massively improved since AMD DC GPU division went hardcore mode back in January 2025. It isn't just us saying this but many of AMD's Instinct GPU customers are saying this too. Great work to Anush Elangovan 's team of amazing engineers.🥳

65,099 просмотров

AMD claims that all their software is open source, yet reality does not match this claim. For example, AMD's rocprof-trace-decoder is still completely closed source currently despite repeated requests for months by ML community members such as George Hotz, who is a daily AMD GPU end user. On the same June Spotify podcast, Tobias Macey & AMD's Anush Elangovan (who has a new twitter pfp) said that "open source allows for innovation to go at the pace at which the people using [AMD] want to, right, and it's not limited by the ability of what we put out in closed source form". We agree with an open source first approach and we agree that AMD should not limit ML community members like George Hotz by continuing to keep rocprof-trace-decoder closed source. When will AMD open source rocprof-trace-decoder?

AMD claims that all their software is open source, yet reality does not match this claim. For example, AMD's rocprof-trace-decoder is still completely closed source currently despite repeated requests for months by ML community members such as George Hotz, who is a daily AMD GPU end user. On the same June Spotify podcast, Tobias Macey & AMD's Anush Elangovan (who has a new twitter pfp) said that "open source allows for innovation to go at the pace at which the people using [AMD] want to, right, and it's not limited by the ability of what we put out in closed source form". We agree with an open source first approach and we agree that AMD should not limit ML community members like George Hotz by continuing to keep rocprof-trace-decoder closed source. When will AMD open source rocprof-trace-decoder?

53,783 просмотров

A lot of AMD's most cracked engineers that work on the latest disagg inferencing & MoRI first princples collective library are in China celebrating major chinese holidays. Should we get human Anush Elangovan in an chinese dance battle verus AI?

A lot of AMD's most cracked engineers that work on the latest disagg inferencing & MoRI first princples collective library are in China celebrating major chinese holidays. Should we get human Anush Elangovan in an chinese dance battle verus AI?

21,470 просмотров

Something NVIDIA & Google do better than anyone else is software-hardware-system co-design, and not just optimizing hardware for current model architectures, but predicting future ones. Back in early 2022, when NVIDIA started the design process for NVL72, MoE (Mixture of Experts) models were not yet the standard, and dense models were still dominant for frontier models. However, NVIDIA's strong software-hardware co-design culture enabled them to make a calculated bet that MoEs were the future, and they built NVL72 specifically for best MoE performance per TCO (Total Cost of Ownership). Furthermore, back in 2022, disaggregated prefill and wide expert parallelism (wideEP) MoE inference optimizations hadn't been invented yet, but it turns out that these MoE inference optimizations work best on large-scale systems like NVL72. While most other AI chip companies' in-house AI labs focus on training small 5B models that mainly use data parallelism, NVIDIA and Google's in-house AI labs continuously push the boundaries of model architecture and training recipes, such as NVFP4 training. Just like Super Idol & IShowSpeed, there must be a strong partnership between software engineers and hardware engineers to deliver the best systems that maximize performance per TCO.

Something NVIDIA & Google do better than anyone else is software-hardware-system co-design, and not just optimizing hardware for current model architectures, but predicting future ones. Back in early 2022, when NVIDIA started the design process for NVL72, MoE (Mixture of Experts) models were not yet the standard, and dense models were still dominant for frontier models. However, NVIDIA's strong software-hardware co-design culture enabled them to make a calculated bet that MoEs were the future, and they built NVL72 specifically for best MoE performance per TCO (Total Cost of Ownership). Furthermore, back in 2022, disaggregated prefill and wide expert parallelism (wideEP) MoE inference optimizations hadn't been invented yet, but it turns out that these MoE inference optimizations work best on large-scale systems like NVL72. While most other AI chip companies' in-house AI labs focus on training small 5B models that mainly use data parallelism, NVIDIA and Google's in-house AI labs continuously push the boundaries of model architecture and training recipes, such as NVFP4 training. Just like Super Idol & IShowSpeed, there must be a strong partnership between software engineers and hardware engineers to deliver the best systems that maximize performance per TCO.

51,021 просмотров

Racks on racks on racks Racks on Racks, Buildin' Stacks Stackin' GPUs Power Whipping Liquid Flex Drip Check

Racks on racks on racks Racks on Racks, Buildin' Stacks Stackin' GPUs Power Whipping Liquid Flex Drip Check

68,561 просмотров

In late 2023, AMD made its best acquisition to date: NodAI, led by CEO Anush Elangovan. At the time, AMD had a 0% chance of challenging CUDA, while AMD was strong in hardware, it didn't understand software. Since the NodAI acquisition, Anush has driven AMD’s AI software strategy and helped reshape the org around the importance of software and software–hardware co-design. As a result, AMD now has a non-zero chance of breaking the CUDA moat. Had NVIDIA acquired NodAI instead, AMD would almost certainly still be stuck at a 0% chance.

In late 2023, AMD made its best acquisition to date: NodAI, led by CEO Anush Elangovan. At the time, AMD had a 0% chance of challenging CUDA, while AMD was strong in hardware, it didn't understand software. Since the NodAI acquisition, Anush has driven AMD’s AI software strategy and helped reshape the org around the importance of software and software–hardware co-design. As a result, AMD now has a non-zero chance of breaking the CUDA moat. Had NVIDIA acquired NodAI instead, AMD would almost certainly still be stuck at a 0% chance.

39,741 просмотров

MI355 disaggregated serving is competitive to B200 disaggregated serving for FP8 but when composability all the optimizations that frontier labs use together like wide expert parallelism+disagg+FP4+kvcache offloading AMD is still jestermaxxing & lackluster. King Anush Elangovan needs to focus on composability of inference optimizations for ROCm stack in order to defeat the CUDA moat

MI355 disaggregated serving is competitive to B200 disaggregated serving for FP8 but when composability all the optimizations that frontier labs use together like wide expert parallelism+disagg+FP4+kvcache offloading AMD is still jestermaxxing & lackluster. King Anush Elangovan needs to focus on composability of inference optimizations for ROCm stack in order to defeat the CUDA moat

22,416 просмотров

Since RCCL is an fork of NCCL, RCCL is basically a copy+paste carbon copy of NCCL except it takes months for new features added to NCCL to reach RCCL. This is clearly not optimal for AMD to be building on their competitors platform as that means that AMD will never be better than NVIDIA or even reach parity (when iso-time) due to the delays/engineering burden of syncing with upstream. AMD is working on an moonshot project cuz MORI-CCL aims to be an first principles from scratch re-built of the AMD collective library software to not be dependent of their competitor's software. Ironically enough with MORI-CCL, it currently doesn't support AMD's Pensando NICs yet it supports NVIDIA ConnectX-7 NICs with AMD GPUs. Support for AMD's Pensando NICs is coming after support for NVIDIA ConnectX-7 NICs.

Since RCCL is an fork of NCCL, RCCL is basically a copy+paste carbon copy of NCCL except it takes months for new features added to NCCL to reach RCCL. This is clearly not optimal for AMD to be building on their competitors platform as that means that AMD will never be better than NVIDIA or even reach parity (when iso-time) due to the delays/engineering burden of syncing with upstream. AMD is working on an moonshot project cuz MORI-CCL aims to be an first principles from scratch re-built of the AMD collective library software to not be dependent of their competitor's software. Ironically enough with MORI-CCL, it currently doesn't support AMD's Pensando NICs yet it supports NVIDIA ConnectX-7 NICs with AMD GPUs. Support for AMD's Pensando NICs is coming after support for NVIDIA ConnectX-7 NICs.

33,430 просмотров

We have been cookin with Seedance-2 🤤

We have been cookin with Seedance-2 🤤

19,502 просмотров

NVIDIA's CUDA Moat is becoming a Copper Moat with scale-up networking. Bigger models demand more from the scale-up network, and we❤️ Copper

NVIDIA's CUDA Moat is becoming a Copper Moat with scale-up networking. Bigger models demand more from the scale-up network, and we❤️ Copper

38,731 просмотров

Choose your fighter 😈

Choose your fighter 😈

15,389 просмотров

Peter Griffin explains Elon Musk 's first principle move when Tennessee pushed back on gas turbines for his AI datacenter, so Elon just built a gigawatt gas turbines across the border in Mississippi. Now Colossus-2 will be the first to 1GW

Peter Griffin explains Elon Musk 's first principle move when Tennessee pushed back on gas turbines for his AI datacenter, so Elon just built a gigawatt gas turbines across the border in Mississippi. Now Colossus-2 will be the first to 1GW

26,590 просмотров

Ai Accelerator Chips

Ai Accelerator Chips

26,198 просмотров

"Speed in The Moat" - Anush Elangovan Last year, AMD software is NP hard to use. After Anush & Lisa 10x the sense of urgency, the software is way better.

"Speed in The Moat" - Anush Elangovan Last year, AMD software is NP hard to use. After Anush & Lisa 10x the sense of urgency, the software is way better.

17,072 просмотров

Videos

SemiAnalysis_'s profile picture

to be clear, NVIDIA is NOT a car

SemiAnalysis

280,900 просмотров • 1 месяц назад

SemiAnalysis_'s profile picture

Technical breakdown of tokenizer improvements from GPT 4.6 to 4.7

SemiAnalysis

37,784 просмотров • 18 дней назад

SemiAnalysis_'s profile picture

can i get a KA-CHOW!

SemiAnalysis

65,289 просмотров • 1 месяц назад