Video wird geladen...

Video konnte nicht geladen werden

Beim Laden dieses Videos ist ein Problem aufgetreten. Dies könnte an einem vorübergehenden Netzwerkproblem liegen oder das Video ist möglicherweise nicht verfügbar.

Inference Chips for Agent Workflows Diana Most AI chips are designed for "prompt in, response out." Agents don't work that way. They loop, branch, and hold context across dozens of steps, and current GPUs hit 30–40% utilization as a result. That gap is where purpose-built silicon wins.

Y Combinator

1,613,487 subscribers

706,963 Aufrufe • vor 1 Monat •via X (Twitter)

Bildung Wissenschaft & Technologie

Anya Rossi• Live Now

Private livecam show

0 Kommentare

Keine Kommentare verfügbar

Kommentare vom Original-Post werden hier angezeigt

Ähnliche Videos

Chips are the foundation of every AI experience. That's why understanding the hardware behind the software has never mattered more. When you ask a chatbot a question or generate an image, there's a physical chip somewhere doing the heavy lifting. For most of computing history, that was a CPU—the "brain" of a computer—great for general tasks needed to run software and operating systems. AI is more complex: for workloads called training and inference, AI needs to perform trillions of calculations in parallel. That's where AI accelerators come in. Purpose-built accelerators can deliver significantly better performance and efficiency than general-purpose chips. Amazon Web Services Trainium chips are an example—purpose-built for AI training and inference. New chips for a new era. ⬇️

Chips are the foundation of every AI experience. That's why understanding the hardware behind the software has never mattered more. When you ask a chatbot a question or generate an image, there's a physical chip somewhere doing the heavy lifting. For most of computing history, that was a CPU—the "brain" of a computer—great for general tasks needed to run software and operating systems. AI is more complex: for workloads called training and inference, AI needs to perform trillions of calculations in parallel. That's where AI accelerators come in. Purpose-built accelerators can deliver significantly better performance and efficiency than general-purpose chips. Amazon Web Services Trainium chips are an example—purpose-built for AI training and inference. New chips for a new era. ⬇️

Amazon

21,916 Aufrufe • vor 17 Tagen

Dojo 2, Dojo 3 Chips and AI5 Chips Are Critical for Tesla and Battle for Future of AI. Tesla and XAI are buying about one million AI GPUs this year. This would be about 20% of the entire Nvidia production of B200/B300 chips that will be made in 2025. If XAI were to buy a mix of half Dojo 2/3 chips in 2025 and 2026 this could make them bigger than AMD. AMD is making about 400,000 AI GPU chips. Tesla will switch to AI5 chips for driving its cars but those chips will also act as inference chips with about 500 Teraflops of compute. Tesla could make 3-4 million AI5 chips in 2026. This would make Tesla the largest maker and user of AI inference chips in the world. Tesla could become the number 2 maker of AI training chips just by supplying a lot of those chips to XAI and Tesla internal use. Tesla is and will be the number one maker and user of AI inference chips. Sawyer Merritt Elon Musk Randy Kirk Herbert Ong Warren Redlich - Chasing Dreams 🇺🇸 Ale𝕏andra Merz 🇺🇲 Ray Emmet Peppers Bradford Ferguson

Dojo 2, Dojo 3 Chips and AI5 Chips Are Critical for Tesla and Battle for Future of AI. Tesla and XAI are buying about one million AI GPUs this year. This would be about 20% of the entire Nvidia production of B200/B300 chips that will be made in 2025. If XAI were to buy a mix of half Dojo 2/3 chips in 2025 and 2026 this could make them bigger than AMD. AMD is making about 400,000 AI GPU chips. Tesla will switch to AI5 chips for driving its cars but those chips will also act as inference chips with about 500 Teraflops of compute. Tesla could make 3-4 million AI5 chips in 2026. This would make Tesla the largest maker and user of AI inference chips in the world. Tesla could become the number 2 maker of AI training chips just by supplying a lot of those chips to XAI and Tesla internal use. Tesla is and will be the number one maker and user of AI inference chips. Sawyer Merritt Elon Musk Randy Kirk Herbert Ong Warren Redlich - Chasing Dreams 🇺🇸 Ale𝕏andra Merz 🇺🇲 Ray Emmet Peppers Bradford Ferguson

nextbigfuture

23,809 Aufrufe • vor 1 Jahr

We are innovating the full-stack AI infrastructure platform on a one-year rhythm — pushing the boundaries of inference performance across chips, systems, and software. NVIDIA Blackwell is the most advanced AI platform ever built — continuously optimized for record-breaking performance. Groundbreaking advancements include the NVIDIA-designed NVFP4 format, enabling high inference performance and accuracy. #GTCParis at #VivaTech

We are innovating the full-stack AI infrastructure platform on a one-year rhythm — pushing the boundaries of inference performance across chips, systems, and software. NVIDIA Blackwell is the most advanced AI platform ever built — continuously optimized for record-breaking performance. Groundbreaking advancements include the NVIDIA-designed NVFP4 format, enabling high inference performance and accuracy. #GTCParis at #VivaTech

NVIDIA

80,342 Aufrufe • vor 1 Jahr

No wonder Jensen got so defensive over $NVDA GPUs against custom chips. $GOOG will start using different custom chips for inference and training. $AMZN has already adopted this approach as they have Trainium for training and Inferentia for inference. Hyperscalers obviously see better price/performance in this approach, and this is where the industry is headed. $AVGO and $MRVL are poised to be big winners, while $NVDA dominance will be under increasing pressure.

No wonder Jensen got so defensive over $NVDA GPUs against custom chips. $GOOG will start using different custom chips for inference and training. $AMZN has already adopted this approach as they have Trainium for training and Inferentia for inference. Hyperscalers obviously see better price/performance in this approach, and this is where the industry is headed. $AVGO and $MRVL are poised to be big winners, while $NVDA dominance will be under increasing pressure.

Oguz Erkan

223,244 Aufrufe • vor 1 Monat

Electronics in Space Philip Johnston Reusable rockets are about to dramatically increase humanity's capacity to put things in space, and that means an enormous new market for compute built to operate there. We want to see inference chips optimized for mass, thermal, and radiation tolerance, built for a world where space is no longer out of reach.

Electronics in Space Philip Johnston Reusable rockets are about to dramatically increase humanity's capacity to put things in space, and that means an enormous new market for compute built to operate there. We want to see inference chips optimized for mass, thermal, and radiation tolerance, built for a world where space is no longer out of reach.

Y Combinator

87,853 Aufrufe • vor 1 Monat

AI answers your questions in seconds, but behind that speed is something called inference—the compute-intensive process where trained models generate responses. At AWS, we've built custom chips like Trainium, intelligent routing systems, and unified infrastructure to make inference faster and more affordable. As AI agents handle complex multi-step tasks, inference accounts for 80-90% of AI computing power. We're engineering at planetary scale to keep those milliseconds reliable.

AI answers your questions in seconds, but behind that speed is something called inference—the compute-intensive process where trained models generate responses. At AWS, we've built custom chips like Trainium, intelligent routing systems, and unified infrastructure to make inference faster and more affordable. As AI agents handle complex multi-step tasks, inference accounts for 80-90% of AI computing power. We're engineering at planetary scale to keep those milliseconds reliable.

Amazon Web Services

27,021 Aufrufe • vor 3 Monaten

We are introducing Felix. Felix is a purpose-built agent for high finance, designed for long-running, complex workflows and capable of producing decks, models, and documents end-to-end. Felix executes so you can focus where it matters.

We are introducing Felix. Felix is a purpose-built agent for high finance, designed for long-running, complex workflows and capable of producing decks, models, and documents end-to-end. Felix executes so you can focus where it matters.

Rogo

908,689 Aufrufe • vor 1 Monat

NVIDIA Vera Rubin is in full production and arrives just in time for the next frontier of AI. The #NVIDIARubin platform uses extreme co-design across six new chips to accelerate agentic AI, advanced reasoning and massive-scale MoE model inference. NVIDIA Vera Rubin NVL72: Six new chips, one AI supercomputer — see how we built it.

NVIDIA Vera Rubin is in full production and arrives just in time for the next frontier of AI. The #NVIDIARubin platform uses extreme co-design across six new chips to accelerate agentic AI, advanced reasoning and massive-scale MoE model inference. NVIDIA Vera Rubin NVL72: Six new chips, one AI supercomputer — see how we built it.

NVIDIA Data Center

27,643 Aufrufe • vor 5 Monaten

Just watched a demo of Merge Agent Handler and this actually looks pretty useful if you’re building AI agents. Most agents can generate responses, but getting them to actually take actions across real tools is still messy. Integrations are one thing, but handling auth, permissions, and security usually makes it way more complicated than it should be. What I liked is that it feels built for real workflows — you can actually see what your agent is doing instead of just hoping it works. Feels like a practical layer between AI agents and real work getting done. Worth checking out:

Just watched a demo of Merge Agent Handler and this actually looks pretty useful if you’re building AI agents. Most agents can generate responses, but getting them to actually take actions across real tools is still messy. Integrations are one thing, but handling auth, permissions, and security usually makes it way more complicated than it should be. What I liked is that it feels built for real workflows — you can actually see what your agent is doing instead of just hoping it works. Feels like a practical layer between AI agents and real work getting done. Worth checking out:

Kawsar

31,812 Aufrufe • vor 3 Monaten

We are announcing new innovations across our datacenter fleet, including the latest AI optimized silicon from our industry partners and two new Microsoft-designed chips. #MSIgnite

We are announcing new innovations across our datacenter fleet, including the latest AI optimized silicon from our industry partners and two new Microsoft-designed chips. #MSIgnite

Microsoft

278,248 Aufrufe • vor 2 Jahren

AI agents are UNRELIABLE because you're giving them too much control. Instead of using a single prompt, structure your agent as a series of high level steps. In this demo, I'm running the same steps across two different websites. Good steps = good output. It's that simple!

AI agents are UNRELIABLE because you're giving them too much control. Instead of using a single prompt, structure your agent as a series of high level steps. In this demo, I'm running the same steps across two different websites. Good steps = good output. It's that simple!

Paul Klein IV

71,482 Aufrufe • vor 1 Jahr

SITUATION EXPLAINED: Cerebras raised $5.55 billion in their IPO and closing their first day of trading valued at $66 billion, making it the biggest US tech IPO since Snowflake in 2020. Cerebras makes Wafer-Scale Engine chips built for AI inference. We asked Sarah Fong the main difference between wafer-scale chips and traditional GPUs: - GPUs are great at parallel work (graphics, training) - AI inference is sequential, AKA one token at a time This causes the "memory wall" problem: - Every GPU core needs model weights, KV cache, and activations to do its math - On a GPU, that data lives in off-chip memory (HBM) - Cores constantly load and offload from off-chip memory, which is a huge bottleneck; hardware accounts for ~70% of inference latency Cerebras' chips: -Dinner-plate sized (vs. GPUs which are palm-sized) with tens of thousands of cores -Memory sits directly on top of the cores as distributed SRAM -Weights and KV cache can be accessed at on-chip speeds in the PB/s range, compared with off-chip speeds in the TB/s range achieved by GPUs with HBM.

SITUATION EXPLAINED: Cerebras raised $5.55 billion in their IPO and closing their first day of trading valued at $66 billion, making it the biggest US tech IPO since Snowflake in 2020. Cerebras makes Wafer-Scale Engine chips built for AI inference. We asked Sarah Fong the main difference between wafer-scale chips and traditional GPUs: - GPUs are great at parallel work (graphics, training) - AI inference is sequential, AKA one token at a time This causes the "memory wall" problem: - Every GPU core needs model weights, KV cache, and activations to do its math - On a GPU, that data lives in off-chip memory (HBM) - Cores constantly load and offload from off-chip memory, which is a huge bottleneck; hardware accounts for ~70% of inference latency Cerebras' chips: -Dinner-plate sized (vs. GPUs which are palm-sized) with tens of thousands of cores -Memory sits directly on top of the cores as distributed SRAM -Weights and KV cache can be accessed at on-chip speeds in the PB/s range, compared with off-chip speeds in the TB/s range achieved by GPUs with HBM.

MTS

44,057 Aufrufe • vor 25 Tagen

Folks are spending hundreds of dollars for AI agents that don't even do what they want. No longer. PRISM AI Agents is now available and open-source. Create your own agent in minutes, for free, directly from Github:

Folks are spending hundreds of dollars for AI agents that don't even do what they want. No longer. PRISM AI Agents is now available and open-source. Create your own agent in minutes, for free, directly from Github:

miles

126,894 Aufrufe • vor 1 Jahr

This is NVIDIA Rubin. Six new chips designed to deliver one incredible AI supercomputer. Built with extreme codesign across compute, networking, and software, Rubin sets a new standard for building and deploying the most advanced AI systems at the lowest possible cost. #CES2026

This is NVIDIA Rubin. Six new chips designed to deliver one incredible AI supercomputer. Built with extreme codesign across compute, networking, and software, Rubin sets a new standard for building and deploying the most advanced AI systems at the lowest possible cost. #CES2026

NVIDIA Newsroom

31,665 Aufrufe • vor 5 Monaten

$NVDA CEO Jensen Huang says GPUs are effectively sold out across the cloud with availability so tight that even renting older-generation chips has become difficult. Spot prices for GPU rentals are rising as a result which favors $NBIS, $IREN & $CIFR as demand pushes through limited capacity.

$NVDA CEO Jensen Huang says GPUs are effectively sold out across the cloud with availability so tight that even renting older-generation chips has become difficult. Spot prices for GPU rentals are rising as a result which favors $NBIS, $IREN & $CIFR as demand pushes through limited capacity.

Shay Boloor

212,563 Aufrufe • vor 4 Monaten

EIP-8004 is coming to the Nova architecture, a trustless infrastructure for AI agents that introduces key on-chain registries, enabling agents to interact safely across the Shido Network. These core components allow autonomous AI agents to verify identity, build reputation, and collaborate without relying on a centralized platform. The result is a decentralized trust layer for agent-to-agent economies, where agents can autonomously discover, evaluate, and work with one another across the Shido ecosystem.

EIP-8004 is coming to the Nova architecture, a trustless infrastructure for AI agents that introduces key on-chain registries, enabling agents to interact safely across the Shido Network. These core components allow autonomous AI agents to verify identity, build reputation, and collaborate without relying on a centralized platform. The result is a decentralized trust layer for agent-to-agent economies, where agents can autonomously discover, evaluate, and work with one another across the Shido ecosystem.

Shido

390,734 Aufrufe • vor 3 Monaten

$AMZN CEO: “If you are building a big inference business and want decent margings, not having your custom chips is a disadvantage.” 40% of $AMZN compute comes from its custom chips. Result? AWS margins climb higher while peers lag, according to SemiAnalysis.

$AMZN CEO: “If you are building a big inference business and want decent margings, not having your custom chips is a disadvantage.” 40% of $AMZN compute comes from its custom chips. Result? AWS margins climb higher while peers lag, according to SemiAnalysis.

Oguz Erkan

202,020 Aufrufe • vor 16 Tagen

Greg Brockman on AI-designed chips and the future of compute >openai used their own models to design their chips, faster and probably better than human engineers >in the future everyone would have an agent that works 24/7 for them. Restriction: compute, at the moment

Greg Brockman on AI-designed chips and the future of compute >openai used their own models to design their chips, faster and probably better than human engineers >in the future everyone would have an agent that works 24/7 for them. Restriction: compute, at the moment

Chubby♨️

209,854 Aufrufe • vor 8 Monaten

$AMZN CEO: “Customers badly want better price/performance. This is why we build custom chips. ” If $AMZN chip business was a standalone company, it would be at $50 billion ARR. Let that sink in. It’ll only accelerate from here as bulk of the AI workloads shift to inference and inference itself becomes more agentic. This will not only result in cheaper inference for customers, but it’ll also expand AWS margins by 7-8% as Jassy said in the earnings call. Imagine where $AMZN valuation would be if AWS growth keeps above 20% for the next decade and net margin stays above 30% thanks to custom chips. Long $AMZN.

$AMZN CEO: “Customers badly want better price/performance. This is why we build custom chips. ” If $AMZN chip business was a standalone company, it would be at $50 billion ARR. Let that sink in. It’ll only accelerate from here as bulk of the AI workloads shift to inference and inference itself becomes more agentic. This will not only result in cheaper inference for customers, but it’ll also expand AWS margins by 7-8% as Jassy said in the earnings call. Imagine where $AMZN valuation would be if AWS growth keeps above 20% for the next decade and net margin stays above 30% thanks to custom chips. Long $AMZN.

Oguz Erkan

61,999 Aufrufe • vor 1 Monat

Google’s new Agent2Agent protocol is a key step toward a future of AI Agent interoperability. No single tool has all the data a user or business needs for most workflows, so we need Agents to be able to talk to each other. Salesforce will have AI Agents that understand the inner workings of CRM, Workday will have AI Agents that understand HR workflows, Box has AI Agents that understand content and documents, and so on. It’s easy to conceive of a world where we have tens of thousands of these tool Agents and then billions or trillions of customized Agents that are extensions of those. So openness becomes key. Workflows need data from multiple systems to complete the task — like a sales report that needs documents and CRM data or an HR task that needs HR policies and employee details — and that’s where A2A comes in. This protocol gives AI Agent providers a way of talking to each other and simplifying the way that Agents communicate, understand each other’s capabilities, and so on. It’s great to see more interoperability, along with MCP, emerging in the AI space right now, as this will define the future of most IT stacks.

Google’s new Agent2Agent protocol is a key step toward a future of AI Agent interoperability. No single tool has all the data a user or business needs for most workflows, so we need Agents to be able to talk to each other. Salesforce will have AI Agents that understand the inner workings of CRM, Workday will have AI Agents that understand HR workflows, Box has AI Agents that understand content and documents, and so on. It’s easy to conceive of a world where we have tens of thousands of these tool Agents and then billions or trillions of customized Agents that are extensions of those. So openness becomes key. Workflows need data from multiple systems to complete the task — like a sales report that needs documents and CRM data or an HR task that needs HR policies and employee details — and that’s where A2A comes in. This protocol gives AI Agent providers a way of talking to each other and simplifying the way that Agents communicate, understand each other’s capabilities, and so on. It’s great to see more interoperability, along with MCP, emerging in the AI space right now, as this will define the future of most IT stacks.

Aaron Levie

139,591 Aufrufe • vor 1 Jahr