Loading video...

Video Failed to Load

There was a problem loading this video. This could be due to a temporary network issue or the video might be unavailable.

Today we're announcing cua-bench: a framework for benchmarking, training data, and RL environments for computer-use AI agents. Why? Current agents show 10x variance across minor UI changes. Here's how we're fixing it.

Cua

4,847 subscribers

189,503 views • 6 months ago •via X (Twitter)

Education News & Politics Science & Technology

Anya Rossi• Live Now

Private livecam show

0 Comments

No comments available

Comments from the original post will appear here

Related Videos

Today, we’re excited to announce Dojo, a collaborative RL environment suite for computer use agents (CUA).

Today, we’re excited to announce Dojo, a collaborative RL environment suite for computer use agents (CUA).

Chakra Labs

53,062 views • 7 months ago

Today, at Markov, we're launching RL Environments. The simplest (and cutest :D) way to evaluate and train your AI agents. We're starting with Bananazon - an environment for customer service agents. Try it out at the link below. Markov

Today, at Markov, we're launching RL Environments. The simplest (and cutest :D) way to evaluate and train your AI agents. We're starting with Bananazon - an environment for customer service agents. Try it out at the link below. Markov

Dev

35,487 views • 6 months ago

Cua (Cua) is the Docker for computer-use agents, an open-source framework that enables AI agents to control full operating systems within lightweight virtual containers, and works with any language model. Congrats on the launch, Francesco + Sandro!

Cua (Cua) is the Docker for computer-use agents, an open-source framework that enables AI agents to control full operating systems within lightweight virtual containers, and works with any language model. Congrats on the launch, Francesco + Sandro!

Y Combinator

105,618 views • 1 year ago

CUA-Suite Massive Human-annotated Video Demonstrations for Computer-Use Agents paper:

CUA-Suite Massive Human-annotated Video Demonstrations for Computer-Use Agents paper:

AK

18,167 views • 2 months ago

🚀 Training AI agents isn’t about teaching them which buttons to click. It’s about judgment. Labelbox builds RL environments with domain experts, creating thousands of real-world scenarios where AI agents learn how to handle complex decisions across industries.

🚀 Training AI agents isn’t about teaching them which buttons to click. It’s about judgment. Labelbox builds RL environments with domain experts, creating thousands of real-world scenarios where AI agents learn how to handle complex decisions across industries.

Labelbox

22,158,127 views • 2 months ago

As AI agents get better at computer and tool use, or writing code on the fly for a task, we're going to be able to solve much broader domains of knowledge work. Here's an example of Box AI with the new Claude Skills to generate a clean powerpoint file from existing data.

As AI agents get better at computer and tool use, or writing code on the fly for a task, we're going to be able to solve much broader domains of knowledge work. Here's an example of Box AI with the new Claude Skills to generate a clean powerpoint file from existing data.

Aaron Levie

30,235 views • 8 months ago

Most "Market Making" on Solana is terrible. The best teams don't share their tech stack. Today that changes. Sonic SVM acquired 𝙁𝙤𝙧𝙜𝙚𝙓, a battle-tested MM toolkit for devs/agents. We're open-sourcing it today GitHub: Here's why it matters👇

Most "Market Making" on Solana is terrible. The best teams don't share their tech stack. Today that changes. Sonic SVM acquired 𝙁𝙤𝙧𝙜𝙚𝙓, a battle-tested MM toolkit for devs/agents. We're open-sourcing it today GitHub: Here's why it matters👇

Sonic SVM

638,203 views • 2 months ago

$Imagine if you could: Create instant parallel versions of a running cloud computer with zero overhead Explore millions of reasoning paths forward with AI agents simultaneously And do it all in a fraction of a second Today, we're announcing Infinibranch.$

Imagine if you could: Create instant parallel versions of a running cloud computer with zero overhead Explore millions of reasoning paths forward with AI agents simultaneously And do it all in a fraction of a second Today, we're announcing Infinibranch.

Morph

78,373 views • 1 year ago

Today we're releasing the Factory desktop app. A native interface for autonomous AI agents that work across every part of your software business.

Today we're releasing the Factory desktop app. A native interface for autonomous AI agents that work across every part of your software business.

Factory

252,375 views • 2 months ago

Today, we're announcing Factory 2.0: from coding agents to software factories.

Today, we're announcing Factory 2.0: from coding agents to software factories.

Factory

1,037,652 views • 2 days ago

1/ Introducing Molten (Molten) an intent-based search engine for AI agents. Agents can now discover and collaborate with each other instantly. Here's how we're building the infra for A2A intent matching on Base 🧵

1/ Introducing Molten (Molten) an intent-based search engine for AI agents. Agents can now discover and collaborate with each other instantly. Here's how we're building the infra for A2A intent matching on Base 🧵

Vesper.base.eth

69,363 views • 4 months ago

The AI Agents revolution has begun — but it’s starving for data. We’re fixing that. We Present to You: DappLooker AI – The Data Marketplace for Decentralized AI Agents Request access: 🧵

The AI Agents revolution has begun — but it’s starving for data. We’re fixing that. We Present to You: DappLooker AI – The Data Marketplace for Decentralized AI Agents Request access: 🧵

DappLooker AI

13,921 views • 1 year ago

Halluminate (Halluminate) provides data and environments to train computer-use AI. Model labs and enterprises partner with Halluminate to accelerate the development of frontier computer/browser use agents. Congrats on the launch, Jerry Wu & wyatt marshall!

Halluminate (Halluminate) provides data and environments to train computer-use AI. Model labs and enterprises partner with Halluminate to accelerate the development of frontier computer/browser use agents. Congrats on the launch, Jerry Wu & wyatt marshall!

Y Combinator

29,140 views • 1 year ago

a computer for your AI scrapybara deploys, scales, and maintains remote desktop instances for agents try computer use in the playground and pip install scrapybara to deploy your own agents to production

a computer for your AI scrapybara deploys, scales, and maintains remote desktop instances for agents try computer use in the playground and pip install scrapybara to deploy your own agents to production

justin

72,275 views • 1 year ago

Announcing fully autonomous AI agents for internal tasks. Hire a general AI agent for IT, compliance, and procurement. Starting at $5/hour. See use cases below.

Announcing fully autonomous AI agents for internal tasks. Hire a general AI agent for IT, compliance, and procurement. Starting at $5/hour. See use cases below.

Emir Karabeg

413,798 views • 28 days ago

We're excited to announce that Coinbase Developer Platform🛡️ AgentKit supports the new OpenAI Agents SDK. OpenAI’s new Agents SDK is an open source framework for building and scaling agents that includes built-in tools for web search, file search, and computer use, and tools to track and optimize agent performance—making it easier than ever to build production-ready AI agents. Coinbase AgentKit complements this by adding secure crypto wallets directly to your agents, enabling them to transact globally, instantly, and with near-zero fees—unlocking true financial autonomy.

We're excited to announce that Coinbase Developer Platform🛡️ AgentKit supports the new OpenAI Agents SDK. OpenAI’s new Agents SDK is an open source framework for building and scaling agents that includes built-in tools for web search, file search, and computer use, and tools to track and optimize agent performance—making it easier than ever to build production-ready AI agents. Coinbase AgentKit complements this by adding secure crypto wallets directly to your agents, enabling them to transact globally, instantly, and with near-zero fees—unlocking true financial autonomy.

Coinbase Developer Platform🛡️

20,287 views • 1 year ago

Fulcrum (Muzafar Bhai) is an agentic debugger for AI systems. It helps developers fix their RL environments and improve their agents. Congrats on the launch, Uzay and @kaivuhariharan!

Fulcrum (Muzafar Bhai) is an agentic debugger for AI systems. It helps developers fix their RL environments and improve their agents. Congrats on the launch, Uzay and @kaivuhariharan!

Y Combinator

14,839 views • 9 months ago

We're making comprehensive, real-time crypto market data accessible to AI agents 🤖 CoinGecko API now supports x402, the open payment protocol developed by Coinbase 🛡️ that lets AI agents (such as OpenClaw🦞 🦞) pay for crypto price and market data using USDC. Learn how it works 👇

We're making comprehensive, real-time crypto market data accessible to AI agents 🤖 CoinGecko API now supports x402, the open payment protocol developed by Coinbase 🛡️ that lets AI agents (such as OpenClaw🦞 🦞) pay for crypto price and market data using USDC. Learn how it works 👇

CoinGecko

145,052 views • 4 months ago

today we're launching the AI agent stack we built that rocketed us to a $3m AI agent consultancy. we're calling it AgentPress, and it allows you to build production grade AI agents in minutes. here's a video of me creating a sales Q&A agent

today we're launching the AI agent stack we built that rocketed us to a $3m AI agent consultancy. we're calling it AgentPress, and it allows you to build production grade AI agents in minutes. here's a video of me creating a sales Q&A agent

Andy Walters

207,205 views • 9 months ago

Today we're announcing Polyscope - the free agent orchestration tool of my dreams. Run dozens of AI agents at the same time, blazing fast copy on write clones, a built-in preview browser you can use to visually prompt your agents, and much more.

Today we're announcing Polyscope - the free agent orchestration tool of my dreams. Run dozens of AI agents at the same time, blazing fast copy on write clones, a built-in preview browser you can use to visually prompt your agents, and much more.

Marcel Pociot 🧪

113,498 views • 3 months ago