正在加载视频...

视频加载失败

加载此视频时出现问题。这可能是由于临时网络问题，或视频可能不可用。

Today we're announcing cua-bench: a framework for benchmarking, training data, and RL environments for computer-use AI agents. Why? Current agents show 10x variance across minor UI changes. Here's how we're fixing it.

Cua

4,847 subscribers

189,503 次观看 • 6 个月前 •via X (Twitter)

教育新闻政治科学技术

Anya Rossi• Live Now

Private livecam show

0 条评论

暂无评论

原始帖子的评论将显示在这里

相关视频

Today, we’re excited to announce Dojo, a collaborative RL environment suite for computer use agents (CUA).

Today, we’re excited to announce Dojo, a collaborative RL environment suite for computer use agents (CUA).

Chakra Labs

53,062 次观看 • 7 个月前

Today, at Markov, we're launching RL Environments. The simplest (and cutest :D) way to evaluate and train your AI agents. We're starting with Bananazon - an environment for customer service agents. Try it out at the link below. Markov

Today, at Markov, we're launching RL Environments. The simplest (and cutest :D) way to evaluate and train your AI agents. We're starting with Bananazon - an environment for customer service agents. Try it out at the link below. Markov

Dev

35,487 次观看 • 6 个月前

Cua (Cua) is the Docker for computer-use agents, an open-source framework that enables AI agents to control full operating systems within lightweight virtual containers, and works with any language model. Congrats on the launch, Francesco + Sandro!

Cua (Cua) is the Docker for computer-use agents, an open-source framework that enables AI agents to control full operating systems within lightweight virtual containers, and works with any language model. Congrats on the launch, Francesco + Sandro!

Y Combinator

105,618 次观看 • 1 年前

CUA-Suite Massive Human-annotated Video Demonstrations for Computer-Use Agents paper:

CUA-Suite Massive Human-annotated Video Demonstrations for Computer-Use Agents paper:

AK

18,167 次观看 • 2 个月前

🚀 Training AI agents isn’t about teaching them which buttons to click. It’s about judgment. Labelbox builds RL environments with domain experts, creating thousands of real-world scenarios where AI agents learn how to handle complex decisions across industries.

🚀 Training AI agents isn’t about teaching them which buttons to click. It’s about judgment. Labelbox builds RL environments with domain experts, creating thousands of real-world scenarios where AI agents learn how to handle complex decisions across industries.

Labelbox

22,158,127 次观看 • 2 个月前

As AI agents get better at computer and tool use, or writing code on the fly for a task, we're going to be able to solve much broader domains of knowledge work. Here's an example of Box AI with the new Claude Skills to generate a clean powerpoint file from existing data.

As AI agents get better at computer and tool use, or writing code on the fly for a task, we're going to be able to solve much broader domains of knowledge work. Here's an example of Box AI with the new Claude Skills to generate a clean powerpoint file from existing data.

Aaron Levie

30,235 次观看 • 8 个月前

Most "Market Making" on Solana is terrible. The best teams don't share their tech stack. Today that changes. Sonic SVM acquired 𝙁𝙤𝙧𝙜𝙚𝙓, a battle-tested MM toolkit for devs/agents. We're open-sourcing it today GitHub: Here's why it matters👇

Most "Market Making" on Solana is terrible. The best teams don't share their tech stack. Today that changes. Sonic SVM acquired 𝙁𝙤𝙧𝙜𝙚𝙓, a battle-tested MM toolkit for devs/agents. We're open-sourcing it today GitHub: Here's why it matters👇

Sonic SVM

640,868 次观看 • 2 个月前

$Imagine if you could: Create instant parallel versions of a running cloud computer with zero overhead Explore millions of reasoning paths forward with AI agents simultaneously And do it all in a fraction of a second Today, we're announcing Infinibranch.$

Imagine if you could: Create instant parallel versions of a running cloud computer with zero overhead Explore millions of reasoning paths forward with AI agents simultaneously And do it all in a fraction of a second Today, we're announcing Infinibranch.

Morph

78,373 次观看 • 1 年前

Today, we're announcing Factory 2.0: from coding agents to software factories.

Today, we're announcing Factory 2.0: from coding agents to software factories.

Factory

1,042,819 次观看 • 3 天前

Today we're releasing the Factory desktop app. A native interface for autonomous AI agents that work across every part of your software business.

Today we're releasing the Factory desktop app. A native interface for autonomous AI agents that work across every part of your software business.

Factory

252,375 次观看 • 2 个月前

1/ Introducing Molten (Molten) an intent-based search engine for AI agents. Agents can now discover and collaborate with each other instantly. Here's how we're building the infra for A2A intent matching on Base 🧵

1/ Introducing Molten (Molten) an intent-based search engine for AI agents. Agents can now discover and collaborate with each other instantly. Here's how we're building the infra for A2A intent matching on Base 🧵

Vesper.base.eth

69,363 次观看 • 4 个月前

The AI Agents revolution has begun — but it’s starving for data. We’re fixing that. We Present to You: DappLooker AI – The Data Marketplace for Decentralized AI Agents Request access: 🧵

The AI Agents revolution has begun — but it’s starving for data. We’re fixing that. We Present to You: DappLooker AI – The Data Marketplace for Decentralized AI Agents Request access: 🧵

DappLooker AI

13,921 次观看 • 1 年前

Halluminate (Halluminate) provides data and environments to train computer-use AI. Model labs and enterprises partner with Halluminate to accelerate the development of frontier computer/browser use agents. Congrats on the launch, Jerry Wu & wyatt marshall!

Halluminate (Halluminate) provides data and environments to train computer-use AI. Model labs and enterprises partner with Halluminate to accelerate the development of frontier computer/browser use agents. Congrats on the launch, Jerry Wu & wyatt marshall!

Y Combinator

29,140 次观看 • 1 年前

a computer for your AI scrapybara deploys, scales, and maintains remote desktop instances for agents try computer use in the playground and pip install scrapybara to deploy your own agents to production

a computer for your AI scrapybara deploys, scales, and maintains remote desktop instances for agents try computer use in the playground and pip install scrapybara to deploy your own agents to production

justin

72,275 次观看 • 1 年前

Announcing fully autonomous AI agents for internal tasks. Hire a general AI agent for IT, compliance, and procurement. Starting at $5/hour. See use cases below.

Announcing fully autonomous AI agents for internal tasks. Hire a general AI agent for IT, compliance, and procurement. Starting at $5/hour. See use cases below.

Emir Karabeg

413,798 次观看 • 29 天前

We're excited to announce that Coinbase Developer Platform🛡️ AgentKit supports the new OpenAI Agents SDK. OpenAI’s new Agents SDK is an open source framework for building and scaling agents that includes built-in tools for web search, file search, and computer use, and tools to track and optimize agent performance—making it easier than ever to build production-ready AI agents. Coinbase AgentKit complements this by adding secure crypto wallets directly to your agents, enabling them to transact globally, instantly, and with near-zero fees—unlocking true financial autonomy.

We're excited to announce that Coinbase Developer Platform🛡️ AgentKit supports the new OpenAI Agents SDK. OpenAI’s new Agents SDK is an open source framework for building and scaling agents that includes built-in tools for web search, file search, and computer use, and tools to track and optimize agent performance—making it easier than ever to build production-ready AI agents. Coinbase AgentKit complements this by adding secure crypto wallets directly to your agents, enabling them to transact globally, instantly, and with near-zero fees—unlocking true financial autonomy.

Coinbase Developer Platform🛡️

20,287 次观看 • 1 年前

We're making comprehensive, real-time crypto market data accessible to AI agents 🤖 CoinGecko API now supports x402, the open payment protocol developed by Coinbase 🛡️ that lets AI agents (such as OpenClaw🦞 🦞) pay for crypto price and market data using USDC. Learn how it works 👇

We're making comprehensive, real-time crypto market data accessible to AI agents 🤖 CoinGecko API now supports x402, the open payment protocol developed by Coinbase 🛡️ that lets AI agents (such as OpenClaw🦞 🦞) pay for crypto price and market data using USDC. Learn how it works 👇

CoinGecko

145,093 次观看 • 4 个月前

Fulcrum (Muzafar Bhai) is an agentic debugger for AI systems. It helps developers fix their RL environments and improve their agents. Congrats on the launch, Uzay and @kaivuhariharan!

Fulcrum (Muzafar Bhai) is an agentic debugger for AI systems. It helps developers fix their RL environments and improve their agents. Congrats on the launch, Uzay and @kaivuhariharan!

Y Combinator

14,839 次观看 • 9 个月前

today we're launching the AI agent stack we built that rocketed us to a $3m AI agent consultancy. we're calling it AgentPress, and it allows you to build production grade AI agents in minutes. here's a video of me creating a sales Q&A agent

today we're launching the AI agent stack we built that rocketed us to a $3m AI agent consultancy. we're calling it AgentPress, and it allows you to build production grade AI agents in minutes. here's a video of me creating a sales Q&A agent

Andy Walters

207,205 次观看 • 9 个月前

Today we're announcing Polyscope - the free agent orchestration tool of my dreams. Run dozens of AI agents at the same time, blazing fast copy on write clones, a built-in preview browser you can use to visually prompt your agents, and much more.

Today we're announcing Polyscope - the free agent orchestration tool of my dreams. Run dozens of AI agents at the same time, blazing fast copy on write clones, a built-in preview browser you can use to visually prompt your agents, and much more.

Marcel Pociot 🧪

113,498 次观看 • 3 个月前