Video wird geladen...

Video konnte nicht geladen werden

Beim Laden dieses Videos ist ein Problem aufgetreten. Dies könnte an einem vorübergehenden Netzwerkproblem liegen oder das Video ist möglicherweise nicht verfügbar.

Today we're announcing cua-bench: a framework for benchmarking, training data, and RL environments for computer-use AI agents. Why? Current agents show 10x variance across minor UI changes. Here's how we're fixing it.

Cua

4,847 subscribers

189,503 Aufrufe • vor 6 Monaten •via X (Twitter)

Bildung Nachrichten & Politik Wissenschaft & Technologie

Anya Rossi• Live Now

Private livecam show

0 Kommentare

Keine Kommentare verfügbar

Kommentare vom Original-Post werden hier angezeigt

Ähnliche Videos

Today, we’re excited to announce Dojo, a collaborative RL environment suite for computer use agents (CUA).

Today, we’re excited to announce Dojo, a collaborative RL environment suite for computer use agents (CUA).

Chakra Labs

52,958 Aufrufe • vor 7 Monaten

Today, at Markov, we're launching RL Environments. The simplest (and cutest :D) way to evaluate and train your AI agents. We're starting with Bananazon - an environment for customer service agents. Try it out at the link below. Markov

Today, at Markov, we're launching RL Environments. The simplest (and cutest :D) way to evaluate and train your AI agents. We're starting with Bananazon - an environment for customer service agents. Try it out at the link below. Markov

Dev

35,487 Aufrufe • vor 5 Monaten

Cua (Cua) is the Docker for computer-use agents, an open-source framework that enables AI agents to control full operating systems within lightweight virtual containers, and works with any language model. Congrats on the launch, Francesco + Sandro!

Cua (Cua) is the Docker for computer-use agents, an open-source framework that enables AI agents to control full operating systems within lightweight virtual containers, and works with any language model. Congrats on the launch, Francesco + Sandro!

Y Combinator

105,618 Aufrufe • vor 1 Jahr

CUA-Suite Massive Human-annotated Video Demonstrations for Computer-Use Agents paper:

CUA-Suite Massive Human-annotated Video Demonstrations for Computer-Use Agents paper:

AK

18,167 Aufrufe • vor 2 Monaten

🚀 Training AI agents isn’t about teaching them which buttons to click. It’s about judgment. Labelbox builds RL environments with domain experts, creating thousands of real-world scenarios where AI agents learn how to handle complex decisions across industries.

🚀 Training AI agents isn’t about teaching them which buttons to click. It’s about judgment. Labelbox builds RL environments with domain experts, creating thousands of real-world scenarios where AI agents learn how to handle complex decisions across industries.

Labelbox

22,158,127 Aufrufe • vor 2 Monaten

Most "Market Making" on Solana is terrible. The best teams don't share their tech stack. Today that changes. Sonic SVM acquired 𝙁𝙤𝙧𝙜𝙚𝙓, a battle-tested MM toolkit for devs/agents. We're open-sourcing it today GitHub: Here's why it matters👇

Most "Market Making" on Solana is terrible. The best teams don't share their tech stack. Today that changes. Sonic SVM acquired 𝙁𝙤𝙧𝙜𝙚𝙓, a battle-tested MM toolkit for devs/agents. We're open-sourcing it today GitHub: Here's why it matters👇

Sonic SVM

634,381 Aufrufe • vor 2 Monaten

As AI agents get better at computer and tool use, or writing code on the fly for a task, we're going to be able to solve much broader domains of knowledge work. Here's an example of Box AI with the new Claude Skills to generate a clean powerpoint file from existing data.

As AI agents get better at computer and tool use, or writing code on the fly for a task, we're going to be able to solve much broader domains of knowledge work. Here's an example of Box AI with the new Claude Skills to generate a clean powerpoint file from existing data.

Aaron Levie

30,235 Aufrufe • vor 8 Monaten

$Imagine if you could: Create instant parallel versions of a running cloud computer with zero overhead Explore millions of reasoning paths forward with AI agents simultaneously And do it all in a fraction of a second Today, we're announcing Infinibranch.$

Imagine if you could: Create instant parallel versions of a running cloud computer with zero overhead Explore millions of reasoning paths forward with AI agents simultaneously And do it all in a fraction of a second Today, we're announcing Infinibranch.

Morph

78,373 Aufrufe • vor 1 Jahr

Today we're releasing the Factory desktop app. A native interface for autonomous AI agents that work across every part of your software business.

Today we're releasing the Factory desktop app. A native interface for autonomous AI agents that work across every part of your software business.

Factory

252,151 Aufrufe • vor 2 Monaten

1/ Introducing Molten (Molten) an intent-based search engine for AI agents. Agents can now discover and collaborate with each other instantly. Here's how we're building the infra for A2A intent matching on Base 🧵

1/ Introducing Molten (Molten) an intent-based search engine for AI agents. Agents can now discover and collaborate with each other instantly. Here's how we're building the infra for A2A intent matching on Base 🧵

Vesper.base.eth

69,363 Aufrufe • vor 4 Monaten

The AI Agents revolution has begun — but it’s starving for data. We’re fixing that. We Present to You: DappLooker AI – The Data Marketplace for Decentralized AI Agents Request access: 🧵

The AI Agents revolution has begun — but it’s starving for data. We’re fixing that. We Present to You: DappLooker AI – The Data Marketplace for Decentralized AI Agents Request access: 🧵

DappLooker AI

13,921 Aufrufe • vor 1 Jahr

Halluminate (Halluminate) provides data and environments to train computer-use AI. Model labs and enterprises partner with Halluminate to accelerate the development of frontier computer/browser use agents. Congrats on the launch, Jerry Wu & wyatt marshall!

Halluminate (Halluminate) provides data and environments to train computer-use AI. Model labs and enterprises partner with Halluminate to accelerate the development of frontier computer/browser use agents. Congrats on the launch, Jerry Wu & wyatt marshall!

Y Combinator

29,140 Aufrufe • vor 1 Jahr

a computer for your AI scrapybara deploys, scales, and maintains remote desktop instances for agents try computer use in the playground and pip install scrapybara to deploy your own agents to production

a computer for your AI scrapybara deploys, scales, and maintains remote desktop instances for agents try computer use in the playground and pip install scrapybara to deploy your own agents to production

justin

72,275 Aufrufe • vor 1 Jahr

Announcing fully autonomous AI agents for internal tasks. Hire a general AI agent for IT, compliance, and procurement. Starting at $5/hour. See use cases below.

Announcing fully autonomous AI agents for internal tasks. Hire a general AI agent for IT, compliance, and procurement. Starting at $5/hour. See use cases below.

Emir Karabeg

413,590 Aufrufe • vor 25 Tagen

Fulcrum (Muzafar Bhai) is an agentic debugger for AI systems. It helps developers fix their RL environments and improve their agents. Congrats on the launch, Uzay and @kaivuhariharan!

Fulcrum (Muzafar Bhai) is an agentic debugger for AI systems. It helps developers fix their RL environments and improve their agents. Congrats on the launch, Uzay and @kaivuhariharan!

Y Combinator

14,839 Aufrufe • vor 9 Monaten

We're excited to announce that Coinbase Developer Platform🛡️ AgentKit supports the new OpenAI Agents SDK. OpenAI’s new Agents SDK is an open source framework for building and scaling agents that includes built-in tools for web search, file search, and computer use, and tools to track and optimize agent performance—making it easier than ever to build production-ready AI agents. Coinbase AgentKit complements this by adding secure crypto wallets directly to your agents, enabling them to transact globally, instantly, and with near-zero fees—unlocking true financial autonomy.

We're excited to announce that Coinbase Developer Platform🛡️ AgentKit supports the new OpenAI Agents SDK. OpenAI’s new Agents SDK is an open source framework for building and scaling agents that includes built-in tools for web search, file search, and computer use, and tools to track and optimize agent performance—making it easier than ever to build production-ready AI agents. Coinbase AgentKit complements this by adding secure crypto wallets directly to your agents, enabling them to transact globally, instantly, and with near-zero fees—unlocking true financial autonomy.

Coinbase Developer Platform🛡️

20,287 Aufrufe • vor 1 Jahr

We're making comprehensive, real-time crypto market data accessible to AI agents 🤖 CoinGecko API now supports x402, the open payment protocol developed by Coinbase 🛡️ that lets AI agents (such as OpenClaw🦞 🦞) pay for crypto price and market data using USDC. Learn how it works 👇

We're making comprehensive, real-time crypto market data accessible to AI agents 🤖 CoinGecko API now supports x402, the open payment protocol developed by Coinbase 🛡️ that lets AI agents (such as OpenClaw🦞 🦞) pay for crypto price and market data using USDC. Learn how it works 👇

CoinGecko

145,052 Aufrufe • vor 4 Monaten

today we're launching the AI agent stack we built that rocketed us to a $3m AI agent consultancy. we're calling it AgentPress, and it allows you to build production grade AI agents in minutes. here's a video of me creating a sales Q&A agent

today we're launching the AI agent stack we built that rocketed us to a $3m AI agent consultancy. we're calling it AgentPress, and it allows you to build production grade AI agents in minutes. here's a video of me creating a sales Q&A agent

Andy Walters

207,205 Aufrufe • vor 9 Monaten

Today we're announcing Polyscope - the free agent orchestration tool of my dreams. Run dozens of AI agents at the same time, blazing fast copy on write clones, a built-in preview browser you can use to visually prompt your agents, and much more.

Today we're announcing Polyscope - the free agent orchestration tool of my dreams. Run dozens of AI agents at the same time, blazing fast copy on write clones, a built-in preview browser you can use to visually prompt your agents, and much more.

Marcel Pociot 🧪

113,329 Aufrufe • vor 3 Monaten

Introducing @Orgo Orgo provides virtual computers for AI agents. Build computer-using agents and deploy them across thousands of instances in seconds. Now available for developers.

Introducing @Orgo Orgo provides virtual computers for AI agents. Build computer-using agents and deploy them across thousands of instances in seconds. Now available for developers.

Spencer Kinney

164,953 Aufrufe • vor 7 Monaten