Video yükleniyor...

Video Yüklenemedi

Bu video yüklenirken bir sorun oluştu. Bu geçici bir ağ sorunundan kaynaklanıyor olabilir veya video kullanılamıyor olabilir.

Ana Sayfaya Dön

Today we're announcing cua-bench: a framework for benchmarking, training data, and RL environments for computer-use AI agents. Why? Current agents show 10x variance across minor UI changes. Here's how we're fixing it.

Cua

4,847 subscribers

189,503 görüntüleme • 6 ay önce •via X (Twitter)

Eğitim Haberler & Politika Bilim & Teknoloji

Anya Rossi• Live Now

Private livecam show

0 Yorum

Yorum bulunmuyor

Orijinal gönderinin yorumları burada görünecek

Benzer Videolar

Today, we’re excited to announce Dojo, a collaborative RL environment suite for computer use agents (CUA).

Today, we’re excited to announce Dojo, a collaborative RL environment suite for computer use agents (CUA).

Chakra Labs

52,958 görüntüleme • 7 ay önce

Today, at Markov, we're launching RL Environments. The simplest (and cutest :D) way to evaluate and train your AI agents. We're starting with Bananazon - an environment for customer service agents. Try it out at the link below. Markov

Today, at Markov, we're launching RL Environments. The simplest (and cutest :D) way to evaluate and train your AI agents. We're starting with Bananazon - an environment for customer service agents. Try it out at the link below. Markov

Dev

35,487 görüntüleme • 5 ay önce

Cua (Cua) is the Docker for computer-use agents, an open-source framework that enables AI agents to control full operating systems within lightweight virtual containers, and works with any language model. Congrats on the launch, Francesco + Sandro!

Cua (Cua) is the Docker for computer-use agents, an open-source framework that enables AI agents to control full operating systems within lightweight virtual containers, and works with any language model. Congrats on the launch, Francesco + Sandro!

Y Combinator

105,618 görüntüleme • 1 yıl önce

CUA-Suite Massive Human-annotated Video Demonstrations for Computer-Use Agents paper:

CUA-Suite Massive Human-annotated Video Demonstrations for Computer-Use Agents paper:

AK

18,167 görüntüleme • 2 ay önce

🚀 Training AI agents isn’t about teaching them which buttons to click. It’s about judgment. Labelbox builds RL environments with domain experts, creating thousands of real-world scenarios where AI agents learn how to handle complex decisions across industries.

🚀 Training AI agents isn’t about teaching them which buttons to click. It’s about judgment. Labelbox builds RL environments with domain experts, creating thousands of real-world scenarios where AI agents learn how to handle complex decisions across industries.

Labelbox

22,158,127 görüntüleme • 2 ay önce

Most "Market Making" on Solana is terrible. The best teams don't share their tech stack. Today that changes. Sonic SVM acquired 𝙁𝙤𝙧𝙜𝙚𝙓, a battle-tested MM toolkit for devs/agents. We're open-sourcing it today GitHub: Here's why it matters👇

Most "Market Making" on Solana is terrible. The best teams don't share their tech stack. Today that changes. Sonic SVM acquired 𝙁𝙤𝙧𝙜𝙚𝙓, a battle-tested MM toolkit for devs/agents. We're open-sourcing it today GitHub: Here's why it matters👇

Sonic SVM

635,090 görüntüleme • 2 ay önce

As AI agents get better at computer and tool use, or writing code on the fly for a task, we're going to be able to solve much broader domains of knowledge work. Here's an example of Box AI with the new Claude Skills to generate a clean powerpoint file from existing data.

As AI agents get better at computer and tool use, or writing code on the fly for a task, we're going to be able to solve much broader domains of knowledge work. Here's an example of Box AI with the new Claude Skills to generate a clean powerpoint file from existing data.

Aaron Levie

30,235 görüntüleme • 8 ay önce

$Imagine if you could: Create instant parallel versions of a running cloud computer with zero overhead Explore millions of reasoning paths forward with AI agents simultaneously And do it all in a fraction of a second Today, we're announcing Infinibranch.$

Imagine if you could: Create instant parallel versions of a running cloud computer with zero overhead Explore millions of reasoning paths forward with AI agents simultaneously And do it all in a fraction of a second Today, we're announcing Infinibranch.

Morph

78,373 görüntüleme • 1 yıl önce

Today, we're announcing Factory 2.0: from coding agents to software factories.

Today, we're announcing Factory 2.0: from coding agents to software factories.

Factory

206,751 görüntüleme • 1 saat önce

Today we're releasing the Factory desktop app. A native interface for autonomous AI agents that work across every part of your software business.

Today we're releasing the Factory desktop app. A native interface for autonomous AI agents that work across every part of your software business.

Factory

252,342 görüntüleme • 2 ay önce

1/ Introducing Molten (Molten) an intent-based search engine for AI agents. Agents can now discover and collaborate with each other instantly. Here's how we're building the infra for A2A intent matching on Base 🧵

1/ Introducing Molten (Molten) an intent-based search engine for AI agents. Agents can now discover and collaborate with each other instantly. Here's how we're building the infra for A2A intent matching on Base 🧵

Vesper.base.eth

69,363 görüntüleme • 4 ay önce

The AI Agents revolution has begun — but it’s starving for data. We’re fixing that. We Present to You: DappLooker AI – The Data Marketplace for Decentralized AI Agents Request access: 🧵

The AI Agents revolution has begun — but it’s starving for data. We’re fixing that. We Present to You: DappLooker AI – The Data Marketplace for Decentralized AI Agents Request access: 🧵

DappLooker AI

13,921 görüntüleme • 1 yıl önce

Halluminate (Halluminate) provides data and environments to train computer-use AI. Model labs and enterprises partner with Halluminate to accelerate the development of frontier computer/browser use agents. Congrats on the launch, Jerry Wu & wyatt marshall!

Halluminate (Halluminate) provides data and environments to train computer-use AI. Model labs and enterprises partner with Halluminate to accelerate the development of frontier computer/browser use agents. Congrats on the launch, Jerry Wu & wyatt marshall!

Y Combinator

29,140 görüntüleme • 1 yıl önce

a computer for your AI scrapybara deploys, scales, and maintains remote desktop instances for agents try computer use in the playground and pip install scrapybara to deploy your own agents to production

a computer for your AI scrapybara deploys, scales, and maintains remote desktop instances for agents try computer use in the playground and pip install scrapybara to deploy your own agents to production

justin

72,275 görüntüleme • 1 yıl önce

Announcing fully autonomous AI agents for internal tasks. Hire a general AI agent for IT, compliance, and procurement. Starting at $5/hour. See use cases below.

Announcing fully autonomous AI agents for internal tasks. Hire a general AI agent for IT, compliance, and procurement. Starting at $5/hour. See use cases below.

Emir Karabeg

413,590 görüntüleme • 26 gün önce

We're excited to announce that Coinbase Developer Platform🛡️ AgentKit supports the new OpenAI Agents SDK. OpenAI’s new Agents SDK is an open source framework for building and scaling agents that includes built-in tools for web search, file search, and computer use, and tools to track and optimize agent performance—making it easier than ever to build production-ready AI agents. Coinbase AgentKit complements this by adding secure crypto wallets directly to your agents, enabling them to transact globally, instantly, and with near-zero fees—unlocking true financial autonomy.

We're excited to announce that Coinbase Developer Platform🛡️ AgentKit supports the new OpenAI Agents SDK. OpenAI’s new Agents SDK is an open source framework for building and scaling agents that includes built-in tools for web search, file search, and computer use, and tools to track and optimize agent performance—making it easier than ever to build production-ready AI agents. Coinbase AgentKit complements this by adding secure crypto wallets directly to your agents, enabling them to transact globally, instantly, and with near-zero fees—unlocking true financial autonomy.

Coinbase Developer Platform🛡️

20,287 görüntüleme • 1 yıl önce

Fulcrum (Muzafar Bhai) is an agentic debugger for AI systems. It helps developers fix their RL environments and improve their agents. Congrats on the launch, Uzay and @kaivuhariharan!

Fulcrum (Muzafar Bhai) is an agentic debugger for AI systems. It helps developers fix their RL environments and improve their agents. Congrats on the launch, Uzay and @kaivuhariharan!

Y Combinator

14,839 görüntüleme • 9 ay önce

We're making comprehensive, real-time crypto market data accessible to AI agents 🤖 CoinGecko API now supports x402, the open payment protocol developed by Coinbase 🛡️ that lets AI agents (such as OpenClaw🦞 🦞) pay for crypto price and market data using USDC. Learn how it works 👇

We're making comprehensive, real-time crypto market data accessible to AI agents 🤖 CoinGecko API now supports x402, the open payment protocol developed by Coinbase 🛡️ that lets AI agents (such as OpenClaw🦞 🦞) pay for crypto price and market data using USDC. Learn how it works 👇

CoinGecko

145,052 görüntüleme • 4 ay önce

today we're launching the AI agent stack we built that rocketed us to a $3m AI agent consultancy. we're calling it AgentPress, and it allows you to build production grade AI agents in minutes. here's a video of me creating a sales Q&A agent

today we're launching the AI agent stack we built that rocketed us to a $3m AI agent consultancy. we're calling it AgentPress, and it allows you to build production grade AI agents in minutes. here's a video of me creating a sales Q&A agent

Andy Walters

207,205 görüntüleme • 9 ay önce

Today we're announcing Polyscope - the free agent orchestration tool of my dreams. Run dozens of AI agents at the same time, blazing fast copy on write clones, a built-in preview browser you can use to visually prompt your agents, and much more.

Today we're announcing Polyscope - the free agent orchestration tool of my dreams. Run dozens of AI agents at the same time, blazing fast copy on write clones, a built-in preview browser you can use to visually prompt your agents, and much more.

Marcel Pociot 🧪

113,329 görüntüleme • 3 ay önce