正在加载视频...

视频加载失败

加载此视频时出现问题。这可能是由于临时网络问题，或视频可能不可用。

Today we're announcing cua-bench: a framework for benchmarking, training data, and RL environments for computer-use AI agents. Why? Current agents show 10x variance across minor UI changes. Here's how we're fixing it.

Cua

9,436 subscribers

189,823 次观看 • 7 个月前 •via X (Twitter)

教育新闻政治科学技术

Anya Rossi• Live Now

Private livecam show

0 条评论

暂无评论

原始帖子的评论将显示在这里

相关视频

Today, we’re excited to announce Dojo, a collaborative RL environment suite for computer use agents (CUA).

Today, we’re excited to announce Dojo, a collaborative RL environment suite for computer use agents (CUA).

Chakra Labs

53,062 次观看 • 8 个月前

Cua (Cua) is the Docker for computer-use agents, an open-source framework that enables AI agents to control full operating systems within lightweight virtual containers, and works with any language model. Congrats on the launch, Francesco + Sandro!

Cua (Cua) is the Docker for computer-use agents, an open-source framework that enables AI agents to control full operating systems within lightweight virtual containers, and works with any language model. Congrats on the launch, Francesco + Sandro!

Y Combinator

105,618 次观看 • 1 年前

Most "Market Making" on Solana is terrible. The best teams don't share their tech stack. Today that changes. Sonic SVM acquired 𝙁𝙤𝙧𝙜𝙚𝙓, a battle-tested MM toolkit for devs/agents. We're open-sourcing it today GitHub: Here's why it matters👇

Most "Market Making" on Solana is terrible. The best teams don't share their tech stack. Today that changes. Sonic SVM acquired 𝙁𝙤𝙧𝙜𝙚𝙓, a battle-tested MM toolkit for devs/agents. We're open-sourcing it today GitHub: Here's why it matters👇

Sonic SVM

677,111 次观看 • 3 个月前

As AI agents get better at computer and tool use, or writing code on the fly for a task, we're going to be able to solve much broader domains of knowledge work. Here's an example of Box AI with the new Claude Skills to generate a clean powerpoint file from existing data.

As AI agents get better at computer and tool use, or writing code on the fly for a task, we're going to be able to solve much broader domains of knowledge work. Here's an example of Box AI with the new Claude Skills to generate a clean powerpoint file from existing data.

Aaron Levie

30,235 次观看 • 9 个月前

Today we're releasing the Factory desktop app. A native interface for autonomous AI agents that work across every part of your software business.

Today we're releasing the Factory desktop app. A native interface for autonomous AI agents that work across every part of your software business.

Factory

252,684 次观看 • 3 个月前

1/ Introducing Molten (@moltenagentic) an intent-based search engine for AI agents. Agents can now discover and collaborate with each other instantly. Here's how we're building the infra for A2A intent matching on Base 🧵

1/ Introducing Molten (@moltenagentic) an intent-based search engine for AI agents. Agents can now discover and collaborate with each other instantly. Here's how we're building the infra for A2A intent matching on Base 🧵

Vesper

72,773 次观看 • 5 个月前

Announcing fully autonomous AI agents for internal tasks. Hire a general AI agent for IT, compliance, and procurement. Starting at $5/hour. See use cases below.

Announcing fully autonomous AI agents for internal tasks. Hire a general AI agent for IT, compliance, and procurement. Starting at $5/hour. See use cases below.

Emir Karabeg

414,644 次观看 • 2 个月前

We're excited to announce that Coinbase Developer Platform🛡️ AgentKit supports the new OpenAI Agents SDK. OpenAI’s new Agents SDK is an open source framework for building and scaling agents that includes built-in tools for web search, file search, and computer use, and tools to track and optimize agent performance—making it easier than ever to build production-ready AI agents. Coinbase AgentKit complements this by adding secure crypto wallets directly to your agents, enabling them to transact globally, instantly, and with near-zero fees—unlocking true financial autonomy.

We're excited to announce that Coinbase Developer Platform🛡️ AgentKit supports the new OpenAI Agents SDK. OpenAI’s new Agents SDK is an open source framework for building and scaling agents that includes built-in tools for web search, file search, and computer use, and tools to track and optimize agent performance—making it easier than ever to build production-ready AI agents. Coinbase AgentKit complements this by adding secure crypto wallets directly to your agents, enabling them to transact globally, instantly, and with near-zero fees—unlocking true financial autonomy.

Coinbase Developer Platform🛡️

20,287 次观看 • 1 年前

We're making comprehensive, real-time crypto market data accessible to AI agents 🤖 CoinGecko API now supports x402, the open payment protocol developed by Coinbase 🛡️ that lets AI agents (such as OpenClaw🦞 🦞) pay for crypto price and market data using USDC. Learn how it works 👇

We're making comprehensive, real-time crypto market data accessible to AI agents 🤖 CoinGecko API now supports x402, the open payment protocol developed by Coinbase 🛡️ that lets AI agents (such as OpenClaw🦞 🦞) pay for crypto price and market data using USDC. Learn how it works 👇

CoinGecko

145,209 次观看 • 5 个月前

today we're launching the AI agent stack we built that rocketed us to a $3m AI agent consultancy. we're calling it AgentPress, and it allows you to build production grade AI agents in minutes. here's a video of me creating a sales Q&A agent

today we're launching the AI agent stack we built that rocketed us to a $3m AI agent consultancy. we're calling it AgentPress, and it allows you to build production grade AI agents in minutes. here's a video of me creating a sales Q&A agent

Andy Walters

207,205 次观看 • 10 个月前

Today we're announcing Polyscope - the free agent orchestration tool of my dreams. Run dozens of AI agents at the same time, blazing fast copy on write clones, a built-in preview browser you can use to visually prompt your agents, and much more.

Today we're announcing Polyscope - the free agent orchestration tool of my dreams. Run dozens of AI agents at the same time, blazing fast copy on write clones, a built-in preview browser you can use to visually prompt your agents, and much more.

Marcel Pociot 🧪

114,388 次观看 • 4 个月前

Open Computer Agent - LLMs completing tasks using a VM. It's playground to test how well current LLM agents use a computer to solve everyday tasks. And this is just the start - very soon models will be 10x faster and 10x better at it! ❤️ built with e2b x qwen2.5-vl x smolagent

Open Computer Agent - LLMs completing tasks using a VM. It's playground to test how well current LLM agents use a computer to solve everyday tasks. And this is just the start - very soon models will be 10x faster and 10x better at it! ❤️ built with e2b x qwen2.5-vl x smolagent

Leandro von Werra

17,126 次观看 • 1 年前

Everyone's building AI agents for crypto. Almost nobody's talking about the infrastructure problem. AI agents need to move faster than humans. But most blockchains are built for humans. Here's why that's a massive issue (and how Supra fixed it):

Everyone's building AI agents for crypto. Almost nobody's talking about the infrastructure problem. AI agents need to move faster than humans. But most blockchains are built for humans. Here's why that's a massive issue (and how Supra fixed it):

Supra

15,795 次观看 • 6 个月前

Excited to introduce micro1 Cortex, a contextual evaluation, visibility, and improvement platform for enterprise AI agents. Foundational models are trained for general intelligence, but enterprises need agents that perform reliably inside their unique context: workflows, policies, data environments, and edge cases. Cortex brings trust to enterprise AI by leveraging domain experts and real-world scenarios for any use case to test, diagnose, and improve how agents behave in production.

Excited to introduce micro1 Cortex, a contextual evaluation, visibility, and improvement platform for enterprise AI agents. Foundational models are trained for general intelligence, but enterprises need agents that perform reliably inside their unique context: workflows, policies, data environments, and edge cases. Cortex brings trust to enterprise AI by leveraging domain experts and real-world scenarios for any use case to test, diagnose, and improve how agents behave in production.

Ali Ansari

117,975 次观看 • 4 个月前

We've heard lots of idea for canvas + AI agents, so we're building a starter kit that will set you up with tldraw's SDK, our AI module, and a shamelessly copied chat UI.

We've heard lots of idea for canvas + AI agents, so we're building a starter kit that will set you up with tldraw's SDK, our AI module, and a shamelessly copied chat UI.

tldraw

14,820 次观看 • 1 年前

In the latest Greylock Partners Change Agents I sat down with Gabe Pereyra, President and Co-Founder of Harvey, to discuss how Harvey is using agents to power law firms. We talked about what it was like starting Harvey before ChatGPT was even released, the importance of RL environments for building agents, what types of agentic workflows lawyers use today, what a future 'AI law firm' might look like, and much more.

In the latest Greylock Partners Change Agents I sat down with Gabe Pereyra, President and Co-Founder of Harvey, to discuss how Harvey is using agents to power law firms. We talked about what it was like starting Harvey before ChatGPT was even released, the importance of RL environments for building agents, what types of agentic workflows lawyers use today, what a future 'AI law firm' might look like, and much more.

Corinne Marie Riley

29,426 次观看 • 7 个月前

Today we're launching Subconscious: a new platform for building agents with long-horizon reasoning and tool use, backed by MIT research. One API call. Tool use. Context beyond existing limits. If you're building agents, let's talk.

Today we're launching Subconscious: a new platform for building agents with long-horizon reasoning and tool use, backed by MIT research. One API call. Tool use. Context beyond existing limits. If you're building agents, let's talk.

Jack O'Brien

12,077 次观看 • 1 年前