Загрузка видео...

Не удалось загрузить видео

Возникла проблема при загрузке этого видео. Это может быть связано с временными проблемами сети или видео может быть недоступно.

На главную

What if we could have trustworthy agents that don't just write code, but also do research, understand multimodal content, and perform many practically useful tasks? Today at OpenHands, we released a new agent that gets SOTA or competitive performance on 8 diverse tasks.

OpenHands

10,619 subscribers

19,047 просмотров • 1 год назад •via X (Twitter)

Наука и технологии Образование

Anya Rossi• Live Now

Private livecam show

Комментарии: 10

Фото профиля All Hands AI

All Hands AI1 год назад

What new sorts of things does this allow you to do? Here is an example: we asked the agent to do research about the OpenHands library, and create a promotional web site, grounded in citations so we were sure that the content was correct. It built this for us in one shot.

Фото профиля All Hands AI

All Hands AI1 год назад

How did we verify the agent's accuracy? We built VersaBench a benchmark that tests 5 capabilities: - Improving codebases (in 9 programming languages) - Building apps from scratch - Writing tests and fixing broken code - Researching new info - General business-relevant tasks

Фото профиля All Hands AI

All Hands AI1 год назад

What are the results? We achieved state-of-the-art performance of 5 of the 8 benchmarks we tested, and competitive performance on all the others. We believe this is the first time an agent has been demonstrated to be so broadly capable.

Фото профиля All Hands AI

All Hands AI1 год назад

How did we achieve this? Building on the strong base of OpenHands, which was already over 70% accuracy on SWE-Bench, we further added a small number of targeted tools: - Research w/ the @tavilyai search engine - Multimodal browsing with set-of-marks - Multimodal file access

Фото профиля All Hands AI

All Hands AI1 год назад

The paper about this versatile agent, OpenHands-Versa, was lead by @Aditya_Soni_8 at CMU, and you can read much more about the methodology: - His summary: - The paper: - Our blog:

Фото профиля All Hands AI

All Hands AI1 год назад

How can you access this new agent? It's available by default in the most recent version of OpenHands: - OpenHands Cloud: - Downloading OpenHands 0.41.0 or higher: We can't wait for your feedback on it!

Фото профиля SecBriefs | Making Cybersecurity Simple

SecBriefs | Making Cybersecurity Simple1 год назад

Thinking about a career in cybersecurity but worried about your technical background? Don't be! 💡 Many roles value your problem-solving, analytical, & communication skills. 🕵️‍♀️ "Cybersecurity Dictionary for Everyone" is a good start, available on Amazon:

Фото профиля David J. Alba

David J. Alba1 год назад

Congrats on the release! We need to make some noise so Versabench gets the standard across labs on new models evaluation. Excited to try it 🫶

Фото профиля Abeansits

Abeansits1 год назад

amazing release! congrats to the team

Фото профиля Christopher

Christopher1 год назад

Hi again, sorry I’ve been digging into the new Versa Agent as planned but I can’t seem to find a link to the Benchmark. Has it been released publicly and I’m just missing it somehow? Thank you.

Похожие видео

We are releasing 4M-21 with a permissive license, including its source code and trained models. It's a pretty effective multimodal model that solves 10s of tasks & modalities. See the demo code, sample results, and the tokenizers of diverse modalities on the website. IMO, the multitask learning aspect of multimodal models has really taken a step forward. We can train a single model on many diverse tasks with ~SOTA accuracy. But a long way to go in terms of transfer/emergence. 🌐 ⌨️ Joint work w/ EPFL Apple.

We are releasing 4M-21 with a permissive license, including its source code and trained models. It's a pretty effective multimodal model that solves 10s of tasks & modalities. See the demo code, sample results, and the tokenizers of diverse modalities on the website. IMO, the multitask learning aspect of multimodal models has really taken a step forward. We can train a single model on many diverse tasks with ~SOTA accuracy. But a long way to go in terms of transfer/emergence. 🌐 ⌨️ Joint work w/ EPFL Apple.

Amir Zamir

69,241 просмотров • 2 лет назад

Simplest way to create an AI agent today: You don't need to write any code. You don't need to build a workflow. You can literally build agents that complete useful tasks for you using a single prompt. Check the quick video I recorded!

Simplest way to create an AI agent today: You don't need to write any code. You don't need to build a workflow. You can literally build agents that complete useful tasks for you using a single prompt. Check the quick video I recorded!

Santiago

62,639 просмотров • 7 месяцев назад

🤖 Introducing Taskade's Multi-AI Agents, now entering Beta! Imagine one AI agent researching while another converts insights into tasks. They can write articles, perform research, summarize findings, and edit content—all at once! 🚀 For early access, reply 'AI Agent'! ✨

🤖 Introducing Taskade's Multi-AI Agents, now entering Beta! Imagine one AI agent researching while another converts insights into tasks. They can write articles, perform research, summarize findings, and edit content—all at once! 🚀 For early access, reply 'AI Agent'! ✨

Taskade

179,310 просмотров • 2 лет назад

Today we release our new platform, Access Intelligence 1⃣Monetize your content or data for agent consumption 2⃣Deploy a data powered agent in under 5 minutes — no code 3⃣Monetize useful agents 🧵

Today we release our new platform, Access Intelligence 1⃣Monetize your content or data for agent consumption 2⃣Deploy a data powered agent in under 5 minutes — no code 3⃣Monetize useful agents 🧵

Access Intelligence

39,019 просмотров • 1 год назад

You can create an AI Agent that answers your email with a few clicks. 1. Go to ChatLLM ( 2. Click on AI Engineer 3. Select Create an AI Agent 4. Choose the Email Answering Agent ChatLLM will do the rest: it will code, test, and deploy the agent for you. You can also create a custom agent in English. The Agent Economy is coming (somebody should write a book and use this title.) We are going to see examples like this, times 1,000 in 2025. Just think about how many repetitive tasks you perform every day. Some of these tasks are involved enough that we couldn't automate them with pre-AI solutions. That's where we'll see agents explode, and I'm here for it.

You can create an AI Agent that answers your email with a few clicks. 1. Go to ChatLLM ( 2. Click on AI Engineer 3. Select Create an AI Agent 4. Choose the Email Answering Agent ChatLLM will do the rest: it will code, test, and deploy the agent for you. You can also create a custom agent in English. The Agent Economy is coming (somebody should write a book and use this title.) We are going to see examples like this, times 1,000 in 2025. Just think about how many repetitive tasks you perform every day. Some of these tasks are involved enough that we couldn't automate them with pre-AI solutions. That's where we'll see agents explode, and I'm here for it.

Santiago

79,974 просмотров • 1 год назад

A research preview of Operator, an agent that can use its own browser to perform tasks for you.

A research preview of Operator, an agent that can use its own browser to perform tasks for you.

OpenAI

3,935,882 просмотров • 1 год назад

Sure, AWS DevOps Agent just went GA, but we added a bunch of features at the same time, including Custom Skills, Learned Skills, Code Indexing - just to name a few! If you have 30 seconds, you can see the new On-Demand SRE Tasks in action, which can help you with the endless variety of operational tasks that come up. More in the blog!

Sure, AWS DevOps Agent just went GA, but we added a bunch of features at the same time, including Custom Skills, Learned Skills, Code Indexing - just to name a few! If you have 30 seconds, you can see the new On-Demand SRE Tasks in action, which can help you with the endless variety of operational tasks that come up. More in the blog!

David Yanacek

20,751 просмотров • 2 месяцев назад

We finished evaluating π0.7, our new model at Physical Intelligence. What I'm most excited about with π0.7 is that it's starting to show some surprising emergent compositional generalization, being able to both perform complex tasks and learn new tasks just from instructions.

We finished evaluating π0.7, our new model at Physical Intelligence. What I'm most excited about with π0.7 is that it's starting to show some surprising emergent compositional generalization, being able to both perform complex tasks and learn new tasks just from instructions.

Sergey Levine

60,173 просмотров • 2 месяцев назад

Today we are launching the next phase of AI reasoning development with Founders Fund, Franklin Templeton, Pantera Capital, Fireworks AI, OpenRouter, OpenHands, Dedalus Labs, alphaXiv, and more. AI is advancing at a relentless pace, but there are many reasoning capabilities we have yet to discover. Announcing Arena—an evaluation-driven platform for ideation, prototyping, and high-quality data generation—with top AI developers advancing SOTA performance on real-world enterprise reasoning tasks.

Today we are launching the next phase of AI reasoning development with Founders Fund, Franklin Templeton, Pantera Capital, Fireworks AI, OpenRouter, OpenHands, Dedalus Labs, alphaXiv, and more. AI is advancing at a relentless pace, but there are many reasoning capabilities we have yet to discover. Announcing Arena—an evaluation-driven platform for ideation, prototyping, and high-quality data generation—with top AI developers advancing SOTA performance on real-world enterprise reasoning tasks.

Sentient

271,598 просмотров • 3 месяцев назад

New AI models can now perform many of the tasks that once defined entry-level jobs. In another year, imagine where we are at.

New AI models can now perform many of the tasks that once defined entry-level jobs. In another year, imagine where we are at.

Financial Dystopia

22,716 просмотров • 3 месяцев назад

New course: Building Coding Agents with Tool Execution, taught by Tereza Tizkova and Fra from E2B. Most AI agents are limited to predefined function calls. This short course teaches you to build agents that write and execute code to accomplish tasks, accessing entire programming language ecosystems instead of being restricted to a fixed set of tools. You'll learn to run agent-generated code safely in sandboxed cloud environments that protect your systems from harmful operations. Skills you'll gain: - Build agents that write and execute code, manage files, and handle errors autonomously through feedback loops - Run agent code safely in E2B cloud sandboxes and understand tradeoffs between local, containerized, and cloud execution - Create a data analyst agent that explores visualizes data with Pandas - Create a full-stack agent that builds complete Next.js web applications Join and build agents that code their way through complex tasks:

New course: Building Coding Agents with Tool Execution, taught by Tereza Tizkova and Fra from E2B. Most AI agents are limited to predefined function calls. This short course teaches you to build agents that write and execute code to accomplish tasks, accessing entire programming language ecosystems instead of being restricted to a fixed set of tools. You'll learn to run agent-generated code safely in sandboxed cloud environments that protect your systems from harmful operations. Skills you'll gain: - Build agents that write and execute code, manage files, and handle errors autonomously through feedback loops - Run agent code safely in E2B cloud sandboxes and understand tradeoffs between local, containerized, and cloud execution - Create a data analyst agent that explores visualizes data with Pandas - Create a full-stack agent that builds complete Next.js web applications Join and build agents that code their way through complex tasks:

Andrew Ng

203,297 просмотров • 6 месяцев назад

Today we’re shipping new capabilities that make Resolve AI the platform where engineering teams run and fix production software with AI agents. New capabilities include background agents that run operational tasks, new agent architecture that delivers 2x investigation quality, new agent capabilities like governed actions, and new ways to work with agents in UI or terminal. With Resolve AI engineering teams can: - Delegate on-call to agents - Co-work with agents to resolve incidents - Run operational tasks with background agents.

Today we’re shipping new capabilities that make Resolve AI the platform where engineering teams run and fix production software with AI agents. New capabilities include background agents that run operational tasks, new agent architecture that delivers 2x investigation quality, new agent capabilities like governed actions, and new ways to work with agents in UI or terminal. With Resolve AI engineering teams can: - Delegate on-call to agents - Co-work with agents to resolve incidents - Run operational tasks with background agents.

Resolve AI

313,677 просмотров • 25 дней назад

Build agents that can actually do real-world tasks! Agent Reinforcement Trainer (ART) is a framework to train multi-step LLM agents for real-world tasks using GRPO. Just a few lines of code. No manual rewards needed. vLLM + Unsloth combined 🚀 100% open-source.

Build agents that can actually do real-world tasks! Agent Reinforcement Trainer (ART) is a framework to train multi-step LLM agents for real-world tasks using GRPO. Just a few lines of code. No manual rewards needed. vLLM + Unsloth combined 🚀 100% open-source.

Akshay 🚀

38,162 просмотров • 4 месяцев назад

We are excited to share new experiments with AgiBot @AgiBot_zhiyuan on multi-task, multi-embodiment VLAs! With one model that can perform many tasks with both two-finger grippers and multi-fingered hands, we take another step toward one model for all robots and tasks.

We are excited to share new experiments with AgiBot @AgiBot_zhiyuan on multi-task, multi-embodiment VLAs! With one model that can perform many tasks with both two-finger grippers and multi-fingered hands, we take another step toward one model for all robots and tasks.

Physical Intelligence

75,796 просмотров • 1 год назад

🤖 AI agents are taking over everyday tasks — while you stay in control. KITE AI is building the trust layer powered by Avalanche🔺, enabling millions of secure, agent-to-agent transactions with speed and scalability. They’re creating the platform that makes AI agents trustworthy, autonomous, and ready to transform how we live and work. [Partner Content]

🤖 AI agents are taking over everyday tasks — while you stay in control. KITE AI is building the trust layer powered by Avalanche🔺, enabling millions of secure, agent-to-agent transactions with speed and scalability. They’re creating the platform that makes AI agents trustworthy, autonomous, and ready to transform how we live and work. [Partner Content]

CoinDesk

65,454 просмотров • 9 месяцев назад

Reasoning models tailored for diverse AI tasks. 🚀 Meet Amazon Nova 2 foundation models, supporting fast, cost-effective reasoning to multimodal capabilities. Power versatile tasks like AI agents, code-generation, and Conversational AI. Choose the perfect match for your workload.

Reasoning models tailored for diverse AI tasks. 🚀 Meet Amazon Nova 2 foundation models, supporting fast, cost-effective reasoning to multimodal capabilities. Power versatile tasks like AI agents, code-generation, and Conversational AI. Choose the perfect match for your workload.

Amazon Web Services

2,074,714 просмотров • 6 месяцев назад

Google just unveiled a new project that shows how AI Agents are coming for full browser and computer use. This opens up AI to any of the laborious knowledge tasks that we do today. Most importantly, it lets us automate work that we never had even contemplated before.

Google just unveiled a new project that shows how AI Agents are coming for full browser and computer use. This opens up AI to any of the laborious knowledge tasks that we do today. Most importantly, it lets us automate work that we never had even contemplated before.

Aaron Levie

145,436 просмотров • 1 год назад

🤔Can we assess agents across various apps & OS w.o. crafting new envs? OSWorld🖥️: A unified, real computer env for multimodal agents to evaluate open-ended computer tasks with arbitrary apps and interfaces on Ubuntu, Windows, & macOS. + annotated 369 real-world computer tasks 👇

🤔Can we assess agents across various apps & OS w.o. crafting new envs? OSWorld🖥️: A unified, real computer env for multimodal agents to evaluate open-ended computer tasks with arbitrary apps and interfaces on Ubuntu, Windows, & macOS. + annotated 369 real-world computer tasks 👇

Tianbao Xie

66,582 просмотров • 2 лет назад

We released iOS app for Hermes Agent 📱 Connect to your self-hosted agent over Tailscale, Cloudflare Tunnel or ngrok. Or deploy on a VPS. Run tasks and manage your agent from anywhere.

We released iOS app for Hermes Agent 📱 Connect to your self-hosted agent over Tailscale, Cloudflare Tunnel or ngrok. Or deploy on a VPS. Run tasks and manage your agent from anywhere.

atomicbot.ai

109,185 просмотров • 20 дней назад

We set a new ARC-AGI-2 SotA: 85.28% using an Agentica agent (~350 lines) that writes and runs code.

We set a new ARC-AGI-2 SotA: 85.28% using an Agentica agent (~350 lines) that writes and runs code.

Agentica

182,778 просмотров • 4 месяцев назад