Загрузка видео...
Не удалось загрузить видео
What if we could have *trustworthy* agents that don't just write code, but also do research, understand multimodal content, and perform many practically useful tasks? Today at OpenHands, we released a new agent that gets SOTA or competitive performance on 8 diverse tasks.
19,047 просмотров • 1 год назад •via X (Twitter)
Комментарии: 10

What new sorts of things does this allow you to do? Here is an example: we asked the agent to do research about the OpenHands library, and create a promotional web site, grounded in citations so we were sure that the content was correct. It built this for us in one shot.

How did we verify the agent's accuracy? We built VersaBench a benchmark that tests 5 capabilities: - Improving codebases (in 9 programming languages) - Building apps from scratch - Writing tests and fixing broken code - Researching new info - General business-relevant tasks

What are the results? We achieved state-of-the-art performance of 5 of the 8 benchmarks we tested, and competitive performance on all the others. We believe this is the first time an agent has been demonstrated to be so broadly capable.

How did we achieve this? Building on the strong base of OpenHands, which was already over 70% accuracy on SWE-Bench, we further added a small number of targeted tools: - Research w/ the @tavilyai search engine - Multimodal browsing with set-of-marks - Multimodal file access

The paper about this versatile agent, OpenHands-Versa, was lead by @Aditya_Soni_8 at CMU, and you can read much more about the methodology: - His summary: - The paper: - Our blog:

How can you access this new agent? It's available by default in the most recent version of OpenHands: - OpenHands Cloud: - Downloading OpenHands 0.41.0 or higher: We can't wait for your feedback on it!

Thinking about a career in cybersecurity but worried about your technical background? Don't be! 💡 Many roles value your problem-solving, analytical, & communication skills. 🕵️♀️ "Cybersecurity Dictionary for Everyone" is a good start, available on Amazon:

Congrats on the release! We need to make some noise so Versabench gets the standard across labs on new models evaluation. Excited to try it 🫶
amazing release! congrats to the team
Hi again, sorry I’ve been digging into the new Versa Agent as planned but I can’t seem to find a link to the Benchmark. Has it been released publicly and I’m just missing it somehow? Thank you.

![🤖 AI agents are taking over everyday tasks — while you stay in control. KITE AI is building the trust layer powered by Avalanche🔺, enabling millions of secure, agent-to-agent transactions with speed and scalability. They’re creating the platform that makes AI agents trustworthy, autonomous, and ready to transform how we live and work. [Partner Content]](https://image.24vids.com/tw-1965881819103998450/media/G0g3FBmXsAAXlLr.jpg)
