Загрузка видео...

Не удалось загрузить видео

На главную

What if we could have *trustworthy* agents that don't just write code, but also do research, understand multimodal content, and perform many practically useful tasks? Today at OpenHands, we released a new agent that gets SOTA or competitive performance on 8 diverse tasks.

19,047 просмотров • 1 год назад •via X (Twitter)

Комментарии: 10

Фото профиля All Hands AI
All Hands AI1 год назад

What new sorts of things does this allow you to do? Here is an example: we asked the agent to do research about the OpenHands library, and create a promotional web site, grounded in citations so we were sure that the content was correct. It built this for us in one shot.

Фото профиля All Hands AI
All Hands AI1 год назад

How did we verify the agent's accuracy? We built VersaBench a benchmark that tests 5 capabilities: - Improving codebases (in 9 programming languages) - Building apps from scratch - Writing tests and fixing broken code - Researching new info - General business-relevant tasks

Фото профиля All Hands AI
All Hands AI1 год назад

What are the results? We achieved state-of-the-art performance of 5 of the 8 benchmarks we tested, and competitive performance on all the others. We believe this is the first time an agent has been demonstrated to be so broadly capable.

Фото профиля All Hands AI
All Hands AI1 год назад

How did we achieve this? Building on the strong base of OpenHands, which was already over 70% accuracy on SWE-Bench, we further added a small number of targeted tools: - Research w/ the @tavilyai search engine - Multimodal browsing with set-of-marks - Multimodal file access

Фото профиля All Hands AI
All Hands AI1 год назад

The paper about this versatile agent, OpenHands-Versa, was lead by @Aditya_Soni_8 at CMU, and you can read much more about the methodology: - His summary: - The paper: - Our blog:

Фото профиля All Hands AI
All Hands AI1 год назад

How can you access this new agent? It's available by default in the most recent version of OpenHands: - OpenHands Cloud: - Downloading OpenHands 0.41.0 or higher: We can't wait for your feedback on it!

Фото профиля SecBriefs | Making Cybersecurity Simple
SecBriefs | Making Cybersecurity Simple1 год назад

Thinking about a career in cybersecurity but worried about your technical background? Don't be! 💡 Many roles value your problem-solving, analytical, & communication skills. 🕵️‍♀️ "Cybersecurity Dictionary for Everyone" is a good start, available on Amazon:

Фото профиля David J. Alba
David J. Alba1 год назад

Congrats on the release! We need to make some noise so Versabench gets the standard across labs on new models evaluation. Excited to try it 🫶

Фото профиля Abeansits
Abeansits1 год назад

amazing release! congrats to the team

Фото профиля Christopher
Christopher1 год назад

Hi again, sorry I’ve been digging into the new Versa Agent as planned but I can’t seem to find a link to the Benchmark. Has it been released publicly and I’m just missing it somehow? Thank you.

Похожие видео