Video yükleniyor...

Video Yüklenemedi

Ana Sayfaya Dön

What if we could have *trustworthy* agents that don't just write code, but also do research, understand multimodal content, and perform many practically useful tasks? Today at OpenHands, we released a new agent that gets SOTA or competitive performance on 8 diverse tasks.

19,052 görüntüleme • 1 yıl önce •via X (Twitter)

10 Yorum

All Hands AI profil fotoğrafı
All Hands AI1 yıl önce

What new sorts of things does this allow you to do? Here is an example: we asked the agent to do research about the OpenHands library, and create a promotional web site, grounded in citations so we were sure that the content was correct. It built this for us in one shot.

All Hands AI profil fotoğrafı
All Hands AI1 yıl önce

How did we verify the agent's accuracy? We built VersaBench a benchmark that tests 5 capabilities: - Improving codebases (in 9 programming languages) - Building apps from scratch - Writing tests and fixing broken code - Researching new info - General business-relevant tasks

All Hands AI profil fotoğrafı
All Hands AI1 yıl önce

What are the results? We achieved state-of-the-art performance of 5 of the 8 benchmarks we tested, and competitive performance on all the others. We believe this is the first time an agent has been demonstrated to be so broadly capable.

All Hands AI profil fotoğrafı
All Hands AI1 yıl önce

How did we achieve this? Building on the strong base of OpenHands, which was already over 70% accuracy on SWE-Bench, we further added a small number of targeted tools: - Research w/ the @tavilyai search engine - Multimodal browsing with set-of-marks - Multimodal file access

All Hands AI profil fotoğrafı
All Hands AI1 yıl önce

The paper about this versatile agent, OpenHands-Versa, was lead by @Aditya_Soni_8 at CMU, and you can read much more about the methodology: - His summary: - The paper: - Our blog:

All Hands AI profil fotoğrafı
All Hands AI1 yıl önce

How can you access this new agent? It's available by default in the most recent version of OpenHands: - OpenHands Cloud: - Downloading OpenHands 0.41.0 or higher: We can't wait for your feedback on it!

SecBriefs | Making Cybersecurity Simple profil fotoğrafı
SecBriefs | Making Cybersecurity Simple1 yıl önce

Thinking about a career in cybersecurity but worried about your technical background? Don't be! 💡 Many roles value your problem-solving, analytical, & communication skills. 🕵️‍♀️ "Cybersecurity Dictionary for Everyone" is a good start, available on Amazon:

David J. Alba profil fotoğrafı
David J. Alba1 yıl önce

Congrats on the release! We need to make some noise so Versabench gets the standard across labs on new models evaluation. Excited to try it 🫶

Abeansits profil fotoğrafı
Abeansits1 yıl önce

amazing release! congrats to the team

Christopher profil fotoğrafı
Christopher1 yıl önce

Hi again, sorry I’ve been digging into the new Versa Agent as planned but I can’t seem to find a link to the Benchmark. Has it been released publicly and I’m just missing it somehow? Thank you.

Benzer Videolar