Video wird geladen...

Video konnte nicht geladen werden

Zur Startseite

What if we could have *trustworthy* agents that don't just write code, but also do research, understand multimodal content, and perform many practically useful tasks? Today at OpenHands, we released a new agent that gets SOTA or competitive performance on 8 diverse tasks.

19,052 Aufrufe • vor 1 Jahr •via X (Twitter)

10 Kommentare

Profilbild von All Hands AI
All Hands AIvor 1 Jahr

What new sorts of things does this allow you to do? Here is an example: we asked the agent to do research about the OpenHands library, and create a promotional web site, grounded in citations so we were sure that the content was correct. It built this for us in one shot.

Profilbild von All Hands AI
All Hands AIvor 1 Jahr

How did we verify the agent's accuracy? We built VersaBench a benchmark that tests 5 capabilities: - Improving codebases (in 9 programming languages) - Building apps from scratch - Writing tests and fixing broken code - Researching new info - General business-relevant tasks

Profilbild von All Hands AI
All Hands AIvor 1 Jahr

What are the results? We achieved state-of-the-art performance of 5 of the 8 benchmarks we tested, and competitive performance on all the others. We believe this is the first time an agent has been demonstrated to be so broadly capable.

Profilbild von All Hands AI
All Hands AIvor 1 Jahr

How did we achieve this? Building on the strong base of OpenHands, which was already over 70% accuracy on SWE-Bench, we further added a small number of targeted tools: - Research w/ the @tavilyai search engine - Multimodal browsing with set-of-marks - Multimodal file access

Profilbild von All Hands AI
All Hands AIvor 1 Jahr

The paper about this versatile agent, OpenHands-Versa, was lead by @Aditya_Soni_8 at CMU, and you can read much more about the methodology: - His summary: - The paper: - Our blog:

Profilbild von All Hands AI
All Hands AIvor 1 Jahr

How can you access this new agent? It's available by default in the most recent version of OpenHands: - OpenHands Cloud: - Downloading OpenHands 0.41.0 or higher: We can't wait for your feedback on it!

Profilbild von SecBriefs | Making Cybersecurity Simple
SecBriefs | Making Cybersecurity Simplevor 1 Jahr

Thinking about a career in cybersecurity but worried about your technical background? Don't be! 💡 Many roles value your problem-solving, analytical, & communication skills. 🕵️‍♀️ "Cybersecurity Dictionary for Everyone" is a good start, available on Amazon:

Profilbild von David J. Alba
David J. Albavor 1 Jahr

Congrats on the release! We need to make some noise so Versabench gets the standard across labs on new models evaluation. Excited to try it 🫶

Profilbild von Abeansits
Abeansitsvor 1 Jahr

amazing release! congrats to the team

Profilbild von Christopher
Christophervor 1 Jahr

Hi again, sorry I’ve been digging into the new Versa Agent as planned but I can’t seem to find a link to the Benchmark. Has it been released publicly and I’m just missing it somehow? Thank you.

Ähnliche Videos