PatronusAI's banner
PatronusAI's profile picture

PatronusAI

@PatronusAI1,935 subscribers

Simulation research and infrastructure for human-aligned AGI https://t.co/8X6bVgvCHd

Shorts

1/ 🔥🔥 Big news: We’re launching Percival, the first AI agent that can evaluate and fix other AI agents! 🤖 Percival is an evaluation agent that doesn’t just detect failures in agent traces — it can fix them. Percival outperformed SOTA LLMs by 2.9x on the TRAIL dataset, containing human annotated errors from GAIA and SWE-Bench. 🦾 Here’s what Percival can do for you: - Automatically suggest prompt fixes for your agent - Catch 20+ types of agent failures spanning tool use, planning and coordination, domain specific errors - Reduce manual debugging time from hours to < 1 minute

1/ 🔥🔥 Big news: We’re launching Percival, the first AI agent that can evaluate and fix other AI agents! 🤖 Percival is an evaluation agent that doesn’t just detect failures in agent traces — it can fix them. Percival outperformed SOTA LLMs by 2.9x on the TRAIL dataset, containing human annotated errors from GAIA and SWE-Bench. 🦾 Here’s what Percival can do for you: - Automatically suggest prompt fixes for your agent - Catch 20+ types of agent failures spanning tool use, planning and coordination, domain specific errors - Reduce manual debugging time from hours to < 1 minute

23,988 просмотров

1/ Introducing Glider - the smallest model to beat GPT-4o-mini on eval tasks ⚡🚀 - Open source, open weights, open code - Explainable evaluations by nature - Trained on 183 criteria and 685 domains Try it out for free at 🔥

1/ Introducing Glider - the smallest model to beat GPT-4o-mini on eval tasks ⚡🚀 - Open source, open weights, open code - Explainable evaluations by nature - Trained on 183 criteria and 685 domains Try it out for free at 🔥

14,842 просмотров

Videos

Больше нет контента для загрузки