Swapna Kumar Panda's banner
Swapna Kumar Panda's profile picture

Swapna Kumar Panda

@swapnakpanda226,485 subscribers

| Tech Writer, Educator | Python, Java, JavaScript, SQL | DSA, Development | Free Resources, AI Tools | Other Version: @therealswapna | Building @JabardastDEV |

Shorts

Unpopular opinion: Most agent evals are theatre. You run them once before the deployment. It'll take 800ms+ as another LLM would be judging your LLM. Most annoying part - no one tells where in the chain things went wrong. I wasted a lot of time in this loop. And then I came across Future AGI bringing 5 different tools under one umbrella, best part - the platform is completely open source. They open sourced their entire platform and the eval layer is noticeably different. It is multimodal - works on everything text, image, audio, pdf. Not an LLM-as-judge adding latency but an agent with memory and tools. The biggest win are learned classifiers trained on actual production failure patterns to run evals at low cost. It also runs across the full reasoning chain, not just the final response. Check out → Try it here →

Unpopular opinion: Most agent evals are theatre. You run them once before the deployment. It'll take 800ms+ as another LLM would be judging your LLM. Most annoying part - no one tells where in the chain things went wrong. I wasted a lot of time in this loop. And then I came across Future AGI bringing 5 different tools under one umbrella, best part - the platform is completely open source. They open sourced their entire platform and the eval layer is noticeably different. It is multimodal - works on everything text, image, audio, pdf. Not an LLM-as-judge adding latency but an agent with memory and tools. The biggest win are learned classifiers trained on actual production failure patterns to run evals at low cost. It also runs across the full reasoning chain, not just the final response. Check out → Try it here →

49,557 次观看

Videos

没有更多内容可加载