正在加载视频...

视频加载失败

Droids have native multi-agent orchestration built-in. Our desktop app let's you easily manage these sub-agents as they tackle complex tasks. Monitor each agent as it works, and interrupt or inject context when needed.

11,781 次观看 • 2 个月前 •via X (Twitter)

0 条评论

暂无评论

原始帖子的评论将显示在这里

相关视频

Microsoft presents Windows Agent Arena Evaluating Multi-Modal OS Agents at Scale discuss: Large language models (LLMs) show remarkable potential to act as computer agents, enhancing human productivity and software accessibility in multi-modal tasks that require planning and reasoning. However, measuring agent performance in realistic environments remains a challenge since: (i) most benchmarks are limited to specific modalities or domains (e.g. text-only, web navigation, Q&A, coding) and (ii) full benchmark evaluations are slow (on order of magnitude of days) given the multi-step sequential nature of tasks. To address these challenges, we introduce the Windows Agent Arena: a reproducible, general environment focusing exclusively on the Windows operating system (OS) where agents can operate freely within a real Windows OS and use the same wide range of applications, tools, and web browsers available to human users when solving tasks. We adapt the OSWorld framework (Xie et al., 2024) to create 150+ diverse Windows tasks across representative domains that require agent abilities in planning, screen understanding, and tool usage. Our benchmark is scalable and can be seamlessly parallelized in Azure for a full benchmark evaluation in as little as 20 minutes. To demonstrate Windows Agent Arena's capabilities, we also introduce a new multi-modal agent, Navi. Our agent achieves a success rate of 19.5% in the Windows domain, compared to 74.5% performance of an unassisted human. Navi also demonstrates strong performance on another popular web-based benchmark, Mind2Web. We offer extensive quantitative and qualitative analysis of Navi's performance, and provide insights into the opportunities for future research in agent development and data generation using Windows Agent Arena.

AK

19,684 次观看 • 1 年前

🧃 Introducing stereOS: a Linux based operating system hardened and purpose built for AI agents. It's clear that agents need an ACTUAL operating system (not what people are calling an "OS") to witness the full breadth and depth of their capabilities while mitigating the blast radius of autonomous, untrusted actors. But there are so many problems with AI sandboxes today: * Going out to the apple store and buying a mac mini will never scale and is way too expensive (obviously) * Running in Docker is too restrictive (agents can't stand up their own container infrastructure, no sub virtualization, docker-in-docker is very broken) * Firecracker strips all the hardware so GPU PCIe passthrough, secure boot, FIPs, etc. is out of the question. * Native VMs are too fat and the overhead of 1 agent per VM is too much. stereOS takes a different approach: it's a full NixOS system that you boot and then kick off agent sandboxes inside with gVisor + /nix/store namespace mounting. Each agent gets their own kernel and the /nix/store is read only by nature. Even if the agent was somehow able to escape the gVisor virtual kernel, they'd land on the NixOS system as the "agent" user! Not your actual hardware!! If you want to take a defense-in-depth approach, we support "native" agents that run at the system level kicked off by our `agentd` utility. These agents, on their own, can manage and kick off other sub agents using the internal sandboxing mechanisms. Today, we're open sourcing all of this: * stereOS: our purpose built Linux OS - * masterblaster: client utility to launch, manage, and orchestrate agents - * stereosd: the stereOS system control plane daemon - * agentd: the stereOS system agent management daemon - Give it a try, throw us a star, and let me know what you think 🧃⭐️

John McBride

150,178 次观看 • 3 个月前