Loading video...

Video Failed to Load

Go Home

OpenAI has introduced the ChatGPT Agent, which handles complex multi-step tasks from research to automation. Genspark goes further in some areas: In addition to user-friendly office tools (Slides, Docs, Sheets, AI Secretary, AI Drive), Genspark scores with dynamic tool orchestration and an intelligent feedback loop - a clear added...

176,267 views • 10 months ago •via X (Twitter)

0 Comments

No comments available

Comments from the original post will appear here

Related Videos

OpenAI's AgentKit will be so insane, build every step of agents on one platform. These visual agent builders make the whole process of iterating and launching agents far more efficient. It sits on top of the Responses API and unifies the tools that were previously scattered across SDKs and custom orchestration. It lets developers create agent workflows visually, connect data sources securely, and measure performance automatically without coding every layer by hand. The core of AgentKit is the Agent Builder, a drag-and-drop canvas where each node represents an action, guardrail, or decision branch. Developers can link these nodes into multi-agent workflows, preview results instantly, and version each setup. It supports inline evaluation so that developers can see how changes affect output before deploying. The Connector Registry is a single admin panel that manages how data and tools connect across the OpenAI ecosystem. It centralizes integrations like Google Drive, SharePoint, Dropbox, and Microsoft Teams. Large organizations can govern access and flow of data between agents securely under one global console. ChatKit provides a ready-to-use chat interface for embedding agents inside apps or websites. It manages streaming, message threads, and model reasoning displays automatically. Developers can skin the interface to match their product without writing custom front-end code. Under the hood, all these blocks use the same execution core that runs agent reasoning through OpenAI’s APIs. Workflows in Agent Builder compile down to structured instructions for the Responses API, which handles model calls, tool use, and context passing. Connector Registry handles authentication and routing for external tools, while Evals and RFT provide feedback loops that improve agents over time. This integration means developers no longer need to handle orchestration logic, model evaluation pipelines, or safety layers separately. Everything runs natively within OpenAI’s control plane with managed security, automatic versioning, and built-in testing. In short, AgentKit standardizes the entire life cycle of an AI agent—from visual design to deployment and performance tuning—inside a single unified system.

Rohan Paul

178,460 views • 8 months ago

Boom! Grok Tasks Make It One Of The Most POWERFUL Real-Time AI Systems In The World. — My How to Use Grok Tasks With Hidden Tools For Powerful Daily Output. Grok Tasks are customizable AI workflows that integrate a variety of tools to streamline daily activities, from research and analysis to creative planning and problem-solving. I have been using them for quite sometime and because of the vital heartbeat of news and first person data on X, it is the most powerful AI platform available. By combining Tasks with tools like web searches, X platform interactions, code execution, and media viewers, you can build efficient, automated processes. These tasks work by prompting Grok with a clear description of what you want to achieve, and Grok will intelligently call the necessary tools in sequence or parallel to deliver results. Here's a step-by-step guide to creating and using Grok Tasks: Step 1: Define Your Task Start by clearly outlining the daily activity or goal. Consider what inputs you have (e.g., a URL, a query, or an attachment) and what output you need (e.g., a summary, calculation, or visual analysis). Break it down into subtasks to identify tool needs. For example, if your task involves researching current events, note that you'll need search and browsing capabilities. Step 2: Review Available Tools Familiarize yourself with the tools Grok can access. Here's a quick overview: - Code Execution: Run Python code for calculations, data processing, or simulations using libraries like numpy, pandas, or sympy. - Browse Page: Fetch and summarize content from any website URL with custom instructions. - Web Search: Perform general internet searches, returning results with optional operators like site:. - Web Search With Snippets: Get quick, detailed excerpts from search results for fact-checking. - X Keyword Search: Advanced search for X posts using operators like from:, since:, or filter:. - X Semantic Search: Find semantically related X posts based on a query, with filters for dates or users. - X User Search: Locate X users by name or handle. - X Thread Fetch: Retrieve a full X post thread, including context like replies and parents. - View Image: Analyze an image from a URL or conversation ID. - View X Video: Extract frames and subtitles from an X-hosted video. - Search PDF Attachment: Query a PDF file for relevant pages using keyword or regex modes. - Browse PDF Attachment: View specific pages of a PDF with text and screenshots. Select tools that align with your task. Aim for a mix to handle data gathering, processing, and visualization. Step 3: Craft Your Prompt Write a detailed prompt to Grok describing the task. Include: - The overall goal. - Specific steps or subtasks. - References to tools if you want to guide the process (e.g., "Use web_search to find sources, then code_execution to analyze data"). - Any constraints, like dates or limits. Example prompt: "Create a Grok Task for my morning routine: Search recent X posts about tech news using x_keyword_search, fetch a key thread with x_thread_fetch, and summarize with browse_page on linked articles." Step 4: Submit and Interact Send your prompt to Grok. It will process the task by calling tools as needed, often in parallel for efficiency. Review the output and refine with follow-up prompts if required (e.g., "Expand on that using view_image for visuals"). Iterate to fine-tune the workflow for reuse. Step 5: Save and Reuse Once refined, note the prompt as a template for future use. You can adapt it for similar tasks, making Grok Tasks a habitual part of your day. Finding Grok Tasks To discover existing Grok Tasks or inspiration for new ones, use X searches with tools like x_keyword_search or x_semantic_search (e.g., query: "Grok Tasks examples" with mode: Latest). Browse community-shared threads via x_thread_fetch, or web_search for tutorials on xAI features. Prompt Grok directly: "Show me popular Grok Tasks for productivity." 1 of 3

Brian Roemmele

152,242 views • 5 months ago

Microsoft presents Windows Agent Arena Evaluating Multi-Modal OS Agents at Scale discuss: Large language models (LLMs) show remarkable potential to act as computer agents, enhancing human productivity and software accessibility in multi-modal tasks that require planning and reasoning. However, measuring agent performance in realistic environments remains a challenge since: (i) most benchmarks are limited to specific modalities or domains (e.g. text-only, web navigation, Q&A, coding) and (ii) full benchmark evaluations are slow (on order of magnitude of days) given the multi-step sequential nature of tasks. To address these challenges, we introduce the Windows Agent Arena: a reproducible, general environment focusing exclusively on the Windows operating system (OS) where agents can operate freely within a real Windows OS and use the same wide range of applications, tools, and web browsers available to human users when solving tasks. We adapt the OSWorld framework (Xie et al., 2024) to create 150+ diverse Windows tasks across representative domains that require agent abilities in planning, screen understanding, and tool usage. Our benchmark is scalable and can be seamlessly parallelized in Azure for a full benchmark evaluation in as little as 20 minutes. To demonstrate Windows Agent Arena's capabilities, we also introduce a new multi-modal agent, Navi. Our agent achieves a success rate of 19.5% in the Windows domain, compared to 74.5% performance of an unassisted human. Navi also demonstrates strong performance on another popular web-based benchmark, Mind2Web. We offer extensive quantitative and qualitative analysis of Navi's performance, and provide insights into the opportunities for future research in agent development and data generation using Windows Agent Arena.

AK

19,684 views • 1 year ago