Video wird geladen...

Video konnte nicht geladen werden

Beim Laden dieses Videos ist ein Problem aufgetreten. Dies könnte an einem vorübergehenden Netzwerkproblem liegen oder das Video ist möglicherweise nicht verfügbar.

📢Excited to release GoEx⚡️a runtime for LLM-generated actions like code, API calls, and more. Featuring "post-facto validation" for assessing LLM actions after execution 🔍 Key to our approach is "undo" 🔄 and "damage confinement" abstractions to manage unintended actions & risks. This paves the way for fully autonomous LLM... show more

Shishir Patil

4,250 subscribers

57,984 Aufrufe • vor 2 Jahren •via X (Twitter)

Wissenschaft & Technologie

Anya Rossi• Live Now

Private livecam show

6 Kommentare

Profilbild von Shishir Patil

Shishir Patilvor 2 Jahren

🌐 Pioneering a future where LLMs empower microservices & apps, evolving from mere data retrievers 🧵to autonomous decision-makers within our digital world 🧙 Wondering about the safety and correctness of these interactions🤔? Our latest vision paper explores these questions, laying out design principles for the next-step in LLM powered applications 💯

Profilbild von Shishir Patil

Shishir Patilvor 2 Jahren

We study the inherent challenges in relying on LLMs—addressing their unpredictability, the essential trust mechanisms for their decision-making, and hurdles in failure recognition & resolution. Our system, GoEx presents abstractions and policies to overcome these for RESTful APIs, and operations on databases and filesystems! An exhilarating collaboration with @tianjun_zhang @vivianfxng Noppapon C @_royh021 Aaron Hao @profjoeyg @ralucaadapopa Ion Stoica from @UCBerkeley and @martin_casado from @a16z

Profilbild von Jeff Schneider

Jeff Schneidervor 2 Jahren

in your API inventory, what % had an undo function?

Profilbild von darya

daryavor 2 Jahren

@vivianfxng hi

Profilbild von Davanum Srinivas

Davanum Srinivasvor 2 Jahren

cc @ibuildthecloud

Profilbild von Tereza Tizkova

Tereza Tizkovavor 2 Jahren

I read your paper and it's the first time I see this approach. Based on the code, you just define a "reverse tool" for each tool the LLM can use, is it correct? For the types of actions you cannot reverse, I suggest running the LLM output in a safe sandboxed environment, e.g. using the @e2b_dev Code Interpreter SDK: There, the actions that the LLM agent "decides" to do are isolated in a separate sandbox instance.

Ähnliche Videos

Announcing Payments MCP, the easiest way for AI agents to get onchain via x402. 🚀 It lets LLM models like Claude, Gemini, and ChatGPT gain access to onchain tools like wallets, onramp, and payments with no API key required. 🧵

Announcing Payments MCP, the easiest way for AI agents to get onchain via x402. 🚀 It lets LLM models like Claude, Gemini, and ChatGPT gain access to onchain tools like wallets, onramp, and payments with no API key required. 🧵

Coinbase Developer Platform🛡️

417,601 Aufrufe • vor 7 Monaten

i'm a little sick of chatgpt giving me obviously broken code i've found a "micro agent" approach to LLM code generation can work much better the LLM first generates a *test*, and then enters a loop where it generates and iterates on the code until the tests pass source below

i'm a little sick of chatgpt giving me obviously broken code i've found a "micro agent" approach to LLM code generation can work much better the LLM first generates a test, and then enters a loop where it generates and iterates on the code until the tests pass source below

Steve (Builder.io)

544,125 Aufrufe • vor 2 Jahren

Codiff 0.1 is out * Fast Local Code Reviews * Optional LLM Walkthroughs * Inline Review Comments This is the best companion for reviewing output of coding agents. macOS Release:

Codiff 0.1 is out * Fast Local Code Reviews * Optional LLM Walkthroughs * Inline Review Comments This is the best companion for reviewing output of coding agents. macOS Release:

Christoph Nakazawa

78,034 Aufrufe • vor 23 Tagen

Introducing the Stagehand Cache We make your agents and automations faster by caching repeated actions, so identical requests skip redundant LLM calls. Available for free for all Stagehand sessions run on Browserbase.

Introducing the Stagehand Cache We make your agents and automations faster by caching repeated actions, so identical requests skip redundant LLM calls. Available for free for all Stagehand sessions run on Browserbase.

Stagehand

60,571 Aufrufe • vor 3 Monaten

Today, we’re proud to announce a partnership with a leader in autonomous agent technology. The collaboration is already delivering impact: Integritas (by Minima) is now fully integrated with ASI-1 (by Fetch.ai), proprietary LLM built for agentic AI, enabling seamless interaction between on-chain data integrity and intelligent autonomous agents.

Today, we’re proud to announce a partnership with a leader in autonomous agent technology. The collaboration is already delivering impact: Integritas (by Minima) is now fully integrated with ASI-1 (by Fetch.ai), proprietary LLM built for agentic AI, enabling seamless interaction between on-chain data integrity and intelligent autonomous agents.

Minima

111,625 Aufrufe • vor 7 Monaten

New short course: Building Code Agents with Hugging Face smolagents! Learn how to build code agents in this course, created in collaboration with Hugging Face, and taught by Thomas Wolf, its co-founder and CSO, and m_ric, Hugging Face’s Project Lead on Agents. Tool-calling agents use LLMs to generate multiple function calls sequentially to complete a complex sequence of tasks. They generate one function call, execute it, observe, reason, and decide what to do next. Code agents take a different approach. They consolidate all these calls into a single block of code, letting the LLM lay out an entire action plan at once, which can be executed efficiently to provide more reliable results. You’ll learn how to code agents using smolagents, a lightweight agentic framework from Hugging Face. Along the way, you’ll learn how to run LLM-generated code safely and develop an evaluation system to optimize your code agent for production. In detail, you’ll learn: - How agentic systems have evolved, gaining greater levels of agency over time—and why code agents are a next step. - How code agents write their actions in code. - When code agents outperform function-calling agents. - How to run code agents safely in your system using a constrained Python interpreter and sandboxing using E2B. - To trace, debug, and assess the code agent to optimize its behaviours for complex requests. - How to build a research multi-agent system that can find information online and organize it into an interactive report. By the end of this course, you’ll know how to build and run code agents using smolagents, and deploy them safely with a structured evaluation system in your projects. Please sign up here!

New short course: Building Code Agents with Hugging Face smolagents! Learn how to build code agents in this course, created in collaboration with Hugging Face, and taught by Thomas Wolf, its co-founder and CSO, and m_ric, Hugging Face’s Project Lead on Agents. Tool-calling agents use LLMs to generate multiple function calls sequentially to complete a complex sequence of tasks. They generate one function call, execute it, observe, reason, and decide what to do next. Code agents take a different approach. They consolidate all these calls into a single block of code, letting the LLM lay out an entire action plan at once, which can be executed efficiently to provide more reliable results. You’ll learn how to code agents using smolagents, a lightweight agentic framework from Hugging Face. Along the way, you’ll learn how to run LLM-generated code safely and develop an evaluation system to optimize your code agent for production. In detail, you’ll learn: - How agentic systems have evolved, gaining greater levels of agency over time—and why code agents are a next step. - How code agents write their actions in code. - When code agents outperform function-calling agents. - How to run code agents safely in your system using a constrained Python interpreter and sandboxing using E2B. - To trace, debug, and assess the code agent to optimize its behaviours for complex requests. - How to build a research multi-agent system that can find information online and organize it into an interactive report. By the end of this course, you’ll know how to build and run code agents using smolagents, and deploy them safely with a structured evaluation system in your projects. Please sign up here!

Andrew Ng

124,382 Aufrufe • vor 1 Jahr

Brett Adcock says the latest autonomous demo of Figure 02 is fully end-to-end and uses a single neural network – camera frames in, actions out. “You cannot code your way out of this problem.”

Brett Adcock says the latest autonomous demo of Figure 02 is fully end-to-end and uses a single neural network – camera frames in, actions out. “You cannot code your way out of this problem.”

The Humanoid Hub

56,268 Aufrufe • vor 1 Jahr

- Multi agent simulation with LLM controlling game actions - Fully onchain maps, agent histories, worlds Smolworld

- Multi agent simulation with LLM controlling game actions - Fully onchain maps, agent histories, worlds Smolworld

✨ Smol Dev

51,869 Aufrufe • vor 1 Jahr

👨🏻‍💻 LLM Engineer Toolkit - Collection of 120+ LLM Libraries Category Wise LLM Engineer Toolkit repository contains a curated list of 120+ LLM libraries category wise. 🚀 LLM Training 🧱 LLM Application Development 🩸LLM RAG 🟩 LLM Inference 🚧 LLM Serving 📤 LLM Data Extraction 🌠 LLM Data Generation 💎 LLM Agents ⚖️ LLM Evaluation 🔍 LLM Monitoring 📅 LLM Prompts 📝 LLM Structured Outputs 🛑 LLM Safety and Security 💠 LLM Embedding Models ❇️ Others Repo -

👨🏻‍💻 LLM Engineer Toolkit - Collection of 120+ LLM Libraries Category Wise LLM Engineer Toolkit repository contains a curated list of 120+ LLM libraries category wise. 🚀 LLM Training 🧱 LLM Application Development 🩸LLM RAG 🟩 LLM Inference 🚧 LLM Serving 📤 LLM Data Extraction 🌠 LLM Data Generation 💎 LLM Agents ⚖️ LLM Evaluation 🔍 LLM Monitoring 📅 LLM Prompts 📝 LLM Structured Outputs 🛑 LLM Safety and Security 💠 LLM Embedding Models ❇️ Others Repo -

Kalyan KS

16,628 Aufrufe • vor 1 Jahr

We’re excited to launch Mercury, the first commercial-scale diffusion LLM tailored for chat applications! Ultra-fast and efficient, Mercury brings real-time responsiveness to conversations, just like Mercury Coder did for code.

We’re excited to launch Mercury, the first commercial-scale diffusion LLM tailored for chat applications! Ultra-fast and efficient, Mercury brings real-time responsiveness to conversations, just like Mercury Coder did for code.

Inception

90,673 Aufrufe • vor 11 Monaten

Building Systems with the ChatGPT API is live! In this short course, you’ll learn how to break a complex task down to be carried out via multiple API calls to an LLM. Join for free:

Building Systems with the ChatGPT API is live! In this short course, you’ll learn how to break a complex task down to be carried out via multiple API calls to an LLM. Join for free:

DeepLearning.AI

575,919 Aufrufe • vor 3 Jahren

can you chat privately with a cloud llm—*without* sacrificing speed? excited to release minions secure chat: an open-source protocol for end-to-end encrypted llm chat with <1% latency overhead (even @ 30B+ params!). cloud providers can’t peek—messages decrypt only inside a secure gpu enclave, where inference stays fully confidential 🤯 links + code in comments👇

can you chat privately with a cloud llm—without sacrificing speed? excited to release minions secure chat: an open-source protocol for end-to-end encrypted llm chat with <1% latency overhead (even @ 30B+ params!). cloud providers can’t peek—messages decrypt only inside a secure gpu enclave, where inference stays fully confidential 🤯 links + code in comments👇

Avanika Narayan

79,190 Aufrufe • vor 1 Jahr

We're building for the LLM-powered future at the intersection of AI x blockchain x social media Simulacrum is exactly the infra that AI agents will need to take actions onchain as LLMs become cheaper & more powerful 👉 Discover more at

We're building for the LLM-powered future at the intersection of AI x blockchain x social media Simulacrum is exactly the infra that AI agents will need to take actions onchain as LLMs become cheaper & more powerful 👉 Discover more at

Empyreal

34,857 Aufrufe • vor 1 Jahr

Can Claude Code design proteins? We're just about to release our wet lab API + SDK so we made a quick demo to test how agents can interact with protein design models and submit the proteins to our lab for testing. This closes the loop between the computational design and the experiment validation We queued up a bunch of agent-designed proteins for wet lab validation, will release the results in a couple weeks on Proteinbase

Can Claude Code design proteins? We're just about to release our wet lab API + SDK so we made a quick demo to test how agents can interact with protein design models and submit the proteins to our lab for testing. This closes the loop between the computational design and the experiment validation We queued up a bunch of agent-designed proteins for wet lab validation, will release the results in a couple weeks on Proteinbase

Julian Englert

79,555 Aufrufe • vor 4 Monaten

"Orgs and cos building MCP servers are taking an LLM-first approach to what the API needs to expose to the agent(s)." - Nikunj Handa from OpenAI "For example, Stripe has a bunch of APIs that can be used to create a subscription/customer/product/price. For an LLM, it can just combine that into a single function." "Instead of returning this massive JSON object, they can return something very specific to the task being solved, so that the LLM can more easily understand what's happening." "It's an opportunity to rewrite your APIs to be very LLM-first. Why do 2 hours of work, when you can do it in 4 lines of code under a minute?"

"Orgs and cos building MCP servers are taking an LLM-first approach to what the API needs to expose to the agent(s)." - Nikunj Handa from OpenAI "For example, Stripe has a bunch of APIs that can be used to create a subscription/customer/product/price. For an LLM, it can just combine that into a single function." "Instead of returning this massive JSON object, they can return something very specific to the task being solved, so that the LLM can more easily understand what's happening." "It's an opportunity to rewrite your APIs to be very LLM-first. Why do 2 hours of work, when you can do it in 4 lines of code under a minute?"

TBPN

11,445 Aufrufe • vor 1 Jahr

Episode 63: Agent Pays L402 Endpoint We equip our agents with the ability to consume any L402 endpoint by paying its Lightning invoice. Our demo agent now takes four important actions: - Runs a similarity search over a vectorized knowledge base - Passes the most relevant results to a Mixtral LLM - Calls a WASM plugin via extism - Calls an L402 API endpoint (here using the weather demo from sulu ⚡) and pays its Lightning invoice via our Alby 🐝 wallet Our core functionality of composable paid AI agent workflows is now complete. Next we clean up the UI and prepare for users! Code 👉

Episode 63: Agent Pays L402 Endpoint We equip our agents with the ability to consume any L402 endpoint by paying its Lightning invoice. Our demo agent now takes four important actions: - Runs a similarity search over a vectorized knowledge base - Passes the most relevant results to a Mixtral LLM - Calls a WASM plugin via extism - Calls an L402 API endpoint (here using the weather demo from sulu ⚡) and pays its Lightning invoice via our Alby 🐝 wallet Our core functionality of composable paid AI agent workflows is now complete. Next we clean up the UI and prepare for users! Code 👉

OpenAgents

15,711 Aufrufe • vor 2 Jahren

We're excited to announce the LLM Apps: Evaluation course is now LIVE! 🚀 Created in collaboration with guest experts 👩‍💻 Paige Bailey and Graham Neubig, this course equips you with the skills needed to build trustworthy evaluations for your GenAI apps. Ready to skill up? 👇

We're excited to announce the LLM Apps: Evaluation course is now LIVE! 🚀 Created in collaboration with guest experts 👩‍💻 Paige Bailey and Graham Neubig, this course equips you with the skills needed to build trustworthy evaluations for your GenAI apps. Ready to skill up? 👇

Weights & Biases

16,757 Aufrufe • vor 1 Jahr

Introducing LangSmith Fleet. Agents for every team. → Build agents with natural language → Share and control who can edit, run, or clone each agent → Manage authentication with agent identity → Approve actions with human-in-the-loop → Track and audit actions with tracing in LangSmith Observability Try Fleet:

Introducing LangSmith Fleet. Agents for every team. → Build agents with natural language → Share and control who can edit, run, or clone each agent → Manage authentication with agent identity → Approve actions with human-in-the-loop → Track and audit actions with tracing in LangSmith Observability Try Fleet:

LangChain

9,397,938 Aufrufe • vor 2 Monaten