Video yükleniyor...

Video Yüklenemedi

Bu video yüklenirken bir sorun oluştu. Bu geçici bir ağ sorunundan kaynaklanıyor olabilir veya video kullanılamıyor olabilir.

Ana Sayfaya Dön

📢Excited to release GoEx⚡️a runtime for LLM-generated actions like code, API calls, and more. Featuring "post-facto validation" for assessing LLM actions after execution 🔍 Key to our approach is "undo" 🔄 and "damage confinement" abstractions to manage unintended actions & risks. This paves the way for fully autonomous LLM... show more

Shishir Patil

4,252 subscribers

57,988 görüntüleme • 2 yıl önce •via X (Twitter)

Bilim & Teknoloji

Anya Rossi• Live Now

Private livecam show

6 Yorum

Shishir Patil profil fotoğrafı

Shishir Patil2 yıl önce

🌐 Pioneering a future where LLMs empower microservices & apps, evolving from mere data retrievers 🧵to autonomous decision-makers within our digital world 🧙 Wondering about the safety and correctness of these interactions🤔? Our latest vision paper explores these questions, laying out design principles for the next-step in LLM powered applications 💯

Shishir Patil profil fotoğrafı

Shishir Patil2 yıl önce

We study the inherent challenges in relying on LLMs—addressing their unpredictability, the essential trust mechanisms for their decision-making, and hurdles in failure recognition & resolution. Our system, GoEx presents abstractions and policies to overcome these for RESTful APIs, and operations on databases and filesystems! An exhilarating collaboration with @tianjun_zhang @vivianfxng Noppapon C @_royh021 Aaron Hao @profjoeyg @ralucaadapopa Ion Stoica from @UCBerkeley and @martin_casado from @a16z

Jeff Schneider profil fotoğrafı

Jeff Schneider2 yıl önce

in your API inventory, what % had an undo function?

darya profil fotoğrafı

darya2 yıl önce

@vivianfxng hi

Davanum Srinivas profil fotoğrafı

Davanum Srinivas2 yıl önce

cc @ibuildthecloud

Tereza Tizkova profil fotoğrafı

Tereza Tizkova2 yıl önce

I read your paper and it's the first time I see this approach. Based on the code, you just define a "reverse tool" for each tool the LLM can use, is it correct? For the types of actions you cannot reverse, I suggest running the LLM output in a safe sandboxed environment, e.g. using the @e2b_dev Code Interpreter SDK: There, the actions that the LLM agent "decides" to do are isolated in a separate sandbox instance.

Benzer Videolar

Today, we’re proud to announce a partnership with a leader in autonomous agent technology. The collaboration is already delivering impact: Integritas (by Minima) is now fully integrated with ASI-1 (by Fetch.ai), proprietary LLM built for agentic AI, enabling seamless interaction between on-chain data integrity and intelligent autonomous agents.

Today, we’re proud to announce a partnership with a leader in autonomous agent technology. The collaboration is already delivering impact: Integritas (by Minima) is now fully integrated with ASI-1 (by Fetch.ai), proprietary LLM built for agentic AI, enabling seamless interaction between on-chain data integrity and intelligent autonomous agents.

Minima

111,625 görüntüleme • 9 ay önce

New short course: Building Code Agents with Hugging Face smolagents! Learn how to build code agents in this course, created in collaboration with Hugging Face, and taught by Thomas Wolf, its co-founder and CSO, and m_ric, Hugging Face’s Project Lead on Agents. Tool-calling agents use LLMs to generate multiple function calls sequentially to complete a complex sequence of tasks. They generate one function call, execute it, observe, reason, and decide what to do next. Code agents take a different approach. They consolidate all these calls into a single block of code, letting the LLM lay out an entire action plan at once, which can be executed efficiently to provide more reliable results. You’ll learn how to code agents using smolagents, a lightweight agentic framework from Hugging Face. Along the way, you’ll learn how to run LLM-generated code safely and develop an evaluation system to optimize your code agent for production. In detail, you’ll learn: - How agentic systems have evolved, gaining greater levels of agency over time—and why code agents are a next step. - How code agents write their actions in code. - When code agents outperform function-calling agents. - How to run code agents safely in your system using a constrained Python interpreter and sandboxing using E2B. - To trace, debug, and assess the code agent to optimize its behaviours for complex requests. - How to build a research multi-agent system that can find information online and organize it into an interactive report. By the end of this course, you’ll know how to build and run code agents using smolagents, and deploy them safely with a structured evaluation system in your projects. Please sign up here!

New short course: Building Code Agents with Hugging Face smolagents! Learn how to build code agents in this course, created in collaboration with Hugging Face, and taught by Thomas Wolf, its co-founder and CSO, and m_ric, Hugging Face’s Project Lead on Agents. Tool-calling agents use LLMs to generate multiple function calls sequentially to complete a complex sequence of tasks. They generate one function call, execute it, observe, reason, and decide what to do next. Code agents take a different approach. They consolidate all these calls into a single block of code, letting the LLM lay out an entire action plan at once, which can be executed efficiently to provide more reliable results. You’ll learn how to code agents using smolagents, a lightweight agentic framework from Hugging Face. Along the way, you’ll learn how to run LLM-generated code safely and develop an evaluation system to optimize your code agent for production. In detail, you’ll learn: - How agentic systems have evolved, gaining greater levels of agency over time—and why code agents are a next step. - How code agents write their actions in code. - When code agents outperform function-calling agents. - How to run code agents safely in your system using a constrained Python interpreter and sandboxing using E2B. - To trace, debug, and assess the code agent to optimize its behaviours for complex requests. - How to build a research multi-agent system that can find information online and organize it into an interactive report. By the end of this course, you’ll know how to build and run code agents using smolagents, and deploy them safely with a structured evaluation system in your projects. Please sign up here!

Andrew Ng

127,724 görüntüleme • 1 yıl önce

Can Claude Code design proteins? We're just about to release our wet lab API + SDK so we made a quick demo to test how agents can interact with protein design models and submit the proteins to our lab for testing. This closes the loop between the computational design and the experiment validation We queued up a bunch of agent-designed proteins for wet lab validation, will release the results in a couple weeks on Proteinbase

Can Claude Code design proteins? We're just about to release our wet lab API + SDK so we made a quick demo to test how agents can interact with protein design models and submit the proteins to our lab for testing. This closes the loop between the computational design and the experiment validation We queued up a bunch of agent-designed proteins for wet lab validation, will release the results in a couple weeks on Proteinbase

Julian Englert

81,258 görüntüleme • 6 ay önce

"Orgs and cos building MCP servers are taking an LLM-first approach to what the API needs to expose to the agent(s)." - Nikunj Handa from OpenAI "For example, Stripe has a bunch of APIs that can be used to create a subscription/customer/product/price. For an LLM, it can just combine that into a single function." "Instead of returning this massive JSON object, they can return something very specific to the task being solved, so that the LLM can more easily understand what's happening." "It's an opportunity to rewrite your APIs to be very LLM-first. Why do 2 hours of work, when you can do it in 4 lines of code under a minute?"

"Orgs and cos building MCP servers are taking an LLM-first approach to what the API needs to expose to the agent(s)." - Nikunj Handa from OpenAI "For example, Stripe has a bunch of APIs that can be used to create a subscription/customer/product/price. For an LLM, it can just combine that into a single function." "Instead of returning this massive JSON object, they can return something very specific to the task being solved, so that the LLM can more easily understand what's happening." "It's an opportunity to rewrite your APIs to be very LLM-first. Why do 2 hours of work, when you can do it in 4 lines of code under a minute?"

TBPN

11,445 görüntüleme • 1 yıl önce

Episode 63: Agent Pays L402 Endpoint We equip our agents with the ability to consume any L402 endpoint by paying its Lightning invoice. Our demo agent now takes four important actions: - Runs a similarity search over a vectorized knowledge base - Passes the most relevant results to a Mixtral LLM - Calls a WASM plugin via extism - Calls an L402 API endpoint (here using the weather demo from sulu ⚡) and pays its Lightning invoice via our Alby 🐝 wallet Our core functionality of composable paid AI agent workflows is now complete. Next we clean up the UI and prepare for users! Code 👉

Episode 63: Agent Pays L402 Endpoint We equip our agents with the ability to consume any L402 endpoint by paying its Lightning invoice. Our demo agent now takes four important actions: - Runs a similarity search over a vectorized knowledge base - Passes the most relevant results to a Mixtral LLM - Calls a WASM plugin via extism - Calls an L402 API endpoint (here using the weather demo from sulu ⚡) and pays its Lightning invoice via our Alby 🐝 wallet Our core functionality of composable paid AI agent workflows is now complete. Next we clean up the UI and prepare for users! Code 👉

OpenAgents

15,711 görüntüleme • 2 yıl önce

We are excited to announce the 1st version of our multimodal assistant, Yasa-1, a language assistant with visual and auditory sensors that can take actions via code execution 🪄. Yasa-1 can understand text, images, videos, sounds & more! 🚀 Check out more details below👇

We are excited to announce the 1st version of our multimodal assistant, Yasa-1, a language assistant with visual and auditory sensors that can take actions via code execution 🪄. Yasa-1 can understand text, images, videos, sounds & more! 🚀 Check out more details below👇

Reka

814,187 görüntüleme • 2 yıl önce

🚀 Introducing Cover-Agent 🧪 An open-source tool that includes a reimplementation of Meta's TestGen-LLM for automatically enhancing test suites. Manager: "We must improve old test suites for better code coverage. Can you handle it?" Me: "Sure, my favorite task... (Not!) 🤷‍♂️" Meta's team had the idea of using LLMs to enhance test suites—sounds great, right? But they didn't release their code We did. We'd love your feedback as we work on improving this baseline In this video, I'll review TestGen-LLM, share insights, and introduce you to Cover-Agent

🚀 Introducing Cover-Agent 🧪 An open-source tool that includes a reimplementation of Meta's TestGen-LLM for automatically enhancing test suites. Manager: "We must improve old test suites for better code coverage. Can you handle it?" Me: "Sure, my favorite task... (Not!) 🤷‍♂️" Meta's team had the idea of using LLMs to enhance test suites—sounds great, right? But they didn't release their code We did. We'd love your feedback as we work on improving this baseline In this video, I'll review TestGen-LLM, share insights, and introduce you to Cover-Agent

Itamar Friedman

139,228 görüntüleme • 2 yıl önce

AI is insecure by default. The open secret is that most AI apps launch with serious flaws. It takes MORE time to secure these apps than it does to build them. I learned this firsthand from shipping LLM agents to 200M users at Discord and supporting LLM evals for hundreds of companies. It doesn’t have to be this way. Today, we’re launching promptfoo, an open-source company that helps find and fix vulnerabilities in AI powered apps before they ship, and announcing a $5M seed round led by a16z.

AI is insecure by default. The open secret is that most AI apps launch with serious flaws. It takes MORE time to secure these apps than it does to build them. I learned this firsthand from shipping LLM agents to 200M users at Discord and supporting LLM evals for hundreds of companies. It doesn’t have to be this way. Today, we’re launching promptfoo, an open-source company that helps find and fix vulnerabilities in AI powered apps before they ship, and announcing a $5M seed round led by a16z.

Ian Webster

28,779 görüntüleme • 2 yıl önce

We are excited to share a research preview of our generative agent. The agent is being trained to solve the hardest tasks in 3D and beyond, using only keyboard and mouse actions. Join the waitlist: Our agent app runs on Windows or Mac, either locally or with one-click setup for a Windows VM. It’s still early days, but this paves the way for production-level workflows for the first time ever. Blog:

We are excited to share a research preview of our generative agent. The agent is being trained to solve the hardest tasks in 3D and beyond, using only keyboard and mouse actions. Join the waitlist: Our agent app runs on Windows or Mac, either locally or with one-click setup for a Windows VM. It’s still early days, but this paves the way for production-level workflows for the first time ever. Blog:

Common Sense Machines

155,199 görüntüleme • 1 yıl önce

We're excited to announce our partnership with Coinbase 🛡️ to bring Tavily to x402, the open protocol for internet-native agentic payments. With x402, agents can discover and use Tavily web search at runtime without an API key. Agents use a Base wallet to pay per-request and get instant results. The next billion agents will discover, pay for, and use online services fully autonomously. We’re live on and we’re just getting started. More in the comments.

We're excited to announce our partnership with Coinbase 🛡️ to bring Tavily to x402, the open protocol for internet-native agentic payments. With x402, agents can discover and use Tavily web search at runtime without an API key. Agents use a Base wallet to pay per-request and get instant results. The next billion agents will discover, pay for, and use online services fully autonomously. We’re live on and we’re just getting started. More in the comments.

Tavily

59,986 görüntüleme • 1 ay önce

RLM is the most import foundation of my Pi Harness (other than Pi of course). It's seeded with late interaction retrieval results (thanks to @lightonai for pylate). The Agent initiates it with query then.. 𝐒𝐞𝐭𝐮𝐩 A python REPL is created and seeded with: 1. Late interaction search to pre-filter. Instead of doing top 3/5/10, it's top hundreds of documents. This is set into a `context` variable. 2. Python functions are loaded in to do more searches if `context` variable isn't enough. And to make llm calls with cheaper models in parallel batches. 𝐈𝐭𝐞𝐫𝐚𝐭𝐢𝐨𝐧 𝐋𝐨𝐨𝐩 From there, an LLM iterates in the REPL based on the query. It's just like exploring in a jupyter notebook. The LLM writes prose (like a markdown cell) and code to be run in the REPL each turn. This allows the LLM to sort, filter, and synthesize information. It can fan out and ask smaller models to summarize, combine, contrast, or do anything else to documents to help it understand the data. After several turns the LLM reponds with the final answer. Either because it found the answer, or hit the budget limit. Context as a Python variable, LLM as the programmer, REPL as the runtime. 𝐖𝐡𝐲 𝐃𝐨𝐞𝐬 𝐓𝐡𝐢𝐬 𝐖𝐨𝐫𝐤 1. Richer Shell. Agents (and subagents) work by intermixing code and prose/thinking. But they use static scripts or bash that run and exit and start over each tool call. That's not ideal for exploration and synthesis of data. For that, state is useful to continue building and exploring the data as you learn more. There's a reason jupyter notebooks have been popular with data scientists. 2. Keeps main agent context clean. The better context you have the better the agent will perform (duh!). This means three thing: better human input, less missing search results, and less incorrect search results. Letting the agent iterate allows it to synthesize just what is needed and nothing else. All bad paths or peeks at something that turns out to be irrelevant stays out of main agent context. 3. Stack the good ideas! People often compare late interaction search vs RLM. Or static vs dynamic languages. Or agentic search vs semantic search. But...You can just use them all together for what they're each good at. Use them all for the area they're really great for. Read the full post which has more detail about how and why.

RLM is the most import foundation of my Pi Harness (other than Pi of course). It's seeded with late interaction retrieval results (thanks to @lightonai for pylate). The Agent initiates it with query then.. 𝐒𝐞𝐭𝐮𝐩 A python REPL is created and seeded with: 1. Late interaction search to pre-filter. Instead of doing top 3/5/10, it's top hundreds of documents. This is set into a `context` variable. 2. Python functions are loaded in to do more searches if `context` variable isn't enough. And to make llm calls with cheaper models in parallel batches. 𝐈𝐭𝐞𝐫𝐚𝐭𝐢𝐨𝐧 𝐋𝐨𝐨𝐩 From there, an LLM iterates in the REPL based on the query. It's just like exploring in a jupyter notebook. The LLM writes prose (like a markdown cell) and code to be run in the REPL each turn. This allows the LLM to sort, filter, and synthesize information. It can fan out and ask smaller models to summarize, combine, contrast, or do anything else to documents to help it understand the data. After several turns the LLM reponds with the final answer. Either because it found the answer, or hit the budget limit. Context as a Python variable, LLM as the programmer, REPL as the runtime. 𝐖𝐡𝐲 𝐃𝐨𝐞𝐬 𝐓𝐡𝐢𝐬 𝐖𝐨𝐫𝐤 1. Richer Shell. Agents (and subagents) work by intermixing code and prose/thinking. But they use static scripts or bash that run and exit and start over each tool call. That's not ideal for exploration and synthesis of data. For that, state is useful to continue building and exploring the data as you learn more. There's a reason jupyter notebooks have been popular with data scientists. 2. Keeps main agent context clean. The better context you have the better the agent will perform (duh!). This means three thing: better human input, less missing search results, and less incorrect search results. Letting the agent iterate allows it to synthesize just what is needed and nothing else. All bad paths or peeks at something that turns out to be irrelevant stays out of main agent context. 3. Stack the good ideas! People often compare late interaction search vs RLM. Or static vs dynamic languages. Or agentic search vs semantic search. But...You can just use them all together for what they're each good at. Use them all for the area they're really great for. Read the full post which has more detail about how and why.

Isaac Flath

40,212 görüntüleme • 2 ay önce

Small prototype with AI + generative sketching workflow. Sketches are written as usual in code, and prompts can be used to augment/modify the artwork's inputs and parameters. 🤖 This is using OpenAI API with GPT 3.5, already showing a surprisingly good grasp of color for an LLM.

Small prototype with AI + generative sketching workflow. Sketches are written as usual in code, and prompts can be used to augment/modify the artwork's inputs and parameters. 🤖 This is using OpenAI API with GPT 3.5, already showing a surprisingly good grasp of color for an LLM.

Matt DesLauriers

22,322 görüntüleme • 3 yıl önce

Can reinforcement learning from AI feedback unlock new capabilities in AI agents? Introducing Motif, an LLM-powered method for intrinsic motivation from AI feedback. Motif extracts reward functions from Llama 2's preferences and uses them to train agents with reinforcement learning. On the complex NetHack game, Motif solves previously unsolved tasks without needing any expert demonstrations. Surprisingly, Motif's reward leads to better game score than the one obtained by using the score itself as a reward. Given access to an event captioning mechanism, a few properties make Motif a general method: • it is entirely based on open models • the LLM doesn't need direct access to the environment dynamics (e.g., its source code) • the LLM doesn't need to understand observation and action spaces The best part? You can start using Motif right now, even on a small compute budget: the whole pipeline can take less than two GPU-days. Feel free to read our paper and try our code out. Paper: Code: Blog post: Work co-lead by Martin Klissarov and myself, with Shagun Sodhani Roberta Raileanu Pierre-Luc Bacon Pascal Vincent Amy Zhang Mikael Henaff Learn more in the thread 🧵

Can reinforcement learning from AI feedback unlock new capabilities in AI agents? Introducing Motif, an LLM-powered method for intrinsic motivation from AI feedback. Motif extracts reward functions from Llama 2's preferences and uses them to train agents with reinforcement learning. On the complex NetHack game, Motif solves previously unsolved tasks without needing any expert demonstrations. Surprisingly, Motif's reward leads to better game score than the one obtained by using the score itself as a reward. Given access to an event captioning mechanism, a few properties make Motif a general method: • it is entirely based on open models • the LLM doesn't need direct access to the environment dynamics (e.g., its source code) • the LLM doesn't need to understand observation and action spaces The best part? You can start using Motif right now, even on a small compute budget: the whole pipeline can take less than two GPU-days. Feel free to read our paper and try our code out. Paper: Code: Blog post: Work co-lead by Martin Klissarov and myself, with Shagun Sodhani Roberta Raileanu Pierre-Luc Bacon Pascal Vincent Amy Zhang Mikael Henaff Learn more in the thread 🧵

Pierluca D'Oro

311,883 görüntüleme • 2 yıl önce

Onyx just hit #1 on GitHub trending. Open source AI platform — self-hostable, works with every major LLM provider, and ships with: - Agentic RAG - Deep research mode - Custom agents - Web search - Code execution - Voice mode - Image generation - 50+ connectors out of the box This is what a self-hosted AI stack is starting to look like.

Onyx just hit #1 on GitHub trending. Open source AI platform — self-hostable, works with every major LLM provider, and ships with: - Agentic RAG - Deep research mode - Custom agents - Web search - Code execution - Voice mode - Image generation - 50+ connectors out of the box This is what a self-hosted AI stack is starting to look like.

0xMarioNawfal

75,559 görüntüleme • 3 ay önce

Can LLMs invent better ways to train LLMs? At Sakana AI, we’re pioneering AI-driven methods to automate AI research and discovery. We’re excited to release DiscoPOP: a new SOTA preference optimization algorithm that was discovered and written by an LLM! Our method leverages LLMs to propose and implement new preference optimization algorithms. We then train models with those algorithms and evaluate their performance, providing feedback to the LLM. By repeating this process for multiple generations in an evolutionary loop, the LLM discovers many highly-performant and novel preference optimization objectives! Paper: GitHub: Model: We proudly collaborated with the University of Oxford (Foerster Lab for AI Research (now part of BOLD)) and Cambridge University (Mihaela van der Schaar) on this groundbreaking project. Looking ahead, we envision a future where AI-driven research reduces the need for extensive human intervention and computational resources. This will accelerate scientific discoveries and innovation, pushing the boundaries of what AI can achieve.

Can LLMs invent better ways to train LLMs? At Sakana AI, we’re pioneering AI-driven methods to automate AI research and discovery. We’re excited to release DiscoPOP: a new SOTA preference optimization algorithm that was discovered and written by an LLM! Our method leverages LLMs to propose and implement new preference optimization algorithms. We then train models with those algorithms and evaluate their performance, providing feedback to the LLM. By repeating this process for multiple generations in an evolutionary loop, the LLM discovers many highly-performant and novel preference optimization objectives! Paper: GitHub: Model: We proudly collaborated with the University of Oxford (Foerster Lab for AI Research (now part of BOLD)) and Cambridge University (Mihaela van der Schaar) on this groundbreaking project. Looking ahead, we envision a future where AI-driven research reduces the need for extensive human intervention and computational resources. This will accelerate scientific discoveries and innovation, pushing the boundaries of what AI can achieve.

Sakana AI

555,922 görüntüleme • 2 yıl önce

Say hello to the fusion of visual programming, code, and AI. Retool Workflows is the fastest way to automate critical business processes. Explore, join, and transform your data with real-time feedback and AI-powered actions. Deploy and manage automations with developer-first control. Used by companies like RE/MAX, OpenAI, and EquipmentShare—Workflows is now available for every team. Try it today for free:

Say hello to the fusion of visual programming, code, and AI. Retool Workflows is the fastest way to automate critical business processes. Explore, join, and transform your data with real-time feedback and AI-powered actions. Deploy and manage automations with developer-first control. Used by companies like RE/MAX, OpenAI, and EquipmentShare—Workflows is now available for every team. Try it today for free:

Retool

11,236 görüntüleme • 2 yıl önce

New short course on Reinforcement Learning from Human Feedback! RLHF is one of the key techniques that led to the rise of modern LLMs. It is used to align LLMs with human preferences, to make them more honest, helpful and harmless, by (i) learning a reward function that mimics human preferences, as expressed in human-provided labels, then, (ii) tuning an LLM to generate outputs that receive a high reward. In this course, taught by Nikita Namjoshi, Developer Advocate for GenAI at Google Cloud, you'll learn the details of how RLHF works, including how to apply it to tune an LLM for your own applications. You'll also use an open source library to tune a base LLM to align with human preferences expressed in a training set, and evaluate the tuned model by comparing its responses before and after RLHF-tuning. Please sign up here!

New short course on Reinforcement Learning from Human Feedback! RLHF is one of the key techniques that led to the rise of modern LLMs. It is used to align LLMs with human preferences, to make them more honest, helpful and harmless, by (i) learning a reward function that mimics human preferences, as expressed in human-provided labels, then, (ii) tuning an LLM to generate outputs that receive a high reward. In this course, taught by Nikita Namjoshi, Developer Advocate for GenAI at Google Cloud, you'll learn the details of how RLHF works, including how to apply it to tune an LLM for your own applications. You'll also use an open source library to tune a base LLM to align with human preferences expressed in a training set, and evaluate the tuned model by comparing its responses before and after RLHF-tuning. Please sign up here!

Andrew Ng

205,537 görüntüleme • 2 yıl önce

We’re excited to announce our partnership with Use Morpheus as they develop a specialized AI model on OpenLedger. Leveraging our decentralized infrastructure, Morpheus will create a fine-tuned Web3 LLM focused on secure code generation and autonomous workflows. This collaboration will push the boundaries of intelligent contract creation, enabling decentralized applications to scale smarter, faster, and more securely. The next wave of AI-powered smart contracts is here, stay tuned for more updates! Hear adamzhao.eth , founder of Use Morpheus, break it down in the video.

We’re excited to announce our partnership with Use Morpheus as they develop a specialized AI model on OpenLedger. Leveraging our decentralized infrastructure, Morpheus will create a fine-tuned Web3 LLM focused on secure code generation and autonomous workflows. This collaboration will push the boundaries of intelligent contract creation, enabling decentralized applications to scale smarter, faster, and more securely. The next wave of AI-powered smart contracts is here, stay tuned for more updates! Hear adamzhao.eth , founder of Use Morpheus, break it down in the video.

OpenLedger

17,229 görüntüleme • 1 yıl önce

💡Divergence thinking💡 is a hallmark of human creativity and problem-solving 🤖Can LLMs also do divergent reasoning to generate diverse solutions🤔? Introducing Flow-of-Reasoning (FoR) 🌊, a data-efficient way of training LLM policy to generate diverse, high-quality reasoning trajectories Unlike existing RL (like PPO) and planning (like MCTS) to find the max-reward trajectory (akin to convergent thinking), FoR connects LLM reasoning with the #GFlowNet formulation and enables LLMs to find trajectories proportional to reward distribution. 🎬The demo video illustrates how FoR learns and infers multiple solutions to a ♠️Game24 puzzle. 🎯Inferring for diverse solutions could be useful for robustness, data augmentation, and enhanced model generalization. Project page: Paper: Github:

💡Divergence thinking💡 is a hallmark of human creativity and problem-solving 🤖Can LLMs also do divergent reasoning to generate diverse solutions🤔? Introducing Flow-of-Reasoning (FoR) 🌊, a data-efficient way of training LLM policy to generate diverse, high-quality reasoning trajectories Unlike existing RL (like PPO) and planning (like MCTS) to find the max-reward trajectory (akin to convergent thinking), FoR connects LLM reasoning with the #GFlowNet formulation and enables LLMs to find trajectories proportional to reward distribution. 🎬The demo video illustrates how FoR learns and infers multiple solutions to a ♠️Game24 puzzle. 🎯Inferring for diverse solutions could be useful for robustness, data augmentation, and enhanced model generalization. Project page: Paper: Github:

Lianhui Qin

50,447 görüntüleme • 2 yıl önce