正在加载视频...

视频加载失败

加载此视频时出现问题。这可能是由于临时网络问题，或视频可能不可用。

Does off-policy value-based RL scale? In LLMs, larger scale predictably improves performance. Value-based RL learns from arbitrary data and is sample-efficient, but folk wisdom says it doesn't scale 🧵⬇️We show predictability for scaling value-based RL!

Oleg Rybkin

1,356 subscribers

23,979 次观看 • 1 年前 •via X (Twitter)

Anya Rossi• Live Now

Private livecam show

0 条评论

暂无评论

原始帖子的评论将显示在这里

相关视频

🚨Current scalable RL algos train a policy w/o value func, which is limiting with learning in open-ended, non-stationary, dynamic environments. But, how to scale value-based RL with more data/compute is unclear... Not anymore: presenting scaling laws for value-based RL 🧵⬇️

🚨Current scalable RL algos train a policy w/o value func, which is limiting with learning in open-ended, non-stationary, dynamic environments. But, how to scale value-based RL with more data/compute is unclear... Not anymore: presenting scaling laws for value-based RL 🧵⬇️

Aviral Kumar

37,301 次观看 • 1 年前

Introducing CQN: Coarse-to-fine Q-Network, a value-based RL algorithm for continuous control🦾Initialized with 20~50 demonstrations, it learns to solve real-world robotic tasks within 10 mins of training, without any pre-training and shaped rewards! (1/4)

Introducing CQN: Coarse-to-fine Q-Network, a value-based RL algorithm for continuous control🦾Initialized with 20~50 demonstrations, it learns to solve real-world robotic tasks within 10 mins of training, without any pre-training and shaped rewards! (1/4)

Younggyo Seo

16,413 次观看 • 2 年前

New work: The Value Axis 🎯 How do LLMs choose which path to take mid-task? We find they internally track the chance of reaching their goal along a linear axis, akin to a value function in RL. We show it modulates confidence in math & coding and can be reshaped with DPO and SFT.

New work: The Value Axis 🎯 How do LLMs choose which path to take mid-task? We find they internally track the chance of reaching their goal along a linear axis, akin to a value function in RL. We show it modulates confidence in math & coding and can be reshaped with DPO and SFT.

Nick Jiang

28,039 次观看 • 1 个月前

New research from Databricks: LLMs Can Learn to Reason via Off-Policy RL Optimal Advantage-based Policy Optimization with Lagged Inference policy (OAPL) shows you don’t need strict on-policy training to improve reasoning. It matches or beats Group Relative Policy Optimization (GRPO), stays stable with large policy lag, and uses ~3× fewer training generations. For Databricks customers, it’s a simpler, practical, and equally powerful approach to RL that Databricks is pioneering internally — and bringing directly to Databricks customers, so enterprises can improve agents using the same methods we use for our in-house agents, without complex infrastructure changes.

New research from Databricks: LLMs Can Learn to Reason via Off-Policy RL Optimal Advantage-based Policy Optimization with Lagged Inference policy (OAPL) shows you don’t need strict on-policy training to improve reasoning. It matches or beats Group Relative Policy Optimization (GRPO), stays stable with large policy lag, and uses ~3× fewer training generations. For Databricks customers, it’s a simpler, practical, and equally powerful approach to RL that Databricks is pioneering internally — and bringing directly to Databricks customers, so enterprises can improve agents using the same methods we use for our in-house agents, without complex infrastructure changes.

Databricks AI Research

12,539 次观看 • 5 个月前

Crypto can’t scale without solving value transfer. Not just between blockchains but across fiat, stablecoins, and bank networks. We’re building it in Kima! From CBDCs to DEXs, from TradFi to DeFi this is how money moves in the bridgeless Kima era. Join us. 💥

Crypto can’t scale without solving value transfer. Not just between blockchains but across fiat, stablecoins, and bank networks. We’re building it in Kima! From CBDCs to DEXs, from TradFi to DeFi this is how money moves in the bridgeless Kima era. Join us. 💥

Kima Network

25,026 次观看 • 1 年前

This figure from HIL-SERL is one of the clearest visualisations of how RL learns differently from imitation learning. The difference comes down to this: imitation learning treats each (state, action) pair as independent. A correction at timestep 20 teaches nothing about timestep 19 or 21. RL propagates reward backward through time. One successful insertion updates the value estimate of every state along the trajectory. So RL builds a full map of "which states lead to success"; imitation learning just memorizes individual snapshots. Setup: a robot inserting a RAM stick into a motherboard slot. Each dot is an end-effector position (Y = lateral, Z = height). Starting position is randomized. Left to right = training progressing. Top row (RL): the policy builds a funnel. Broad at the top, narrowing into the target. It systematically fills in the state space, learning which paths lead to success from many different starting positions. Bottom row (imitation learning / HG-DAgger, same human data): sparse, diffuse, no funnel. The policy only learns near states the human demonstrated. Both have access to the same data, including human corrections, but a completely different structure emerges.

This figure from HIL-SERL is one of the clearest visualisations of how RL learns differently from imitation learning. The difference comes down to this: imitation learning treats each (state, action) pair as independent. A correction at timestep 20 teaches nothing about timestep 19 or 21. RL propagates reward backward through time. One successful insertion updates the value estimate of every state along the trajectory. So RL builds a full map of "which states lead to success"; imitation learning just memorizes individual snapshots. Setup: a robot inserting a RAM stick into a motherboard slot. Each dot is an end-effector position (Y = lateral, Z = height). Starting position is randomized. Left to right = training progressing. Top row (RL): the policy builds a funnel. Broad at the top, narrowing into the target. It systematically fills in the state space, learning which paths lead to success from many different starting positions. Bottom row (imitation learning / HG-DAgger, same human data): sparse, diffuse, no funnel. The policy only learns near states the human demonstrated. Both have access to the same data, including human corrections, but a completely different structure emerges.

Dominique Paul

24,433 次观看 • 5 个月前

GPT-5.5 by Reasoning Effort: I've asked it in Codex to create a physics-based visualisation of RL cycles for different sized models (70b, 1t, 10t), to demonstrate how the amount of RL you can do differs by model size. My assessment of each: - Low: weird slop - Medium: kinda cooked - High: sort of tried but ultimately incoherent - Extra High: elite - really nice idea and well executed Obviously this is just one shot, but worth trying different reasoning levels for the new models, medium seems to be pretty good for GPT-5.5 and it was really bad for many previous GPT models.

GPT-5.5 by Reasoning Effort: I've asked it in Codex to create a physics-based visualisation of RL cycles for different sized models (70b, 1t, 10t), to demonstrate how the amount of RL you can do differs by model size. My assessment of each: - Low: weird slop - Medium: kinda cooked - High: sort of tried but ultimately incoherent - Extra High: elite - really nice idea and well executed Obviously this is just one shot, but worth trying different reasoning levels for the new models, medium seems to be pretty good for GPT-5.5 and it was really bad for many previous GPT models.

Peter Gostev (SF: 22-26 June)

209,258 次观看 • 3 个月前

Model-Free Reinforcement Learning (MFRL) has been alluring, especially with supercharged compute with physics on GPU. However, the methods use 0-th order gradients, and are often not the best optimizers. Can we do better than PPO in continuous control for robotics? Turns out yes! 🥳 tl;dr: Faster, better RL than PPO in continuous control 💪 The answer lies in using more information from the simulation. We are juicing the simulation on GPU as it is, why not use it for gradients as well? This has been a driving question in a series of our works. We first studied this problem in ICLR 2022 paper on Short Horizon Actor Critic Naive gradient based methods are stuck in local minima and have exploding/vanishing gradients. SHAC solved this problem truncated rollouts and model based value estimation, where the model is Differentiable Sim. This boosted sample efficiency and wall-clock time immensely especially in high dimensional systems such as humanoids Yet, given enough compute PPO often caught up. Our follow up paper on on Adaptive Horizon Actor Critic at ICML 2024 discovers the cause and provides a fix. However, we find that even when given ground-truth dynamics, not all gradients are useful due to sample error. 1st-Order Model-Based Reinforcement Learning methods employing differentiable simulation provide gradients with reduced variance but are susceptible to bias in scenarios involving stiff dynamics, such as physical contact. We find that back-propagating through contact and long trajectories drastically reduces gradient accuracy. Using this insight, we propose AHAC to dynamically adapt its roll-out horizon to avoid differentiating through stiff contact. AHAC is a first-order model-based RL algorithm that learns high-dimensional tasks in minutes (wall clock) and outperforms PPO by 40%, even in the limit of data provided to PPO. This work is led by Ignat Georgiev alongside Krishnan Srinivasan, Jie Xu, Eric Heiden and ample assistance from warp team at NVIDIA Robotics (Miles Macklin)

Model-Free Reinforcement Learning (MFRL) has been alluring, especially with supercharged compute with physics on GPU. However, the methods use 0-th order gradients, and are often not the best optimizers. Can we do better than PPO in continuous control for robotics? Turns out yes! 🥳 tl;dr: Faster, better RL than PPO in continuous control 💪 The answer lies in using more information from the simulation. We are juicing the simulation on GPU as it is, why not use it for gradients as well? This has been a driving question in a series of our works. We first studied this problem in ICLR 2022 paper on Short Horizon Actor Critic Naive gradient based methods are stuck in local minima and have exploding/vanishing gradients. SHAC solved this problem truncated rollouts and model based value estimation, where the model is Differentiable Sim. This boosted sample efficiency and wall-clock time immensely especially in high dimensional systems such as humanoids Yet, given enough compute PPO often caught up. Our follow up paper on on Adaptive Horizon Actor Critic at ICML 2024 discovers the cause and provides a fix. However, we find that even when given ground-truth dynamics, not all gradients are useful due to sample error. 1st-Order Model-Based Reinforcement Learning methods employing differentiable simulation provide gradients with reduced variance but are susceptible to bias in scenarios involving stiff dynamics, such as physical contact. We find that back-propagating through contact and long trajectories drastically reduces gradient accuracy. Using this insight, we propose AHAC to dynamically adapt its roll-out horizon to avoid differentiating through stiff contact. AHAC is a first-order model-based RL algorithm that learns high-dimensional tasks in minutes (wall clock) and outperforms PPO by 40%, even in the limit of data provided to PPO. This work is led by Ignat Georgiev alongside Krishnan Srinivasan, Jie Xu, Eric Heiden and ample assistance from warp team at NVIDIA Robotics (Miles Macklin)

Animesh Garg

52,300 次观看 • 2 年前

We’re excited to announce our integration with SKALE, a high-performance, zero-gas blockchain purpose-built for speed, scale, and security. This partnership strengthens our infrastructure as we continue building transparent, trust-based systems for decentralized science. We’re excited about what this unlocks for researchers, contributors, and the future of data integrity in DeSci. 👀 Look out for more on how we’re using SKALE in the AxonDAO ecosystem.

We’re excited to announce our integration with SKALE, a high-performance, zero-gas blockchain purpose-built for speed, scale, and security. This partnership strengthens our infrastructure as we continue building transparent, trust-based systems for decentralized science. We’re excited about what this unlocks for researchers, contributors, and the future of data integrity in DeSci. 👀 Look out for more on how we’re using SKALE in the AxonDAO ecosystem.

AxonDAO

33,926 次观看 • 1 年前

High-resolution image and video generation is hitting a wall because attention in DiTs scales quadratically with token count. But does every pixel need to be in full resolution? Introducing Foveated Diffusion: a new approach for efficient diffusion-based generation that allocates compute where it matters most. 1/7🧵

High-resolution image and video generation is hitting a wall because attention in DiTs scales quadratically with token count. But does every pixel need to be in full resolution? Introducing Foveated Diffusion: a new approach for efficient diffusion-based generation that allocates compute where it matters most. 1/7🧵

Gordon Wetzstein

164,096 次观看 • 4 个月前

Robots struggle with strict action rules…memory and symbols help them learn fast. [Project + Full video link ⬇️] Robots struggle when tasks require specific steps in a fixed order. What if memory helped them think symbolically and learn faster? Solving tasks like unlocking a door then opening it is hard for deep RL. But by learning constraint relationships and storing them in memory, robots can solve these tasks much faster; with fewer trials and less training. Why it works ✅ Learns symbolic rules about action constraints ✅ Uses memory to transfer what it learned across tasks ✅ Handles real-world exploration with just 30 minutes of data ✅ Needs 10x fewer episodes than deep RL approaches This memory-based method shows a promising path forward for robots learning structured, real-world tasks. Full video: Paper: Thank you, Mrinal Verghese for sharing this amazing work! 🙏

Robots struggle with strict action rules…memory and symbols help them learn fast. [Project + Full video link ⬇️] Robots struggle when tasks require specific steps in a fixed order. What if memory helped them think symbolically and learn faster? Solving tasks like unlocking a door then opening it is hard for deep RL. But by learning constraint relationships and storing them in memory, robots can solve these tasks much faster; with fewer trials and less training. Why it works ✅ Learns symbolic rules about action constraints ✅ Uses memory to transfer what it learned across tasks ✅ Handles real-world exploration with just 30 minutes of data ✅ Needs 10x fewer episodes than deep RL approaches This memory-based method shows a promising path forward for robots learning structured, real-world tasks. Full video: Paper: Thank you, Mrinal Verghese for sharing this amazing work! 🙏

Ilir Aliu - eu/acc

10,241 次观看 • 1 年前

This one sentence from Mark Zuckerberg proves he's serious about ending the censorship on his platforms. "We're going to move our trust and safety and content moderation teams out of California. And our US-based content review is going to be based in Texas." This means Silicon Valley liberals will no longer have their thumb on the scale and pick and choose what we, the peasants, get to post & view. I still don't forgive Zuck for rigging the 2020 election, but I think he means what he says.

This one sentence from Mark Zuckerberg proves he's serious about ending the censorship on his platforms. "We're going to move our trust and safety and content moderation teams out of California. And our US-based content review is going to be based in Texas." This means Silicon Valley liberals will no longer have their thumb on the scale and pick and choose what we, the peasants, get to post & view. I still don't forgive Zuck for rigging the 2020 election, but I think he means what he says.

George

431,534 次观看 • 1 年前

Exploring the Future of Legal Data Infrastructure Iagon, in partnership with Cloud Court, is pleased to announce that Ford Motor Company Motor Company will serve in an advisory capacity for this exploratory project, which seeks to evaluate the use of the Cardano blockchain and Iagon's decentralized cloud storage technology as a potential solution for the secure storage and management of legal documents and data. As a major corporation with sophisticated legal operations, Ford brings valuable perspective to this exploratory initiative based on their experience managing complex legal data infrastructures at scale. Ford is interested in exploring whether blockchain-based distributed storage could address persistent challenges in legal data infrastructure. In particular, Ford sees merit in exploring how blockchain technology might deliver economically efficient storage and audit solutions for legal data management. More insights 👉

Exploring the Future of Legal Data Infrastructure Iagon, in partnership with Cloud Court, is pleased to announce that Ford Motor Company Motor Company will serve in an advisory capacity for this exploratory project, which seeks to evaluate the use of the Cardano blockchain and Iagon's decentralized cloud storage technology as a potential solution for the secure storage and management of legal documents and data. As a major corporation with sophisticated legal operations, Ford brings valuable perspective to this exploratory initiative based on their experience managing complex legal data infrastructures at scale. Ford is interested in exploring whether blockchain-based distributed storage could address persistent challenges in legal data infrastructure. In particular, Ford sees merit in exploring how blockchain technology might deliver economically efficient storage and audit solutions for legal data management. More insights 👉

Iagon 🧑‍🚀💽

225,793 次观看 • 1 年前

March 18, 2025 marked the public launch of OptimAI. In one year, it has evolved from a lightweight node layer into a decentralized intelligence infrastructure powering real-time data, compute, and reinforcement for agentic systems. Not just nodes. Not just data. A continuously learning, network-driven intelligence layer. This is infrastructure for a new class of software: autonomous agents that persist, adapt, and operate across environments. Year one established the network. Year two is where it compounds into coordination and value flow. Personal agents. Reinforcement at network scale. Emerging primitives for AgentFi. New layers coming online. 2026 won’t just be about scale, it’s where the network starts to operate. Keep building!

March 18, 2025 marked the public launch of OptimAI. In one year, it has evolved from a lightweight node layer into a decentralized intelligence infrastructure powering real-time data, compute, and reinforcement for agentic systems. Not just nodes. Not just data. A continuously learning, network-driven intelligence layer. This is infrastructure for a new class of software: autonomous agents that persist, adapt, and operate across environments. Year one established the network. Year two is where it compounds into coordination and value flow. Personal agents. Reinforcement at network scale. Emerging primitives for AgentFi. New layers coming online. 2026 won’t just be about scale, it’s where the network starts to operate. Keep building!

OptimAI Network

34,082 次观看 • 4 个月前

LongWriter Unleashing 10,000+ Word Generation from Long Context LLMs discuss: Current long context large language models (LLMs) can process inputs up to 100,000 tokens, yet struggle to generate outputs exceeding even a modest length of 2,000 words. Through controlled experiments, we find that the model's effective generation length is inherently bounded by the sample it has seen during supervised fine-tuning (SFT). In other words, their output limitation is due to the scarcity of long-output examples in existing SFT datasets. To address this, we introduce AgentWrite, an agent-based pipeline that decomposes ultra-long generation tasks into subtasks, enabling off-the-shelf LLMs to generate coherent outputs exceeding 20,000 words. Leveraging AgentWrite, we construct LongWriter-6k, a dataset containing 6,000 SFT data with output lengths ranging from 2k to 32k words. By incorporating this dataset into model training, we successfully scale the output length of existing models to over 10,000 words while maintaining output quality. We also develop LongBench-Write, a comprehensive benchmark for evaluating ultra-long generation capabilities. Our 9B parameter model, further improved through DPO, achieves state-of-the-art performance on this benchmark, surpassing even much larger proprietary models. In general, our work demonstrates that existing long context LLM already possesses the potential for a larger output window--all you need is data with extended output during model alignment to unlock this capability.

LongWriter Unleashing 10,000+ Word Generation from Long Context LLMs discuss: Current long context large language models (LLMs) can process inputs up to 100,000 tokens, yet struggle to generate outputs exceeding even a modest length of 2,000 words. Through controlled experiments, we find that the model's effective generation length is inherently bounded by the sample it has seen during supervised fine-tuning (SFT). In other words, their output limitation is due to the scarcity of long-output examples in existing SFT datasets. To address this, we introduce AgentWrite, an agent-based pipeline that decomposes ultra-long generation tasks into subtasks, enabling off-the-shelf LLMs to generate coherent outputs exceeding 20,000 words. Leveraging AgentWrite, we construct LongWriter-6k, a dataset containing 6,000 SFT data with output lengths ranging from 2k to 32k words. By incorporating this dataset into model training, we successfully scale the output length of existing models to over 10,000 words while maintaining output quality. We also develop LongBench-Write, a comprehensive benchmark for evaluating ultra-long generation capabilities. Our 9B parameter model, further improved through DPO, achieves state-of-the-art performance on this benchmark, surpassing even much larger proprietary models. In general, our work demonstrates that existing long context LLM already possesses the potential for a larger output window--all you need is data with extended output during model alignment to unlock this capability.

AK

50,995 次观看 • 1 年前

MaskINT: Video Editing via Interpolative Non-autoregressive Masked Transformers paper page: Recent advances in generative AI have significantly enhanced image and video editing, particularly in the context of text prompt control. State-of-the-art approaches predominantly rely on diffusion models to accomplish these tasks. However, the computational demands of diffusion-based methods are substantial, often necessitating large-scale paired datasets for training, and therefore challenging the deployment in practical applications. This study addresses this challenge by breaking down the text-based video editing process into two separate stages. In the first stage, we leverage an existing text-to-image diffusion model to simultaneously edit a few keyframes without additional fine-tuning. In the second stage, we introduce an efficient model called MaskINT, which is built on non-autoregressive masked generative transformers and specializes in frame interpolation between the keyframes, benefiting from structural guidance provided by intermediate frames. Our comprehensive set of experiments illustrates the efficacy and efficiency of MaskINT when compared to other diffusion-based methodologies. This research offers a practical solution for text-based video editing and showcases the potential of non-autoregressive masked generative transformers in this domain.

MaskINT: Video Editing via Interpolative Non-autoregressive Masked Transformers paper page: Recent advances in generative AI have significantly enhanced image and video editing, particularly in the context of text prompt control. State-of-the-art approaches predominantly rely on diffusion models to accomplish these tasks. However, the computational demands of diffusion-based methods are substantial, often necessitating large-scale paired datasets for training, and therefore challenging the deployment in practical applications. This study addresses this challenge by breaking down the text-based video editing process into two separate stages. In the first stage, we leverage an existing text-to-image diffusion model to simultaneously edit a few keyframes without additional fine-tuning. In the second stage, we introduce an efficient model called MaskINT, which is built on non-autoregressive masked generative transformers and specializes in frame interpolation between the keyframes, benefiting from structural guidance provided by intermediate frames. Our comprehensive set of experiments illustrates the efficacy and efficiency of MaskINT when compared to other diffusion-based methodologies. This research offers a practical solution for text-based video editing and showcases the potential of non-autoregressive masked generative transformers in this domain.

AK

25,449 次观看 • 2 年前

3. Mojeek Mojeek is a unique search engine that stands apart by offering its own independent search index, rather than relying on data from other engines. With a strong focus on privacy, Mojeek doesn’t track users, collect personal information, or target ads based on search history. It’s a great option for users who value both transparency and autonomy in their search experience, providing an alternative to mainstream search engines while still delivering relevant, unbiased results from its growing index of the web.

3. Mojeek Mojeek is a unique search engine that stands apart by offering its own independent search index, rather than relying on data from other engines. With a strong focus on privacy, Mojeek doesn’t track users, collect personal information, or target ads based on search history. It’s a great option for users who value both transparency and autonomy in their search experience, providing an alternative to mainstream search engines while still delivering relevant, unbiased results from its growing index of the web.

Mario Nawfal

25,772 次观看 • 1 年前

From the archives! 2023: Dan was asked about his thoughts on Dutton being in Victoria. What Dan says is true. Dutton did start a racist campaign that wasn’t based on reported crime data. The so called crime that he was alleging happening in Melb disappeared after Dan won in 2018. Funny that! It’s pretty funny how Dan says it. Because it was true. Peter Dutton and the Liberal Party started a racist campaign leading up to the 2018 Victorian state election, that “African gangs” were taking over Melbourne. That wasn’t based on reported crime data. Look it up if you will. It was based on the Liberal Party trying to scare voters into voting for the Liberal Party, because they’re tougher on crime (apparently) so let’s use a racist slogan not based on fact to try and scare people. It was nothing but a dead set scare campaign. It didn’t work! But after the 2018 state election, all the commentary on this all disappeared. It actually did. Remember, the Liberal Party lies to win elections. They don’t do any of this is good faith. They do it for themselves, and only themselves.

From the archives! 2023: Dan was asked about his thoughts on Dutton being in Victoria. What Dan says is true. Dutton did start a racist campaign that wasn’t based on reported crime data. The so called crime that he was alleging happening in Melb disappeared after Dan won in 2018. Funny that! It’s pretty funny how Dan says it. Because it was true. Peter Dutton and the Liberal Party started a racist campaign leading up to the 2018 Victorian state election, that “African gangs” were taking over Melbourne. That wasn’t based on reported crime data. Look it up if you will. It was based on the Liberal Party trying to scare voters into voting for the Liberal Party, because they’re tougher on crime (apparently) so let’s use a racist slogan not based on fact to try and scare people. It was nothing but a dead set scare campaign. It didn’t work! But after the 2018 state election, all the commentary on this all disappeared. It actually did. Remember, the Liberal Party lies to win elections. They don’t do any of this is good faith. They do it for themselves, and only themselves.

Dan Fangirl 🤓

22,573 次观看 • 1 年前

Extracting structured outputs with LLMs is easy. But doing large-scale extraction with precise citations and bounding boxes back to the source documents is way harder. With our latest release in LlamaExtract, we extract citation bounding boxes along with every single key and value within a document. You can see this in the UI. Hover over any k:v pair and you’ll be able to see the corresponding highlights in the source doc. If you’re a human reviewing a million docs (resumes, IDs, invoices, claims, contracts), this will help you 5x your ability to verify values and make sure things are correct. Check out these new extraction upgrades in LlamaCloud:

Extracting structured outputs with LLMs is easy. But doing large-scale extraction with precise citations and bounding boxes back to the source documents is way harder. With our latest release in LlamaExtract, we extract citation bounding boxes along with every single key and value within a document. You can see this in the UI. Hover over any k:v pair and you’ll be able to see the corresponding highlights in the source doc. If you’re a human reviewing a million docs (resumes, IDs, invoices, claims, contracts), this will help you 5x your ability to verify values and make sure things are correct. Check out these new extraction upgrades in LlamaCloud:

Jerry Liu

23,044 次观看 • 5 个月前

The newest version of our Almanac preprint is out, and just in time for our demo at the Stanford AIMI Symposium 2023! Almanac is a retrieval-augmented LLM that provides up-to-date and verifiable answers to medical queries. Link: We benchmark our approach on a novel dataset of clinical scenarios (n = 130) evaluated by a panel of 5 board-certified & resident physicians, and demonstrate significant increases in factuality (mean of 18% at p-value < 0.05) across all specialties. More interestingly, because the retrieved data acts as a single source of truth, we find retrieval-based LLMs to be more robust to prompt injection and manipulation! Future work will involve expanding the scope of our dataset to more specialties and multimodal settings. #Medtwitter #MedEd

The newest version of our Almanac preprint is out, and just in time for our demo at the Stanford AIMI Symposium 2023! Almanac is a retrieval-augmented LLM that provides up-to-date and verifiable answers to medical queries. Link: We benchmark our approach on a novel dataset of clinical scenarios (n = 130) evaluated by a panel of 5 board-certified & resident physicians, and demonstrate significant increases in factuality (mean of 18% at p-value < 0.05) across all specialties. More interestingly, because the retrieved data acts as a single source of truth, we find retrieval-based LLMs to be more robust to prompt injection and manipulation! Future work will involve expanding the scope of our dataset to more specialties and multimodal settings. #Medtwitter #MedEd

Cyril Zakka, MD

18,128 次观看 • 3 年前