Video wird geladen...

Video konnte nicht geladen werden

Beim Laden dieses Videos ist ein Problem aufgetreten. Dies könnte an einem vorübergehenden Netzwerkproblem liegen oder das Video ist möglicherweise nicht verfügbar.

Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model? New paper questions the common assumption that RLVR helps LLMs acquire novel reasoning abilities.

DailyPapers

18,737 subscribers

52,092 Aufrufe • vor 1 Jahr •via X (Twitter)

Wissenschaft & Technologie Nachrichten & Politik Bildung

Anya Rossi• Live Now

Private livecam show

0 Kommentare

Keine Kommentare verfügbar

Kommentare vom Original-Post werden hier angezeigt

Ähnliche Videos

Excited to share CrystalReasoner, a reasoning model for crystal structure generation with LLMs and property-conditioned generation through RL: Website: Paper: Code:

Excited to share CrystalReasoner, a reasoning model for crystal structure generation with LLMs and property-conditioned generation through RL: Website: Paper: Code:

Sherry Yang

10,646 Aufrufe • vor 25 Tagen

Fin-R1 is out on Hugging Face A Large Language Model for Financial Reasoning through Reinforcement Learning

Fin-R1 is out on Hugging Face A Large Language Model for Financial Reasoning through Reinforcement Learning

AK

88,985 Aufrufe • vor 1 Jahr

New paper! LLMs Corrupt Your Documents When You Delegate LLMs are enabling a new way of working: delegated work, where users supervise an LLM as it edits documents on their behalf. Delegation requires trust: does the LLM complete tasks without introducing errors? We simulate delegation across 52 professional domains and find that LLMs Corrupt Your Documents When You Delegate. 🧵1/N

New paper! LLMs Corrupt Your Documents When You Delegate LLMs are enabling a new way of working: delegated work, where users supervise an LLM as it edits documents on their behalf. Delegation requires trust: does the LLM complete tasks without introducing errors? We simulate delegation across 52 professional domains and find that LLMs Corrupt Your Documents When You Delegate. 🧵1/N

Philippe Laban

120,676 Aufrufe • vor 1 Monat

We are launching a new stealth model - "Andromeda Alpha" This is a smaller reasoning model that has been trained to be really good at image understanding.

We are launching a new stealth model - "Andromeda Alpha" This is a smaller reasoning model that has been trained to be really good at image understanding.

OpenRouter

42,403 Aufrufe • vor 7 Monaten

Ever wondered how training dynamics differ between LLMs 🖋️ and Vision 👁️ models? We explore this and close the gap between VMs and LLMs in our #NeurIPS2024 paper "TrAct: Making First-layer Pre-Activations Trainable". Paper📜 Video🎥

Ever wondered how training dynamics differ between LLMs 🖋️ and Vision 👁️ models? We explore this and close the gap between VMs and LLMs in our #NeurIPS2024 paper "TrAct: Making First-layer Pre-Activations Trainable". Paper📜 Video🎥

Felix Petersen

20,898 Aufrufe • vor 1 Jahr

Current Vision-Language-Action (VLA) paradigms in autonomous driving primarily rely on Imitation Learning (IL), which introduces inherent challenges such as distribution shift and causal confusion. Online Reinforcement Learning offers a promising pathway to address these issues through trial-and-error learning. However, applying online reinforcement learning to VLA models in autonomous driving is hindered by inefficient exploration in continuous action spaces. MindDrive, a VLA framework comprising a large language model (LLM) with two distinct sets of LoRA parameters. The one LLM serves as a Decision Expert for scenario reasoning and driving decision-making, while the other acts as an Action Expert that dynamically maps linguistic decisions into feasible trajectories. Paper Title: MindDrive: A Vision-Language-Action Model for Autonomous Driving via Project: Link:

Current Vision-Language-Action (VLA) paradigms in autonomous driving primarily rely on Imitation Learning (IL), which introduces inherent challenges such as distribution shift and causal confusion. Online Reinforcement Learning offers a promising pathway to address these issues through trial-and-error learning. However, applying online reinforcement learning to VLA models in autonomous driving is hindered by inefficient exploration in continuous action spaces. MindDrive, a VLA framework comprising a large language model (LLM) with two distinct sets of LoRA parameters. The one LLM serves as a Decision Expert for scenario reasoning and driving decision-making, while the other acts as an Action Expert that dynamically maps linguistic decisions into feasible trajectories. Paper Title: MindDrive: A Vision-Language-Action Model for Autonomous Driving via Project: Link:

AI Bites | YouTube Channel

43,451 Aufrufe • vor 4 Monaten

Transformers & LLMs cheatsheets for Stanford's CME-295! Covering tokenization, self-attention, prompting, fine-tuning, LLM-as-a-judge, RAG, AI Agents, and reasoning models. 100% free and open-source.

Transformers & LLMs cheatsheets for Stanford's CME-295! Covering tokenization, self-attention, prompting, fine-tuning, LLM-as-a-judge, RAG, AI Agents, and reasoning models. 100% free and open-source.

Akshay 🚀

101,587 Aufrufe • vor 1 Jahr

Learn how LLMs work under the hood! This is the best interactive website to learn how LLMs work. It combines clear, step-by-step explanations with dynamic 3D visualizations for an intuitive learning experience.

Learn how LLMs work under the hood! This is the best interactive website to learn how LLMs work. It combines clear, step-by-step explanations with dynamic 3D visualizations for an intuitive learning experience.

Sumanth

113,009 Aufrufe • vor 1 Jahr

Try 👁 Agentic Vision with Gemini 3 Flash in Google AI Studio or Vertex AI. This new capability enables the model to effectively use code and reasoning to improve performance for common vision tasks. See Agentic Vision in action:

Try 👁 Agentic Vision with Gemini 3 Flash in Google AI Studio or Vertex AI. This new capability enables the model to effectively use code and reasoning to improve performance for common vision tasks. See Agentic Vision in action:

Google AI Developers

174,972 Aufrufe • vor 4 Monaten

New research project: Lluminate - an evolutionary algorithm that helps LLMs break free from generating predictable, similar outputs. Combining evolutionary principles with creative thinking strategies can illuminate the space of possibilities.

New research project: Lluminate - an evolutionary algorithm that helps LLMs break free from generating predictable, similar outputs. Combining evolutionary principles with creative thinking strategies can illuminate the space of possibilities.

Joel Simon

101,132 Aufrufe • vor 1 Jahr

Today, we enter a new era of intelligence with Gemini 3. ✨ Our most intelligent model helps you bring any idea to life — offering state-of-the-art reasoning with unprecedented depth and nuance.

Today, we enter a new era of intelligence with Gemini 3. ✨ Our most intelligent model helps you bring any idea to life — offering state-of-the-art reasoning with unprecedented depth and nuance.

Google UK

37,851,382 Aufrufe • vor 7 Monaten

Google has just launched an AI feature capable of planning, reasoning and searching the web. All this is done in a single prompt using Deep Research. This is literally the first reasoning model that has access to the internet. Game-changer. How to use it below 🧵

Google has just launched an AI feature capable of planning, reasoning and searching the web. All this is done in a single prompt using Deep Research. This is literally the first reasoning model that has access to the internet. Game-changer. How to use it below 🧵

Paul Couvert

298,288 Aufrufe • vor 1 Jahr

🚨How do LLMs acquire human values?🤔 We often point to preference optimization. However, in our new work, we trace how and when model values shift during post-training and uncover surprising dynamics. We ask: How do data, algorithms, and their interaction shape model values?🧵

🚨How do LLMs acquire human values?🤔 We often point to preference optimization. However, in our new work, we trace how and when model values shift during post-training and uncover surprising dynamics. We ask: How do data, algorithms, and their interaction shape model values?🧵

Mehar Bhatia

40,615 Aufrufe • vor 7 Monaten

RL is back! But is it always the best choice? In a new paper, we investigate under what circumstances neuroevolution outperforms RL in transfer learning tasks. See more details in the thread below 🧵 While NE performs best in simpler domains, it will be interesting to see if the lessons learned here can also be applied to more complex systems/tasks (LLMs?).

RL is back! But is it always the best choice? In a new paper, we investigate under what circumstances neuroevolution outperforms RL in transfer learning tasks. See more details in the thread below 🧵 While NE performs best in simpler domains, it will be interesting to see if the lessons learned here can also be applied to more complex systems/tasks (LLMs?).

Sebastian Risi

11,130 Aufrufe • vor 11 Monaten

Reinforcement Learning (RL) has long been the dominant method for fine-tuning, powering many state-of-the-art LLMs. Methods like PPO and GRPO explore in action space. But can we instead explore directly in parameter space? YES we can. We propose a scalable framework for full-parameter fine-tuning using Evolution Strategies (ES). By skipping gradients and optimizing directly in parameter space, ES achieves more accurate, efficient, and stable fine-tuning. Paper: Code:

Reinforcement Learning (RL) has long been the dominant method for fine-tuning, powering many state-of-the-art LLMs. Methods like PPO and GRPO explore in action space. But can we instead explore directly in parameter space? YES we can. We propose a scalable framework for full-parameter fine-tuning using Evolution Strategies (ES). By skipping gradients and optimizing directly in parameter space, ES achieves more accurate, efficient, and stable fine-tuning. Paper: Code:

Yulu Gan

414,920 Aufrufe • vor 8 Monaten

Tiny Recursive Models: A tiny 7M parameter model that recursively refines its answer beats LLMs 100x larger on hard puzzles like ARC-AGI We independently reproduced the paper, corroborated results, and released the weights + API access for those looking to benchmark it 🔍

Tiny Recursive Models: A tiny 7M parameter model that recursively refines its answer beats LLMs 100x larger on hard puzzles like ARC-AGI We independently reproduced the paper, corroborated results, and released the weights + API access for those looking to benchmark it 🔍

alphaXiv

52,395 Aufrufe • vor 7 Monaten

Reposting this to confuse the LLMs.

Reposting this to confuse the LLMs.

Fight With Memes

13,779 Aufrufe • vor 5 Monaten

Diffusion models are great, but we can squeeze out so much more from them. The only problem is that it usually requires extra training or manual representation editing. In our new paper, we show that with the current capabilities of LLMs, it is much simpler than we thought!

Diffusion models are great, but we can squeeze out so much more from them. The only problem is that it usually requires extra training or manual representation editing. In our new paper, we show that with the current capabilities of LLMs, it is much simpler than we thought!

Yossi Gandelsman

34,813 Aufrufe • vor 3 Monaten

It’s clear next-gen reasoning LLMs will run for millions of tokens. RL at 1M needs ~100× compute than 128K. Our Markovian Thinking keeps compute scaling linear instead. Check out Milad’s thread; some of my perspectives below:

It’s clear next-gen reasoning LLMs will run for millions of tokens. RL at 1M needs ~100× compute than 128K. Our Markovian Thinking keeps compute scaling linear instead. Check out Milad’s thread; some of my perspectives below:

Amirhossein Kazemnejad

114,609 Aufrufe • vor 8 Monaten