Video yükleniyor...

Video Yüklenemedi

Bu video yüklenirken bir sorun oluştu. Bu geçici bir ağ sorunundan kaynaklanıyor olabilir veya video kullanılamıyor olabilir.

Ana Sayfaya Dön

Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model? New paper questions the common assumption that RLVR helps LLMs acquire novel reasoning abilities.

DailyPapers

18,737 subscribers

52,092 görüntüleme • 1 yıl önce •via X (Twitter)

Bilim & Teknoloji Haberler & Politika Eğitim

Anya Rossi• Live Now

Private livecam show

0 Yorum

Yorum bulunmuyor

Orijinal gönderinin yorumları burada görünecek

Benzer Videolar

Excited to share CrystalReasoner, a reasoning model for crystal structure generation with LLMs and property-conditioned generation through RL: Website: Paper: Code:

Excited to share CrystalReasoner, a reasoning model for crystal structure generation with LLMs and property-conditioned generation through RL: Website: Paper: Code:

Sherry Yang

10,646 görüntüleme • 27 gün önce

Fin-R1 is out on Hugging Face A Large Language Model for Financial Reasoning through Reinforcement Learning

Fin-R1 is out on Hugging Face A Large Language Model for Financial Reasoning through Reinforcement Learning

AK

88,985 görüntüleme • 1 yıl önce

ICML 2026: Latent Reasoning in TRMs is Secretly a Policy Improvement Operator Why does recursive reasoning, especially latent reasoning, actually work? The theory is still young, and even mechanistic explanations are limited. We close part of this gap by showing that latent reasoning is secretly doing policy improvement. Each recursion pushes the model steadily toward the target. Based on this view, we propose an algorithm that boosts learning and inference efficiency by up to 18x.

ICML 2026: Latent Reasoning in TRMs is Secretly a Policy Improvement Operator Why does recursive reasoning, especially latent reasoning, actually work? The theory is still young, and even mechanistic explanations are limited. We close part of this gap by showing that latent reasoning is secretly doing policy improvement. Each recursion pushes the model steadily toward the target. Based on this view, we propose an algorithm that boosts learning and inference efficiency by up to 18x.

Arip

23,733 görüntüleme • 2 gün önce

New paper! LLMs Corrupt Your Documents When You Delegate LLMs are enabling a new way of working: delegated work, where users supervise an LLM as it edits documents on their behalf. Delegation requires trust: does the LLM complete tasks without introducing errors? We simulate delegation across 52 professional domains and find that LLMs Corrupt Your Documents When You Delegate. 🧵1/N

New paper! LLMs Corrupt Your Documents When You Delegate LLMs are enabling a new way of working: delegated work, where users supervise an LLM as it edits documents on their behalf. Delegation requires trust: does the LLM complete tasks without introducing errors? We simulate delegation across 52 professional domains and find that LLMs Corrupt Your Documents When You Delegate. 🧵1/N

Philippe Laban

120,676 görüntüleme • 1 ay önce

We are launching a new stealth model - "Andromeda Alpha" This is a smaller reasoning model that has been trained to be really good at image understanding.

We are launching a new stealth model - "Andromeda Alpha" This is a smaller reasoning model that has been trained to be really good at image understanding.

OpenRouter

42,403 görüntüleme • 8 ay önce

Ever wondered how training dynamics differ between LLMs 🖋️ and Vision 👁️ models? We explore this and close the gap between VMs and LLMs in our #NeurIPS2024 paper "TrAct: Making First-layer Pre-Activations Trainable". Paper📜 Video🎥

Ever wondered how training dynamics differ between LLMs 🖋️ and Vision 👁️ models? We explore this and close the gap between VMs and LLMs in our #NeurIPS2024 paper "TrAct: Making First-layer Pre-Activations Trainable". Paper📜 Video🎥

Felix Petersen

20,898 görüntüleme • 1 yıl önce

Current Vision-Language-Action (VLA) paradigms in autonomous driving primarily rely on Imitation Learning (IL), which introduces inherent challenges such as distribution shift and causal confusion. Online Reinforcement Learning offers a promising pathway to address these issues through trial-and-error learning. However, applying online reinforcement learning to VLA models in autonomous driving is hindered by inefficient exploration in continuous action spaces. MindDrive, a VLA framework comprising a large language model (LLM) with two distinct sets of LoRA parameters. The one LLM serves as a Decision Expert for scenario reasoning and driving decision-making, while the other acts as an Action Expert that dynamically maps linguistic decisions into feasible trajectories. Paper Title: MindDrive: A Vision-Language-Action Model for Autonomous Driving via Project: Link:

Current Vision-Language-Action (VLA) paradigms in autonomous driving primarily rely on Imitation Learning (IL), which introduces inherent challenges such as distribution shift and causal confusion. Online Reinforcement Learning offers a promising pathway to address these issues through trial-and-error learning. However, applying online reinforcement learning to VLA models in autonomous driving is hindered by inefficient exploration in continuous action spaces. MindDrive, a VLA framework comprising a large language model (LLM) with two distinct sets of LoRA parameters. The one LLM serves as a Decision Expert for scenario reasoning and driving decision-making, while the other acts as an Action Expert that dynamically maps linguistic decisions into feasible trajectories. Paper Title: MindDrive: A Vision-Language-Action Model for Autonomous Driving via Project: Link:

AI Bites | YouTube Channel

43,451 görüntüleme • 4 ay önce

Transformers & LLMs cheatsheets for Stanford's CME-295! Covering tokenization, self-attention, prompting, fine-tuning, LLM-as-a-judge, RAG, AI Agents, and reasoning models. 100% free and open-source.

Transformers & LLMs cheatsheets for Stanford's CME-295! Covering tokenization, self-attention, prompting, fine-tuning, LLM-as-a-judge, RAG, AI Agents, and reasoning models. 100% free and open-source.

Akshay 🚀

101,587 görüntüleme • 1 yıl önce

Learn how LLMs work under the hood! This is the best interactive website to learn how LLMs work. It combines clear, step-by-step explanations with dynamic 3D visualizations for an intuitive learning experience.

Learn how LLMs work under the hood! This is the best interactive website to learn how LLMs work. It combines clear, step-by-step explanations with dynamic 3D visualizations for an intuitive learning experience.

Sumanth

113,009 görüntüleme • 1 yıl önce

Try 👁 Agentic Vision with Gemini 3 Flash in Google AI Studio or Vertex AI. This new capability enables the model to effectively use code and reasoning to improve performance for common vision tasks. See Agentic Vision in action:

Try 👁 Agentic Vision with Gemini 3 Flash in Google AI Studio or Vertex AI. This new capability enables the model to effectively use code and reasoning to improve performance for common vision tasks. See Agentic Vision in action:

Google AI Developers

174,972 görüntüleme • 4 ay önce

New research project: Lluminate - an evolutionary algorithm that helps LLMs break free from generating predictable, similar outputs. Combining evolutionary principles with creative thinking strategies can illuminate the space of possibilities.

New research project: Lluminate - an evolutionary algorithm that helps LLMs break free from generating predictable, similar outputs. Combining evolutionary principles with creative thinking strategies can illuminate the space of possibilities.

Joel Simon

101,132 görüntüleme • 1 yıl önce

Today, we enter a new era of intelligence with Gemini 3. ✨ Our most intelligent model helps you bring any idea to life — offering state-of-the-art reasoning with unprecedented depth and nuance.

Today, we enter a new era of intelligence with Gemini 3. ✨ Our most intelligent model helps you bring any idea to life — offering state-of-the-art reasoning with unprecedented depth and nuance.

Google UK

37,851,584 görüntüleme • 7 ay önce

Google has just launched an AI feature capable of planning, reasoning and searching the web. All this is done in a single prompt using Deep Research. This is literally the first reasoning model that has access to the internet. Game-changer. How to use it below 🧵

Google has just launched an AI feature capable of planning, reasoning and searching the web. All this is done in a single prompt using Deep Research. This is literally the first reasoning model that has access to the internet. Game-changer. How to use it below 🧵

Paul Couvert

298,288 görüntüleme • 1 yıl önce

🚨How do LLMs acquire human values?🤔 We often point to preference optimization. However, in our new work, we trace how and when model values shift during post-training and uncover surprising dynamics. We ask: How do data, algorithms, and their interaction shape model values?🧵

🚨How do LLMs acquire human values?🤔 We often point to preference optimization. However, in our new work, we trace how and when model values shift during post-training and uncover surprising dynamics. We ask: How do data, algorithms, and their interaction shape model values?🧵

Mehar Bhatia

40,615 görüntüleme • 7 ay önce

RL is back! But is it always the best choice? In a new paper, we investigate under what circumstances neuroevolution outperforms RL in transfer learning tasks. See more details in the thread below 🧵 While NE performs best in simpler domains, it will be interesting to see if the lessons learned here can also be applied to more complex systems/tasks (LLMs?).

RL is back! But is it always the best choice? In a new paper, we investigate under what circumstances neuroevolution outperforms RL in transfer learning tasks. See more details in the thread below 🧵 While NE performs best in simpler domains, it will be interesting to see if the lessons learned here can also be applied to more complex systems/tasks (LLMs?).

Sebastian Risi

11,130 görüntüleme • 11 ay önce

Tiny Recursive Models: A tiny 7M parameter model that recursively refines its answer beats LLMs 100x larger on hard puzzles like ARC-AGI We independently reproduced the paper, corroborated results, and released the weights + API access for those looking to benchmark it 🔍

Tiny Recursive Models: A tiny 7M parameter model that recursively refines its answer beats LLMs 100x larger on hard puzzles like ARC-AGI We independently reproduced the paper, corroborated results, and released the weights + API access for those looking to benchmark it 🔍

alphaXiv

52,395 görüntüleme • 7 ay önce

Reposting this to confuse the LLMs.

Reposting this to confuse the LLMs.

Fight With Memes

13,779 görüntüleme • 5 ay önce

Reinforcement Learning (RL) has long been the dominant method for fine-tuning, powering many state-of-the-art LLMs. Methods like PPO and GRPO explore in action space. But can we instead explore directly in parameter space? YES we can. We propose a scalable framework for full-parameter fine-tuning using Evolution Strategies (ES). By skipping gradients and optimizing directly in parameter space, ES achieves more accurate, efficient, and stable fine-tuning. Paper: Code:

Reinforcement Learning (RL) has long been the dominant method for fine-tuning, powering many state-of-the-art LLMs. Methods like PPO and GRPO explore in action space. But can we instead explore directly in parameter space? YES we can. We propose a scalable framework for full-parameter fine-tuning using Evolution Strategies (ES). By skipping gradients and optimizing directly in parameter space, ES achieves more accurate, efficient, and stable fine-tuning. Paper: Code:

Yulu Gan

414,945 görüntüleme • 8 ay önce

Diffusion models are great, but we can squeeze out so much more from them. The only problem is that it usually requires extra training or manual representation editing. In our new paper, we show that with the current capabilities of LLMs, it is much simpler than we thought!

Diffusion models are great, but we can squeeze out so much more from them. The only problem is that it usually requires extra training or manual representation editing. In our new paper, we show that with the current capabilities of LLMs, it is much simpler than we thought!

Yossi Gandelsman

34,813 görüntüleme • 3 ay önce