Video wird geladen...
Video konnte nicht geladen werden
💡Divergence thinking💡 is a hallmark of human creativity and problem-solving 🤖Can LLMs also do divergent reasoning to generate diverse solutions🤔? Introducing Flow-of-Reasoning (FoR) 🌊, a data-efficient way of training LLM policy to generate diverse, high-quality reasoning trajectories Unlike existing RL (like PPO) and planning (like MCTS) to find the... show more
50,447 Aufrufe • vor 1 Jahr •via X (Twitter)
10 Kommentare

On BlocksWorld, FoR produces both more diverse and higher-quality reasoning trajectories than CoT, Tree-of-Thoughts, RAP (MCTS), Supervised Finetuning (SFT), and PPO.

FoR is very data-efficient. With only 15 training examples, FoR achieves much better accuracy and diversity than SFT using more data.

Thanks our amazing students: Fangxu Yu @nerv_599164778, Lai Jiang @pero733858111, Haoqiang Kang @haoqik322 , Shibo Hao @Ber18791531

Interesting work! I suppose here one cannot use very long trajectory for training due to gpu memory constraint? Transition based objectives should be more appropriate for large models.

The flow-based formulation allows FoR (Flow of Reasoning) to adapt successful GFlowNet approaches for efficient LLM policy training. The diverse sampling enabled by the trajectory balance objective and the exploration mechanisms lead to the superior performance of FoR compared to other methods. full paper:

cool idea!

Thanks!!

Flow-of-Reasoning (FoR) can transform how LLMs approach problem-solving by fostering divergent thinking. Excited to see its applications in robustness, data augmentation, and model generalization.

What is the dataset like?

It's text-based

