
Guowei Xu
@Kevin_GuoweiXu • 2,830 subscribers
Undergraduate student at Yao Class (Tsinghua University), interested in Language Models and Reinforcement Learning
Videos

🚀 How should LLMs sample on hard reasoning problems during post-training and inference where direct rollouts rarely produce a correct answer? Best-of-N (e.g., GRPO) and tree search share two limitations: 🔻 Verification signals are sparse 🔻 Candidates stay within the model's own distribution We introduce BES: Bidirectional Evolutionary Search — a search framework that couples forward candidate evolution with backward goal decomposition. ✅ Works for both post-training and inference.
Guowei Xu239,945 просмотров • 12 дней назад
Больше нет контента для загрузки