Video wird geladen...
Video konnte nicht geladen werden
🚀 How should LLMs sample on hard reasoning problems during post-training and inference where direct rollouts rarely produce a correct answer? Best-of-N (e.g., GRPO) and tree search share two limitations: 🔻 Verification signals are sparse 🔻 Candidates stay within the model's own distribution We introduce BES: Bidirectional Evolutionary Search... show more
241,225 Aufrufe • vor 22 Tagen •via X (Twitter)
0 Kommentare
Keine Kommentare verfügbar
Kommentare vom Original-Post werden hier angezeigt


