Loading video...
Video Failed to Load
Reinforcement Learning (RL) has long been the dominant method for fine-tuning, powering many state-of-the-art LLMs. Methods like PPO and GRPO explore in action space. But can we instead explore directly in parameter space? YES we can. We propose a scalable framework for full-parameter fine-tuning using Evolution Strategies (ES). By... show more
414,920 views • 8 months ago •via X (Twitter)
0 Comments
No comments available
Comments from the original post will appear here

