Yulu Gan's banner
Yulu Gan's profile picture

Yulu Gan

@yule_gan3,704 subscribers

PhD student @MITEECS @MIT_CSAIL @MIT_CBMM / ex @PKU1898 @MSFTResearch (M)LLM Reasoning, Neuroevolution, Emergence of Intelligence, Understanding Intelligence

Shorts

Reinforcement Learning (RL) has long been the dominant method for fine-tuning, powering many state-of-the-art LLMs. Methods like PPO and GRPO explore in action space. But can we instead explore directly in parameter space? YES we can. We propose a scalable framework for full-parameter fine-tuning using Evolution Strategies (ES). By skipping gradients and optimizing directly in parameter space, ES achieves more accurate, efficient, and stable fine-tuning. Paper: Code:

Reinforcement Learning (RL) has long been the dominant method for fine-tuning, powering many state-of-the-art LLMs. Methods like PPO and GRPO explore in action space. But can we instead explore directly in parameter space? YES we can. We propose a scalable framework for full-parameter fine-tuning using Evolution Strategies (ES). By skipping gradients and optimizing directly in parameter space, ES achieves more accurate, efficient, and stable fine-tuning. Paper: Code:

414,920 views