Mehar Bhatia's banner
Mehar Bhatia's profile picture

Mehar Bhatia

@bhatia_mehar1,339 subscribers

PhD Candidate at @Mila_Quebec @mcgillu 👩‍🎓| Prev: @UBC Vancouver | Studying societal impacts of AI, AI alignment and safety

Shorts

🚨How do LLMs acquire human values?🤔 We often point to preference optimization. However, in our new work, we trace how and when model values shift during post-training and uncover surprising dynamics. We ask: How do data, algorithms, and their interaction shape model values?🧵

🚨How do LLMs acquire human values?🤔 We often point to preference optimization. However, in our new work, we trace how and when model values shift during post-training and uncover surprising dynamics. We ask: How do data, algorithms, and their interaction shape model values?🧵

39,935 Aufrufe