#alignmentworkshop

Anya Rossi

sweetdream.ai

SweetDream.ai•Sponsored•Livecam

Watch Anya Live

Anya is streaming live right now! Join her private show and enjoy exclusive content.

Exclusive private shows

1.2k viewers online

Private Show

Join now for exclusive access

Free preview available • Premium content

RepE: Representations are weights & activations. Engineering is reading, probing & control—like brain scans for AI. Andy Zou shows how top-down representational engineering improves AI honesty and jailbreak robustness. #AlignmentWorkshop

RepE: Representations are weights & activations. Engineering is reading, probing & control—like brain scans for AI. Andy Zou shows how top-down representational engineering improves AI honesty and jailbreak robustness. #AlignmentWorkshop

415,329 просмотров • 1 год назад

"If you literally catch your AI trying to escape, you have to stop deploying it." Buck Shlegeris shares strategies for managing misaligned AI, including trusted monitoring and collusion-busting techniques to limit catastrophic risks as capabilities grow. #AlignmentWorkshop

"If you literally catch your AI trying to escape, you have to stop deploying it." Buck Shlegeris shares strategies for managing misaligned AI, including trusted monitoring and collusion-busting techniques to limit catastrophic risks as capabilities grow. #AlignmentWorkshop

194,863 просмотров • 1 год назад

Vienna #AlignmentWorkshop: 129 researchers tackled #AISafety from interpretability & robustness to governance. Keynote by Jan Leike + talks by Victoria Krakovna David Krueger Gillian Hadfield Robert Trager Neel Nanda David Bau Helen Toner Mary Phuong and more. Blog recap & videos. 👇

Vienna #AlignmentWorkshop: 129 researchers tackled #AISafety from interpretability & robustness to governance. Keynote by Jan Leike + talks by Victoria Krakovna David Krueger Gillian Hadfield Robert Trager Neel Nanda David Bau Helen Toner Mary Phuong and more. Blog recap & videos. 👇

46,317 просмотров • 1 год назад

Больше нет контента для загрузки