Ziqian Zhong's banner
Ziqian Zhong's profile picture

Ziqian Zhong

@fjzzq20021,648 subscribers

Intern @TransluceAI | AI interp & alignment @CarnegieMellon, prev @MIT @pika_labs

Shorts

🔭 We’re releasing Hodoscope: an open-source tool for unsupervised behavior discovery. It lets you visually explore and compare agent behaviors at scale. It helped us discover a novel reward hacking vulnerability in Commit0 - with just a couple minutes of human effort.

🔭 We’re releasing Hodoscope: an open-source tool for unsupervised behavior discovery. It lets you visually explore and compare agent behaviors at scale. It helped us discover a novel reward hacking vulnerability in Commit0 - with just a couple minutes of human effort.

74,035 Aufrufe