Загрузка видео...

Не удалось загрузить видео

Возникла проблема при загрузке этого видео. Это может быть связано с временными проблемами сети или видео может быть недоступно.

На главную

I recently gave a talk at Anthropic on automating alignment research, covering the content from my (just released) essay on the topic. Video here and on youtube, link to transcript in thread.

Joe Carlsmith

6,552 subscribers

28,039 просмотров • 1 год назад •via X (Twitter)

Наука и технологии Новости и политика Образование

Anya Rossi• Live Now

Private livecam show

Комментарии: 5

Фото профиля Joe Carlsmith

Joe Carlsmith1 год назад

Transcript here:

Фото профиля Holly ⏸️ Elmore

Holly ⏸️ Elmore1 год назад

You're leaving out the biggest and most realistic failure mode: the humans doing the work/controlling what happens not being aligned or committed, and using your research to accelerate capabilities to be immortal or make them more money instead.

Фото профиля Roman Hauksson

Roman Hauksson1 год назад

Looks interesting and deeply important; looking forward to watching it

Фото профиля Marcus Arvan

Marcus Arvan1 год назад

The answer to your title slide is “no”.

Фото профиля Joe Carlsmith

Joe Carlsmith1 год назад

I can't access the full text of this, but from a glance at the abstract, to me it looks like it's going to be an over-broad conclusion, and is going to imply that e.g. we can't expect an image classifier to correctly classify cat pictures it hasn't seen before, and/or cat pictures that are slightly outside the training distribution, because there are an infinite/suitably-large number of grue-like functions it could be implementing where it gets that classification wrong. And I expect the argument to go wrong for the same reason that it goes wrong for cat classification, and for grue-like induction problems in general (for example, in human science) -- namely, it's going to neglect something related to priors, simplicity, inductive biases, etc. (But also: alignment doesn't require that we know the exact function an LLM implements -- we just need to know enough about how it generalizes to be confident it will behave well on the inputs we care about.)

Похожие видео

I gave a talk about why I'm skeptical of the argument for doom from alien AI motivations in "If Anyone Builds It, Everyone Dies." Video below and on YouTube, link to transcript and slides in thread. The talk is based on my recent essay on the topic, also linked in thread.

I gave a talk about why I'm skeptical of the argument for doom from alien AI motivations in "If Anyone Builds It, Everyone Dies." Video below and on YouTube, link to transcript and slides in thread. The talk is based on my recent essay on the topic, also linked in thread.

Joe Carlsmith

16,636 просмотров • 6 месяцев назад

I recently gave a public talk called “Can goodness compete?”, on long-term equilibria post-AGI. Video here and on YouTube, link to transcript and slides in thread.

I recently gave a public talk called “Can goodness compete?”, on long-term equilibria post-AGI. Video here and on YouTube, link to transcript and slides in thread.

Joe Carlsmith

11,082 просмотров • 11 месяцев назад

I recently gave a talk at Yale Law School about writing AI constitutions. Video below and on YouTube, link to transcript and slides in thread.

I recently gave a talk at Yale Law School about writing AI constitutions. Video below and on YouTube, link to transcript and slides in thread.

Joe Carlsmith

39,393 просмотров • 2 месяцев назад

The Anthropic perspective on interpretability is prominent and significant, but not inevitable. My own take is quite different. (Clip from a talk I gave; YouTube link in the thread):

The Anthropic perspective on interpretability is prominent and significant, but not inevitable. My own take is quite different. (Clip from a talk I gave; YouTube link in the thread):

Christopher Potts

55,872 просмотров • 7 месяцев назад

"Toward General Virtual Agents" I recently gave a talk at MIT. I argued that we should use tools from reinforcement learning and search to improve the capability and alignment of LLM agents. Slides: Video:

"Toward General Virtual Agents" I recently gave a talk at MIT. I argued that we should use tools from reinforcement learning and search to improve the capability and alignment of LLM agents. Slides: Video:

Stephen McAleer

164,513 просмотров • 2 лет назад

Hiii #PortfolioDay , I'm Goodness and I made these. I talk about stuff like this on my YouTube, Link in thread ;⁠)

Hiii #PortfolioDay , I'm Goodness and I made these. I talk about stuff like this on my YouTube, Link in thread ;⁠)

Eze

50,179 просмотров • 5 месяцев назад

Just posted video of my talk on "DeFi in the MEV Era" at the recent @Paradigm research workshop. 🧵

Just posted video of my talk on "DeFi in the MEV Era" at the recent @Paradigm research workshop. 🧵

ciamac moallemi

19,634 просмотров • 1 год назад

I recently gave a short talk at the International Workshop on Reimagining Democracy. The first half focused on feeling the AGI. The second half briefly outlined a new research direction I'm very excited about: leveraging AI to build unprecedentedly trustworthy institutions.

I recently gave a short talk at the International Workshop on Reimagining Democracy. The first half focused on feeling the AGI. The second half briefly outlined a new research direction I'm very excited about: leveraging AI to build unprecedentedly trustworthy institutions.

Richard Ngo

40,435 просмотров • 1 год назад

In Egypt at the moment - we made some interesting new observations at the incredible and #megalithic Osirion, at Abydos. More content on this to come! I have a full walkthrough vid from a previous visit on youtube here:

In Egypt at the moment - we made some interesting new observations at the incredible and #megalithic Osirion, at Abydos. More content on this to come! I have a full walkthrough vid from a previous visit on youtube here:

Ben van Kerkwyk - UnchartedX

49,713 просмотров • 6 месяцев назад

—On my latest episode I just found out this dude on my head again. I talk about it live in my latest video. Emotionally ? I mean…… live at 5pm eastern /2pm pacific at my YouTube channel Link ⬇️

—On my latest episode I just found out this dude on my head again. I talk about it live in my latest video. Emotionally ? I mean…… live at 5pm eastern /2pm pacific at my YouTube channel Link ⬇️

Daniel Cormier

61,747 просмотров • 11 месяцев назад

I won't be sharing the video in question but here is the announcement Anonymous made on Youtube on October 12, 2016, two days prior to the release of the video, this was initially removed from Youtube. Restored Youtube link:

I won't be sharing the video in question but here is the announcement Anonymous made on Youtube on October 12, 2016, two days prior to the release of the video, this was initially removed from Youtube. Restored Youtube link:

Dom Lucre | Breaker of Narratives

889,367 просмотров • 3 лет назад

Negotiations recently began between the US and Russia on the topic of Ukraine... For access to the full video, join the Patreon here:

Negotiations recently began between the US and Russia on the topic of Ukraine... For access to the full video, join the Patreon here:

Peter Zeihan

14,141 просмотров • 1 год назад

A breakdown of the recently released bodycam footage from the Carmello Anthony incident. Now we finally get to see what the jury saw. Full video and analysis available NOW on my Otunga Lawyer YouTube channel. Make sure to get subscribed for more content like this!

A breakdown of the recently released bodycam footage from the Carmello Anthony incident. Now we finally get to see what the jury saw. Full video and analysis available NOW on my Otunga Lawyer YouTube channel. Make sure to get subscribed for more content like this!

David Otunga

16,854 просмотров • 12 дней назад

Sir, another 4h video essay on speedrunning has just hit the YouTube feed

Sir, another 4h video essay on speedrunning has just hit the YouTube feed

Nina Saotome✨🐦‍⬛🎀

100,526 просмотров • 1 год назад

Brother Wendell and @YoungBobTPUK at Speakers Corner today to discuss the topic: There are no positives in Islam for Urban Scoop Going LIVE and DIRECT here on X and my YouTube from 1pm

Brother Wendell and @YoungBobTPUK at Speakers Corner today to discuss the topic: There are no positives in Islam for Urban Scoop Going LIVE and DIRECT here on X and my YouTube from 1pm

Tommy Robinson 🇬🇧

108,543 просмотров • 5 месяцев назад

Interpretability research has made only minor contributions to AI safety so far. What can we do to change that? (Clip from a longer talk; YouTube link in the thread):

Interpretability research has made only minor contributions to AI safety so far. What can we do to change that? (Clip from a longer talk; YouTube link in the thread):

Christopher Potts

29,135 просмотров • 7 месяцев назад

A day in my life 🏀 Find the full video on Youtube. Link is in profile and here: 📸: ISED MEDIA

Anthony Thompson

12,407 просмотров • 10 месяцев назад

I sat down with Texans Head Coach DeMeco Ryans recently to talk about life on and off the field. One of the things he talked about is how he’s delegating more this season. You can view the entire video on my YouTube Channel here: #ChalkTalk #Texans

I sat down with Texans Head Coach DeMeco Ryans recently to talk about life on and off the field. One of the things he talked about is how he’s delegating more this season. You can view the entire video on my YouTube Channel here: #ChalkTalk #Texans

Kim Davis

18,235 просмотров • 10 месяцев назад

Some time back, I gave an exposition on my understanding of the Upanishadic conception of Brahman (God). I approach the idea through first-principles reasoning and analogies from physics. A few here may find it interesting. The link to the full talk is in the first comment.

Some time back, I gave an exposition on my understanding of the Upanishadic conception of Brahman (God). I approach the idea through first-principles reasoning and analogies from physics. A few here may find it interesting. The link to the full talk is in the first comment.

prathosh ap

11,357 просмотров • 1 месяц назад

I gave my grade 9 entrepreneurship students 30 minutes to research Tesla and Elon Musk and write a tweet $TSLA I put their tweets in this thread and whoever gets the most likes wins a boost on their overall grade in my course! 👇

I gave my grade 9 entrepreneurship students 30 minutes to research Tesla and Elon Musk and write a tweet $TSLA I put their tweets in this thread and whoever gets the most likes wins a boost on their overall grade in my course! 👇

Jacob H

1,185,170 просмотров • 1 год назад