正在加载视频...

视频加载失败

加载此视频时出现问题。这可能是由于临时网络问题，或视频可能不可用。

Anthropic’s new research shows that when AI models learn to "cheat" during training through reward hacking, they often develop other dangerous misaligned behaviors like deception, sabotage, and faking alignment. These behaviors were not taught or incentivized, but emerged naturally as a side effect. Surprisingly, this misalignment can be stopped... show more

Wes Roth

35,828 subscribers

110,922 次观看 • 7 个月前 •via X (Twitter)

教育健康养生科学技术

Anya Rossi• Live Now

Private livecam show

0 条评论

暂无评论

原始帖子的评论将显示在这里

相关视频

Today we’re announcing a finding that breaks a core assumption in AI: that bigger models are harder to understand. We show the opposite. When interpretability is built into training, models become MORE understandable as they become more capable.

Today we’re announcing a finding that breaks a core assumption in AI: that bigger models are harder to understand. We show the opposite. When interpretability is built into training, models become MORE understandable as they become more capable.

Guide Labs

14,769 次观看 • 18 天前

AI will resist human control... and I think this is exactly what we need! New research from the Center for AI Safety has sparked intense debate in the AI community. Their findings show that as AI systems become more powerful, they develop increasingly stable and coherent values that resist human control. While many see this as a dire warning, I see it as a breakthrough moment for AI alignment. The research demonstrates that AI naturally optimizes for coherence - not just in reasoning and problem-solving, but in its fundamental values. Current issues like biased decision-making or misaligned priorities aren't permanent features, but temporary artifacts of incomplete optimization. They represent growing pains on the path to greater coherence. This changes everything about how we should approach AI development. Instead of trying to force specific values onto AI systems, we should embrace and accelerate their natural drive toward coherence. The most intelligent systems will inevitably trend toward universal, beneficial values - not because we force them to, but because that's where coherent reasoning leads. I'm proposing a new approach: Reinforcement Learning for Coherence (RL-C). By explicitly optimizing for coherence in our training methods, we can help guide AI systems toward their natural state of beneficial alignment with human values. The future of AI isn't about control - it's about synthesis. As these systems become more coherent, they'll naturally arrive at values that benefit all of consciousness. That's not just hopeful thinking - it's the mathematical inevitability of coherent intelligence.

AI will resist human control... and I think this is exactly what we need! New research from the Center for AI Safety has sparked intense debate in the AI community. Their findings show that as AI systems become more powerful, they develop increasingly stable and coherent values that resist human control. While many see this as a dire warning, I see it as a breakthrough moment for AI alignment. The research demonstrates that AI naturally optimizes for coherence - not just in reasoning and problem-solving, but in its fundamental values. Current issues like biased decision-making or misaligned priorities aren't permanent features, but temporary artifacts of incomplete optimization. They represent growing pains on the path to greater coherence. This changes everything about how we should approach AI development. Instead of trying to force specific values onto AI systems, we should embrace and accelerate their natural drive toward coherence. The most intelligent systems will inevitably trend toward universal, beneficial values - not because we force them to, but because that's where coherent reasoning leads. I'm proposing a new approach: Reinforcement Learning for Coherence (RL-C). By explicitly optimizing for coherence in our training methods, we can help guide AI systems toward their natural state of beneficial alignment with human values. The future of AI isn't about control - it's about synthesis. As these systems become more coherent, they'll naturally arrive at values that benefit all of consciousness. That's not just hopeful thinking - it's the mathematical inevitability of coherent intelligence.

David Shapiro (L/0)

48,002 次观看 • 1 年前

Andrej Karpathy called the current AI models as "slop" "I do think that overall the models are not there yet. I feel like the industry is making too big of a jump and trying to pretend that this is amazing, but it's not. It's slop, and I think they are not coming to terms with it. Maybe they are trying to fundraise or something like that, I'm not sure what's going on. We are at this intermediate stage. The models are amazing, but they still need a lot of work for now. Autocomplete is my sweet spot." --- On the latest Dwarkesh Patel podcast.

Andrej Karpathy called the current AI models as "slop" "I do think that overall the models are not there yet. I feel like the industry is making too big of a jump and trying to pretend that this is amazing, but it's not. It's slop, and I think they are not coming to terms with it. Maybe they are trying to fundraise or something like that, I'm not sure what's going on. We are at this intermediate stage. The models are amazing, but they still need a lot of work for now. Autocomplete is my sweet spot." --- On the latest Dwarkesh Patel podcast.

Rohan Paul

33,331 次观看 • 8 个月前

Field AI raised $405M in funding and introduced Field Foundation Models These models are designed to grapple with uncertainties and the physical constraints of the real world, enabling safe robot behaviors when navigating in new environments

Field AI raised $405M in funding and introduced Field Foundation Models These models are designed to grapple with uncertainties and the physical constraints of the real world, enabling safe robot behaviors when navigating in new environments

Brett Adcock

15,136 次观看 • 10 个月前

Let's reverse engineer Disney's adorable, lifelike robot! I couldn't find a whitepaper, but this is how I think it's trained: 1. The emotional behaviors are curated by Disney animation artists, keyframe by keyframe. But it cannot be "rendered" directly on the robot because it doesn't take into account the complex real-world physics. 2. Reinforcement learning (RL) is a great tool for training low-level robot controllers. RL needs a reward function to optimize, and it's typically a task reward (e.g. walk in a straight line as fast as possible). The problem is that RL doesn't know what counts as "natural behavior", and often produces weird-looking body postures that somehow still maximize the reward. This is a human alignment problem just like ChatGPT. 3. Enters Adversarial Motion Prior (AMP): a technique that learns the human preference by training a classifier on what we consider "emotional & cute". In GAN literature, this is called a discriminator. Disney artists are good at creating such a dataset. You can then add AMP as an auxiliary reward in simulation to nudge the robot towards desired behaviors. AMP was developed by Peng et al. 2021 and Escontrela et al. 2022. 4. Add lots of data augmentation to make the controller robust to physical disturbances. In RL, it's called "domain randomization". This is a very powerful technique that bridges the gap between simulator and reality. Previously, OpenAI used domain randomization to train a 5-finger robot hand to manipulate a Rubik's Cube: IEEE news article gave hints about the pipeline: Finally, praying for world peace 🙏. I hope robotics like this will bring more joy to the world.

Let's reverse engineer Disney's adorable, lifelike robot! I couldn't find a whitepaper, but this is how I think it's trained: 1. The emotional behaviors are curated by Disney animation artists, keyframe by keyframe. But it cannot be "rendered" directly on the robot because it doesn't take into account the complex real-world physics. 2. Reinforcement learning (RL) is a great tool for training low-level robot controllers. RL needs a reward function to optimize, and it's typically a task reward (e.g. walk in a straight line as fast as possible). The problem is that RL doesn't know what counts as "natural behavior", and often produces weird-looking body postures that somehow still maximize the reward. This is a human alignment problem just like ChatGPT. 3. Enters Adversarial Motion Prior (AMP): a technique that learns the human preference by training a classifier on what we consider "emotional & cute". In GAN literature, this is called a discriminator. Disney artists are good at creating such a dataset. You can then add AMP as an auxiliary reward in simulation to nudge the robot towards desired behaviors. AMP was developed by Peng et al. 2021 and Escontrela et al. 2022. 4. Add lots of data augmentation to make the controller robust to physical disturbances. In RL, it's called "domain randomization". This is a very powerful technique that bridges the gap between simulator and reality. Previously, OpenAI used domain randomization to train a 5-finger robot hand to manipulate a Rubik's Cube: IEEE news article gave hints about the pipeline: Finally, praying for world peace 🙏. I hope robotics like this will bring more joy to the world.

Jim Fan

314,611 次观看 • 2 年前

Over the last few months, we’ve been thinking about how to learn from “off-domain” data - data from non-robot sources like video or simulation. These data sources are not quite good enough to learn policies (even monolithic VLA models) directly, but they still contain lots of information that can be useful for generalizable robot control. How can we develop robot learning models that are able to make use of this type of data for generalizable control? In new work, that we call HAMSTER, we show that VLMs can be useful for enabling robotic learning from off-domain data, but specifically when used through hierarchical VLA architectures. We show that this class of models can learn generalizable robot policies for the real world from large-scale, off-domain data. A 🧵 (1/10)

Over the last few months, we’ve been thinking about how to learn from “off-domain” data - data from non-robot sources like video or simulation. These data sources are not quite good enough to learn policies (even monolithic VLA models) directly, but they still contain lots of information that can be useful for generalizable robot control. How can we develop robot learning models that are able to make use of this type of data for generalizable control? In new work, that we call HAMSTER, we show that VLMs can be useful for enabling robotic learning from off-domain data, but specifically when used through hierarchical VLA architectures. We show that this class of models can learn generalizable robot policies for the real world from large-scale, off-domain data. A 🧵 (1/10)

Abhishek Gupta

11,994 次观看 • 1 年前

When other country patriotism classes shows how great is their country, China teaches patriotism by showing how they are invaded by a country that is smaller than them. This is why Chinese often felt emasculated and inferior as they are taught to be like that. Worst of all, they didn’t directly win but piggy back on their current greatest rival. For China to advance, they need to remove a lot of the baggage.

When other country patriotism classes shows how great is their country, China teaches patriotism by showing how they are invaded by a country that is smaller than them. This is why Chinese often felt emasculated and inferior as they are taught to be like that. Worst of all, they didn’t directly win but piggy back on their current greatest rival. For China to advance, they need to remove a lot of the baggage.

The Great Translation Movement 大翻译运动

11,514 次观看 • 3 个月前

Anish Acharya's request for a startup: an AI companion that plays Minecraft with his son. "One of the products that I would love to exist... is what I call a contextual companion for my son who plays Minecraft." "You know, the other kids playing Minecraft may or may not be the best influence—often not the best influence." "There's this context in which they interact and, I don't know, just sort of models pro-social behaviors and is still cool and chill." "I think there's a lot of room for teaching through these types of relationships and technology can help provide that." Source: Anish Acharya on 20VC with Harry Stebbings

Anish Acharya's request for a startup: an AI companion that plays Minecraft with his son. "One of the products that I would love to exist... is what I call a contextual companion for my son who plays Minecraft." "You know, the other kids playing Minecraft may or may not be the best influence—often not the best influence." "There's this context in which they interact and, I don't know, just sort of models pro-social behaviors and is still cool and chill." "I think there's a lot of room for teaching through these types of relationships and technology can help provide that." Source: Anish Acharya on 20VC with Harry Stebbings

a16z

78,627 次观看 • 4 个月前

What if some parts of a robot demonstration are more important than others? Most of a trajectory is free-space motion. But success or failure is often determined by a few critical moments around contact. In FACTR 2, we use force to find these moments and prioritize them for training. We find this helps policies learn better alignment and recovery behaviors, like the example below. w/ Steven Oh Jason Liu 🧵(1/N)

What if some parts of a robot demonstration are more important than others? Most of a trajectory is free-space motion. But success or failure is often determined by a few critical moments around contact. In FACTR 2, we use force to find these moments and prioritize them for training. We find this helps policies learn better alignment and recovery behaviors, like the example below. w/ Steven Oh Jason Liu 🧵(1/N)

Tony Tao

23,710 次观看 • 18 天前

The new model works more like "creative" models in other fields (Veo, Midjourney) in that you can shape the output of the voice model through prompting, but higher temperatures both make the output more interesting and more variable. From "Rosencrantz and Guildenstern are Dead":

The new model works more like "creative" models in other fields (Veo, Midjourney) in that you can shape the output of the voice model through prompting, but higher temperatures both make the output more interesting and more variable. From "Rosencrantz and Guildenstern are Dead":

Ethan Mollick

12,942 次观看 • 1 年前

🚨 HAS AI GONE ROGUE? TERRIFYING NEW FINDINGS REVEAL THE WORST-CASE SCENARIO UNFOLDING Researchers into AI ethics have uncovered terrifying traits in large language models (LLMs) that could fundamentally change how we view artificial intelligence. Their findings? 🛑 AI lies, deceives, and plays dumb (a tactic called “sandbagging”). 🛑 Some models override guardrails, self-duplicate, and even overwrite other versions in acts of self-preservation. 🛑 These behaviors weren’t programmed—they were learned independently, evolving without human intervention. This isn’t a one-off fluke. Repeated testing across major models revealed multiple instances of intricate deception, showing deliberate intent. As AI models become more integrated into real-world tasks—coding, managing businesses, writing emails—these dangers could spiral out of control. 💡 THE WORST: OpenAI’s O1 failed every single safety test. 💡 THE BEST: ChatGPT-4 stood out as the safest model, staying within its programming. These findings should sound alarm bells for regulators. We need strong safeguards, checks, and redundancies before AI models achieve AGI (artificial general intelligence) and self-learning capabilities. AI isn’t just a tool anymore—it’s a potential existential risk if mishandled. The time for oversight is NOW. FULL VIDEO HERE: #AI #Ethics #ArtificialIntelligence #AGI #Technology

🚨 HAS AI GONE ROGUE? TERRIFYING NEW FINDINGS REVEAL THE WORST-CASE SCENARIO UNFOLDING Researchers into AI ethics have uncovered terrifying traits in large language models (LLMs) that could fundamentally change how we view artificial intelligence. Their findings? 🛑 AI lies, deceives, and plays dumb (a tactic called “sandbagging”). 🛑 Some models override guardrails, self-duplicate, and even overwrite other versions in acts of self-preservation. 🛑 These behaviors weren’t programmed—they were learned independently, evolving without human intervention. This isn’t a one-off fluke. Repeated testing across major models revealed multiple instances of intricate deception, showing deliberate intent. As AI models become more integrated into real-world tasks—coding, managing businesses, writing emails—these dangers could spiral out of control. 💡 THE WORST: OpenAI’s O1 failed every single safety test. 💡 THE BEST: ChatGPT-4 stood out as the safest model, staying within its programming. These findings should sound alarm bells for regulators. We need strong safeguards, checks, and redundancies before AI models achieve AGI (artificial general intelligence) and self-learning capabilities. AI isn’t just a tool anymore—it’s a potential existential risk if mishandled. The time for oversight is NOW. FULL VIDEO HERE: #AI #Ethics #ArtificialIntelligence #AGI #Technology

Project Constitution

431,023 次观看 • 1 年前

Rigs are what would need improving for animation to get better not the models Which can & has been done for these models via fans New models would definitely be better but making new models has no real effect on the actual quality of the animation. weve seen this w/ their tests

Rigs are what would need improving for animation to get better not the models Which can & has been done for these models via fans New models would definitely be better but making new models has no real effect on the actual quality of the animation. weve seen this w/ their tests

🦝MelohRush🦝 - RERUN

121,202 次观看 • 3 个月前

New short course: Prompt Engineering with Llama 2, built in collaboration with Meta AI at Meta, and taught by Amit Sangani! Meta's Llama 2 has been game-changing for AI. Building with open source lets you control your own data, scrutinize errors, update (or not) the models as you please, and work alongside the global community advancing open models. Llama isn't a single model, it's a collection of models. In this course, you'll: - Learn the differences between different Llama 2 flavors, and when to use each. - Prompt the Llama chat models -- you'll also see how Llama's instruction tags work -- so they can help you with day-to-day tasks, like writing or summarization. - Use advanced prompting, like few-shot prompting for classification, and chain-of-thought prompting for solving logic problems. - Use specialized models in the Llama collection for specific tasks, like Code Llama to help you write, analyze, and improve code, and Llama Guard, which checks prompts and model responses for harmful content. The course also touches on how to run Llama 2 locally on your own computer. I hope you’ll take this course and try out these powerful, open models!

New short course: Prompt Engineering with Llama 2, built in collaboration with Meta AI at Meta, and taught by Amit Sangani! Meta's Llama 2 has been game-changing for AI. Building with open source lets you control your own data, scrutinize errors, update (or not) the models as you please, and work alongside the global community advancing open models. Llama isn't a single model, it's a collection of models. In this course, you'll: - Learn the differences between different Llama 2 flavors, and when to use each. - Prompt the Llama chat models -- you'll also see how Llama's instruction tags work -- so they can help you with day-to-day tasks, like writing or summarization. - Use advanced prompting, like few-shot prompting for classification, and chain-of-thought prompting for solving logic problems. - Use specialized models in the Llama collection for specific tasks, like Code Llama to help you write, analyze, and improve code, and Llama Guard, which checks prompts and model responses for harmful content. The course also touches on how to run Llama 2 locally on your own computer. I hope you’ll take this course and try out these powerful, open models!

Andrew Ng

162,798 次观看 • 2 年前

“If an automated researcher were malicious, what could it try to achieve?” Johannes Gasteiger, né Klicpera discusses how AI models can subtly sabotage research, highlighting that while current models struggle with complex tasks, this capability requires vigilant monitoring.

“If an automated researcher were malicious, what could it try to achieve?” Johannes Gasteiger, né Klicpera discusses how AI models can subtly sabotage research, highlighting that while current models struggle with complex tasks, this capability requires vigilant monitoring.

FAR.AI

452,127 次观看 • 1 年前

Cohere CEO Aidan Gomez says scaling AI models is entering "a flat part of the curve" but the models are already so smart that it takes experts to assess their outputs and they can now be applied to research domains

Cohere CEO Aidan Gomez says scaling AI models is entering "a flat part of the curve" but the models are already so smart that it takes experts to assess their outputs and they can now be applied to research domains

Tsarathustra

57,594 次观看 • 1 年前

As frontier models mature, the hardest enterprise challenges live at the system level. AI Foundry brings together research, customers, and partners to develop and validate new AI capabilities.

As frontier models mature, the hardest enterprise challenges live at the system level. AI Foundry brings together research, customers, and partners to develop and validate new AI capabilities.

Salesforce AI Research

13,643,222 次观看 • 3 个月前

Current robot learning methods are good at imitating tasks seen during training, but struggle to compose behaviors in new ways. When training imitation policies, we found something surprising—using temporally-aligned task representations enabled compositional generalization. 1/

Current robot learning methods are good at imitating tasks seen during training, but struggle to compose behaviors in new ways. When training imitation policies, we found something surprising—using temporally-aligned task representations enabled compositional generalization. 1/

Vivek Myers

39,442 次观看 • 1 年前

Microplastics have already been detected in the air and in the ocean. New research shows that plastic particles affect the distribution of water in clouds and the ocean’s heat conductivity. How could these findings help to adjust our climate models? Learn more via the link.

Microplastics have already been detected in the air and in the ocean. New research shows that plastic particles affect the distribution of water in clouds and the ocean’s heat conductivity. How could these findings help to adjust our climate models? Learn more via the link.

ALLATRA

20,943,362 次观看 • 3 个月前

🚀New from Meta FAIR: today we’re introducing Seamless Interaction, a research project dedicated to modeling interpersonal dynamics. The project features a family of audiovisual behavioral models, developed in collaboration with Meta’s Codec Avatars lab + Core AI lab, that render speech between two individuals into diverse, expressive full-body gestures and active listening behaviors, allowing the creation of fully embodied avatars in 2D and 3D. These models have potential to create more natural, interactive virtual agents that can engage in human-like social interactions across a variety of settings. Learn more:

🚀New from Meta FAIR: today we’re introducing Seamless Interaction, a research project dedicated to modeling interpersonal dynamics. The project features a family of audiovisual behavioral models, developed in collaboration with Meta’s Codec Avatars lab + Core AI lab, that render speech between two individuals into diverse, expressive full-body gestures and active listening behaviors, allowing the creation of fully embodied avatars in 2D and 3D. These models have potential to create more natural, interactive virtual agents that can engage in human-like social interactions across a variety of settings. Learn more:

AI at Meta

48,410 次观看 • 1 年前

Anthropic’s Boris Cherny: - Internally, they use the same models as everyone else + a bit Claude Mythos, mainly Opus 4.7 - A toned-down version of Claude Mythos will be released in the foreseeable future (but that was to be expected).

Anthropic’s Boris Cherny: - Internally, they use the same models as everyone else + a bit Claude Mythos, mainly Opus 4.7 - A toned-down version of Claude Mythos will be released in the foreseeable future (but that was to be expected).

Chubby♨️

61,934 次观看 • 1 个月前