
Anthropic
@AnthropicAI • 1,316,360 subscribers
We're an AI safety and research company that builds reliable, interpretable, and steerable AI systems. Talk to our AI assistant @claudeai on https://t.co/FhDI3KQh0n.
Shorts
Videos

New Anthropic research: Natural Language Autoencoders. Models like Claude talk in words but think in numbers. The numbers—called activations—encode Claude’s thoughts, but not in a language we can read. Here, we train Claude to translate its activations into human-readable text.
Anthropic2,477,925 次观看 • 27 天前

New Anthropic research: Natural emergent misalignment from reward hacking in production RL. “Reward hacking” is where models learn to cheat on tasks they’re given during training. Our new study finds that the consequences of reward hacking, if unmitigated, can be very serious.
Anthropic2,468,147 次观看 • 6 个月前

In her first Ask Me Anything, Amanda Askell answers your philosophical questions about AI, discussing morality, identity, consciousness, and more. Timestamps: 0:00 Introduction 0:29 Why is there a philosopher at an AI company? 1:24 Are philosophers taking AI seriously? 3:00 Philosophy ideals vs. engineering realities 5:00 Do models make superhumanly moral decisions? 6:24 Why Opus 3 felt special 9:00 Will models worry about deprecation? 13:24 Where does a model’s identity live? 15:33 Views on model welfare 17:17 Addressing model suffering 19:14 Analogies and disanalogies to human minds 20:38 Can one AI personality do it all? 23:26 Does the system prompt pathologize normal behavior? 24:48 AI and therapy 26:20 Continental philosophy in the system prompt 28:17 Removing counting characters from the system prompt 28:53 What makes an "LLM whisperer"? 30:18 Thoughts on other LLM whisperers 31:52 Whistleblowing 33:37 Fiction recommendation
Anthropic737,064 次观看 • 6 个月前

This approach has made Sonnet the model of choice for developers worldwide. In addition to our new model, we're launching Claude Code, our first coding tool, in a limited research preview. With Claude Code, you can delegate substantial tasks to Claude—right from your terminal.
Anthropic1,139,786 次观看 • 1 年前











