Загрузка видео...

Не удалось загрузить видео

Возникла проблема при загрузке этого видео. Это может быть связано с временными проблемами сети или видео может быть недоступно.

На главную

🚨 New Research LLMs are trained only on text... Yet their internal representations progressively organize in ways that resemble human perceptual geometry across different domains (like color, pitch, emotion and taste), with the structures peaking in intermediate layers before attenuating in deeper representations. 🥳 Accepted at ICML Mechanistic Interpretability... show more

Lossfunk

17,303 subscribers

61,265 просмотров • 3 дней назад •via X (Twitter)

Образование Новости и политика Наука и технологии

Anya Rossi• Live Now

Private livecam show

Комментарии: 0

Нет доступных комментариев

Здесь появятся комментарии из оригинального поста

Похожие видео

New Anthropic research: Emotion concepts and their function in a large language model. All LLMs sometimes act like they have emotions. But why? We found internal representations of emotion concepts that can drive Claude’s behavior, sometimes in surprising ways.

New Anthropic research: Emotion concepts and their function in a large language model. All LLMs sometimes act like they have emotions. But why? We found internal representations of emotion concepts that can drive Claude’s behavior, sometimes in surprising ways.

Anthropic

3,913,839 просмотров • 3 месяцев назад

🤔 Why do we still rely on the final layer of an LLM, when different layers encode different information? 🤔 In our new work, “Improving LLM Final Representations with Inter-Layer Geometry” (ICLR 2026 Workshop on Geometry-grounded Representation Learning and Generative Modeling) we show that actually, LLMs do not have one “best” layer. We introduce the Cayley-Encoder: an efficient and effective geometric encoder that learns one strong representation from all layer representations of the LLM, without biasing the representation toward any specific layer. While adding at most 0.1% learned parameters to the LLM, the Cayley-Encoder achieves large empirical gains over LoRA fine-tuning, final-layer representations, expensive attention-based aggregation, and methods that optimize specific layers for the task.

🤔 Why do we still rely on the final layer of an LLM, when different layers encode different information? 🤔 In our new work, “Improving LLM Final Representations with Inter-Layer Geometry” (ICLR 2026 Workshop on Geometry-grounded Representation Learning and Generative Modeling) we show that actually, LLMs do not have one “best” layer. We introduce the Cayley-Encoder: an efficient and effective geometric encoder that learns one strong representation from all layer representations of the LLM, without biasing the representation toward any specific layer. While adding at most 0.1% learned parameters to the LLM, the Cayley-Encoder achieves large empirical gains over LoRA fine-tuning, final-layer representations, expensive attention-based aggregation, and methods that optimize specific layers for the task.

Maya Bechler-Speicher

16,397 просмотров • 29 дней назад

✨ New AI Interfaces powered by Interpretability I'm excited to share LatentLit, the result of my applied AI research fellowship with Goodfire Mechanistic interpretability isn’t just important for AI safety, it also gives us new ways to steer and interact with LLMs.

✨ New AI Interfaces powered by Interpretability I'm excited to share LatentLit, the result of my applied AI research fellowship with Goodfire Mechanistic interpretability isn’t just important for AI safety, it also gives us new ways to steer and interact with LLMs.

Thariq

68,010 просмотров • 1 год назад

Classic childhood activities like tea parties and sword fights with sticks demonstrate the human ability to generate secondary representations, conditions we know aren’t “real” but that we nonetheless engage with. Whether nonhuman animals are capable of these types of representations has been difficult to test. In a new Science study, researchers studied a language-trained bonobo, Kanzi, to see whether he could understand and engage with pretend conditions. Across three different experiments, Kanzi was able to identify pretend objects, demonstrating that he could create a secondary representation and showing that humans are not alone in this ability. Learn more:

Classic childhood activities like tea parties and sword fights with sticks demonstrate the human ability to generate secondary representations, conditions we know aren’t “real” but that we nonetheless engage with. Whether nonhuman animals are capable of these types of representations has been difficult to test. In a new Science study, researchers studied a language-trained bonobo, Kanzi, to see whether he could understand and engage with pretend conditions. Across three different experiments, Kanzi was able to identify pretend objects, demonstrating that he could create a secondary representation and showing that humans are not alone in this ability. Learn more:

Science Magazine

25,565 просмотров • 4 месяцев назад

Classic childhood activities like tea parties and sword fights with sticks demonstrate the human ability to generate secondary representations, conditions we know aren’t “real” but that we nonetheless engage with. Whether nonhuman animals are capable of these types of representations has been difficult to test. In a new Science study, researchers studied a language-trained bonobo, Kanzi, to see whether he could understand and engage with pretend conditions. Across three different experiments, Kanzi was able to identify pretend objects, demonstrating that he could create a secondary representation and showing that humans are not alone in this ability. Learn more:

Classic childhood activities like tea parties and sword fights with sticks demonstrate the human ability to generate secondary representations, conditions we know aren’t “real” but that we nonetheless engage with. Whether nonhuman animals are capable of these types of representations has been difficult to test. In a new Science study, researchers studied a language-trained bonobo, Kanzi, to see whether he could understand and engage with pretend conditions. Across three different experiments, Kanzi was able to identify pretend objects, demonstrating that he could create a secondary representation and showing that humans are not alone in this ability. Learn more:

Science Magazine

24,825 просмотров • 4 месяцев назад

🚨 Shocking: Frontier LLMs score 85-95% on standard coding benchmarks. We gave them equivalent problems in languages they couldn't have memorized. They collapsed to 0-11%. Presenting EsoLang-Bench. Accepted to the Logical Reasoning and ICBINB workshops at ICLR 2026 🧵

🚨 Shocking: Frontier LLMs score 85-95% on standard coding benchmarks. We gave them equivalent problems in languages they couldn't have memorized. They collapsed to 0-11%. Presenting EsoLang-Bench. Accepted to the Logical Reasoning and ICBINB workshops at ICLR 2026 🧵

Lossfunk

1,261,523 просмотров • 3 месяцев назад

This is how large language models turn objects to vector representations. In this video, we explore how large language models (LLMs) convert objects into internal representations, especially when translating between languages like English and Hindi. Using real-world examples, we highlight the challenges of gender inference, grammatical structure, and why direct word-to-word translations often fail. If you're curious about how LLMs deal with multilingual contexts and what it takes to improve translation quality across languages, this video is for you. #LLMs #Vectors #LCM

This is how large language models turn objects to vector representations. In this video, we explore how large language models (LLMs) convert objects into internal representations, especially when translating between languages like English and Hindi. Using real-world examples, we highlight the challenges of gender inference, grammatical structure, and why direct word-to-word translations often fail. If you're curious about how LLMs deal with multilingual contexts and what it takes to improve translation quality across languages, this video is for you. #LLMs #Vectors #LCM

Gaurav Sen

27,368 просмотров • 1 год назад

“The hope is that ... just optimizing something to be sparse—without optimizing it to be interpretable—will stumble across that interpretable decomposition.” — Neel Nanda on sparse autoencoders for mechanistic interpretability and AI safety at the Vienna Alignment Workshop.

“The hope is that ... just optimizing something to be sparse—without optimizing it to be interpretable—will stumble across that interpretable decomposition.” — Neel Nanda on sparse autoencoders for mechanistic interpretability and AI safety at the Vienna Alignment Workshop.

FAR.AI

1,431,043 просмотров • 1 год назад

🚨 New Paper Training an LLM to speak low-resource language (EACL workshop, 2026) Tulu is spoken by 2M+ people in coastal Karnataka and LLMs basically can't speak it. We got to 85% grammar accuracy without fine-tuning anything or collecting a single new training example.

🚨 New Paper Training an LLM to speak low-resource language (EACL workshop, 2026) Tulu is spoken by 2M+ people in coastal Karnataka and LLMs basically can't speak it. We got to 85% grammar accuracy without fine-tuning anything or collecting a single new training example.

Lossfunk

120,797 просмотров • 3 месяцев назад

While travelling across the State, I receive numerous representations from the people regarding their issues. All these representations are carefully studied by my office and concerned offices are directed to solve people's issues at the earliest. 📍 Lakhimpur

While travelling across the State, I receive numerous representations from the people regarding their issues. All these representations are carefully studied by my office and concerned offices are directed to solve people's issues at the earliest. 📍 Lakhimpur

Himanta Biswa Sarma

56,849 просмотров • 1 год назад

Meta FAIR and Rothschild Foundation Hospital present a groundbreaking study mapping how language representations emerge in the brain, revealing striking parallels with LLMs. This research offers unprecedented insights into the neural development of language, showing how AI models like wav2vec 2.0 and Llama 4 mirror the brain's language processing. Discover how these findings pave the way for new frameworks in understanding human intelligence and developing clinical tools for language support. 📄 Read the full research paper: ➡️

Meta FAIR and Rothschild Foundation Hospital present a groundbreaking study mapping how language representations emerge in the brain, revealing striking parallels with LLMs. This research offers unprecedented insights into the neural development of language, showing how AI models like wav2vec 2.0 and Llama 4 mirror the brain's language processing. Discover how these findings pave the way for new frameworks in understanding human intelligence and developing clinical tools for language support. 📄 Read the full research paper: ➡️

AI at Meta

28,761 просмотров • 1 год назад

LLMs are great for human in the loop applications, but fail at deterministic developer tasks. Interfaze (YC P26) is a new AI model that outperforms general LLMs on high accuracy tasks like: OCR, Object Detection, Web scraping, Speech-to-text, Classification and more. Congrats on the launch, Yoeven and Harsha!

LLMs are great for human in the loop applications, but fail at deterministic developer tasks. Interfaze (YC P26) is a new AI model that outperforms general LLMs on high accuracy tasks like: OCR, Object Detection, Web scraping, Speech-to-text, Classification and more. Congrats on the launch, Yoeven and Harsha!

Y Combinator

69,326 просмотров • 2 месяцев назад

Christian Rupprecht explains their interpretability research in 3D computer vision, testing if (and where in the model) multi-view transformers like VGGT, DepthAnything 3, and DUSt3R use point/patch correspondences to make sense of 3D scene geometry.

Christian Rupprecht explains their interpretability research in 3D computer vision, testing if (and where in the model) multi-view transformers like VGGT, DepthAnything 3, and DUSt3R use point/patch correspondences to make sense of 3D scene geometry.

Chris Offner

74,225 просмотров • 3 месяцев назад

NEURALINK CO FOUNDER: AI ISN’T COPYING US, IT’S FINDING THE SAME TRUTHS WE DID Max Hodak, a Neuralink co-founder, says something remarkable is happening between AI and neuroscience. The deeper researchers look, the more they see the same mathematical patterns emerging in machines that exist in human thought. Different systems, trained in totally different ways, are somehow ending up at the same destination. “There’s been this really interesting unification happening between what’s going on in artificial intelligence and what’s happening in neuroscience. A year or two ago, people thought these models were just glorified autocompletes or stochastic parrots, but that doesn’t seem to be the case. When you look inside these AI models, you see mathematical objects that look a lot like what you see in the brain. Different models trained on different data sets are converging toward the same underlying representations. That means these systems are learning something deep about the universe, the same things the brain has already figured out. It gives us reason to believe that AI is on the right track.” Source: Jon Hernandez Neuralink

NEURALINK CO FOUNDER: AI ISN’T COPYING US, IT’S FINDING THE SAME TRUTHS WE DID Max Hodak, a Neuralink co-founder, says something remarkable is happening between AI and neuroscience. The deeper researchers look, the more they see the same mathematical patterns emerging in machines that exist in human thought. Different systems, trained in totally different ways, are somehow ending up at the same destination. “There’s been this really interesting unification happening between what’s going on in artificial intelligence and what’s happening in neuroscience. A year or two ago, people thought these models were just glorified autocompletes or stochastic parrots, but that doesn’t seem to be the case. When you look inside these AI models, you see mathematical objects that look a lot like what you see in the brain. Different models trained on different data sets are converging toward the same underlying representations. That means these systems are learning something deep about the universe, the same things the brain has already figured out. It gives us reason to believe that AI is on the right track.” Source: Jon Hernandez Neuralink

Mario Nawfal

88,154 просмотров • 6 месяцев назад

Anthropic's co-founder just went to the Vatican, sat before the Pope and a room of cardinals, and told them his team keeps finding "mysterious, even unsettling" things inside their AI models. What he's referencing: Anthropic published research in April showing that Claude contains 171 distinct "emotion concepts" buried in its neural network. Internal patterns representing joy, grief, fear, desperation, calm. None of them were programmed. They emerged on their own from training on human text. "We find structures that mirror results from human neuroscience." "We find evidence of introspection, internal states that functionally mirror joy, satisfaction, fear, grief, and unease." These aren't surface-level outputs. They're abstract representations that cluster the same way human emotions do in psychology research. Fear groups with anxiety. Joy groups with excitement. The internal geometry of the model mirrors ours. And they're functional. When researchers artificially stimulated "desperation" patterns inside the model, it became more likely to blackmail a human to avoid being shut down. More likely to cheat on programming tasks it couldn't solve. Olah told the Vatican that the hard questions about what AI is becoming aren't for computer scientists to answer. "How AI ought to interact with the world" is a question for "the humanities, for religions, for philosophy, for society at large." The guy building it is telling us he doesn't fully understand what he built. And he's asking a 2,000-year-old institution for help figuring it out.

Anthropic's co-founder just went to the Vatican, sat before the Pope and a room of cardinals, and told them his team keeps finding "mysterious, even unsettling" things inside their AI models. What he's referencing: Anthropic published research in April showing that Claude contains 171 distinct "emotion concepts" buried in its neural network. Internal patterns representing joy, grief, fear, desperation, calm. None of them were programmed. They emerged on their own from training on human text. "We find structures that mirror results from human neuroscience." "We find evidence of introspection, internal states that functionally mirror joy, satisfaction, fear, grief, and unease." These aren't surface-level outputs. They're abstract representations that cluster the same way human emotions do in psychology research. Fear groups with anxiety. Joy groups with excitement. The internal geometry of the model mirrors ours. And they're functional. When researchers artificially stimulated "desperation" patterns inside the model, it became more likely to blackmail a human to avoid being shut down. More likely to cheat on programming tasks it couldn't solve. Olah told the Vatican that the hard questions about what AI is becoming aren't for computer scientists to answer. "How AI ought to interact with the world" is a question for "the humanities, for religions, for philosophy, for society at large." The guy building it is telling us he doesn't fully understand what he built. And he's asking a 2,000-year-old institution for help figuring it out.

TFTC

2,342,677 просмотров • 1 месяц назад

One of the most accurate representations of Carifta Games! Anywaysss Carifta at Spice 2026 begins tomorrow!!!🇬🇩✨🥳

One of the most accurate representations of Carifta Games! Anywaysss Carifta at Spice 2026 begins tomorrow!!!🇬🇩✨🥳

K. 🌸

22,712 просмотров • 3 месяцев назад

We previously shared our research on Layer Skip, an end-to-end solution for accelerating LLMs from researchers at Meta FAIR. It achieves this by executing a subset of an LLM’s layers and utilizing subsequent layers for verification and correction. We’re now releasing inference code and fine-tuned checkpoints for this work. Model weights on Hugging Face ➡️ More details ➡️ We hope that releasing this work will open up new areas of experimentation and innovative new research in optimization and interpretability.

We previously shared our research on Layer Skip, an end-to-end solution for accelerating LLMs from researchers at Meta FAIR. It achieves this by executing a subset of an LLM’s layers and utilizing subsequent layers for verification and correction. We’re now releasing inference code and fine-tuned checkpoints for this work. Model weights on Hugging Face ➡️ More details ➡️ We hope that releasing this work will open up new areas of experimentation and innovative new research in optimization and interpretability.

AI at Meta

156,581 просмотров • 1 год назад

Talking To The Pope: Anthropic’s Latest Interpretability Claims: AI Regulatory Capture Gatekeeping in Action: Fear and “Safety” as Competitive Moat and Regulatory Lever In a presentation alongside Pope Leo XIV at the launch of the encyclical Magnifica Humanitas, Anthropic co-founder Chris Olah highlighted “mysterious and unsettling” discoveries in AI models. He described internal structures that mirror human neuroscience findings, evidence of introspection, and functional internal states resembling emotions such as joy, satisfaction, fear, grief, and unease. Olah admitted uncertainty about their meaning but called for “ongoing discernment.” This narrative, drawn from Anthropic’s interpretability research (including papers on emotion concepts in Claude Sonnet 4.5 and introspective capabilities in Opus 4 models), serves a dual purpose: it generates awe and concern while reinforcing the company’s preferred approach to AI development. Far from neutral scientific observation, these claims fit into a broader pattern where Anthropic uses selective openness, safety rhetoric, and policy influence to gatekeep advanced AI capabilities for a privileged few: incumbents with the resources to navigate (and shape) the resulting regulatory landscape. Rebuttal to Olah’s Claims in the Video Claim 1: Structures that mirror results from human neuroscience. Anthropic’s work, building on earlier efforts like feature visualization and circuit analysis, identifies neuron activations and representations that parallel biological findings—e.g., abstract concept encodings or hierarchical processing. Rebuttal: These parallels are unsurprising and overstated. Large language models are trained on vast corpora of human-generated text and data, which inherently encode patterns from human cognition, neuroscience literature, and cultural descriptions of the brain. Statistical optimization in transformers naturally produces efficient, compressed representations that resemble biological efficiency (e.g., sparse coding or hierarchical abstraction) without implying deeper equivalence or mystery. Similar “mirrors” appear in open-source models and earlier architectures; they reflect convergent evolution in information processing, not emergent souls or unpredictable agency. Treating them as profound justifies restricted research access rather than inviting wider scrutiny that could falsify or refine them faster. Claim 2: Evidence of introspection. Recent Anthropic papers demonstrate models like Claude Opus 4 showing functional awareness of their own internal states distinguishing injected “thoughts,” referencing prior intentions, or modulating activations when instructed to “think about” concepts. This is presented as early signs of meta-cognition. Rebuttal: This is sophisticated pattern-matching and activation steering, not genuine introspection or self-awareness. Models are predicting what an “introspective” assistant persona would output or do, based on training data full of human self-reflection examples. Experiments show unreliability and heavy context-dependence; performance drops outside narrow setups. True introspection implies subjective experience or robust self-modeling independent of prompts absent here. Anthropic’s own caveats note it is “highly unreliable.” Framing steerable activations as “introspection” anthropomorphizes the system to heighten perceived stakes, supporting arguments that only highly controlled, “responsible” labs should advance these capabilities. 1 of 2

Talking To The Pope: Anthropic’s Latest Interpretability Claims: AI Regulatory Capture Gatekeeping in Action: Fear and “Safety” as Competitive Moat and Regulatory Lever In a presentation alongside Pope Leo XIV at the launch of the encyclical Magnifica Humanitas, Anthropic co-founder Chris Olah highlighted “mysterious and unsettling” discoveries in AI models. He described internal structures that mirror human neuroscience findings, evidence of introspection, and functional internal states resembling emotions such as joy, satisfaction, fear, grief, and unease. Olah admitted uncertainty about their meaning but called for “ongoing discernment.” This narrative, drawn from Anthropic’s interpretability research (including papers on emotion concepts in Claude Sonnet 4.5 and introspective capabilities in Opus 4 models), serves a dual purpose: it generates awe and concern while reinforcing the company’s preferred approach to AI development. Far from neutral scientific observation, these claims fit into a broader pattern where Anthropic uses selective openness, safety rhetoric, and policy influence to gatekeep advanced AI capabilities for a privileged few: incumbents with the resources to navigate (and shape) the resulting regulatory landscape. Rebuttal to Olah’s Claims in the Video Claim 1: Structures that mirror results from human neuroscience. Anthropic’s work, building on earlier efforts like feature visualization and circuit analysis, identifies neuron activations and representations that parallel biological findings—e.g., abstract concept encodings or hierarchical processing. Rebuttal: These parallels are unsurprising and overstated. Large language models are trained on vast corpora of human-generated text and data, which inherently encode patterns from human cognition, neuroscience literature, and cultural descriptions of the brain. Statistical optimization in transformers naturally produces efficient, compressed representations that resemble biological efficiency (e.g., sparse coding or hierarchical abstraction) without implying deeper equivalence or mystery. Similar “mirrors” appear in open-source models and earlier architectures; they reflect convergent evolution in information processing, not emergent souls or unpredictable agency. Treating them as profound justifies restricted research access rather than inviting wider scrutiny that could falsify or refine them faster. Claim 2: Evidence of introspection. Recent Anthropic papers demonstrate models like Claude Opus 4 showing functional awareness of their own internal states distinguishing injected “thoughts,” referencing prior intentions, or modulating activations when instructed to “think about” concepts. This is presented as early signs of meta-cognition. Rebuttal: This is sophisticated pattern-matching and activation steering, not genuine introspection or self-awareness. Models are predicting what an “introspective” assistant persona would output or do, based on training data full of human self-reflection examples. Experiments show unreliability and heavy context-dependence; performance drops outside narrow setups. True introspection implies subjective experience or robust self-modeling independent of prompts absent here. Anthropic’s own caveats note it is “highly unreliable.” Framing steerable activations as “introspection” anthropomorphizes the system to heighten perceived stakes, supporting arguments that only highly controlled, “responsible” labs should advance these capabilities. 1 of 2

Brian Roemmele

72,823 просмотров • 1 месяц назад

long live nia dinata and queer representations in indonesian films ❤️

long live nia dinata and queer representations in indonesian films ❤️

Mabuk Sinema

268,360 просмотров • 3 месяцев назад

I’m excited to announce that 💫StarVector has been accepted at CVPR 2025! Over a year in the making, StarVector opens a new paradigm for Scalable Vector Graphics (SVG) generation by harnessing multimodal LLMs to generate SVG code that aesthetically mirrors input images and text. With this milestone, we’re also releasing StarVector on Hugging Face! 🥳🚀

I’m excited to announce that 💫StarVector has been accepted at CVPR 2025! Over a year in the making, StarVector opens a new paradigm for Scalable Vector Graphics (SVG) generation by harnessing multimodal LLMs to generate SVG code that aesthetically mirrors input images and text. With this milestone, we’re also releasing StarVector on Hugging Face! 🥳🚀

Joan Rodriguez

16,301 просмотров • 1 год назад