Loading video...

Video Failed to Load

There was a problem loading this video. This could be due to a temporary network issue or the video might be unavailable.

Today we introduce T-Free, a new paradigm in language processing. Tokenization is one of the core building blocks of large language models (LLMs), transforming natural language into numeric representations for further processing. (1/3) 🔗 #writtenbyalephalpha

Aleph Alpha

8,908 subscribers

18,121 views • 1 year ago •via X (Twitter)

Education Science & Technology

Anya Rossi• Live Now

Private livecam show

2 Comments

Aleph Alpha1 year ago

Our innovation, T-Free, offers a novel approach to tokenization, boosting tokenizer fertility across various languages, and reducing the size of the embedding layer by up to 75% compared to traditional tokenizers. Early experiments with T-Free show promising results and could unlock new possibilities in LLMs, including: - Up to 50% reduction in training and inference costs - Improved semantic encoding of language - Enhanced performance in multilingual models (2/3)

Aleph Alpha1 year ago

Read our full paper here: Dive into the source code of T-Free: Try out our interim research model checkpoints: (3/3)

Related Videos

Meta FAIR and Rothschild Foundation Hospital present a groundbreaking study mapping how language representations emerge in the brain, revealing striking parallels with LLMs. This research offers unprecedented insights into the neural development of language, showing how AI models like wav2vec 2.0 and Llama 4 mirror the brain's language processing. Discover how these findings pave the way for new frameworks in understanding human intelligence and developing clinical tools for language support. 📄 Read the full research paper: ➡️

Meta FAIR and Rothschild Foundation Hospital present a groundbreaking study mapping how language representations emerge in the brain, revealing striking parallels with LLMs. This research offers unprecedented insights into the neural development of language, showing how AI models like wav2vec 2.0 and Llama 4 mirror the brain's language processing. Discover how these findings pave the way for new frameworks in understanding human intelligence and developing clinical tools for language support. 📄 Read the full research paper: ➡️

AI at Meta

28,761 views • 1 year ago

This is how large language models turn objects to vector representations. In this video, we explore how large language models (LLMs) convert objects into internal representations, especially when translating between languages like English and Hindi. Using real-world examples, we highlight the challenges of gender inference, grammatical structure, and why direct word-to-word translations often fail. If you're curious about how LLMs deal with multilingual contexts and what it takes to improve translation quality across languages, this video is for you. #LLMs #Vectors #LCM

This is how large language models turn objects to vector representations. In this video, we explore how large language models (LLMs) convert objects into internal representations, especially when translating between languages like English and Hindi. Using real-world examples, we highlight the challenges of gender inference, grammatical structure, and why direct word-to-word translations often fail. If you're curious about how LLMs deal with multilingual contexts and what it takes to improve translation quality across languages, this video is for you. #LLMs #Vectors #LCM

Gaurav Sen

27,368 views • 1 year ago

LP-MusicCaps: LLM-Based Pseudo Music Captioning paper page: Automatic music captioning, which generates natural language descriptions for given music tracks, holds significant potential for enhancing the understanding and organization of large volumes of musical data. Despite its importance, researchers face challenges due to the costly and time-consuming collection process of existing music-language datasets, which are limited in size. To address this data scarcity issue, we propose the use of large language models (LLMs) to artificially generate the description sentences from large-scale tag datasets. This results in approximately 2.2M captions paired with 0.5M audio clips. We term it Large Language Model based Pseudo music caption dataset, shortly, LP-MusicCaps. We conduct a systemic evaluation of the large-scale music captioning dataset with various quantitative evaluation metrics used in the field of natural language processing as well as human evaluation. In addition, we trained a transformer-based music captioning model with the dataset and evaluated it under zero-shot and transfer-learning settings. The results demonstrate that our proposed approach outperforms the supervised baseline model.

LP-MusicCaps: LLM-Based Pseudo Music Captioning paper page: Automatic music captioning, which generates natural language descriptions for given music tracks, holds significant potential for enhancing the understanding and organization of large volumes of musical data. Despite its importance, researchers face challenges due to the costly and time-consuming collection process of existing music-language datasets, which are limited in size. To address this data scarcity issue, we propose the use of large language models (LLMs) to artificially generate the description sentences from large-scale tag datasets. This results in approximately 2.2M captions paired with 0.5M audio clips. We term it Large Language Model based Pseudo music caption dataset, shortly, LP-MusicCaps. We conduct a systemic evaluation of the large-scale music captioning dataset with various quantitative evaluation metrics used in the field of natural language processing as well as human evaluation. In addition, we trained a transformer-based music captioning model with the dataset and evaluated it under zero-shot and transfer-learning settings. The results demonstrate that our proposed approach outperforms the supervised baseline model.

AK

78,794 views • 3 years ago

How would Sam Blackshear explain what the Move Programming Language is to his mom? Sam Blackshear says that it's a language for programming with money. Sam saw developers forced to rebuild core financial behaviors from scratch with lots of room for error. That’s what sparked the creation of the Move language: to embed safe, reusable building blocks for money - right into the language itself. MystenLabs.sui Sui Walrus 🦭/acc

How would Sam Blackshear explain what the Move Programming Language is to his mom? Sam Blackshear says that it's a language for programming with money. Sam saw developers forced to rebuild core financial behaviors from scratch with lots of room for error. That’s what sparked the creation of the Move language: to embed safe, reusable building blocks for money - right into the language itself. MystenLabs.sui Sui Walrus 🦭/acc

MR SHIFT 🦁

49,948 views • 11 months ago

🚀 Excited to announce the first release of a novel open source programming language and platform for language model interaction! Combining prompts, constraints & scripting, LMQL elevates the capabilities of large language models. 🧵1/6 A quick tour.

🚀 Excited to announce the first release of a novel open source programming language and platform for language model interaction! Combining prompts, constraints & scripting, LMQL elevates the capabilities of large language models. 🧵1/6 A quick tour.

LMQL (Language Model Query Language)

198,966 views • 3 years ago

Google presents AudioPaLM: A Large Language Model That Can Speak and Listen paper page: introduce AudioPaLM, a large language model for speech understanding and generation. AudioPaLM fuses text-based and speech-based language models, PaLM-2 [Anil et al., 2023] and AudioLM [Borsos et al., 2022], into a unified multimodal architecture that can process and generate text and speech with applications including speech recognition and speech-to-speech translation. AudioPaLM inherits the capability to preserve paralinguistic information such as speaker identity and intonation from AudioLM and the linguistic knowledge present only in text large language models such as PaLM-2. We demonstrate that initializing AudioPaLM with the weights of a text-only large language model improves speech processing, successfully leveraging the larger quantity of text training data used in pretraining to assist with the speech tasks. The resulting model significantly outperforms existing systems for speech translation tasks and has the ability to perform zero-shot speech-to-text translation for many languages for which input/target language combinations were not seen in training. AudioPaLM also demonstrates features of audio language models, such as transferring a voice across languages based on a short spoken prompt.

Google presents AudioPaLM: A Large Language Model That Can Speak and Listen paper page: introduce AudioPaLM, a large language model for speech understanding and generation. AudioPaLM fuses text-based and speech-based language models, PaLM-2 [Anil et al., 2023] and AudioLM [Borsos et al., 2022], into a unified multimodal architecture that can process and generate text and speech with applications including speech recognition and speech-to-speech translation. AudioPaLM inherits the capability to preserve paralinguistic information such as speaker identity and intonation from AudioLM and the linguistic knowledge present only in text large language models such as PaLM-2. We demonstrate that initializing AudioPaLM with the weights of a text-only large language model improves speech processing, successfully leveraging the larger quantity of text training data used in pretraining to assist with the speech tasks. The resulting model significantly outperforms existing systems for speech translation tasks and has the ability to perform zero-shot speech-to-text translation for many languages for which input/target language combinations were not seen in training. AudioPaLM also demonstrates features of audio language models, such as transferring a voice across languages based on a short spoken prompt.

AK

290,517 views • 3 years ago

New Anthropic research: Emotion concepts and their function in a large language model. All LLMs sometimes act like they have emotions. But why? We found internal representations of emotion concepts that can drive Claude’s behavior, sometimes in surprising ways.

New Anthropic research: Emotion concepts and their function in a large language model. All LLMs sometimes act like they have emotions. But why? We found internal representations of emotion concepts that can drive Claude’s behavior, sometimes in surprising ways.

Anthropic

3,926,559 views • 3 months ago

We believe an open approach is the right one for the development of today's Al models. Today, we’re releasing Llama 2, the next generation of Meta’s open source Large Language Model, available for free for research & commercial use. Details ➡️

We believe an open approach is the right one for the development of today's Al models. Today, we’re releasing Llama 2, the next generation of Meta’s open source Large Language Model, available for free for research & commercial use. Details ➡️

AI at Meta

1,234,744 views • 3 years ago

Here's my conversation with Edward Gibson (Ted Gibson, Language Lab MIT), a linguist and psychologist at MIT, heading the MIT Language Lab. We talk all about the human language: syntax, grammar, structure, theories of language, evolution of language, how it reflects culture, and of course LLMs, both their amazing power and their limitations. It's here on X in full, and is up on YouTube, Spotify, and everywhere else. Links in comment. Timestamps: 0:00 - Introduction 1:13 - Human language 5:19 - Generalizations in language 11:06 - Dependency grammar 21:05 - Morphology 29:40 - Evolution of languages 33:00 - Noam Chomsky 1:17:06 - Thinking and language 1:30:36 - LLMs 1:43:35 - Center embedding 2:10:02 - Learning a new language 2:13:54 - Nature vs nurture 2:20:30 - Culture and language 2:34:58 - Universal language 2:39:21 - Language translation 2:42:36 - Animal communication

Here's my conversation with Edward Gibson (Ted Gibson, Language Lab MIT), a linguist and psychologist at MIT, heading the MIT Language Lab. We talk all about the human language: syntax, grammar, structure, theories of language, evolution of language, how it reflects culture, and of course LLMs, both their amazing power and their limitations. It's here on X in full, and is up on YouTube, Spotify, and everywhere else. Links in comment. Timestamps: 0:00 - Introduction 1:13 - Human language 5:19 - Generalizations in language 11:06 - Dependency grammar 21:05 - Morphology 29:40 - Evolution of languages 33:00 - Noam Chomsky 1:17:06 - Thinking and language 1:30:36 - LLMs 1:43:35 - Center embedding 2:10:02 - Learning a new language 2:13:54 - Nature vs nurture 2:20:30 - Culture and language 2:34:58 - Universal language 2:39:21 - Language translation 2:42:36 - Animal communication

Lex Fridman

340,267 views • 2 years ago

MotionGPT: Human Motion as a Foreign Language paper page: Though the advancement of pre-trained large language models unfolds, the exploration of building a unified model for language and other multi-modal data, such as motion, remains challenging and untouched so far. Fortunately, human motion displays a semantic coupling akin to human language, often perceived as a form of body language. By fusing language data with large-scale motion models, motion-language pre-training that can enhance the performance of motion-related tasks becomes feasible. Driven by this insight, we propose MotionGPT, a unified, versatile, and user-friendly motion-language model to handle multiple motion-relevant tasks. Specifically, we employ the discrete vector quantization for human motion and transfer 3D motion into motion tokens, similar to the generation process of word tokens. Building upon this "motion vocabulary", we perform language modeling on both motion and text in a unified manner, treating human motion as a specific language. Moreover, inspired by prompt learning, we pre-train MotionGPT with a mixture of motion-language data and fine-tune it on prompt-based question-and-answer tasks. Extensive experiments demonstrate that MotionGPT achieves state-of-the-art performances on multiple motion tasks including text-driven motion generation, motion captioning, motion prediction, and motion in-between.

MotionGPT: Human Motion as a Foreign Language paper page: Though the advancement of pre-trained large language models unfolds, the exploration of building a unified model for language and other multi-modal data, such as motion, remains challenging and untouched so far. Fortunately, human motion displays a semantic coupling akin to human language, often perceived as a form of body language. By fusing language data with large-scale motion models, motion-language pre-training that can enhance the performance of motion-related tasks becomes feasible. Driven by this insight, we propose MotionGPT, a unified, versatile, and user-friendly motion-language model to handle multiple motion-relevant tasks. Specifically, we employ the discrete vector quantization for human motion and transfer 3D motion into motion tokens, similar to the generation process of word tokens. Building upon this "motion vocabulary", we perform language modeling on both motion and text in a unified manner, treating human motion as a specific language. Moreover, inspired by prompt learning, we pre-train MotionGPT with a mixture of motion-language data and fine-tune it on prompt-based question-and-answer tasks. Extensive experiments demonstrate that MotionGPT achieves state-of-the-art performances on multiple motion tasks including text-driven motion generation, motion captioning, motion prediction, and motion in-between.

AK

125,319 views • 3 years ago

Just dropped a 4 hour lecture on "Large Language Models": 0:00 Basics of language models 2:30 Word2vec 16:27 Transfer Learning 19:23 BERT 1:00:39 T5 1:31:14 GPT1-3 1:53:05 ChatGPT 2:20:03 LLMs as Deep RL 2:53:00 Policy Gradient 3:32:50 Train your own LLM

Just dropped a 4 hour lecture on "Large Language Models": 0:00 Basics of language models 2:30 Word2vec 16:27 Transfer Learning 19:23 BERT 1:00:39 T5 1:31:14 GPT1-3 1:53:05 ChatGPT 2:20:03 LLMs as Deep RL 2:53:00 Policy Gradient 3:32:50 Train your own LLM

Soheil Feizi

217,255 views • 2 years ago

Yann LeCun argues that large language models (LLMs) cannot reach human-level or superintelligence just by scaling. He says the current LLM paradigm is hitting its limits. Many researchers are now exploring “agentic systems,” but building them on top of LLMs alone is flawed. LLMs can't plan actions well because they don’t truly understand or predict consequences. To get intelligent behavior, we need something fundamentally different.

Yann LeCun argues that large language models (LLMs) cannot reach human-level or superintelligence just by scaling. He says the current LLM paradigm is hitting its limits. Many researchers are now exploring “agentic systems,” but building them on top of LLMs alone is flawed. LLMs can't plan actions well because they don’t truly understand or predict consequences. To get intelligent behavior, we need something fundamentally different.

Wes Roth

71,832 views • 5 months ago

3D-LLM: Injecting the 3D World into Large Language Models paper page: Large language models (LLMs) and Vision-Language Models (VLMs) have been proven to excel at multiple tasks, such as commonsense reasoning. Powerful as these models can be, they are not grounded in the 3D physical world, which involves richer concepts such as spatial relationships, affordances, physics, layout, and so on. In this work, we propose to inject the 3D world into large language models and introduce a whole new family of 3D-LLMs. Specifically, 3D-LLMs can take 3D point clouds and their features as input and perform a diverse set of 3D-related tasks, including captioning, dense captioning, 3D question answering, task decomposition, 3D grounding, 3D-assisted dialog, navigation, and so on. Using three types of prompting mechanisms that we design, we are able to collect over 300k 3D-language data covering these tasks. To efficiently train 3D-LLMs, we first utilize a 3D feature extractor that obtains 3D features from rendered multi- view images. Then, we use 2D VLMs as our backbones to train our 3D-LLMs. By introducing a 3D localization mechanism, 3D-LLMs can better capture 3D spatial information. Experiments on ScanQA show that our model outperforms state-of-the-art baselines by a large margin (e.g., the BLEU-1 score surpasses state-of-the-art score by 9%). Furthermore, experiments on our held-in datasets for 3D captioning, task composition, and 3D-assisted dialogue show that our model outperforms 2D VLMs. Qualitative examples also show that our model could perform more tasks beyond the scope of existing LLMs and VLMs.

3D-LLM: Injecting the 3D World into Large Language Models paper page: Large language models (LLMs) and Vision-Language Models (VLMs) have been proven to excel at multiple tasks, such as commonsense reasoning. Powerful as these models can be, they are not grounded in the 3D physical world, which involves richer concepts such as spatial relationships, affordances, physics, layout, and so on. In this work, we propose to inject the 3D world into large language models and introduce a whole new family of 3D-LLMs. Specifically, 3D-LLMs can take 3D point clouds and their features as input and perform a diverse set of 3D-related tasks, including captioning, dense captioning, 3D question answering, task decomposition, 3D grounding, 3D-assisted dialog, navigation, and so on. Using three types of prompting mechanisms that we design, we are able to collect over 300k 3D-language data covering these tasks. To efficiently train 3D-LLMs, we first utilize a 3D feature extractor that obtains 3D features from rendered multi- view images. Then, we use 2D VLMs as our backbones to train our 3D-LLMs. By introducing a 3D localization mechanism, 3D-LLMs can better capture 3D spatial information. Experiments on ScanQA show that our model outperforms state-of-the-art baselines by a large margin (e.g., the BLEU-1 score surpasses state-of-the-art score by 9%). Furthermore, experiments on our held-in datasets for 3D captioning, task composition, and 3D-assisted dialogue show that our model outperforms 2D VLMs. Qualitative examples also show that our model could perform more tasks beyond the scope of existing LLMs and VLMs.

AK

249,708 views • 3 years ago

Announcing the newest releases from Meta FAIR. We’re releasing new groundbreaking models, benchmarks, and datasets that will transform the way researchers approach molecular property prediction, language processing, and neuroscience. 1️⃣ Open Molecules 2025 (OMol25): A dataset for molecular discovery with simulations of large atomic systems. 2️⃣ Universal Model for Atoms: A machine learning interatomic potential for modeling atom interactions across a wide range of materials and molecules. 3️⃣ Adjoint Sampling: A scalable algorithm for training generative models based on scalar rewards. 4️⃣ FAIR and the Rothschild Foundation Hospital partnered on a large-scale study that reveals striking parallels between language development in humans and LLMs. Read more ➡️

Announcing the newest releases from Meta FAIR. We’re releasing new groundbreaking models, benchmarks, and datasets that will transform the way researchers approach molecular property prediction, language processing, and neuroscience. 1️⃣ Open Molecules 2025 (OMol25): A dataset for molecular discovery with simulations of large atomic systems. 2️⃣ Universal Model for Atoms: A machine learning interatomic potential for modeling atom interactions across a wide range of materials and molecules. 3️⃣ Adjoint Sampling: A scalable algorithm for training generative models based on scalar rewards. 4️⃣ FAIR and the Rothschild Foundation Hospital partnered on a large-scale study that reveals striking parallels between language development in humans and LLMs. Read more ➡️

AI at Meta

899,553 views • 1 year ago

Netanyahu says Israel is transforming the Middle East and asserting itself as a regional power. This is the language of Zionism not the language of Judaism. Turning Judaism into a banner for political expansion only puts Jewish communities at risk.

Netanyahu says Israel is transforming the Middle East and asserting itself as a regional power. This is the language of Zionism not the language of Judaism. Turning Judaism into a banner for political expansion only puts Jewish communities at risk.

Voice of Rabbis

117,409 views • 4 months ago

New Anthropic research: Natural Language Autoencoders. Models like Claude talk in words but think in numbers. The numbers—called activations—encode Claude’s thoughts, but not in a language we can read. Here, we train Claude to translate its activations into human-readable text.

New Anthropic research: Natural Language Autoencoders. Models like Claude talk in words but think in numbers. The numbers—called activations—encode Claude’s thoughts, but not in a language we can read. Here, we train Claude to translate its activations into human-readable text.

Anthropic

2,513,789 views • 2 months ago

"Just like maths...was the perfect description language for physics, I think that AI is potentially the perfect description language for biology." Demis Hassabis envisions a new era of "digital biology" where AI helps us understand life's complex information processing. His dream? "Virtual cells" for faster, more efficient scientific discovery.

"Just like maths...was the perfect description language for physics, I think that AI is potentially the perfect description language for biology." Demis Hassabis envisions a new era of "digital biology" where AI helps us understand life's complex information processing. His dream? "Virtual cells" for faster, more efficient scientific discovery.

vitrupo

28,821 views • 1 year ago

Language is the future for how we interact with robots. Today Wayve is sharing a first look at LINGO-1, a new vision-language-action AI model. To give you a glimpse of its capabilities, here is a video of me playing with LINGO-1 yesterday morning.

Language is the future for how we interact with robots. Today Wayve is sharing a first look at LINGO-1, a new vision-language-action AI model. To give you a glimpse of its capabilities, here is a video of me playing with LINGO-1 yesterday morning.

Alex Kendall

154,759 views • 2 years ago

Geoffrey Hinton says AI models understand in the same way that people do and the best model we have of how the human brain works is large language models

Geoffrey Hinton says AI models understand in the same way that people do and the best model we have of how the human brain works is large language models

Tsarathustra

59,171 views • 1 year ago