Загрузка видео...

Не удалось загрузить видео

Возникла проблема при загрузке этого видео. Это может быть связано с временными проблемами сети или видео может быть недоступно.

На главную

Today we introduce T-Free, a new paradigm in language processing. Tokenization is one of the core building blocks of large language models (LLMs), transforming natural language into numeric representations for further processing. (1/3) 🔗 #writtenbyalephalpha

Aleph Alpha

8,895 subscribers

18,120 просмотров • 1 год назад •via X (Twitter)

Образование Наука и технологии

Anya Rossi• Live Now

Private livecam show

Комментарии: 2

Фото профиля Aleph Alpha

Aleph Alpha1 год назад

Our innovation, T-Free, offers a novel approach to tokenization, boosting tokenizer fertility across various languages, and reducing the size of the embedding layer by up to 75% compared to traditional tokenizers. Early experiments with T-Free show promising results and could unlock new possibilities in LLMs, including: - Up to 50% reduction in training and inference costs - Improved semantic encoding of language - Enhanced performance in multilingual models (2/3)

Фото профиля Aleph Alpha

Aleph Alpha1 год назад

Read our full paper here: Dive into the source code of T-Free: Try out our interim research model checkpoints: (3/3)

Похожие видео

Meta FAIR and Rothschild Foundation Hospital present a groundbreaking study mapping how language representations emerge in the brain, revealing striking parallels with LLMs. This research offers unprecedented insights into the neural development of language, showing how AI models like wav2vec 2.0 and Llama 4 mirror the brain's language processing. Discover how these findings pave the way for new frameworks in understanding human intelligence and developing clinical tools for language support. 📄 Read the full research paper: ➡️

Meta FAIR and Rothschild Foundation Hospital present a groundbreaking study mapping how language representations emerge in the brain, revealing striking parallels with LLMs. This research offers unprecedented insights into the neural development of language, showing how AI models like wav2vec 2.0 and Llama 4 mirror the brain's language processing. Discover how these findings pave the way for new frameworks in understanding human intelligence and developing clinical tools for language support. 📄 Read the full research paper: ➡️

AI at Meta

28,761 просмотров • 1 год назад

Understanding LLMs by Building One 👉 We use large language models every day, but what actually happens inside them?

Understanding LLMs by Building One 👉 We use large language models every day, but what actually happens inside them?

Danilo Poccia

14,181 просмотров • 2 месяцев назад

This is how large language models turn objects to vector representations. In this video, we explore how large language models (LLMs) convert objects into internal representations, especially when translating between languages like English and Hindi. Using real-world examples, we highlight the challenges of gender inference, grammatical structure, and why direct word-to-word translations often fail. If you're curious about how LLMs deal with multilingual contexts and what it takes to improve translation quality across languages, this video is for you. #LLMs #Vectors #LCM

This is how large language models turn objects to vector representations. In this video, we explore how large language models (LLMs) convert objects into internal representations, especially when translating between languages like English and Hindi. Using real-world examples, we highlight the challenges of gender inference, grammatical structure, and why direct word-to-word translations often fail. If you're curious about how LLMs deal with multilingual contexts and what it takes to improve translation quality across languages, this video is for you. #LLMs #Vectors #LCM

Gaurav Sen

27,368 просмотров • 1 год назад

Today, we're joined by Julie Kallini ✨, PhD student at Stanford NLP Group to discuss her recent papers, “MrT5: Dynamic Token Merging for Efficient Byte-level Language Models” and “Mission: Impossible Language Models.” For the MrT5 paper, we explore the importance and failings of tokenization in large language models—including inefficient compression rates for under-resourced languages—and dig into byte-level modeling as an alternative. We discuss the architecture of MrT5, its ability to learn language-specific compression rates, its performance on multilingual benchmarks and character-level manipulation tasks, and its performance and efficiency. For the “Mission: Impossible Language Models” paper, we review the core idea behind the research, the definition and creation of impossible languages, the creation of impossible language training datasets, and explore the bias of language model architectures towards natural language. 🎧 / 🎥 Listen or watch the full episode on our page: 📖 CHAPTERS =============================== 00:00 - Introduction 4:28 - Issues of tokenization for LLMs 11:26 - Sub-word tokenization versus byte level tokenization 16:28 - Inefficiencies of byte T5 17:08 - Mr. T5 architecture 22:05 - Language-specific compression rate 24:10 - Benchmarks 27:15 - Inference efficiency 28:50 - Applying MrT5 to other decoder models 31:15 - Future directions of MrT5 33:51 - Mission: Impossible Language Models paper 39:59 - Languages tested 45:13 - Language architectures biased toward natural languages vs impossible languages 48:19 - Future directions for Mission Impossible

Today, we're joined by Julie Kallini ✨, PhD student at Stanford NLP Group to discuss her recent papers, “MrT5: Dynamic Token Merging for Efficient Byte-level Language Models” and “Mission: Impossible Language Models.” For the MrT5 paper, we explore the importance and failings of tokenization in large language models—including inefficient compression rates for under-resourced languages—and dig into byte-level modeling as an alternative. We discuss the architecture of MrT5, its ability to learn language-specific compression rates, its performance on multilingual benchmarks and character-level manipulation tasks, and its performance and efficiency. For the “Mission: Impossible Language Models” paper, we review the core idea behind the research, the definition and creation of impossible languages, the creation of impossible language training datasets, and explore the bias of language model architectures towards natural language. 🎧 / 🎥 Listen or watch the full episode on our page: 📖 CHAPTERS =============================== 00:00 - Introduction 4:28 - Issues of tokenization for LLMs 11:26 - Sub-word tokenization versus byte level tokenization 16:28 - Inefficiencies of byte T5 17:08 - Mr. T5 architecture 22:05 - Language-specific compression rate 24:10 - Benchmarks 27:15 - Inference efficiency 28:50 - Applying MrT5 to other decoder models 31:15 - Future directions of MrT5 33:51 - Mission: Impossible Language Models paper 39:59 - Languages tested 45:13 - Language architectures biased toward natural languages vs impossible languages 48:19 - Future directions for Mission Impossible

The TWIML AI Podcast

11,758 просмотров • 1 год назад

Andrej Karpathy calls large language models the new computing paradigm: CPU -> LLM bytes -> tokens RAM -> context window this is the large language model OS (LMOS)

Andrej Karpathy calls large language models the new computing paradigm: CPU -> LLM bytes -> tokens RAM -> context window this is the large language model OS (LMOS)

ℏεsam

343,282 просмотров • 1 год назад

Large Language Models (LLM) Explained Briefly the best visual explanation that I saw of LLMs, source in comment below

Large Language Models (LLM) Explained Briefly the best visual explanation that I saw of LLMs, source in comment below

Mohit Mishra

15,698 просмотров • 1 год назад

LP-MusicCaps: LLM-Based Pseudo Music Captioning paper page: Automatic music captioning, which generates natural language descriptions for given music tracks, holds significant potential for enhancing the understanding and organization of large volumes of musical data. Despite its importance, researchers face challenges due to the costly and time-consuming collection process of existing music-language datasets, which are limited in size. To address this data scarcity issue, we propose the use of large language models (LLMs) to artificially generate the description sentences from large-scale tag datasets. This results in approximately 2.2M captions paired with 0.5M audio clips. We term it Large Language Model based Pseudo music caption dataset, shortly, LP-MusicCaps. We conduct a systemic evaluation of the large-scale music captioning dataset with various quantitative evaluation metrics used in the field of natural language processing as well as human evaluation. In addition, we trained a transformer-based music captioning model with the dataset and evaluated it under zero-shot and transfer-learning settings. The results demonstrate that our proposed approach outperforms the supervised baseline model.

LP-MusicCaps: LLM-Based Pseudo Music Captioning paper page: Automatic music captioning, which generates natural language descriptions for given music tracks, holds significant potential for enhancing the understanding and organization of large volumes of musical data. Despite its importance, researchers face challenges due to the costly and time-consuming collection process of existing music-language datasets, which are limited in size. To address this data scarcity issue, we propose the use of large language models (LLMs) to artificially generate the description sentences from large-scale tag datasets. This results in approximately 2.2M captions paired with 0.5M audio clips. We term it Large Language Model based Pseudo music caption dataset, shortly, LP-MusicCaps. We conduct a systemic evaluation of the large-scale music captioning dataset with various quantitative evaluation metrics used in the field of natural language processing as well as human evaluation. In addition, we trained a transformer-based music captioning model with the dataset and evaluated it under zero-shot and transfer-learning settings. The results demonstrate that our proposed approach outperforms the supervised baseline model.

AK

78,794 просмотров • 2 лет назад

New research from Meta FAIR: Large Concept Models (LCM) is a fundamentally different paradigm for language modeling that decouples reasoning from language representation, inspired by how humans can plan high-level thoughts to communicate.

New research from Meta FAIR: Large Concept Models (LCM) is a fundamentally different paradigm for language modeling that decouples reasoning from language representation, inspired by how humans can plan high-level thoughts to communicate.

AI at Meta

531,486 просмотров • 1 год назад

🚀 Excited to announce the first release of a novel open source programming language and platform for language model interaction! Combining prompts, constraints & scripting, LMQL elevates the capabilities of large language models. 🧵1/6 A quick tour.

🚀 Excited to announce the first release of a novel open source programming language and platform for language model interaction! Combining prompts, constraints & scripting, LMQL elevates the capabilities of large language models. 🧵1/6 A quick tour.

LMQL (Language Model Query Language)

198,966 просмотров • 3 лет назад

Google presents AudioPaLM: A Large Language Model That Can Speak and Listen paper page: introduce AudioPaLM, a large language model for speech understanding and generation. AudioPaLM fuses text-based and speech-based language models, PaLM-2 [Anil et al., 2023] and AudioLM [Borsos et al., 2022], into a unified multimodal architecture that can process and generate text and speech with applications including speech recognition and speech-to-speech translation. AudioPaLM inherits the capability to preserve paralinguistic information such as speaker identity and intonation from AudioLM and the linguistic knowledge present only in text large language models such as PaLM-2. We demonstrate that initializing AudioPaLM with the weights of a text-only large language model improves speech processing, successfully leveraging the larger quantity of text training data used in pretraining to assist with the speech tasks. The resulting model significantly outperforms existing systems for speech translation tasks and has the ability to perform zero-shot speech-to-text translation for many languages for which input/target language combinations were not seen in training. AudioPaLM also demonstrates features of audio language models, such as transferring a voice across languages based on a short spoken prompt.

Google presents AudioPaLM: A Large Language Model That Can Speak and Listen paper page: introduce AudioPaLM, a large language model for speech understanding and generation. AudioPaLM fuses text-based and speech-based language models, PaLM-2 [Anil et al., 2023] and AudioLM [Borsos et al., 2022], into a unified multimodal architecture that can process and generate text and speech with applications including speech recognition and speech-to-speech translation. AudioPaLM inherits the capability to preserve paralinguistic information such as speaker identity and intonation from AudioLM and the linguistic knowledge present only in text large language models such as PaLM-2. We demonstrate that initializing AudioPaLM with the weights of a text-only large language model improves speech processing, successfully leveraging the larger quantity of text training data used in pretraining to assist with the speech tasks. The resulting model significantly outperforms existing systems for speech translation tasks and has the ability to perform zero-shot speech-to-text translation for many languages for which input/target language combinations were not seen in training. AudioPaLM also demonstrates features of audio language models, such as transferring a voice across languages based on a short spoken prompt.

AK

290,517 просмотров • 3 лет назад

TNT X-Space Series: Minister of Ministry of ICT and Innovation | Rwanda, Paula Ingabire, discusses the development of Kinyarwanda Large Language Models (LLMs).

TNT X-Space Series: Minister of Ministry of ICT and Innovation | Rwanda, Paula Ingabire, discusses the development of Kinyarwanda Large Language Models (LLMs).

The New Times (Rwanda)

32,980 просмотров • 1 год назад

New Anthropic research: Emotion concepts and their function in a large language model. All LLMs sometimes act like they have emotions. But why? We found internal representations of emotion concepts that can drive Claude’s behavior, sometimes in surprising ways.

New Anthropic research: Emotion concepts and their function in a large language model. All LLMs sometimes act like they have emotions. But why? We found internal representations of emotion concepts that can drive Claude’s behavior, sometimes in surprising ways.

Anthropic

3,907,947 просмотров • 2 месяцев назад

We believe an open approach is the right one for the development of today's Al models. Today, we’re releasing Llama 2, the next generation of Meta’s open source Large Language Model, available for free for research & commercial use. Details ➡️

We believe an open approach is the right one for the development of today's Al models. Today, we’re releasing Llama 2, the next generation of Meta’s open source Large Language Model, available for free for research & commercial use. Details ➡️

AI at Meta

1,234,438 просмотров • 2 лет назад

4. Stanford CS 224N An introduction to natural language processing (NLP) and how it works.

4. Stanford CS 224N An introduction to natural language processing (NLP) and how it works.

Rowan Cheung

67,710 просмотров • 3 лет назад

🚨Patrick Bet-David talks about how language learning models are causing brain atrophy in the youth. "Language learning models have done so much damage to the youth brain and processing." (🎥: Tristan Dlabik Podcast)

🚨Patrick Bet-David talks about how language learning models are causing brain atrophy in the youth. "Language learning models have done so much damage to the youth brain and processing." (🎥: Tristan Dlabik Podcast)

PBD Podcast

11,525 просмотров • 24 дней назад

Here's my conversation with Edward Gibson (Ted Gibson, Language Lab MIT), a linguist and psychologist at MIT, heading the MIT Language Lab. We talk all about the human language: syntax, grammar, structure, theories of language, evolution of language, how it reflects culture, and of course LLMs, both their amazing power and their limitations. It's here on X in full, and is up on YouTube, Spotify, and everywhere else. Links in comment. Timestamps: 0:00 - Introduction 1:13 - Human language 5:19 - Generalizations in language 11:06 - Dependency grammar 21:05 - Morphology 29:40 - Evolution of languages 33:00 - Noam Chomsky 1:17:06 - Thinking and language 1:30:36 - LLMs 1:43:35 - Center embedding 2:10:02 - Learning a new language 2:13:54 - Nature vs nurture 2:20:30 - Culture and language 2:34:58 - Universal language 2:39:21 - Language translation 2:42:36 - Animal communication

Here's my conversation with Edward Gibson (Ted Gibson, Language Lab MIT), a linguist and psychologist at MIT, heading the MIT Language Lab. We talk all about the human language: syntax, grammar, structure, theories of language, evolution of language, how it reflects culture, and of course LLMs, both their amazing power and their limitations. It's here on X in full, and is up on YouTube, Spotify, and everywhere else. Links in comment. Timestamps: 0:00 - Introduction 1:13 - Human language 5:19 - Generalizations in language 11:06 - Dependency grammar 21:05 - Morphology 29:40 - Evolution of languages 33:00 - Noam Chomsky 1:17:06 - Thinking and language 1:30:36 - LLMs 1:43:35 - Center embedding 2:10:02 - Learning a new language 2:13:54 - Nature vs nurture 2:20:30 - Culture and language 2:34:58 - Universal language 2:39:21 - Language translation 2:42:36 - Animal communication

Lex Fridman

340,169 просмотров • 2 лет назад

Just dropped a 4 hour lecture on "Large Language Models": 0:00 Basics of language models 2:30 Word2vec 16:27 Transfer Learning 19:23 BERT 1:00:39 T5 1:31:14 GPT1-3 1:53:05 ChatGPT 2:20:03 LLMs as Deep RL 2:53:00 Policy Gradient 3:32:50 Train your own LLM

Just dropped a 4 hour lecture on "Large Language Models": 0:00 Basics of language models 2:30 Word2vec 16:27 Transfer Learning 19:23 BERT 1:00:39 T5 1:31:14 GPT1-3 1:53:05 ChatGPT 2:20:03 LLMs as Deep RL 2:53:00 Policy Gradient 3:32:50 Train your own LLM

Soheil Feizi

217,255 просмотров • 2 лет назад

MotionGPT: Human Motion as a Foreign Language paper page: Though the advancement of pre-trained large language models unfolds, the exploration of building a unified model for language and other multi-modal data, such as motion, remains challenging and untouched so far. Fortunately, human motion displays a semantic coupling akin to human language, often perceived as a form of body language. By fusing language data with large-scale motion models, motion-language pre-training that can enhance the performance of motion-related tasks becomes feasible. Driven by this insight, we propose MotionGPT, a unified, versatile, and user-friendly motion-language model to handle multiple motion-relevant tasks. Specifically, we employ the discrete vector quantization for human motion and transfer 3D motion into motion tokens, similar to the generation process of word tokens. Building upon this "motion vocabulary", we perform language modeling on both motion and text in a unified manner, treating human motion as a specific language. Moreover, inspired by prompt learning, we pre-train MotionGPT with a mixture of motion-language data and fine-tune it on prompt-based question-and-answer tasks. Extensive experiments demonstrate that MotionGPT achieves state-of-the-art performances on multiple motion tasks including text-driven motion generation, motion captioning, motion prediction, and motion in-between.

MotionGPT: Human Motion as a Foreign Language paper page: Though the advancement of pre-trained large language models unfolds, the exploration of building a unified model for language and other multi-modal data, such as motion, remains challenging and untouched so far. Fortunately, human motion displays a semantic coupling akin to human language, often perceived as a form of body language. By fusing language data with large-scale motion models, motion-language pre-training that can enhance the performance of motion-related tasks becomes feasible. Driven by this insight, we propose MotionGPT, a unified, versatile, and user-friendly motion-language model to handle multiple motion-relevant tasks. Specifically, we employ the discrete vector quantization for human motion and transfer 3D motion into motion tokens, similar to the generation process of word tokens. Building upon this "motion vocabulary", we perform language modeling on both motion and text in a unified manner, treating human motion as a specific language. Moreover, inspired by prompt learning, we pre-train MotionGPT with a mixture of motion-language data and fine-tune it on prompt-based question-and-answer tasks. Extensive experiments demonstrate that MotionGPT achieves state-of-the-art performances on multiple motion tasks including text-driven motion generation, motion captioning, motion prediction, and motion in-between.

AK

125,319 просмотров • 3 лет назад

Yann LeCun argues that large language models (LLMs) cannot reach human-level or superintelligence just by scaling. He says the current LLM paradigm is hitting its limits. Many researchers are now exploring “agentic systems,” but building them on top of LLMs alone is flawed. LLMs can't plan actions well because they don’t truly understand or predict consequences. To get intelligent behavior, we need something fundamentally different.

Yann LeCun argues that large language models (LLMs) cannot reach human-level or superintelligence just by scaling. He says the current LLM paradigm is hitting its limits. Many researchers are now exploring “agentic systems,” but building them on top of LLMs alone is flawed. LLMs can't plan actions well because they don’t truly understand or predict consequences. To get intelligent behavior, we need something fundamentally different.

Wes Roth

71,816 просмотров • 4 месяцев назад

Exclusive insights into the process of transforming foreign telenovelas into productions in Ghanaian local language.

Exclusive insights into the process of transforming foreign telenovelas into productions in Ghanaian local language.

SIKAOFFICIAL🦍

231,070 просмотров • 4 месяцев назад