Video wird geladen...

Video konnte nicht geladen werden

Zur Startseite

Today we introduce T-Free, a new paradigm in language processing. Tokenization is one of the core building blocks of large language models (LLMs), transforming natural language into numeric representations for further processing. (1/3) 🔗 #writtenbyalephalpha

18,096 Aufrufe • vor 1 Jahr •via X (Twitter)

2 Kommentare

Profilbild von Aleph Alpha
Aleph Alphavor 1 Jahr

Our innovation, T-Free, offers a novel approach to tokenization, boosting tokenizer fertility across various languages, and reducing the size of the embedding layer by up to 75% compared to traditional tokenizers. Early experiments with T-Free show promising results and could unlock new possibilities in LLMs, including: - Up to 50% reduction in training and inference costs - Improved semantic encoding of language - Enhanced performance in multilingual models (2/3)

Profilbild von Aleph Alpha
Aleph Alphavor 1 Jahr

Read our full paper here: Dive into the source code of T-Free: Try out our interim research model checkpoints: (3/3)

Ähnliche Videos

Today, we're joined by Julie Kallini ✨, PhD student at Stanford NLP Group to discuss her recent papers, “MrT5: Dynamic Token Merging for Efficient Byte-level Language Models” and “Mission: Impossible Language Models.” For the MrT5 paper, we explore the importance and failings of tokenization in large language models—including inefficient compression rates for under-resourced languages—and dig into byte-level modeling as an alternative. We discuss the architecture of MrT5, its ability to learn language-specific compression rates, its performance on multilingual benchmarks and character-level manipulation tasks, and its performance and efficiency. For the “Mission: Impossible Language Models” paper, we review the core idea behind the research, the definition and creation of impossible languages, the creation of impossible language training datasets, and explore the bias of language model architectures towards natural language. 🎧 / 🎥 Listen or watch the full episode on our page: 📖 CHAPTERS =============================== 00:00 - Introduction 4:28 - Issues of tokenization for LLMs 11:26 - Sub-word tokenization versus byte level tokenization 16:28 - Inefficiencies of byte T5 17:08 - Mr. T5 architecture 22:05 - Language-specific compression rate 24:10 - Benchmarks 27:15 - Inference efficiency 28:50 - Applying MrT5 to other decoder models 31:15 - Future directions of MrT5 33:51 - Mission: Impossible Language Models paper 39:59 - Languages tested 45:13 - Language architectures biased toward natural languages vs impossible languages 48:19 - Future directions for Mission Impossible

The TWIML AI Podcast

11,758 Aufrufe • vor 1 Jahr

3D-LLM: Injecting the 3D World into Large Language Models paper page: Large language models (LLMs) and Vision-Language Models (VLMs) have been proven to excel at multiple tasks, such as commonsense reasoning. Powerful as these models can be, they are not grounded in the 3D physical world, which involves richer concepts such as spatial relationships, affordances, physics, layout, and so on. In this work, we propose to inject the 3D world into large language models and introduce a whole new family of 3D-LLMs. Specifically, 3D-LLMs can take 3D point clouds and their features as input and perform a diverse set of 3D-related tasks, including captioning, dense captioning, 3D question answering, task decomposition, 3D grounding, 3D-assisted dialog, navigation, and so on. Using three types of prompting mechanisms that we design, we are able to collect over 300k 3D-language data covering these tasks. To efficiently train 3D-LLMs, we first utilize a 3D feature extractor that obtains 3D features from rendered multi- view images. Then, we use 2D VLMs as our backbones to train our 3D-LLMs. By introducing a 3D localization mechanism, 3D-LLMs can better capture 3D spatial information. Experiments on ScanQA show that our model outperforms state-of-the-art baselines by a large margin (e.g., the BLEU-1 score surpasses state-of-the-art score by 9%). Furthermore, experiments on our held-in datasets for 3D captioning, task composition, and 3D-assisted dialogue show that our model outperforms 2D VLMs. Qualitative examples also show that our model could perform more tasks beyond the scope of existing LLMs and VLMs.

AK

249,494 Aufrufe • vor 2 Jahren