Загрузка видео...

Не удалось загрузить видео

На главную

Can data owners & LM developers collaborate to build a strong shared model while each retaining data control? Introducing FlexOlmo💪, a mixture-of-experts LM enabling: • Flexible training on your local data without sharing it • Flexible inference to opt in/out your data anytime At 37B parameters, FlexOlmo is competitive...

93,434 просмотров • 11 месяцев назад •via X (Twitter)

Комментарии: 10

Фото профиля Weijia Shi
Weijia Shi11 месяцев назад

❓Why FlexOlmo? The current "monolithic" pretraining paradigm centralizes all data during training & requires one-time decisions on data inclusion/exclusion. Once data is used for training, it's difficult to add and remove. This creates challenges: For data owners: • Required to share raw data for model training • Loss of control once they give the data away For LM developers: • Valuable data remains locked behind closed doors • No straightforward way to update models with new data without catastrophic forgetting

Фото профиля Weijia Shi
Weijia Shi11 месяцев назад

💡FlexOlmo Recipe 1️⃣ Each data owner trains an expert locally using a shared anchor model 2️⃣ Expert modules from different data owners merge into a single MoE without joint training 3️⃣ At inference, you can control which expert modules along with their data serve particular users or queries.

Фото профиля Premium
Premium11 месяцев назад

Go ad-free on X with Premium+ It's the highest return on investment you can make.

Фото профиля Weijia Shi
Weijia Shi11 месяцев назад

📑Paper: ✍️Blog: 💻Code: 🤗Models:

Фото профиля Weijia Shi
Weijia Shi11 месяцев назад

📊 FlexOlmo Performance Evaluated across 31 tasks with models up to 37B parameters (20B active) 🔧 Training: Start with 7B public model (pretrained on 1T tokens), then each data owner continues pretraining for 50B tokens on simulated closed data before combining experts. Key results: • 41% improvement brought by leveraging the closed data sources • 10.1% better than existing model merging methods • Even outperforms standard MoE with unrestricted data access

Фото профиля Weijia Shi
Weijia Shi11 месяцев назад

⚔️ Data Extraction Attack Can shared expert modules leak your private data? We tested training data extraction attacks on FlexOlmo to find out: • FlexOlmo: 0.7% extraction rate • Overfitted model (100 epochs) on the data: 60% extraction rate

Фото профиля FreeMind
FreeMind11 месяцев назад

Can this method be applied to other models? Aren't routers all trained?

Фото профиля Weijia Shi
Weijia Shi11 месяцев назад

It can be applied to other base models as well. The router is not jointly trained. Each expert is associated with a corresponding router embedding that is learned independently.

Фото профиля carlo
carlo11 месяцев назад

@ShirleyYXWu wow, super impressive! gotta check the minimal hardware for running it 🤓

Фото профиля Jdjf
Jdjf11 месяцев назад

Fix You’re a fucking AI bitch you’re the reason why our accounts are being suspended

Похожие видео

Perplexity CEO Aravind Srinivas on the biggest threat to the data center industry: It's not competition. It's not regulation. It's decentralisation. "The biggest threat to a data center is if the intelligence can be packed locally on a chip that's running on the device and then there's no need to inference all of it on like one centralized data center." He outlines how this could work in practice. Personalisation doesn't necessarily require on-device model training. Retrieval augmented generation, tool calls, and local data can already tailor AI to individual users. But the real unlock? Test time training. Aravind Srinivas describes a future where AI lives on your device, watches how you work and gradually automates your repetitive tasks. "Imagine we crack test time training where the AI watches tasks you repeatedly do on your local system, adapts to you over time and starts automating a lot of the things you do." The key insight: in this model, the intelligence belongs to you. It's your data, your device, your personalised AI brain. And if that future arrives, the economics of centralised infrastructure start to collapse. "That really disrupts the whole data center industry. It doesn't make sense to spend all this money, 500 billion, 5 trillion, whatever on building all the centralized data centers across the world that do a lot of the intelligence workloads for people." The companies spending trillions on centralised infrastructure may want to rethink where intelligence actually needs to live.

Big Brain AI

90,102 просмотров • 3 месяцев назад