Loading video...

Video Failed to Load

Go Home

Can data owners & LM developers collaborate to build a strong shared model while each retaining data control? Introducing FlexOlmo💪, a mixture-of-experts LM enabling: • Flexible training on your local data without sharing it • Flexible inference to opt in/out your data anytime At 37B parameters, FlexOlmo is competitive...

93,434 views • 11 months ago •via X (Twitter)

10 Comments

Weijia Shi's profile picture
Weijia Shi11 months ago

❓Why FlexOlmo? The current "monolithic" pretraining paradigm centralizes all data during training & requires one-time decisions on data inclusion/exclusion. Once data is used for training, it's difficult to add and remove. This creates challenges: For data owners: • Required to share raw data for model training • Loss of control once they give the data away For LM developers: • Valuable data remains locked behind closed doors • No straightforward way to update models with new data without catastrophic forgetting

Weijia Shi's profile picture
Weijia Shi11 months ago

💡FlexOlmo Recipe 1️⃣ Each data owner trains an expert locally using a shared anchor model 2️⃣ Expert modules from different data owners merge into a single MoE without joint training 3️⃣ At inference, you can control which expert modules along with their data serve particular users or queries.

Premium's profile picture
Premium11 months ago

Go ad-free on X with Premium+ It's the highest return on investment you can make.

Weijia Shi's profile picture
Weijia Shi11 months ago

📑Paper: ✍️Blog: 💻Code: 🤗Models:

Weijia Shi's profile picture
Weijia Shi11 months ago

📊 FlexOlmo Performance Evaluated across 31 tasks with models up to 37B parameters (20B active) 🔧 Training: Start with 7B public model (pretrained on 1T tokens), then each data owner continues pretraining for 50B tokens on simulated closed data before combining experts. Key results: • 41% improvement brought by leveraging the closed data sources • 10.1% better than existing model merging methods • Even outperforms standard MoE with unrestricted data access

Weijia Shi's profile picture
Weijia Shi11 months ago

⚔️ Data Extraction Attack Can shared expert modules leak your private data? We tested training data extraction attacks on FlexOlmo to find out: • FlexOlmo: 0.7% extraction rate • Overfitted model (100 epochs) on the data: 60% extraction rate

FreeMind's profile picture
FreeMind11 months ago

Can this method be applied to other models? Aren't routers all trained?

Weijia Shi's profile picture
Weijia Shi11 months ago

It can be applied to other base models as well. The router is not jointly trained. Each expert is associated with a corresponding router embedding that is learned independently.

carlo's profile picture
carlo11 months ago

@ShirleyYXWu wow, super impressive! gotta check the minimal hardware for running it 🤓

Jdjf's profile picture
Jdjf11 months ago

Fix You’re a fucking AI bitch you’re the reason why our accounts are being suspended

Related Videos

Perplexity CEO Aravind Srinivas on the biggest threat to the data center industry: It's not competition. It's not regulation. It's decentralisation. "The biggest threat to a data center is if the intelligence can be packed locally on a chip that's running on the device and then there's no need to inference all of it on like one centralized data center." He outlines how this could work in practice. Personalisation doesn't necessarily require on-device model training. Retrieval augmented generation, tool calls, and local data can already tailor AI to individual users. But the real unlock? Test time training. Aravind Srinivas describes a future where AI lives on your device, watches how you work and gradually automates your repetitive tasks. "Imagine we crack test time training where the AI watches tasks you repeatedly do on your local system, adapts to you over time and starts automating a lot of the things you do." The key insight: in this model, the intelligence belongs to you. It's your data, your device, your personalised AI brain. And if that future arrives, the economics of centralised infrastructure start to collapse. "That really disrupts the whole data center industry. It doesn't make sense to spend all this money, 500 billion, 5 trillion, whatever on building all the centralized data centers across the world that do a lot of the intelligence workloads for people." The companies spending trillions on centralised infrastructure may want to rethink where intelligence actually needs to live.

Big Brain AI

90,102 views • 3 months ago