Video yükleniyor...

Video Yüklenemedi

Ana Sayfaya Dön

Can data owners & LM developers collaborate to build a strong shared model while each retaining data control? Introducing FlexOlmo💪, a mixture-of-experts LM enabling: • Flexible training on your local data without sharing it • Flexible inference to opt in/out your data anytime At 37B parameters, FlexOlmo is competitive...

93,434 görüntüleme • 11 ay önce •via X (Twitter)

10 Yorum

Weijia Shi profil fotoğrafı
Weijia Shi11 ay önce

❓Why FlexOlmo? The current "monolithic" pretraining paradigm centralizes all data during training & requires one-time decisions on data inclusion/exclusion. Once data is used for training, it's difficult to add and remove. This creates challenges: For data owners: • Required to share raw data for model training • Loss of control once they give the data away For LM developers: • Valuable data remains locked behind closed doors • No straightforward way to update models with new data without catastrophic forgetting

Weijia Shi profil fotoğrafı
Weijia Shi11 ay önce

💡FlexOlmo Recipe 1️⃣ Each data owner trains an expert locally using a shared anchor model 2️⃣ Expert modules from different data owners merge into a single MoE without joint training 3️⃣ At inference, you can control which expert modules along with their data serve particular users or queries.

Premium profil fotoğrafı
Premium11 ay önce

Go ad-free on X with Premium+ It's the highest return on investment you can make.

Weijia Shi profil fotoğrafı
Weijia Shi11 ay önce

📑Paper: ✍️Blog: 💻Code: 🤗Models:

Weijia Shi profil fotoğrafı
Weijia Shi11 ay önce

📊 FlexOlmo Performance Evaluated across 31 tasks with models up to 37B parameters (20B active) 🔧 Training: Start with 7B public model (pretrained on 1T tokens), then each data owner continues pretraining for 50B tokens on simulated closed data before combining experts. Key results: • 41% improvement brought by leveraging the closed data sources • 10.1% better than existing model merging methods • Even outperforms standard MoE with unrestricted data access

Weijia Shi profil fotoğrafı
Weijia Shi11 ay önce

⚔️ Data Extraction Attack Can shared expert modules leak your private data? We tested training data extraction attacks on FlexOlmo to find out: • FlexOlmo: 0.7% extraction rate • Overfitted model (100 epochs) on the data: 60% extraction rate

FreeMind profil fotoğrafı
FreeMind11 ay önce

Can this method be applied to other models? Aren't routers all trained?

Weijia Shi profil fotoğrafı
Weijia Shi11 ay önce

It can be applied to other base models as well. The router is not jointly trained. Each expert is associated with a corresponding router embedding that is learned independently.

carlo profil fotoğrafı
carlo11 ay önce

@ShirleyYXWu wow, super impressive! gotta check the minimal hardware for running it 🤓

Jdjf profil fotoğrafı
Jdjf11 ay önce

Fix You’re a fucking AI bitch you’re the reason why our accounts are being suspended

Benzer Videolar

Perplexity CEO Aravind Srinivas on the biggest threat to the data center industry: It's not competition. It's not regulation. It's decentralisation. "The biggest threat to a data center is if the intelligence can be packed locally on a chip that's running on the device and then there's no need to inference all of it on like one centralized data center." He outlines how this could work in practice. Personalisation doesn't necessarily require on-device model training. Retrieval augmented generation, tool calls, and local data can already tailor AI to individual users. But the real unlock? Test time training. Aravind Srinivas describes a future where AI lives on your device, watches how you work and gradually automates your repetitive tasks. "Imagine we crack test time training where the AI watches tasks you repeatedly do on your local system, adapts to you over time and starts automating a lot of the things you do." The key insight: in this model, the intelligence belongs to you. It's your data, your device, your personalised AI brain. And if that future arrives, the economics of centralised infrastructure start to collapse. "That really disrupts the whole data center industry. It doesn't make sense to spend all this money, 500 billion, 5 trillion, whatever on building all the centralized data centers across the world that do a lot of the intelligence workloads for people." The companies spending trillions on centralised infrastructure may want to rethink where intelligence actually needs to live.

Big Brain AI

90,102 görüntüleme • 3 ay önce