Loading video...

Video Failed to Load

There was a problem loading this video. This could be due to a temporary network issue or the video might be unavailable.

Can data owners & LM developers collaborate to build a strong shared model while each retaining data control? Introducing FlexOlmo💪, a mixture-of-experts LM enabling: • Flexible training on your local data without sharing it • Flexible inference to opt in/out your data anytime At 37B parameters, FlexOlmo is competitive... show more

Weijia Shi

9,881 subscribers

93,434 views • 11 months ago •via X (Twitter)

Health & Wellness Science & Technology Education

Anya Rossi• Live Now

Private livecam show

10 Comments

Weijia Shi11 months ago

❓Why FlexOlmo? The current "monolithic" pretraining paradigm centralizes all data during training & requires one-time decisions on data inclusion/exclusion. Once data is used for training, it's difficult to add and remove. This creates challenges: For data owners: • Required to share raw data for model training • Loss of control once they give the data away For LM developers: • Valuable data remains locked behind closed doors • No straightforward way to update models with new data without catastrophic forgetting

Weijia Shi11 months ago

💡FlexOlmo Recipe 1️⃣ Each data owner trains an expert locally using a shared anchor model 2️⃣ Expert modules from different data owners merge into a single MoE without joint training 3️⃣ At inference, you can control which expert modules along with their data serve particular users or queries.

Premium11 months ago

Go ad-free on X with Premium+ It's the highest return on investment you can make.

Weijia Shi11 months ago

📑Paper: ✍️Blog: 💻Code: 🤗Models:

Weijia Shi11 months ago

📊 FlexOlmo Performance Evaluated across 31 tasks with models up to 37B parameters (20B active) 🔧 Training: Start with 7B public model (pretrained on 1T tokens), then each data owner continues pretraining for 50B tokens on simulated closed data before combining experts. Key results: • 41% improvement brought by leveraging the closed data sources • 10.1% better than existing model merging methods • Even outperforms standard MoE with unrestricted data access

Weijia Shi11 months ago

⚔️ Data Extraction Attack Can shared expert modules leak your private data? We tested training data extraction attacks on FlexOlmo to find out: • FlexOlmo: 0.7% extraction rate • Overfitted model (100 epochs) on the data: 60% extraction rate

FreeMind11 months ago

Can this method be applied to other models? Aren't routers all trained?

Weijia Shi11 months ago

It can be applied to other base models as well. The router is not jointly trained. Each expert is associated with a corresponding router embedding that is learned independently.

carlo11 months ago

@ShirleyYXWu wow, super impressive! gotta check the minimal hardware for running it 🤓

Jdjf11 months ago

Fix You’re a fucking AI bitch you’re the reason why our accounts are being suspended

Related Videos

Introducing FlexOlmo, a new paradigm for language model training that enables the co-development of AI through data collaboration. 🧵

Introducing FlexOlmo, a new paradigm for language model training that enables the co-development of AI through data collaboration. 🧵

Ai2

369,259 views • 11 months ago

Introducing Mobi-π: Mobilizing Your Robot Learning Policy. Our method: ✈️ enables flexible mobile skill chaining 🪶 without requiring additional policy training data 🏠 while scaling to unseen scenes 🧵↓

Introducing Mobi-π: Mobilizing Your Robot Learning Policy. Our method: ✈️ enables flexible mobile skill chaining 🪶 without requiring additional policy training data 🏠 while scaling to unseen scenes 🧵↓

Jingyun Yang

58,774 views • 1 year ago

1/ Felt 3.0 is here with GIS superpowers for your entire organization. Get your data out of data silos, and empower teams to build geospatial apps and dashboards in seconds without heavy development. 💪

1/ Felt 3.0 is here with GIS superpowers for your entire organization. Get your data out of data silos, and empower teams to build geospatial apps and dashboards in seconds without heavy development. 💪

Felt

84,146 views • 1 year ago

Your data is being harvested for ads and AI training while you click 'I agree' on EULAs. The alternative? ICP puts data sovereignty in YOUR hands without the headache of running infrastructure. Store code and data while maintaining control. Take back ownership. Watch the full video:

Your data is being harvested for ads and AI training while you click 'I agree' on EULAs. The alternative? ICP puts data sovereignty in YOUR hands without the headache of running infrastructure. Store code and data while maintaining control. Take back ownership. Watch the full video:

DFINITY Foundation

13,213 views • 1 year ago

Today we're announcing that hybrid agentic inference is coming to Perplexity Computer. Computer can split tasks between a local model running on your machine and frontier models in the cloud. This keeps private data on your device and maximizes token efficiency. Coming soon.

Today we're announcing that hybrid agentic inference is coming to Perplexity Computer. Computer can split tasks between a local model running on your machine and frontier models in the cloud. This keeps private data on your device and maximizes token efficiency. Coming soon.

Perplexity

331,492 views • 4 days ago

GP practices are patient data protectors The new Health Bill will remove the GP & move the data controller (protector) role to the Secretary of State for Health Your sharing data ‘opt-out’ may not be protected in the same way I have ‘opted out’ but doesn’t stop sharing data

GP practices are patient data protectors The new Health Bill will remove the GP & move the data controller (protector) role to the Secretary of State for Health Your sharing data ‘opt-out’ may not be protected in the same way I have ‘opted out’ but doesn’t stop sharing data

Dr Steve Taylor

41,118 views • 16 days ago

"The way our product is set up, I don't have access to your data" Palantir CEO Alex Karp says that his data analytics company "wouldn't be able to" sell NHS patient data on to third parties, and he believes people will opt in to data sharing #BBCLauraK

"The way our product is set up, I don't have access to your data" Palantir CEO Alex Karp says that his data analytics company "wouldn't be able to" sell NHS patient data on to third parties, and he believes people will opt in to data sharing #BBCLauraK

BBC Politics

301,621 views • 2 years ago

Perplexity CEO Aravind Srinivas on the biggest threat to the data center industry: It's not competition. It's not regulation. It's decentralisation. "The biggest threat to a data center is if the intelligence can be packed locally on a chip that's running on the device and then there's no need to inference all of it on like one centralized data center." He outlines how this could work in practice. Personalisation doesn't necessarily require on-device model training. Retrieval augmented generation, tool calls, and local data can already tailor AI to individual users. But the real unlock? Test time training. Aravind Srinivas describes a future where AI lives on your device, watches how you work and gradually automates your repetitive tasks. "Imagine we crack test time training where the AI watches tasks you repeatedly do on your local system, adapts to you over time and starts automating a lot of the things you do." The key insight: in this model, the intelligence belongs to you. It's your data, your device, your personalised AI brain. And if that future arrives, the economics of centralised infrastructure start to collapse. "That really disrupts the whole data center industry. It doesn't make sense to spend all this money, 500 billion, 5 trillion, whatever on building all the centralized data centers across the world that do a lot of the intelligence workloads for people." The companies spending trillions on centralised infrastructure may want to rethink where intelligence actually needs to live.

Perplexity CEO Aravind Srinivas on the biggest threat to the data center industry: It's not competition. It's not regulation. It's decentralisation. "The biggest threat to a data center is if the intelligence can be packed locally on a chip that's running on the device and then there's no need to inference all of it on like one centralized data center." He outlines how this could work in practice. Personalisation doesn't necessarily require on-device model training. Retrieval augmented generation, tool calls, and local data can already tailor AI to individual users. But the real unlock? Test time training. Aravind Srinivas describes a future where AI lives on your device, watches how you work and gradually automates your repetitive tasks. "Imagine we crack test time training where the AI watches tasks you repeatedly do on your local system, adapts to you over time and starts automating a lot of the things you do." The key insight: in this model, the intelligence belongs to you. It's your data, your device, your personalised AI brain. And if that future arrives, the economics of centralised infrastructure start to collapse. "That really disrupts the whole data center industry. It doesn't make sense to spend all this money, 500 billion, 5 trillion, whatever on building all the centralized data centers across the world that do a lot of the intelligence workloads for people." The companies spending trillions on centralised infrastructure may want to rethink where intelligence actually needs to live.

Big Brain AI

90,102 views • 3 months ago

Sharing one person's Social Security data without consent is a felony. DOGE shared EVERYONE'S PRIVATE DATA. 340 million felonies.

Sharing one person's Social Security data without consent is a felony. DOGE shared EVERYONE'S PRIVATE DATA. 340 million felonies.

Social Security Works ❌👑

148,727 views • 4 months ago

Still following your human intuition to mix corpora from different sources for language model pre-training 🧠? Everyone says that data mixture has a big impact on model performance, but how - and why🕵️? Did you know that web corpora are actually highly impactful for downstream tasks 🏆? Let's check out our preprint "RegMix: Data Mixture as Regression for Language Model Pre-training" 📄 🔬In this paper, we've proposed an automatic data mixture method RegMix that achieves a 6.3% improvement over human selection on the widely used HellaSwag benchmark - and it only needs a 2% extra training FLOPs! 📈 Details in the thread 🧵

Still following your human intuition to mix corpora from different sources for language model pre-training 🧠? Everyone says that data mixture has a big impact on model performance, but how - and why🕵️? Did you know that web corpora are actually highly impactful for downstream tasks 🏆? Let's check out our preprint "RegMix: Data Mixture as Regression for Language Model Pre-training" 📄 🔬In this paper, we've proposed an automatic data mixture method RegMix that achieves a 6.3% improvement over human selection on the widely used HellaSwag benchmark - and it only needs a 2% extra training FLOPs! 📈 Details in the thread 🧵

Qian Liu

54,778 views • 1 year ago

Did you know that Facebook made $134.9 billion in 2023 from selling your data? Our society has come to the point where we accepted an unfair model in which privacy is being exploited and data gifted to big tech that is selling it for billions. We intend to break the norm of how data is shared on the internet because it is outdated and unfair. You should be empowered with more control and monetization models for your data!

Did you know that Facebook made $134.9 billion in 2023 from selling your data? Our society has come to the point where we accepted an unfair model in which privacy is being exploited and data gifted to big tech that is selling it for billions. We intend to break the norm of how data is shared on the internet because it is outdated and unfair. You should be empowered with more control and monetization models for your data!

Solana ID 🪷

30,393 views • 1 year ago

Heard of Deepseek and want to try it out? But afraid of giving your data to China? Made a quick vid on how you can run Deepseek locally on your computer so you can keepo all your data using fullmoon.

Heard of Deepseek and want to try it out? But afraid of giving your data to China? Made a quick vid on how you can run Deepseek locally on your computer so you can keepo all your data using fullmoon.

Alex Hugh Sam

13,511 views • 1 year ago

You don’t have to take your data to the AI— you can bring AI out to your data. Flynn Maloy, CMO of Lenovo Data Center Infrastructure Solutions Group, breaks down how hybrid AI drives success across the enterprise - from devices to edge to private data centers. #AdvancingAI

You don’t have to take your data to the AI— you can bring AI out to your data. Flynn Maloy, CMO of Lenovo Data Center Infrastructure Solutions Group, breaks down how hybrid AI drives success across the enterprise - from devices to edge to private data centers. #AdvancingAI

AMD

15,427 views • 11 months ago

📡 Want to build with real-time crypto data? Introducing Developer Academy — your technical learning hub for: 🔸 REST & WebSocket APIs 🔸 Market data integration 🔸 Trading tools & data feeds Perfect for developers, analysts, and crypto builders. 👉

📡 Want to build with real-time crypto data? Introducing Developer Academy — your technical learning hub for: 🔸 REST & WebSocket APIs 🔸 Market data integration 🔸 Trading tools & data feeds Perfect for developers, analysts, and crypto builders. 👉

Binance Academy

99,632 views • 9 months ago

Sharing one person's Social Security data without consent is a felony. DOGE shared everyone's private data. 340 MILLION FELONIES. Nicole Sandler

Sharing one person's Social Security data without consent is a felony. DOGE shared everyone's private data. 340 MILLION FELONIES. Nicole Sandler

Social Security Works ❌👑

19,042 views • 3 months ago

You can get preeeeetty close to a columns and row flexible data table using grid now

You can get preeeeetty close to a columns and row flexible data table using grid now

luis.

41,132 views • 1 year ago

Besimple (Besimple AI) helps you spin up your own data annotation platform in 60 seconds, so you can build robust evaluation and training data without the hassle of looking at complex spreadsheets. Congrats on the launch, Yi Zhong & Bill Wang!

Besimple (Besimple AI) helps you spin up your own data annotation platform in 60 seconds, so you can build robust evaluation and training data without the hassle of looking at complex spreadsheets. Congrats on the launch, Yi Zhong & Bill Wang!

Y Combinator

17,163 views • 1 year ago

Data is the foundation of any AI training. All of the GPUs in the world can't train an AI model if they don't have data to train it on. Don't forget: Grass is the data layer of AI.

Data is the foundation of any AI training. All of the GPUs in the world can't train an AI model if they don't have data to train it on. Don't forget: Grass is the data layer of AI.

Grass

211,905 views • 2 years ago

Shapefiles have officially landed on Earth. By simply uploading a .zip file, you can render features and attributes as flexible, cloud-native data layers. This is the ultimate silo-breaker for professionals who need to combine local zoning data, property boundaries, and more to get a complete geospatial picture. Add your first Shapefile to Google Earth now.

Shapefiles have officially landed on Earth. By simply uploading a .zip file, you can render features and attributes as flexible, cloud-native data layers. This is the ultimate silo-breaker for professionals who need to combine local zoning data, property boundaries, and more to get a complete geospatial picture. Add your first Shapefile to Google Earth now.

Google Earth

169,577 views • 1 month ago

introducing in Hex: Generative Data Apps backed by code, so you can build anything you want! a dashboard, an editorial, a customer prezo - you name it, you can build it 💜 integrated with your data team's context + built w/ governance, security and observability top of mind

introducing in Hex: Generative Data Apps backed by code, so you can build anything you want! a dashboard, an editorial, a customer prezo - you name it, you can build it 💜 integrated with your data team's context + built w/ governance, security and observability top of mind

Olivia Koshy

23,835 views • 25 days ago