EXO Labs's banner

EXO Labs

@exolabs • 52,012 subscribers

Frontier AI on local hardware. EXO 1.0 is now open-source (Apache 2.0): https://t.co/SGGGK784Qp

Shorts

Running DeepSeek-V3 on M4 Mac Mini AI Cluster 671B MoE model distributed across 8 M4 Pro 64GB Mac Minis. Apple Silicon with unified memory is a great fit for MoE.

Running DeepSeek-V3 on M4 Mac Mini AI Cluster 671B MoE model distributed across 8 M4 Pro 64GB Mac Minis. Apple Silicon with unified memory is a great fit for MoE.

719,293 次观看

EXO 1.0 at Apple NeurIPS booth. 4 x 512GB M3 Ultra Mac Studios clustered all-to-all with RDMA over Thunderbolt 5. Runs DeepSeek v3.2 (8-bit) at 25 tok/sec with tensor parallelism.

EXO 1.0 at Apple NeurIPS booth. 4 x 512GB M3 Ultra Mac Studios clustered all-to-all with RDMA over Thunderbolt 5. Runs DeepSeek v3.2 (8-bit) at 25 tok/sec with tensor parallelism.

186,329 次观看

Distributed training on M4 Mac Mini cluster We implemented Google DeepMind DiLoCo on Apple Silicon to train large models with 100-1000x less bandwidth compared to DDP baseline. AI is entering a new era where a distributed network of consumer devices can train large models.

Distributed training on M4 Mac Mini cluster We implemented Google DeepMind DiLoCo on Apple Silicon to train large models with 100-1000x less bandwidth compared to DDP baseline. AI is entering a new era where a distributed network of consumer devices can train large models.

347,697 次观看

Day 1: Benchmarks We ran 1000+ LLM benchmarks on real consumer devices. Data includes single-device and multi-device clusters with Tokens-Per-Second (TPS) and Time-To-First-Token (TTFT). Setups tested: 3x M4 Mac Mini cluster, iPhone 15 + S24, RTX4090 & more.

Day 1: Benchmarks We ran 1000+ LLM benchmarks on real consumer devices. Data includes single-device and multi-device clusters with Tokens-Per-Second (TPS) and Time-To-First-Token (TTFT). Setups tested: 3x M4 Mac Mini cluster, iPhone 15 + S24, RTX4090 & more.

254,968 次观看

First look at SPARTA, a distributed AI training algorithm that avoids synchronization by randomly exchanging sparse sets of parameters ( 1,000x reduction in inter-GPU communication, enabling training of large models over slow bandwidths without specialized infrastructure. SPARTA works on its own but can also be combined with sync-based low communication training algorithms like DiLoCo for even better performance.

First look at SPARTA, a distributed AI training algorithm that avoids synchronization by randomly exchanging sparse sets of parameters ( 1,000x reduction in inter-GPU communication, enabling training of large models over slow bandwidths without specialized infrastructure. SPARTA works on its own but can also be combined with sync-based low communication training algorithms like DiLoCo for even better performance.

99,350 次观看

“EXO, get me in the mood for the holidays” A home assistant you can trust. It’s 100% Open-Source and runs locally. AI that’s aligned with you, not a for-profit company.

“EXO, get me in the mood for the holidays” A home assistant you can trust. It’s 100% Open-Source and runs locally. AI that’s aligned with you, not a for-profit company.

65,515 次观看

But the KV cache is created for each transformer layer. By sending each layer’s KV cache after it’s computed, we overlap communication with computation. We stream the KV cache and hide the network delay. We achieve a 4x speedup in prefill & 3x in decode, with 0 network delay.

But the KV cache is created for each transformer layer. By sending each layer’s KV cache after it’s computed, we overlap communication with computation. We stream the KV cache and hide the network delay. We achieve a 4x speedup in prefill & 3x in decode, with 0 network delay.

22,604 次观看

Videos

Anya Rossi

sweetdream.ai

SweetDream.ai•Sponsored•Livecam

Watch Anya Live

Anya is streaming live right now! Join her private show and enjoy exclusive content.

Exclusive private shows

1.2k viewers online

Private Show

Join now for exclusive access

Free preview available • Premium content

LLM running on Windows 98 PC 26 year old hardware with Intel Pentium II CPU and 128MB RAM. Uses llama98.c, our custom pure C inference engine based on Andrej Karpathy llama2.c Code and DIY guide 👇

LLM running on Windows 98 PC 26 year old hardware with Intel Pentium II CPU and 128MB RAM. Uses llama98.c, our custom pure C inference engine based on Andrej Karpathy llama2.c Code and DIY guide 👇

483,118 次观看 • 1 年前

What if we could connect all the dark compute across the globe to build the world's biggest AI data center? Most of the compute in the world is dark: phones, laptops, Tesla's, PS5's, TV's. These devices have powerful GPUs but are mostly sitting idle. Today, EXO Labs is announcing the first step in activating all of this dark compute for AI workloads: evML, a distributed computing protocol that enforces honest behavior through hardware security and spot-checks with just 5% performance overhead. Version 0.0.1 of evML is compatible with Apple, Samsung and LG devices. Our preliminary report on evML is now available (link below).

What if we could connect all the dark compute across the globe to build the world's biggest AI data center? Most of the compute in the world is dark: phones, laptops, Tesla's, PS5's, TV's. These devices have powerful GPUs but are mostly sitting idle. Today, EXO Labs is announcing the first step in activating all of this dark compute for AI workloads: evML, a distributed computing protocol that enforces honest behavior through hardware security and spot-checks with just 5% performance overhead. Version 0.0.1 of evML is compatible with Apple, Samsung and LG devices. Our preliminary report on evML is now available (link below).

137,913 次观看 • 1 年前

Unboxing the package from Apple.

Unboxing the package from Apple.

76,544 次观看 • 1 年前

Giving an AI agent access to my iPhone. Your phone knows you better than anyone. What if your AI agent could go through your phone to truly understand you? The EXO Agent uses iPhone mirroring to look through your apps including YouTube/Netflix watch history, X likes and photos. With every swipe, it learns who you are. No logins or APIs. Just your phone, mirrored. Available in preview with the EXO Desktop App (link below).

Giving an AI agent access to my iPhone. Your phone knows you better than anyone. What if your AI agent could go through your phone to truly understand you? The EXO Agent uses iPhone mirroring to look through your apps including YouTube/Netflix watch history, X likes and photos. With every swipe, it learns who you are. No logins or APIs. Just your phone, mirrored. Available in preview with the EXO Desktop App (link below).

82,558 次观看 • 1 年前

没有更多内容可加载