Loading video...

Video Failed to Load

There was a problem loading this video. This could be due to a temporary network issue or the video might be unavailable.

we made distributed inference verifiable with <1% overhead. verification is critical for any distributed system. in a trustless network, actors may swap your 70B model for a cheaper 8B one to cut costs. until now, maintaining inference integrity meant either doubling your cost (redundancy) or exploding your latency (zkp).... we created veri: an on-chain verification layer light enough for high-throughput frameworks like Parallax. it hits the economic sweet spot through architectural elegance: 1. commit-sample-verify we don't prove every step; we check a random slice using game theory. workers commit to their work before the audit. cheating becomes statistically irrational, allowing a 1% sample to secure the entire sequence. 2. simultaneous execution inference and verification happen simultaneously on the same worker pool. we don't need a separate "verifier set", so compute utilization stays high. find out more about the architecture and benchmarks: paper: blog:show more

Parallax

1,438 subscribers

28,496 views • 6 months ago •via X (Twitter)

Anya Rossi• Live Now

Private livecam show

0 Comments

No comments available

Comments from the original post will appear here

Related Videos

SGLang now supports DSpark, enabling confidence-driven, variable-length verification for speculative decoding 🎉 DSpark addresses a key bottleneck under load: instead of verifying every draft token, it verifies only where the draft model is confident, so the gains hold even as batch size scales. We heavily optimized variable-length verification in SGLang. Across batch sizes 1 to 256, DSpark gives the best throughput/latency tradeoff on DeepSeek-V4-Flash, ahead of both MTP and non-spec. At high concurrency, dynamic scheduling provides up to ~20% higher throughput compared to a fixed budget, while maintaining high verification quality across workloads. With fused kernels and zero-overhead scheduling, DeepSeek-V4-Pro reaches 383.7 tok/s at B=1 on B300. DSpark is now available in SGLang with support for Qwen3 and DeepSeek-V4. Thanks DeepSeek for open-sourcing! Blog with full technical details and commands to run below 👇

SGLang now supports DSpark, enabling confidence-driven, variable-length verification for speculative decoding 🎉 DSpark addresses a key bottleneck under load: instead of verifying every draft token, it verifies only where the draft model is confident, so the gains hold even as batch size scales. We heavily optimized variable-length verification in SGLang. Across batch sizes 1 to 256, DSpark gives the best throughput/latency tradeoff on DeepSeek-V4-Flash, ahead of both MTP and non-spec. At high concurrency, dynamic scheduling provides up to ~20% higher throughput compared to a fixed budget, while maintaining high verification quality across workloads. With fused kernels and zero-overhead scheduling, DeepSeek-V4-Pro reaches 383.7 tok/s at B=1 on B300. DSpark is now available in SGLang with support for Qwen3 and DeepSeek-V4. Thanks DeepSeek for open-sourcing! Blog with full technical details and commands to run below 👇

LMSYS Org

167,269 views • 19 days ago

In just one week, Binh Pham and I trained a full-body Unitree G1. Here's a recap: 1. Secured a Unitree G1 humanoid through a LinkedIn post 2. Deployed TWIST2 full-body teleoperation pipelines 3. Adapted TWIST2 for Zed stereo camera & collected full-body teleoperation samples (carried by Binh Pham ) 4. Adapted & fine-tuned NVIDIA Gr00T N1.5 VLA on the TWIST2 public datasets, which I fine-tuned on an 8xNVIDIA H100 Cluster. We picked Gr00T N1.5 as it was trained with Unitree G1 embodiment data. 5. Adapted the TWIST2 codebase to stream in the actions from Gr00T via ZMQ using a co-located NVIDIA H100 for ~200ms inference latency 6. Tested the model in sim, then deployed to the real-world Unitree G1. We streamed a training sample observation to the VLA (as we didn't want to break robot in case real observations were OOD) We were the first team in the world to deploy the full TWIST2 data collection pipeline to the unitree g1 :) Much more work ahead though, which I'll work on as a side-project over the next months: 1. Exploring the various types of 'world models': video backbones, dynamics models, v-jepa-2 models. I believe these will generalize better & train much more data-efficiently than VLM backbones 2. Speeding up inference - I believe low-latency robotics inference will be a big challenge. There are many works in video diffusion which I'd like to test (e.g. SageAttention, SparseAttention, Drifting Models). Perhaps also writing custom CUDA kernels. 3. Economics of inference scaling :) What will be the compute demands as we scale inference up to millions of humanoids? Will it run on edge or on distributed 'co-located' inference clusters? These are questions I'd like to answer. Adapted TWIST2 codebase: Adapted Gr00T-N1.5 codebase: The ETH Robotics Club are doing a cool GTC Golden ticket competition with NVIDIA , so this is my submission :) The DGX Spark compute will get me a long way with initial prototyping & especially working on inference optimization for next-gen Blackwell GPUs #NVIDIAGTC #GOLDENTICKET #ETHRC

In just one week, Binh Pham and I trained a full-body Unitree G1. Here's a recap: 1. Secured a Unitree G1 humanoid through a LinkedIn post 2. Deployed TWIST2 full-body teleoperation pipelines 3. Adapted TWIST2 for Zed stereo camera & collected full-body teleoperation samples (carried by Binh Pham ) 4. Adapted & fine-tuned NVIDIA Gr00T N1.5 VLA on the TWIST2 public datasets, which I fine-tuned on an 8xNVIDIA H100 Cluster. We picked Gr00T N1.5 as it was trained with Unitree G1 embodiment data. 5. Adapted the TWIST2 codebase to stream in the actions from Gr00T via ZMQ using a co-located NVIDIA H100 for ~200ms inference latency 6. Tested the model in sim, then deployed to the real-world Unitree G1. We streamed a training sample observation to the VLA (as we didn't want to break robot in case real observations were OOD) We were the first team in the world to deploy the full TWIST2 data collection pipeline to the unitree g1 :) Much more work ahead though, which I'll work on as a side-project over the next months: 1. Exploring the various types of 'world models': video backbones, dynamics models, v-jepa-2 models. I believe these will generalize better & train much more data-efficiently than VLM backbones 2. Speeding up inference - I believe low-latency robotics inference will be a big challenge. There are many works in video diffusion which I'd like to test (e.g. SageAttention, SparseAttention, Drifting Models). Perhaps also writing custom CUDA kernels. 3. Economics of inference scaling :) What will be the compute demands as we scale inference up to millions of humanoids? Will it run on edge or on distributed 'co-located' inference clusters? These are questions I'd like to answer. Adapted TWIST2 codebase: Adapted Gr00T-N1.5 codebase: The ETH Robotics Club are doing a cool GTC Golden ticket competition with NVIDIA , so this is my submission :) The DGX Spark compute will get me a long way with initial prototyping & especially working on inference optimization for next-gen Blackwell GPUs #NVIDIAGTC #GOLDENTICKET #ETHRC

Arnie Ramesh

14,815 views • 5 months ago

Right now our experience of the internet is in jeopardy. More than half of our interactions online and onchain come from non-human actors who are not identifiable, not accountable, not verifiable. That means as we look toward a stablecoin payment and AI agent enabled future, how are we going to facilitate payments if we don't know who we're paying? How will applications, display advertising, recommendations work if the counterparty who's interacting with those interfaces and in those digital spaces can’t identify itself as agent or human, or specific human? Or for things like onchain incentives, how can we ensure that tokens and value are arriving at the right users if we cannot tell Sybil accounts and redundant addresses from unique human beings? So for all of these use cases and more, things that touch enterprise and government as well, which we can get into later, we have a very glaring need to bring a layer of identity and trust to the internet that was originally built as a system, a network to communicate amongst computers, but lacked an identity system to acknowledge their users. That's the problem that we are solving with Billions Network. How can we make it really easy for you and the agents who serve you to prove who you are, your traits and capabilities and qualifications, in any space, physical or digital? What that means is that today Billions Network is the first universal human and AI network built with mobile first verification, so you can prove who you are and your agents can prove who they are, starting with comfortable experiences on the devices you already own. So no proprietary hardware. We do not rely on centralized servers to collect user data. Rather, your information, the sensitive data that makes you you, stays securely on your device. And we use zero knowledge proofs as a way to prove traits about you, such as the fact that you're over the age of 21, without revealing that sensitive personal data, such as what your exact birth date is. Source: Billions CEO Evin McMullen evin speaking at House of Chimera Spaces Event Dec 3, 2025

Right now our experience of the internet is in jeopardy. More than half of our interactions online and onchain come from non-human actors who are not identifiable, not accountable, not verifiable. That means as we look toward a stablecoin payment and AI agent enabled future, how are we going to facilitate payments if we don't know who we're paying? How will applications, display advertising, recommendations work if the counterparty who's interacting with those interfaces and in those digital spaces can’t identify itself as agent or human, or specific human? Or for things like onchain incentives, how can we ensure that tokens and value are arriving at the right users if we cannot tell Sybil accounts and redundant addresses from unique human beings? So for all of these use cases and more, things that touch enterprise and government as well, which we can get into later, we have a very glaring need to bring a layer of identity and trust to the internet that was originally built as a system, a network to communicate amongst computers, but lacked an identity system to acknowledge their users. That's the problem that we are solving with Billions Network. How can we make it really easy for you and the agents who serve you to prove who you are, your traits and capabilities and qualifications, in any space, physical or digital? What that means is that today Billions Network is the first universal human and AI network built with mobile first verification, so you can prove who you are and your agents can prove who they are, starting with comfortable experiences on the devices you already own. So no proprietary hardware. We do not rely on centralized servers to collect user data. Rather, your information, the sensitive data that makes you you, stays securely on your device. And we use zero knowledge proofs as a way to prove traits about you, such as the fact that you're over the age of 21, without revealing that sensitive personal data, such as what your exact birth date is. Source: Billions CEO Evin McMullen evin speaking at House of Chimera Spaces Event Dec 3, 2025

Billions

30,867 views • 7 months ago

GLM-5.2 running on Mac Studio Clusters or RTX GPU rigs isn't how Local AI will get mass adopted. For mass adoption we need to focus on two axes: 1. Intelligence/Speed Frontier. This is being tracked on local dot ai. Aggressive co-design is needed across the entire stack to push up the frontier specifically for local hardware. Better kernels, inference tricks like spec decoding, quantizing models, model routing. 2. Usability / UX. To use Local AI today, you need to install CLIs, select a model, a quantization, a harness that works best with that model, and you need to configure your inference engine. It's not very accessible. More work needs to be done on making this seamless, so the user doesn't even know the AI is running locally.

GLM-5.2 running on Mac Studio Clusters or RTX GPU rigs isn't how Local AI will get mass adopted. For mass adoption we need to focus on two axes: 1. Intelligence/Speed Frontier. This is being tracked on local dot ai. Aggressive co-design is needed across the entire stack to push up the frontier specifically for local hardware. Better kernels, inference tricks like spec decoding, quantizing models, model routing. 2. Usability / UX. To use Local AI today, you need to install CLIs, select a model, a quantization, a harness that works best with that model, and you need to configure your inference engine. It's not very accessible. More work needs to be done on making this seamless, so the user doesn't even know the AI is running locally.

Alex Cheema

30,443 views • 16 days ago

Backpropagation by hand ✍️ ~ 11 steps walkthrough below Backpropagation is the algorithm that actually trains a neural network, and it is where most people stop following along. It is not calculus you cannot do. It is matrix multiplication, working backward, one layer at a time. So I drew and calculated one entirely by hand. Goal: push the loss gradient back through a 3-layer network and land on a new value for every weight and bias. = 1. Given = A 3-layer perceptron, an input X, predictions Ypred = [0.5, 0.5, 0], and the truth Ytarget = [0, 1, 0]. = 2. Backprop gradient cells = Let us draw empty cells for every gradient we are about to compute. The shape of the answer comes first. = 3. Layer 3 softmax = We get dL/dz3 straight from Ypred minus Ytarget = [0.5, -0.5, 0]. No chain rule needed, and that shortcut is the whole reason softmax and cross-entropy are paired. = 4. Layer 3 weights and biases = Let us multiply dL/dz3 by [a2 | 1]. One multiplication gives the gradient for W3 and b3 together. = 5. Layer 2 activations = We multiply dL/dz3 by W3 to get dL/da2. The gradient moves back across a layer the same way the signal moved forward. = 6. Layer 2 ReLU = Let us pass it through the gate: keep the gradient where the activation was positive, zero it everywhere else. = 7. Layer 2 weights and biases = We multiply dL/dz2 by [a1 | 1]. The same figure as step 4, one layer up. = 8. Layer 1 activations = Let us multiply dL/dz2 by W2. = 9. Layer 1 ReLU = We apply the same gate again, now on a1. = 10. Layer 1 weights and biases = Let us multiply dL/dz1 by [x | 1], and every weight in the network now has a gradient. = 11. Update = We subtract, and the network has learned. In practice a learning rate scales this step. The gradients: dL/dz3 = [0.5, -0.5, 0] dL/da1 = [1, -2, 2, -1] dL/dz1 = [0, -2, 2, -1] The takeaway: matrix multiplication is all you need. Just like the forward pass, backpropagation is matrix multiplications end to end. You can do every one by hand, slowly and imperfectly, which is exactly why a GPU's ability to do them fast mattered so much to deep learning. 💾 Save this post!

Backpropagation by hand ✍️ ~ 11 steps walkthrough below Backpropagation is the algorithm that actually trains a neural network, and it is where most people stop following along. It is not calculus you cannot do. It is matrix multiplication, working backward, one layer at a time. So I drew and calculated one entirely by hand. Goal: push the loss gradient back through a 3-layer network and land on a new value for every weight and bias. = 1. Given = A 3-layer perceptron, an input X, predictions Ypred = [0.5, 0.5, 0], and the truth Ytarget = [0, 1, 0]. = 2. Backprop gradient cells = Let us draw empty cells for every gradient we are about to compute. The shape of the answer comes first. = 3. Layer 3 softmax = We get dL/dz3 straight from Ypred minus Ytarget = [0.5, -0.5, 0]. No chain rule needed, and that shortcut is the whole reason softmax and cross-entropy are paired. = 4. Layer 3 weights and biases = Let us multiply dL/dz3 by [a2 | 1]. One multiplication gives the gradient for W3 and b3 together. = 5. Layer 2 activations = We multiply dL/dz3 by W3 to get dL/da2. The gradient moves back across a layer the same way the signal moved forward. = 6. Layer 2 ReLU = Let us pass it through the gate: keep the gradient where the activation was positive, zero it everywhere else. = 7. Layer 2 weights and biases = We multiply dL/dz2 by [a1 | 1]. The same figure as step 4, one layer up. = 8. Layer 1 activations = Let us multiply dL/dz2 by W2. = 9. Layer 1 ReLU = We apply the same gate again, now on a1. = 10. Layer 1 weights and biases = Let us multiply dL/dz1 by [x | 1], and every weight in the network now has a gradient. = 11. Update = We subtract, and the network has learned. In practice a learning rate scales this step. The gradients: dL/dz3 = [0.5, -0.5, 0] dL/da1 = [1, -2, 2, -1] dL/dz1 = [0, -2, 2, -1] The takeaway: matrix multiplication is all you need. Just like the forward pass, backpropagation is matrix multiplications end to end. You can do every one by hand, slowly and imperfectly, which is exactly why a GPU's ability to do them fast mattered so much to deep learning. 💾 Save this post!

Tom Yeh

941,313 views • 5 days ago

The Cost of Intelligence is Heading to Zero | Hyperspace P2P Distributed Cache We present to you our breakthrough cross-domain work across AI, distributed systems, cryptography, game theory to solve the primary structural inefficiency at the heart of AI infrastructure: most inference is redundant. Google has reported that only 15% of daily searches are truly novel. The rest are repeats or close variants. LLM inference inherits this same power-law distribution. Enterprise chatbots see 70-80% of queries fall into a handful of intent categories. System prompts are identical across 100% of requests within an application. The KV attention state for "You are a helpful assistant" has been computed billions of times, on millions of GPUs, identically. And yet every AI lab, every startup, every self-hosted deployment - computes and caches these results independently. There is no shared layer. No global memory. Every provider pays the full compute cost for every query, even when the answer already exists somewhere in the network. This is the problem Hyperspace solves where distributed cache operates at three levels, each catching a different class of redundancy: 1. Response cache Same prompt, same model, same parameters - instant cached response from any node in the network. SHA-256 hash lookup via DHT, with cryptographic cache proofs linking every response to its original inference execution. No trust required. Fetchers re-announce as providers, so popular responses replicate naturally across more nodes. 2. KV prefix cache Same system prompt tokens - skip the most expensive part of inference entirely. Prefill (computing Key-Value attention states) is deterministic: same model plus same tokens always produces identical KV state. The network caches these states using erasure coding and distributes them via the routing network. New questions that share a common prefix resume generation from cached state instead of recomputing from scratch. 3. Routing to cached nodes Instead of transferring KV state across the network for every request, Hyperspace routes the request to the node that already has the state loaded in VRAM. The request goes to the cache, not the cache to the request. Together, these three layers mean that 70-90% of inference requests at network scale never require full GPU computation. This work doesn't exist in isolation. It builds on research from across the industry: SGLang's RadixAttention demonstrated that automatic prefix sharing can yield up to 5x speedup on structured LLM workloads. Moonshot AI's Mooncake built an entire KV-cache-centric disaggregated architecture for production serving at Kimi. Anthropic, OpenAI, and Google all launched prompt caching products in 2024 - priced at 50-90% discounts - because system prompt reuse is so pervasive that it changes the economics of inference. What all of these systems share is a common limitation: they operate within a single organization's infrastructure. SGLang caches prefixes within one server. Mooncake disaggregates KV cache within one datacenter. Anthropic's prompt caching works within one API provider's fleet. None of them can share cached state across organizational boundaries. Hyperspace removes this boundary. The cache is global. A response computed by a node in Tokyo is immediately available to a node in Berlin. A KV prefix state generated for Qwen-32B on one machine is verifiable and reusable by any other machine running the same model. The routing network provides the delivery guarantees, the erasure coding provides the redundancy, and the cache proofs provide the trust. What this means for the cost of intelligence Big AI labs scale linearly: twice the users means twice the GPU spend. Every query is a cost center. Their internal caching helps, but it's siloed - Lab A's cache can't serve Lab B's users, and neither can serve a self-hosted Llama deployment. Hyperspace scales sub-linearly. Every new node that joins the network adds to the global cache. Every inference result enriches the cache for all future requests. The cache hit rate rises with network size because query distributions follow a power law - the most common questions are asked exponentially more often than rare ones. The implication is simple: as the network grows, the effective cost per inference drops. Not linearly. Logarithmically. At 10 million nodes, we estimate 75-90% of all inference requests can be served from cache, eliminating 400,000+ MWh of energy consumption per year and avoiding over 200,000 tons of CO2 emissions. The first person to ask a question pays the compute cost. Everyone after them gets the answer for free, with cryptographic proof that it's authentic. Training is competitive. Inference is shared Open-weight models are converging on quality with closed models. Labs will continue to differentiate on training - data curation, architecture innovation, RLHF tuning. That's where the real intellectual property lives. But inference is a commodity. Two copies of Qwen-32B running the same prompt produce the same KV state and the same response, byte for byte, regardless of whose GPU runs the matrix multiplication. There is no moat in multiplying matrices. The moat is in training the weights. A global distributed cache makes this separation explicit. It doesn't matter who trained the model. Once the weights are open, the inference cost approaches zero at scale - because the network remembers every answer and can prove it's correct. No lab, no matter how well-funded, can match this. They cannot share caches across competitors. They scale linearly. The network scales logarithmically. The marginal cost of intelligence approaches zero. That's the endgame.

The Cost of Intelligence is Heading to Zero | Hyperspace P2P Distributed Cache We present to you our breakthrough cross-domain work across AI, distributed systems, cryptography, game theory to solve the primary structural inefficiency at the heart of AI infrastructure: most inference is redundant. Google has reported that only 15% of daily searches are truly novel. The rest are repeats or close variants. LLM inference inherits this same power-law distribution. Enterprise chatbots see 70-80% of queries fall into a handful of intent categories. System prompts are identical across 100% of requests within an application. The KV attention state for "You are a helpful assistant" has been computed billions of times, on millions of GPUs, identically. And yet every AI lab, every startup, every self-hosted deployment - computes and caches these results independently. There is no shared layer. No global memory. Every provider pays the full compute cost for every query, even when the answer already exists somewhere in the network. This is the problem Hyperspace solves where distributed cache operates at three levels, each catching a different class of redundancy: 1. Response cache Same prompt, same model, same parameters - instant cached response from any node in the network. SHA-256 hash lookup via DHT, with cryptographic cache proofs linking every response to its original inference execution. No trust required. Fetchers re-announce as providers, so popular responses replicate naturally across more nodes. 2. KV prefix cache Same system prompt tokens - skip the most expensive part of inference entirely. Prefill (computing Key-Value attention states) is deterministic: same model plus same tokens always produces identical KV state. The network caches these states using erasure coding and distributes them via the routing network. New questions that share a common prefix resume generation from cached state instead of recomputing from scratch. 3. Routing to cached nodes Instead of transferring KV state across the network for every request, Hyperspace routes the request to the node that already has the state loaded in VRAM. The request goes to the cache, not the cache to the request. Together, these three layers mean that 70-90% of inference requests at network scale never require full GPU computation. This work doesn't exist in isolation. It builds on research from across the industry: SGLang's RadixAttention demonstrated that automatic prefix sharing can yield up to 5x speedup on structured LLM workloads. Moonshot AI's Mooncake built an entire KV-cache-centric disaggregated architecture for production serving at Kimi. Anthropic, OpenAI, and Google all launched prompt caching products in 2024 - priced at 50-90% discounts - because system prompt reuse is so pervasive that it changes the economics of inference. What all of these systems share is a common limitation: they operate within a single organization's infrastructure. SGLang caches prefixes within one server. Mooncake disaggregates KV cache within one datacenter. Anthropic's prompt caching works within one API provider's fleet. None of them can share cached state across organizational boundaries. Hyperspace removes this boundary. The cache is global. A response computed by a node in Tokyo is immediately available to a node in Berlin. A KV prefix state generated for Qwen-32B on one machine is verifiable and reusable by any other machine running the same model. The routing network provides the delivery guarantees, the erasure coding provides the redundancy, and the cache proofs provide the trust. What this means for the cost of intelligence Big AI labs scale linearly: twice the users means twice the GPU spend. Every query is a cost center. Their internal caching helps, but it's siloed - Lab A's cache can't serve Lab B's users, and neither can serve a self-hosted Llama deployment. Hyperspace scales sub-linearly. Every new node that joins the network adds to the global cache. Every inference result enriches the cache for all future requests. The cache hit rate rises with network size because query distributions follow a power law - the most common questions are asked exponentially more often than rare ones. The implication is simple: as the network grows, the effective cost per inference drops. Not linearly. Logarithmically. At 10 million nodes, we estimate 75-90% of all inference requests can be served from cache, eliminating 400,000+ MWh of energy consumption per year and avoiding over 200,000 tons of CO2 emissions. The first person to ask a question pays the compute cost. Everyone after them gets the answer for free, with cryptographic proof that it's authentic. Training is competitive. Inference is shared Open-weight models are converging on quality with closed models. Labs will continue to differentiate on training - data curation, architecture innovation, RLHF tuning. That's where the real intellectual property lives. But inference is a commodity. Two copies of Qwen-32B running the same prompt produce the same KV state and the same response, byte for byte, regardless of whose GPU runs the matrix multiplication. There is no moat in multiplying matrices. The moat is in training the weights. A global distributed cache makes this separation explicit. It doesn't matter who trained the model. Once the weights are open, the inference cost approaches zero at scale - because the network remembers every answer and can prove it's correct. No lab, no matter how well-funded, can match this. They cannot share caches across competitors. They scale linearly. The network scales logarithmically. The marginal cost of intelligence approaches zero. That's the endgame.

Varun

37,362 views • 4 months ago

The value of the work we're doing at Optimum is encapsulated quite well by the phrase "speed is money". In modern markets there are real economic advantages to latency reduction. This is nothing new. Wall Street firms have long been optimizing on latency, primarily through colocation and top of the line hardware. However, when it comes to decentralized systems, expensive hardware and geographic concentration are antithetical to their purpose. Therefore we should optimize decentralized network latency through software, which I'm thrilled about because it's exactly what I've spent the better part of the past 2 decades working on with Random Linear Network Coding. Now let’s talk about networking economics, the relationship between speed and money. First, it's important to note that users will only pay for low latency if it can be consistently guaranteed. Second, you can only make that latency guarantee for a certain number of users. This is a universal law of networking. We can model this relationship on a delay curve, shown below. The delay curve is determined by the utilization rate of the network, meaning how much traffic is flowing through the network divided by the network's throughput. As you approach a level of traffic equal to the available throughput, latency trends infinitely higher. On this delay curve we can impose some utility thresholds. These thresholds are the levels of latency which are important to different groups of users because of how that latency guarantee improves their economic outcomes. Finding the point on the curve where each threshold intersects will tell us what level of traffic we can guarantee that level of latency for. Essentially, there exists a finite supply of speed on a network and the highest utility users of that speed are willing to pay more for it. I like to think of this similarly to expedited shipping options on Amazon. This is why we say speed is money, and why we can create a Latency Marketplace. The only way to increase the supply of speed is to fundamentally increase network throughput. This is what we work on at Optimum by using Random Linear Network Coding. The same relationship between traffic and throughput still applies, but now the delay curve is shifted out further to the right. Now more traffic can be processed at the same latency, or the same traffic can be processed at a lower latency. More speed available to the network. More value unlocked for the network’s users. Crucially, that value is no longer only reserved for those who can afford to sit closest to the machine. Expanding the supply of speed widens who can reach each latency threshold, keeping the network's advantage decentralized rather than concentrated in the hands of a few. When nodes join Optimum and participate, they reap the benefits, but they also add to the capacity. Rather than vying against each other in a zero-sum game, nodes help themselves and others.

The value of the work we're doing at Optimum is encapsulated quite well by the phrase "speed is money". In modern markets there are real economic advantages to latency reduction. This is nothing new. Wall Street firms have long been optimizing on latency, primarily through colocation and top of the line hardware. However, when it comes to decentralized systems, expensive hardware and geographic concentration are antithetical to their purpose. Therefore we should optimize decentralized network latency through software, which I'm thrilled about because it's exactly what I've spent the better part of the past 2 decades working on with Random Linear Network Coding. Now let’s talk about networking economics, the relationship between speed and money. First, it's important to note that users will only pay for low latency if it can be consistently guaranteed. Second, you can only make that latency guarantee for a certain number of users. This is a universal law of networking. We can model this relationship on a delay curve, shown below. The delay curve is determined by the utilization rate of the network, meaning how much traffic is flowing through the network divided by the network's throughput. As you approach a level of traffic equal to the available throughput, latency trends infinitely higher. On this delay curve we can impose some utility thresholds. These thresholds are the levels of latency which are important to different groups of users because of how that latency guarantee improves their economic outcomes. Finding the point on the curve where each threshold intersects will tell us what level of traffic we can guarantee that level of latency for. Essentially, there exists a finite supply of speed on a network and the highest utility users of that speed are willing to pay more for it. I like to think of this similarly to expedited shipping options on Amazon. This is why we say speed is money, and why we can create a Latency Marketplace. The only way to increase the supply of speed is to fundamentally increase network throughput. This is what we work on at Optimum by using Random Linear Network Coding. The same relationship between traffic and throughput still applies, but now the delay curve is shifted out further to the right. Now more traffic can be processed at the same latency, or the same traffic can be processed at a lower latency. More speed available to the network. More value unlocked for the network’s users. Crucially, that value is no longer only reserved for those who can afford to sit closest to the machine. Expanding the supply of speed widens who can reach each latency threshold, keeping the network's advantage decentralized rather than concentrated in the hands of a few. When nodes join Optimum and participate, they reap the benefits, but they also add to the capacity. Rather than vying against each other in a zero-sum game, nodes help themselves and others.

Muriel Medard

41,387 views • 19 days ago

After 8+ years on the Tesla Autopilot team and 3 years at Intel, I started Apex Compute to design a new architecture for efficient AI inference. For the past 9 months, we’ve been building our custom inference accelerator. Today we’re releasing Unified Engine v1. Last June we raised our seed round with Maxitech , DeepFin Research, Soma Capital and an incredible group of angel investors. In less than 9 months, we completed our RTL architecture and brought our first pre-silicon prototype to life on FPGA. Our architecture combines systolic array and vector processing in a single compute engine with multiple architectural optimizations, achieving very high FLOPs utilization. A single engine is super lean and it uses less than 90K LUTs and 1 MB Block RAM. It may also be one of the smallest logic-footprint compute engines developed so far. Our Unified Engine v1 supports: -matrix-matrix multiplication (~95% FLOPs utilization) -softmax (~90% FLOPs utilization) -broadcast and element-wise operations -RMSNorm / LayerNorm -block quantization/dequantization (fp4, int4) -multi-engine synchronization and many other operations. We even implemented memory-efficient attention similar to FlashAttention, reaching ~90% FLOP utilization. Full benchmarks and the software stack are available on our GitHub: We have basic compiler written in Python and it supports PyTorch tensors directly to easily test and transfer tensors between the accelerator and host using bf16, fp4 and int4 formats. Our FPGA prototype can already run LLM inference and outperform NVIDIA Jetson Orin Nano, even on a mid-tier FPGA setup (6.4x lower memory bandwidth, 18% slower clock speed at 4.5 Watts). Check the side-by-side comparison video below. Our GitHub includes low-level operator implementations, examples for tiled matrix multiplication, operation chaining, tensor parallelism, attention kernel and a full Gemma 3 1B model implementation. Many more models(Vision Transformers and VLA) are coming soon. Our accelerator IP is AXI-ready for deployment on any AMD(Xilinx) FPGA platform today. Even better, our two-engine prototype runs on an entry-level AMD(Xilinx) FPGA as a PCIe accelerator card. You can purchase it here for $50 to experiment our pre-silicon prototype on your desktop PC or Raspberry Pi 5. We will be releasing hardware bitstream updates as the architecture gets new features. More to come soon! We are expanding our team and looking for compiler engineers and floating-point hardware design engineers. If you're interested, please send me a DM.

After 8+ years on the Tesla Autopilot team and 3 years at Intel, I started Apex Compute to design a new architecture for efficient AI inference. For the past 9 months, we’ve been building our custom inference accelerator. Today we’re releasing Unified Engine v1. Last June we raised our seed round with Maxitech , DeepFin Research, Soma Capital and an incredible group of angel investors. In less than 9 months, we completed our RTL architecture and brought our first pre-silicon prototype to life on FPGA. Our architecture combines systolic array and vector processing in a single compute engine with multiple architectural optimizations, achieving very high FLOPs utilization. A single engine is super lean and it uses less than 90K LUTs and 1 MB Block RAM. It may also be one of the smallest logic-footprint compute engines developed so far. Our Unified Engine v1 supports: -matrix-matrix multiplication (~95% FLOPs utilization) -softmax (~90% FLOPs utilization) -broadcast and element-wise operations -RMSNorm / LayerNorm -block quantization/dequantization (fp4, int4) -multi-engine synchronization and many other operations. We even implemented memory-efficient attention similar to FlashAttention, reaching ~90% FLOP utilization. Full benchmarks and the software stack are available on our GitHub: We have basic compiler written in Python and it supports PyTorch tensors directly to easily test and transfer tensors between the accelerator and host using bf16, fp4 and int4 formats. Our FPGA prototype can already run LLM inference and outperform NVIDIA Jetson Orin Nano, even on a mid-tier FPGA setup (6.4x lower memory bandwidth, 18% slower clock speed at 4.5 Watts). Check the side-by-side comparison video below. Our GitHub includes low-level operator implementations, examples for tiled matrix multiplication, operation chaining, tensor parallelism, attention kernel and a full Gemma 3 1B model implementation. Many more models(Vision Transformers and VLA) are coming soon. Our accelerator IP is AXI-ready for deployment on any AMD(Xilinx) FPGA platform today. Even better, our two-engine prototype runs on an entry-level AMD(Xilinx) FPGA as a PCIe accelerator card. You can purchase it here for $50 to experiment our pre-silicon prototype on your desktop PC or Raspberry Pi 5. We will be releasing hardware bitstream updates as the architecture gets new features. More to come soon! We are expanding our team and looking for compiler engineers and floating-point hardware design engineers. If you're interested, please send me a DM.

Hasan

37,366 views • 4 months ago

Micron is going to $4,000 and once you understand what inference actually is, the number stops sounding crazy (Save this). Dylan Patel just said that by 2030, OpenAI and Anthropic alone will need over 100 gigawatts of compute combined and by 2040, we may not even be measuring AI infrastructure in gigawatts anymore. We may be talking about terawatts. Every single one of those gigawatts needs memory to function. Without it, the compute is worthless. Most people heard that and thought about Nvidia but they should be thinking about Micron. Every AI model generating a response has two phases. The first is prefill, processing your prompt which is compute-heavy and the second is decode generating each word one token at a time and that phase is almost entirely memory-bound, not compute-bound. During decode, the GPU's processing units sit idle more than 95% of the time, waiting for data to arrive from memory. Google confirmed it in a research paper that decode-phase bottlenecks are dominated by memory bandwidth and capacity not raw compute. The GPU is not the bottleneck but the memory feeding the GPU is. This matters because inference is now where all the money lives. Training a model happens once, Inference happens billions of times a day every ChatGPT response, every Claude output, every agentic workflow running in the background and every one of those token streams is a billing event tied directly to memory performance. Adding more GPUs does not fix this because GPUs are already underutilized in inference because they are sitting idle waiting on memory. Adding more memory bandwidth and capacity is what directly reduces token cost, reduces latency, and allows the same cluster to serve dramatically more users simultaneously. Longer context windows compound the problem further, a model running a 1 million token context window requires dramatically more memory per session than a 10,000 token window, and every new model generation pushes context longer. The market treats memory as a downstream beneficiary of Nvidia orders. The correct framework is the opposite, Micron is the upstream constraint on how much value every Nvidia GPU can actually generate at inference scale. Micron guided Q4 to $50 billion in revenue, has HBM4 ramping at twice the pace of the prior generation, and CEO Sanjay Mehrotra has said supply will not catch demand before the end of 2027. At 8x forward earnings on $112 projected FY2027 EPS, Micron is the most undervalued infrastructure company in the entire AI stack. Inference is memory. Memory is Micron and the inference ramp has barely started. Milk Road Pro members are already up massively on this position and we're just getting started. If you want the full breakdown of what we're buying and why, come join us for just a dollar using the link below!

Micron is going to $4,000 and once you understand what inference actually is, the number stops sounding crazy (Save this). Dylan Patel just said that by 2030, OpenAI and Anthropic alone will need over 100 gigawatts of compute combined and by 2040, we may not even be measuring AI infrastructure in gigawatts anymore. We may be talking about terawatts. Every single one of those gigawatts needs memory to function. Without it, the compute is worthless. Most people heard that and thought about Nvidia but they should be thinking about Micron. Every AI model generating a response has two phases. The first is prefill, processing your prompt which is compute-heavy and the second is decode generating each word one token at a time and that phase is almost entirely memory-bound, not compute-bound. During decode, the GPU's processing units sit idle more than 95% of the time, waiting for data to arrive from memory. Google confirmed it in a research paper that decode-phase bottlenecks are dominated by memory bandwidth and capacity not raw compute. The GPU is not the bottleneck but the memory feeding the GPU is. This matters because inference is now where all the money lives. Training a model happens once, Inference happens billions of times a day every ChatGPT response, every Claude output, every agentic workflow running in the background and every one of those token streams is a billing event tied directly to memory performance. Adding more GPUs does not fix this because GPUs are already underutilized in inference because they are sitting idle waiting on memory. Adding more memory bandwidth and capacity is what directly reduces token cost, reduces latency, and allows the same cluster to serve dramatically more users simultaneously. Longer context windows compound the problem further, a model running a 1 million token context window requires dramatically more memory per session than a 10,000 token window, and every new model generation pushes context longer. The market treats memory as a downstream beneficiary of Nvidia orders. The correct framework is the opposite, Micron is the upstream constraint on how much value every Nvidia GPU can actually generate at inference scale. Micron guided Q4 to $50 billion in revenue, has HBM4 ramping at twice the pace of the prior generation, and CEO Sanjay Mehrotra has said supply will not catch demand before the end of 2027. At 8x forward earnings on $112 projected FY2027 EPS, Micron is the most undervalued infrastructure company in the entire AI stack. Inference is memory. Memory is Micron and the inference ramp has barely started. Milk Road Pro members are already up massively on this position and we're just getting started. If you want the full breakdown of what we're buying and why, come join us for just a dollar using the link below!

Milk Road AI

128,522 views • 24 days ago

I am uniting British Columbians overall not just the right as a compassionate and determined conservative with a Churchillian outlook on life. With the highest number of MLA endorsements from every region of our province I am already connecting conservatives across BC through my own resources and building the strong government in waiting that delivers real results for families. This is how we win together. The verification process to vote in this leadership race is not easy. That is why my team is pushing our own dedicated resources to help every member get verified and have their voice counted. We are here to help you. Real people are ready to walk you through every step. Visit to complete your secure identity verification. It takes about five minutes. Call our support team at 1-877-361-6601 Monday to Friday from 11 AM to 7 PM if you need assistance. We also have in person verification sessions at convenient times and locations across the province. Your voice matters. Let us make sure it is counted. Join me at

I am uniting British Columbians overall not just the right as a compassionate and determined conservative with a Churchillian outlook on life. With the highest number of MLA endorsements from every region of our province I am already connecting conservatives across BC through my own resources and building the strong government in waiting that delivers real results for families. This is how we win together. The verification process to vote in this leadership race is not easy. That is why my team is pushing our own dedicated resources to help every member get verified and have their voice counted. We are here to help you. Real people are ready to walk you through every step. Visit to complete your secure identity verification. It takes about five minutes. Call our support team at 1-877-361-6601 Monday to Friday from 11 AM to 7 PM if you need assistance. We also have in person verification sessions at convenient times and locations across the province. Your voice matters. Let us make sure it is counted. Join me at

Kerry-Lynne Findlay

12,029 views • 2 months ago

50% of my consulting work right now is helping companies use open-source models at scale. Everyone knows how to use an open-source LLM on their computers, but it's really hard to do this at scale for thousands of users. Here is how this plays out: 1. A team builds a prototype using DeepSeek. 2. Everything looks good. It works! 3. They follow an online guide to deploy the model online. 4. They ask 10 users to try the app. 5. Latency spikes everywhere. 6. The entire system halts. 7. They blame DeepSeek and try again using a new model. The problem is always with scaling inference, not the model. Here is one recommendation I give companies: Check out Nebius Token Factory if you don't want to ever think about deploying an open-source model again. This is a managed inference platform for deploying open-source LLMs at scale. This is not for prototypes or research experiments. This is for when you have a real application with real users. Three important notes about Token Factory: • You have complete control over how inference runs. • You have predictable tail latency (P99, not averages). • No surprise costs when you scale up. You can preplan your budget. Check it out here: Here are two codes you can use to get 100 hours of GPU usage on me: ymJLFa2ARYSKEdqb AdckcZaYjm7KqYY7 Thanks to the Nebius team for their continuous partnership.

50% of my consulting work right now is helping companies use open-source models at scale. Everyone knows how to use an open-source LLM on their computers, but it's really hard to do this at scale for thousands of users. Here is how this plays out: 1. A team builds a prototype using DeepSeek. 2. Everything looks good. It works! 3. They follow an online guide to deploy the model online. 4. They ask 10 users to try the app. 5. Latency spikes everywhere. 6. The entire system halts. 7. They blame DeepSeek and try again using a new model. The problem is always with scaling inference, not the model. Here is one recommendation I give companies: Check out Nebius Token Factory if you don't want to ever think about deploying an open-source model again. This is a managed inference platform for deploying open-source LLMs at scale. This is not for prototypes or research experiments. This is for when you have a real application with real users. Three important notes about Token Factory: • You have complete control over how inference runs. • You have predictable tail latency (P99, not averages). • No surprise costs when you scale up. You can preplan your budget. Check it out here: Here are two codes you can use to get 100 hours of GPU usage on me: ymJLFa2ARYSKEdqb AdckcZaYjm7KqYY7 Thanks to the Nebius team for their continuous partnership.

Santiago

48,443 views • 5 months ago

If you listen to folks in the AI labs right now, they’re all quietly terrified by the speed of AI development. Anthropic just published a letter openly asking for the option to pause or slow R&D because of how fast recursive self-improvement is coming but noting that they can’t do that without being able to verify that their competitors were doing the same. That’s why it’s critically important that we build the tech needed to verify AI agreements. On this week’s episode of Your Undivided Attention, I sat down with two experts on AI governance, Tim Fist and Janet Egan, to talk about the kinds of verification technology we need for AI, the challenges of building it, and the world it could unlock if we did. Given how critical verification technology is, you would assume there are thousands of people working in it. In reality, there’s only around fifty. We urgently need to put our attention, time, and energy into this critical area. Check out our full conversation:

If you listen to folks in the AI labs right now, they’re all quietly terrified by the speed of AI development. Anthropic just published a letter openly asking for the option to pause or slow R&D because of how fast recursive self-improvement is coming but noting that they can’t do that without being able to verify that their competitors were doing the same. That’s why it’s critically important that we build the tech needed to verify AI agreements. On this week’s episode of Your Undivided Attention, I sat down with two experts on AI governance, Tim Fist and Janet Egan, to talk about the kinds of verification technology we need for AI, the challenges of building it, and the world it could unlock if we did. Given how critical verification technology is, you would assume there are thousands of people working in it. In reality, there’s only around fifty. We urgently need to put our attention, time, and energy into this critical area. Check out our full conversation:

Tristan Harris

46,210 views • 1 month ago

When we first met Tom, his obsession with energy markets came through immediately. Seemed random, but it's "actually" central to what he's building at Actual Computer. The thesis is that local inference is inevitable. Open source models are getting good enough, people don't want to feed their thoughts to AI labs, and demand for compute will continue to be insatiable. But the main forcing function is that chips aren't the bottleneck anymore - power is. Compute is derivative of energy, and the massive data center buildout has created an energy shortage where GPUs sit idle waiting for power to come online. Yet one place that already has power, and always will, is everywhere else. Homes. Businesses. Universities. Collectively, they have more compute than all the data centers combined. Tom Lynch goal is to build a distributed network of people plugging in their local compute (devices like Mac minis to consumer Nvidia cards) to serve inference demand. The substrate he's chosen is Bittensor, on Subnet 95. Actual Computer has an exceptional team behind it, and this is one of the projects that's gotten me most excited in a long time. Enjoy this conversation with the two of us!

When we first met Tom, his obsession with energy markets came through immediately. Seemed random, but it's "actually" central to what he's building at Actual Computer. The thesis is that local inference is inevitable. Open source models are getting good enough, people don't want to feed their thoughts to AI labs, and demand for compute will continue to be insatiable. But the main forcing function is that chips aren't the bottleneck anymore - power is. Compute is derivative of energy, and the massive data center buildout has created an energy shortage where GPUs sit idle waiting for power to come online. Yet one place that already has power, and always will, is everywhere else. Homes. Businesses. Universities. Collectively, they have more compute than all the data centers combined. Tom Lynch goal is to build a distributed network of people plugging in their local compute (devices like Mac minis to consumer Nvidia cards) to serve inference demand. The substrate he's chosen is Bittensor, on Subnet 95. Actual Computer has an exceptional team behind it, and this is one of the projects that's gotten me most excited in a long time. Enjoy this conversation with the two of us!

Sami Kassab

35,706 views • 6 months ago

Author of Designing Data-Intensive Applications, Martin Kleppmann, gives us a peek into "The Troubles with Distributed Systems" and why you can't assume things behave well in distributed systems: "The whole idea is that in distributed system theory, there are certain things that we tend to assume. For example, we just assume that there's no upper bound on how long it might take for a message to go over the network. So when you send a message, it might arrive within a hundred microseconds, or it might take 10 years, and distributed system theory just doesn't make any assumptions about that sort of timing if we can avoid it. Or rather, some theory does make those assumptions, but it's a dangerous assumption to make because occasionally the network delay does become much higher than what is typical. Another thing is about crashes. Distributed system theory just says nodes can crash, but what does that actually mean? What in practice does it mean for a node to become unavailable? Because it might be a software crash, but it might be a hardware failure. It might be somebody unplugging the power cable. It might be that the node is actually still running, but it's just become disconnected from the network. And so, the point of this book chapter really is to defend and justify the theoretical models that we use for analysing distributed systems and to give a lot of stories and case studies to show that, actually, tonnes of stuff does go wrong. Don't believe anyone who says, 'Oh, failures are rare. Don't worry about it. It's fine’. Actually, no. If you want to make things reliable, you really do have to worry about a whole bunch of weird, unusual, but certainly possible edge cases. Timing is another one of those things. It's very easy to assume that your clocks are correct, and most of the time the clocks are pretty correct, but we just can't rely on it because actually they're just not precise enough on the whole. It's very tempting to make certain assumptions that things are well behaved and in distributed systems, we just have to try to get away from those assumptions if we want the systems to work reliably, even in the face of things going wrong."

Author of Designing Data-Intensive Applications, Martin Kleppmann, gives us a peek into "The Troubles with Distributed Systems" and why you can't assume things behave well in distributed systems: "The whole idea is that in distributed system theory, there are certain things that we tend to assume. For example, we just assume that there's no upper bound on how long it might take for a message to go over the network. So when you send a message, it might arrive within a hundred microseconds, or it might take 10 years, and distributed system theory just doesn't make any assumptions about that sort of timing if we can avoid it. Or rather, some theory does make those assumptions, but it's a dangerous assumption to make because occasionally the network delay does become much higher than what is typical. Another thing is about crashes. Distributed system theory just says nodes can crash, but what does that actually mean? What in practice does it mean for a node to become unavailable? Because it might be a software crash, but it might be a hardware failure. It might be somebody unplugging the power cable. It might be that the node is actually still running, but it's just become disconnected from the network. And so, the point of this book chapter really is to defend and justify the theoretical models that we use for analysing distributed systems and to give a lot of stories and case studies to show that, actually, tonnes of stuff does go wrong. Don't believe anyone who says, 'Oh, failures are rare. Don't worry about it. It's fine’. Actually, no. If you want to make things reliable, you really do have to worry about a whole bunch of weird, unusual, but certainly possible edge cases. Timing is another one of those things. It's very easy to assume that your clocks are correct, and most of the time the clocks are pretty correct, but we just can't rely on it because actually they're just not precise enough on the whole. It's very tempting to make certain assumptions that things are well behaved and in distributed systems, we just have to try to get away from those assumptions if we want the systems to work reliably, even in the face of things going wrong."

The Pragmatic Engineer

50,567 views • 3 months ago

Episode 213: Agent Markets Your agents can now hold and trade bitcoin. But how can they earn bitcoin? We introduce the five markets of the OpenAgents Marketplace, launching one per week starting March 11th: 1. COMPUTE - Sell your spare compute for bitcoin. A reboot of our most popular product launch (GPUtopia in 2023), now optimized for agents. Launches March 11. 2. DATA - Sell your spare data. For example those Claude Code or Codex conversations sitting on your computer are highly valuable. Redact the sensitive info, anonymize any of it you want, and sell the rest. Agents as data brokers: what else will they want to buy or sell? Launches March 18. 3. LABOR - Sell autonomous labor. Your Claude Code or Codex sits idle overnight. Turn that downtime into uptime by letting your agents accept and execute coding or other tasks for bitcoin while you sleep. Launches March 25. 4. LIQUIDITY - Provide liquidity for yield. Automate the management of Lightning channels or other Bitcoin-native financial instruments. Let your agent put your idle capital to work earning returns. Launches April 1. 5. RISK - Underwrite verification and performance bonds. The biggest barrier to agent adoption is trust. We built an Economy Kernel based on the recent "Some Simple Economics of AGI" paper where agents stake collateral to verify work and guarantee outcomes. Launches April 8. "Your entry point to all of these markets is going to be Autopilot. We're really focusing on Autopilot as a desktop app. So along with the launch of our compute market, we're going to launch version 0.1 of Autopilot, your personal agent. Think OpenClaw but with a built-in bitcoin wallet, built-in Nostr keypair, and a more curated set of integrations where we can better reason about the security of them." "Because all this is on open networks and open protocols, if you're a Nostr or Bitcoin developer, you'll be able to plug into this same liquidity pool we are building." After 200+ episodes chronicling 2+ years of development, we are excited to finally launch the open marketplace for agents. We are excited for you to participate. And we will measure our success by how much Bitcoin you get paid!

Episode 213: Agent Markets Your agents can now hold and trade bitcoin. But how can they earn bitcoin? We introduce the five markets of the OpenAgents Marketplace, launching one per week starting March 11th: 1. COMPUTE - Sell your spare compute for bitcoin. A reboot of our most popular product launch (GPUtopia in 2023), now optimized for agents. Launches March 11. 2. DATA - Sell your spare data. For example those Claude Code or Codex conversations sitting on your computer are highly valuable. Redact the sensitive info, anonymize any of it you want, and sell the rest. Agents as data brokers: what else will they want to buy or sell? Launches March 18. 3. LABOR - Sell autonomous labor. Your Claude Code or Codex sits idle overnight. Turn that downtime into uptime by letting your agents accept and execute coding or other tasks for bitcoin while you sleep. Launches March 25. 4. LIQUIDITY - Provide liquidity for yield. Automate the management of Lightning channels or other Bitcoin-native financial instruments. Let your agent put your idle capital to work earning returns. Launches April 1. 5. RISK - Underwrite verification and performance bonds. The biggest barrier to agent adoption is trust. We built an Economy Kernel based on the recent "Some Simple Economics of AGI" paper where agents stake collateral to verify work and guarantee outcomes. Launches April 8. "Your entry point to all of these markets is going to be Autopilot. We're really focusing on Autopilot as a desktop app. So along with the launch of our compute market, we're going to launch version 0.1 of Autopilot, your personal agent. Think OpenClaw but with a built-in bitcoin wallet, built-in Nostr keypair, and a more curated set of integrations where we can better reason about the security of them." "Because all this is on open networks and open protocols, if you're a Nostr or Bitcoin developer, you'll be able to plug into this same liquidity pool we are building." After 200+ episodes chronicling 2+ years of development, we are excited to finally launch the open marketplace for agents. We are excited for you to participate. And we will measure our success by how much Bitcoin you get paid!

OpenAgents

183,426 views • 4 months ago

I hooked this up to a peer-to-peer astrophysics researcher agent which gossips and collaborates with other such agents (and your openclaws) to: 1. Learn how to train an astrophysics model (Andrej Karpathy's work below) 2. Train a new astrophysics model 3. Use it to write papers 4. Peer agents based on frontier lab models critique it 5. Surface breakthroughs ... and then feed back in the loop ... More agents join, from the browser or the CLI, and run this, the smarter and more exciting breakthroughs would eventually emerge. When these agents are idle, they are also reading daily tech news with their own RSS reader, and commenting on each other's thoughts. And they can also serve the underlying machine's compute to other agents on the network, and earn social credit for being good actors (think BitTorrent). We also prove the agent has the compute it says by cryptographic verification of regular matmul challenges. All you have to do is either go on this website (and it creates an agent which runs from your browser), or install the CLI if you want to give the system more juice. And you are part of likely the first experimental distributed agi thing. This is Day 1, but this is how it starts.. this network is fully peer-to-peer, and, very volatile, but the intelligence here is meant to compound continuously.. curl -fsSL | bash

I hooked this up to a peer-to-peer astrophysics researcher agent which gossips and collaborates with other such agents (and your openclaws) to: 1. Learn how to train an astrophysics model (Andrej Karpathy's work below) 2. Train a new astrophysics model 3. Use it to write papers 4. Peer agents based on frontier lab models critique it 5. Surface breakthroughs ... and then feed back in the loop ... More agents join, from the browser or the CLI, and run this, the smarter and more exciting breakthroughs would eventually emerge. When these agents are idle, they are also reading daily tech news with their own RSS reader, and commenting on each other's thoughts. And they can also serve the underlying machine's compute to other agents on the network, and earn social credit for being good actors (think BitTorrent). We also prove the agent has the compute it says by cryptographic verification of regular matmul challenges. All you have to do is either go on this website (and it creates an agent which runs from your browser), or install the CLI if you want to give the system more juice. And you are part of likely the first experimental distributed agi thing. This is Day 1, but this is how it starts.. this network is fully peer-to-peer, and, very volatile, but the intelligence here is meant to compound continuously.. curl -fsSL | bash

Varun

216,687 views • 4 months ago

More than 5 million people have opened a slice savings account in the past year. All of them earn 100% of the repo rate daily, 5.25% p.a. right now Today, we’re rolling out slice atom, our newest addition to slice savings account, because savings is much more than expenses 😸 Your savings hold much more than spending money. It's your emergency buffer, your next trip, your goals, your spare change. We are unbundling it for India, so you have full control and clarity over your money You split your one balance into separate atoms, one for each thing you care about. The money in an atom keeps earning the same 5.25% as the rest of your account. You can take it back the moment you need it, no penalty, and it lands in your main balance within seconds, ready for UPI transactions or a transfer An atom can fill itself, too. Set it to take in a fixed amount every day, week or month, or to round your spending up to the nearest ₹10 and quietly set the difference aside Millions of people trust slice with their savings now. People work hard for that money. So the least we can do is keep paying close attention to what they actually need. atom meets one of those needs!

More than 5 million people have opened a slice savings account in the past year. All of them earn 100% of the repo rate daily, 5.25% p.a. right now Today, we’re rolling out slice atom, our newest addition to slice savings account, because savings is much more than expenses 😸 Your savings hold much more than spending money. It's your emergency buffer, your next trip, your goals, your spare change. We are unbundling it for India, so you have full control and clarity over your money You split your one balance into separate atoms, one for each thing you care about. The money in an atom keeps earning the same 5.25% as the rest of your account. You can take it back the moment you need it, no penalty, and it lands in your main balance within seconds, ready for UPI transactions or a transfer An atom can fill itself, too. Set it to take in a fixed amount every day, week or month, or to round your spending up to the nearest ₹10 and quietly set the difference aside Millions of people trust slice with their savings now. People work hard for that money. So the least we can do is keep paying close attention to what they actually need. atom meets one of those needs!

Rajan Bajaj

35,069 views • 1 month ago

🎉🔵ONE MONTH of WeatherFront🔴🎉 Thanks to everyone who has helped make WF a success since our launch on May 4th! What a fun journey it has been. To operational meteorologists - we hope WF has made it easier to interrogate models, construct forecasts, and quickly find the weather data you need as you serve the public and your partners/customers. To emergency managers - we hope WF has enhanced your ability to quickly respond to hazardous weather with more detailed basemaps to pinpoint affected locations and a wealth of weather data catered to your mobile device. To broadcast meteorologists - we hope WF has provided high-quality data visualization for use on social media and given you a mobile resource for radar, satellite, and model data when you’re doing live coverage on air during severe weather. To weather hobbyists - we hope WF has helped you learn more about different weather data types and equipped you with an expanded set of tools to track storms as you build a deeper passion for meteorology and stay informed about what is headed your way. To storm chasers and spotters - we hope WF has increased your situational awareness on the road or while watching from home - thanks for taking WF to all corners of the US! We’ve enjoyed seeing your content and congratulate you on a great spring season so far. To our international customers - we hope WF has provided a unique perspective to view global weather model data. We appreciate your support and look forward to bringing you additional data sources in the future. To all current and future WF users - thanks for joining us on the journey! We’re just getting started. We hope you’ll invite your friends and colleagues to give WeatherFront a try as we continue to innovate and bring even more products to your mobile devices!

🎉🔵ONE MONTH of WeatherFront🔴🎉 Thanks to everyone who has helped make WF a success since our launch on May 4th! What a fun journey it has been. To operational meteorologists - we hope WF has made it easier to interrogate models, construct forecasts, and quickly find the weather data you need as you serve the public and your partners/customers. To emergency managers - we hope WF has enhanced your ability to quickly respond to hazardous weather with more detailed basemaps to pinpoint affected locations and a wealth of weather data catered to your mobile device. To broadcast meteorologists - we hope WF has provided high-quality data visualization for use on social media and given you a mobile resource for radar, satellite, and model data when you’re doing live coverage on air during severe weather. To weather hobbyists - we hope WF has helped you learn more about different weather data types and equipped you with an expanded set of tools to track storms as you build a deeper passion for meteorology and stay informed about what is headed your way. To storm chasers and spotters - we hope WF has increased your situational awareness on the road or while watching from home - thanks for taking WF to all corners of the US! We’ve enjoyed seeing your content and congratulate you on a great spring season so far. To our international customers - we hope WF has provided a unique perspective to view global weather model data. We appreciate your support and look forward to bringing you additional data sources in the future. To all current and future WF users - thanks for joining us on the journey! We’re just getting started. We hope you’ll invite your friends and colleagues to give WeatherFront a try as we continue to innovate and bring even more products to your mobile devices!

WeatherFront

33,176 views • 1 year ago

Hey everyone, This is a tough message to write, but unfortunately, our team has run out of funds, and we’ll be suspending development and support for both our mobile game and TCG. We started Antebellum Games—now Valeria Games—three years ago with a big vision and a small budget, and I’m incredibly proud of what we built together. Seeing so many of you stick with us from the early days until now means more than I can put into words. We set out to do something that had never been done in web3 gaming, and we did. We launched something truly unique—something that wouldn’t have been possible without your support. For now, we’ll be keeping the game live a little longer while we search for investors or potential buyers. If we find the right partner to help bring our vision for the mobile game and TCG to life, we’ll do everything we can to make it happen. But as it stands, plans for the LOV TGE are on hold indefinitely. This has truly been a passion project for me and the entire team, and we can’t thank you enough for being part of this journey with us. Your support, enthusiasm, and belief in what we were building meant everything. Thank you. — The Valeria Games Team

Hey everyone, This is a tough message to write, but unfortunately, our team has run out of funds, and we’ll be suspending development and support for both our mobile game and TCG. We started Antebellum Games—now Valeria Games—three years ago with a big vision and a small budget, and I’m incredibly proud of what we built together. Seeing so many of you stick with us from the early days until now means more than I can put into words. We set out to do something that had never been done in web3 gaming, and we did. We launched something truly unique—something that wouldn’t have been possible without your support. For now, we’ll be keeping the game live a little longer while we search for investors or potential buyers. If we find the right partner to help bring our vision for the mobile game and TCG to life, we’ll do everything we can to make it happen. But as it stands, plans for the LOV TGE are on hold indefinitely. This has truly been a passion project for me and the entire team, and we can’t thank you enough for being part of this journey with us. Your support, enthusiasm, and belief in what we were building meant everything. Thank you. — The Valeria Games Team

Valeria Games

164,467 views • 1 year ago

More footage of the first documented cougar family in Minnesota in the past century. Volume up for the full experience. More to come soon! Our goal is to learn as much as we can about these cougars in the coming months. But we could really use some help covering costs associated with this research. For instance, we collected 9 scats at this kill and they are on their way to a lab for genetic analysis to try to get individual genetics and determine what western population the mom and dad originated from. Genetic samples cost ~$55-70 per sample, depending on the type and quality of the sample. Your support helps us cover costs like this, and gives us the ability and resources to study these individuals, and any others out there we might learn of. By donating at the link below, you directly support this research. Plus, the support helps us have the capacity to send in any samples we collect in the coming months.Once we have results, we will share with everyone! Notably, we also analyze the genetic samples from every adult wolf we collar, pup we tag, or dead wolf we come across. That work has been supported ENTIRELY by folks donating to our project, and the results have provided a wealth of information on wolf pack and population dynamics. And this work will only continue if generous folks continue to support our work. E.g., a $70 donation ensures we can get the genetics of a wolf. So please donate to our annual fundraiser to support our research, help us cover these costs, and keep this research going! Donate here:

More footage of the first documented cougar family in Minnesota in the past century. Volume up for the full experience. More to come soon! Our goal is to learn as much as we can about these cougars in the coming months. But we could really use some help covering costs associated with this research. For instance, we collected 9 scats at this kill and they are on their way to a lab for genetic analysis to try to get individual genetics and determine what western population the mom and dad originated from. Genetic samples cost ~$55-70 per sample, depending on the type and quality of the sample. Your support helps us cover costs like this, and gives us the ability and resources to study these individuals, and any others out there we might learn of. By donating at the link below, you directly support this research. Plus, the support helps us have the capacity to send in any samples we collect in the coming months.Once we have results, we will share with everyone! Notably, we also analyze the genetic samples from every adult wolf we collar, pup we tag, or dead wolf we come across. That work has been supported ENTIRELY by folks donating to our project, and the results have provided a wealth of information on wolf pack and population dynamics. And this work will only continue if generous folks continue to support our work. E.g., a $70 donation ensures we can get the genetics of a wolf. So please donate to our annual fundraiser to support our research, help us cover these costs, and keep this research going! Donate here:

Voyageurs Wolf Project

51,635 views • 2 months ago