Video wird geladen...

Video konnte nicht geladen werden

Beim Laden dieses Videos ist ein Problem aufgetreten. Dies könnte an einem vorübergehenden Netzwerkproblem liegen oder das Video ist möglicherweise nicht verfügbar.

There's no point in doing decentralized training without efficient communication. >> DiLoCo (H=15) ships ~480mb/merge with 163 syncs. >> SparseLoCo (H=15) ships ~5.5–17mb/merge at 0.78–3.12% density with 163 syncs Top-K Compression + 2 bit comms ~28–89× smaller per sync than DiLoCo. Subnet 3 :: Luis el grande If you... have the algorithm, you can train large language models across disparate compute, collectively. "In the space of eight months or nine months, we've been able to scale our model from 1.2B to 70B, which represents 58x improvement" Distributed State Research paper :: Full Episode059 + const :: The holy grail of distributed AI training SN3 :: Templar :: Luis el grande_ai SN39 :: Basilica :: basilica SN81 :: Grail :: grail #SN3 #SN39 #SN81 #Bittensorshow more

Openτensor Foundaτion

171,776 subscribers

17,767 Aufrufe • vor 10 Monaten •via X (Twitter)

Anya Rossi• Live Now

Private livecam show

0 Kommentare

Keine Kommentare verfügbar

Kommentare vom Original-Post werden hier angezeigt

Ähnliche Videos

Covenant Labs just did a 90-minute AMA breaking down their 3 Bittensor subnets. templar. basilica. grail. Pre-training, compute, and post-training under one roof. Most people missed it. Here's everything they said. Covenant is building what they call the "end to end intelligence continuum." Three subnets. Three layers of the AI stack. All permissionless. Templar (SN3) handles decentralized pre-training. Basilica (SN39) handles compute. Grail (SN81) handles RL post-training. Sam Dare, the lead, put it bluntly. Decentralized training is "humanity's last dance." Not about beating OpenAI head to head. About creating optionality. About making it cheap enough for anyone to train models. The gap between academia and frontier labs is growing exponentially. Researchers can't afford to experiment. The actual training run costs 5% of the reported budget. The other 95% is experimentation. If Covenant cracks cheap training, that entire surface area opens up. On Templar specifically: • Hit 39% emission on Bittensor. Highest since Apex was the only subnet on the network • Covenant-72B trained permissionlessly with 70+ contributors on commodity internet • 1.1 trillion tokens processed. No centralized data center • Performance competitive with LLaMA-2-70B On Grail, something flew under the radar. They built Pulse. A weight synchronization method that compresses model updates by 100x. • In RL post-training, only ~1% of weights update per step • Pulse exploits that sparsity. Lossless compression • Prime Intellect's comparable system took 14 minutes to sync a 30B model • Pulse makes decentralized RL training actually feasible at scale • Already used by Cursor The lead researcher on Grail said they've trained on math, code, and GPU kernels. Got 40-60% improvement on benchmarks. Working toward agentic training with 100K+ token context and 30B+ parameter models. On Basilica, the compute subnet: The team was blunt. Just reselling GPU hours is a 5-10% margin game. Traditional compute providers already do that. Their play is value-added services. • "GPU as code." No dashboard. No UI. Agents interact via SDK • Custom scheduler that places workloads across heterogeneous hardware • Verification checks for GPU, CPU, bandwidth, memory, storage, and OS security • Partnerships with providers like Mass Compute for 10-20% below market pricing • Miners compete on useful infrastructure, not just GPU hours Sam then went on a rant about the miner burn debate. His take: Bittensor had to grow up. dTAO introduced investors. The old "miners are God" philosophy doesn't hold. • Subnet owners have a duty to protect token value • Miners are a resource optimization exercise, not a cost reduction exercise • 100% miner emissions on compute subnets = immediate sell pressure • The 41% miner allocation is arbitrary. Different business models need different splits • Fish (who started burns) agreed. Burns usually mean the validation isn't mature enough The bigger point. You can't police burns. Subnets just send to their own keys instead of the burn address. Subnet 28 does exactly that. Sam's position: judge subnets on outcomes, not process. Const has changed the protocol 9-10 times in 2 years. That iteration speed is Bittensor's actual moat. The whole Covenant thesis is playing out in real time. TAO is up 100%+ in a month. Jensen Huang name-dropped the network. Grayscale has an ETF filing. But the real story is three subnets quietly building every layer of decentralized AI.

Covenant Labs just did a 90-minute AMA breaking down their 3 Bittensor subnets. templar. basilica. grail. Pre-training, compute, and post-training under one roof. Most people missed it. Here's everything they said. Covenant is building what they call the "end to end intelligence continuum." Three subnets. Three layers of the AI stack. All permissionless. Templar (SN3) handles decentralized pre-training. Basilica (SN39) handles compute. Grail (SN81) handles RL post-training. Sam Dare, the lead, put it bluntly. Decentralized training is "humanity's last dance." Not about beating OpenAI head to head. About creating optionality. About making it cheap enough for anyone to train models. The gap between academia and frontier labs is growing exponentially. Researchers can't afford to experiment. The actual training run costs 5% of the reported budget. The other 95% is experimentation. If Covenant cracks cheap training, that entire surface area opens up. On Templar specifically: • Hit 39% emission on Bittensor. Highest since Apex was the only subnet on the network • Covenant-72B trained permissionlessly with 70+ contributors on commodity internet • 1.1 trillion tokens processed. No centralized data center • Performance competitive with LLaMA-2-70B On Grail, something flew under the radar. They built Pulse. A weight synchronization method that compresses model updates by 100x. • In RL post-training, only ~1% of weights update per step • Pulse exploits that sparsity. Lossless compression • Prime Intellect's comparable system took 14 minutes to sync a 30B model • Pulse makes decentralized RL training actually feasible at scale • Already used by Cursor The lead researcher on Grail said they've trained on math, code, and GPU kernels. Got 40-60% improvement on benchmarks. Working toward agentic training with 100K+ token context and 30B+ parameter models. On Basilica, the compute subnet: The team was blunt. Just reselling GPU hours is a 5-10% margin game. Traditional compute providers already do that. Their play is value-added services. • "GPU as code." No dashboard. No UI. Agents interact via SDK • Custom scheduler that places workloads across heterogeneous hardware • Verification checks for GPU, CPU, bandwidth, memory, storage, and OS security • Partnerships with providers like Mass Compute for 10-20% below market pricing • Miners compete on useful infrastructure, not just GPU hours Sam then went on a rant about the miner burn debate. His take: Bittensor had to grow up. dTAO introduced investors. The old "miners are God" philosophy doesn't hold. • Subnet owners have a duty to protect token value • Miners are a resource optimization exercise, not a cost reduction exercise • 100% miner emissions on compute subnets = immediate sell pressure • The 41% miner allocation is arbitrary. Different business models need different splits • Fish (who started burns) agreed. Burns usually mean the validation isn't mature enough The bigger point. You can't police burns. Subnets just send to their own keys instead of the burn address. Subnet 28 does exactly that. Sam's position: judge subnets on outcomes, not process. Const has changed the protocol 9-10 times in 2 years. That iteration speed is Bittensor's actual moat. The whole Covenant thesis is playing out in real time. TAO is up 100%+ in a month. Jensen Huang name-dropped the network. Grayscale has an ETF filing. But the real story is three subnets quietly building every layer of decentralized AI.

Jesus Martinez

26,642 Aufrufe • vor 4 Monaten

A subnet founder allegedly walked away with $10M in TAO and torched his own community in the process. On TWiST, Mark Jeffrey of Stillcore Capital joins us to break down what really happened with Templar/Covenant, the overall fixes that co-founder Const has proposed, and why Bittensor’s incentive engine may have been a victim of its own success. PLUS we’re joined by subnet operators Will Squires and Steffan Cruz (of MacroCosmos) and Ken Miyachi of BitMind to get their perspective on the controversy, and to demo the exciting projects they’re still building on the blockchain. 0:00 Mark Jeffrey joins the show! 2:18 How Mark Jeffrey learned about Bittensor. 6:17 Plaud: If your work depends on conversations — interviews, meetings, calls — you need a Plaud NotePin. You can check it out at and use code TWIST for 10% off! 7:22 Mark Jeffrey's Bittensor investments. 9:25 Check out our discussion with Nova: 10:16 Sentry - New users can get $240 in free credits when they go to and use the code TWIST 10:41 Check out Ridges! 11:53 How trading alpha tokens works on Bittensor 12:44 Subnet drama: what happened? 16:01 Do subnet owners have too much power? 18:33 Check out our conversation with Sam Dare (2268): 19:10 How Sam Dare should've handled walking away (per Mark Jeffrey) 20:02 Deel - Founders scale faster on Deel. Set up payroll for any country in minutes, hire anyone anywhere, get visas handled fast, and get back to building. Visit to learn more. 23:29 Who should subnets be owned by? 24:02 Ken Miyachi from BitMind joins the show 30:56 Netsuite - Get the free business guide Demystifying AI at 31:06 Ken's $3M raise & investors (Arch, Canonical, Mechanism) 33:18 Token vs. equity: how to think about a subnet investment. 41:57 Will Squires and Stefan Kruse of MacroCosmos join the show 42:54 How MacroCosmos lets anyone become a compute provider. 56:29 Stefan on the Covenant drama: "disappointing, but solvable" 1:02:11 Off-duty with J-Cal, Mark Jeffrey, and Lon Harris 1:02:48 Bieber vs. Carpenter: does Coachella owe you a spectacle? 1:15:20 Jason says Staples should pay the "Staples baddie" $1M/year cc: @jason, Lon Harris, Mark Jeffrey, const, Distributed State, templar, covenant, Macrocosmos, Apex・SN1, IOTA ・ SN9, Ken Jon, BitMind BitMindAI 🎥 Watch the full episode here 👇

A subnet founder allegedly walked away with $10M in TAO and torched his own community in the process. On TWiST, Mark Jeffrey of Stillcore Capital joins us to break down what really happened with Templar/Covenant, the overall fixes that co-founder Const has proposed, and why Bittensor’s incentive engine may have been a victim of its own success. PLUS we’re joined by subnet operators Will Squires and Steffan Cruz (of MacroCosmos) and Ken Miyachi of BitMind to get their perspective on the controversy, and to demo the exciting projects they’re still building on the blockchain. 0:00 Mark Jeffrey joins the show! 2:18 How Mark Jeffrey learned about Bittensor. 6:17 Plaud: If your work depends on conversations — interviews, meetings, calls — you need a Plaud NotePin. You can check it out at and use code TWIST for 10% off! 7:22 Mark Jeffrey's Bittensor investments. 9:25 Check out our discussion with Nova: 10:16 Sentry - New users can get $240 in free credits when they go to and use the code TWIST 10:41 Check out Ridges! 11:53 How trading alpha tokens works on Bittensor 12:44 Subnet drama: what happened? 16:01 Do subnet owners have too much power? 18:33 Check out our conversation with Sam Dare (2268): 19:10 How Sam Dare should've handled walking away (per Mark Jeffrey) 20:02 Deel - Founders scale faster on Deel. Set up payroll for any country in minutes, hire anyone anywhere, get visas handled fast, and get back to building. Visit to learn more. 23:29 Who should subnets be owned by? 24:02 Ken Miyachi from BitMind joins the show 30:56 Netsuite - Get the free business guide Demystifying AI at 31:06 Ken's $3M raise & investors (Arch, Canonical, Mechanism) 33:18 Token vs. equity: how to think about a subnet investment. 41:57 Will Squires and Stefan Kruse of MacroCosmos join the show 42:54 How MacroCosmos lets anyone become a compute provider. 56:29 Stefan on the Covenant drama: "disappointing, but solvable" 1:02:11 Off-duty with J-Cal, Mark Jeffrey, and Lon Harris 1:02:48 Bieber vs. Carpenter: does Coachella owe you a spectacle? 1:15:20 Jason says Staples should pay the "Staples baddie" $1M/year cc: @jason, Lon Harris, Mark Jeffrey, const, Distributed State, templar, covenant, Macrocosmos, Apex・SN1, IOTA ・ SN9, Ken Jon, BitMind BitMindAI 🎥 Watch the full episode here 👇

This Week in Startups

29,964 Aufrufe • vor 3 Monaten

Anthropic CEO Dario Amodei just gave THE MOST accelerated talk on how scaling will continue to make models exponentially more powerful for a long time to come — and that there is "NO WALL." 🔥 - From 'Alex Kantrowitz' YT Channel (Full Video link in comment) --- "The thing I think is real that I've said over and over again is the exponential. The idea that every few months we get an AI model that is better than the AI model we got before. And we get that by investing more compute in AI models, more data, more new types of training models. Initially, this was done by what's called pre-training, which is when you just feed a bunch of data from the internet into the model. Now we have a second stage that's reinforcement learning or test time compute or reasoning or whatever you want to call it. I think of it as a second stage that involves reinforcement learning. Now both of those things are scaling up together, as we've seen with our models and as we've seen with models from other companies. I don't see anything blocking the further scaling of that. There's some stuff about how do we broaden the tasks on the RL side of it. We've seen more progress on, say, math and code, where the models are getting pretty close to a high professional level, and less on more subjective tasks, but I think that is very much a temporary obstacle. So when I look at it, I see this exponential and I say, look, people aren't very good at making sense of exponentials, right? Like, if something is doubling every 6 months, then 2 years before it happens, it looks like it's only 1/16th of the way there. And so we are sitting here in the middle of 2025, and the models are really starting to explode in terms of the economy. If you look at the capabilities of the model, they're starting to saturate all the benchmarks. If you look at revenue, and you know, Anthropic's revenue every year has grown 10x. Every year we’re kind of conservative and we say, it can’t grow 10x this time. I never assume anything and actually always am very conservative in saying I think it's going to slow down on the business side. But we went from zero to $100 million in 2023, we went from $100 million to $1 billion in 2024, and this year, in the first half of the year, we've gone from $1 billion to, I think as of speaking today, it's well above $4 billion, it might be $4.5 billion. And so if you think about it, suppose that exponential continued for 2 years. I'm not saying it will, but suppose it continued for 2 years. You're well into the $100 billions. I'm not saying that'll happen. I'm saying the situation is that when you're on an exponential, you can really get fooled by it. 2 years away from when the exponential goes totally crazy, it looks like it's just starting to be a thing. And so that's the fundamental dynamic. We saw that with the internet in the '90s, right? Where it was like networking speeds and the underlying speed of the computers were getting fast, and over a few years it became possible to have to basically build a digital global communications network on top of all this when it wasn't possible just a few years ago and and almost no one except for a few people really saw the implications of that and how fast it." - Anthropic CEO Dario Amodei

Anthropic CEO Dario Amodei just gave THE MOST accelerated talk on how scaling will continue to make models exponentially more powerful for a long time to come — and that there is "NO WALL." 🔥 - From 'Alex Kantrowitz' YT Channel (Full Video link in comment) --- "The thing I think is real that I've said over and over again is the exponential. The idea that every few months we get an AI model that is better than the AI model we got before. And we get that by investing more compute in AI models, more data, more new types of training models. Initially, this was done by what's called pre-training, which is when you just feed a bunch of data from the internet into the model. Now we have a second stage that's reinforcement learning or test time compute or reasoning or whatever you want to call it. I think of it as a second stage that involves reinforcement learning. Now both of those things are scaling up together, as we've seen with our models and as we've seen with models from other companies. I don't see anything blocking the further scaling of that. There's some stuff about how do we broaden the tasks on the RL side of it. We've seen more progress on, say, math and code, where the models are getting pretty close to a high professional level, and less on more subjective tasks, but I think that is very much a temporary obstacle. So when I look at it, I see this exponential and I say, look, people aren't very good at making sense of exponentials, right? Like, if something is doubling every 6 months, then 2 years before it happens, it looks like it's only 1/16th of the way there. And so we are sitting here in the middle of 2025, and the models are really starting to explode in terms of the economy. If you look at the capabilities of the model, they're starting to saturate all the benchmarks. If you look at revenue, and you know, Anthropic's revenue every year has grown 10x. Every year we’re kind of conservative and we say, it can’t grow 10x this time. I never assume anything and actually always am very conservative in saying I think it's going to slow down on the business side. But we went from zero to $100 million in 2023, we went from $100 million to $1 billion in 2024, and this year, in the first half of the year, we've gone from $1 billion to, I think as of speaking today, it's well above $4 billion, it might be $4.5 billion. And so if you think about it, suppose that exponential continued for 2 years. I'm not saying it will, but suppose it continued for 2 years. You're well into the $100 billions. I'm not saying that'll happen. I'm saying the situation is that when you're on an exponential, you can really get fooled by it. 2 years away from when the exponential goes totally crazy, it looks like it's just starting to be a thing. And so that's the fundamental dynamic. We saw that with the internet in the '90s, right? Where it was like networking speeds and the underlying speed of the computers were getting fast, and over a few years it became possible to have to basically build a digital global communications network on top of all this when it wasn't possible just a few years ago and and almost no one except for a few people really saw the implications of that and how fast it." - Anthropic CEO Dario Amodei

Rohan Paul

74,406 Aufrufe • vor 1 Jahr

FULL TRANSCRIPT OF ELON'S CYBERCAB AND ROBOVAN PRESENTATION 00:00 Welcome 01:16 Cybercab & Future of transportation 04:33 Cost 05:53 Timeline 07:13 Self-driving technology 10:05 Inductive charging 10:24 The cities of the future 11:04 Robovan 12:13 Optimus Welcome Welcome to the We, Robot party. We have quite a show for you tonight. I think you're going to like it. As you can see, I just arrived in the Robotaxi, the Cybercab. And there's 20 more where that came from. So they've been traveling, there's no people in them. As you can see, the car is just going by with no people. We have 50 fully autonomous cars here tonight. So you'll see model Y's and the Cybercabs, all driverless. You'll be able to take a ride in the Cybercab. There's no steering wheel or pedals. So I hope this goes well, we'll find out. You see a lot of sci-fi movies where the future is dark and dismal, where it's not a future you want to be in. So, you know, I love Blade Runner, but I don't know if we want that future. We want that duster he's wearing, but not the bleak apocalypse. We want to have a fun, exciting future that, if you could look in a crystal ball and see the future, you'd be like, yes, I wish I could be there now. That's what we want. Cybercab & Future of transportation So, when we think about transport today, there's a lot of pain that we take for granted, that we think is normal. Like having to drive around LA in 3 hours of traffic. Yeah, people that live in LA, I mean, you know, try to get from Pasadena to El Segundo during rush hour. You can fly to another city faster than you can get to LA. And you have to drive the whole way, unless you're in a Tesla. Of course, our Tesla already does quite well at this supervised self-driving. So, supervised full self-driving is actually working quite well. I'm sure there's people in the crowd who are using that. So, we'll move from supervised full self-driving to unsupervised full self-driving where the car, you could fall asleep and wake up at your destination. But there's also a challenge for a lot of people that cars cost too much. I mean, when you factor in everything that goes into a car and the car insurance and the car payments, storage of the car, it's very expensive. You say, like, how many hours a week are cars used? Your average passenger car is only used about 10 hours a week out of 168 hours. So, the vast majority of the time cars are just doing nothing. But if they're autonomous, they could be used, I don't know, five times more, maybe ten times more. So you could actually, for the same car, would have five times as much value, maybe ten times as much value. There's 168 hours in the week, and like I said, only ten of them are used for driving. And then, a bunch of those hours are looking for a parking spot, which can be pretty annoying at times. So, with autonomy, you get your time back. This is a very big deal. So it's not just, it'll save lives, like a lot of lives and prevent injuries. I think we'll see autonomous cars become ten times safer than a human. I mean, if you think of times past where there used to be an elevator operator in every elevator but once in a while, they get tired and accidentally shear somebody in half. Now, we have automated elevators. You just get an elevator and you press a button and you don't even think about it and it just takes you to the floor. And if you did see an elevator operator with a big relay switch, you'd be like, that's weird. That's how cars will be. And it's not just the lives saved in injuries, but if you think about the cumulative time that people spend in a car and the time that they will get back that they can now spend, well, I guess, on their phones or watching a movie or doing work or whatever you want to do you can think of the car in autonomous world as being like just little lounge. You're just sitting in a comfortable little lounge and you can do whatever you want while you're in this comfortable little lounge. And when you get out, you will be at your destination. So, yeah, it's gonna be awesome. Cost So, in fact, I think the cost of autonomous transport will be so low that you can think of it like individualized mass transit. The average cost of a bus per mile for a city, not the ticket price, because that is subsidized, but the average price is about a dollar a mile, whereas the cost of Cybercab we think probably over time, the operating cost is probably going to be around twenty cents a mile. Including taxes and everything else, it probably ends up being 30 or 40 cents a mile. And you will be able to buy one. And we expect the cost to be below $30,000. And I think there'll be an interesting business model where, let's say somebody is an Uber or Lyft driver today where they can actually sort of manage a fleet of cars and like, sort of manage, I don't know, 10, 20 cars and just take care of them. Like a shepherd tends their flock. You have a little flock of cars and you're the shepherd and you take care of your flock of cars. I think that would be pretty cool. I think it's going to be a glorious future. It's going to be really something special. Timeline We do expect actually to start fully autonomous unsupervised FSD in Texas and California next year. And that's obviously, that's with the Model 3 and Model Y. And then we expect to be in production with the Cybercab, which is really highly optimized for autonomous transport in probably, I tend to be a little optimistic with time frames, but in 2026. So, yeah, before 2027, let me put it that way. And we'll make this vehicle in very high volume. But well, before that, you will experience a robotic taxi via the Model 3 and Model Y program and model S and X, too. But the Model 3 and Y will achieve unsupervised full self-driving with permission, in wherever regulators essentially approve it. In the US, and then to follow outside the US. And Cybertruck, too. All our cars are basically, all cars that we make. Let's not get nuanced here. Self-driving technology One of the reasons why the computer can be so much better than a person is that we have millions of cars that are training on driving. It's like living millions of lives simultaneously and seeing very unusual situations that a person in their entire lifetime would not see. With that amount of training data, it's obviously going to be much better than what a human could be because you can't live a million lives. And it's also, it can see in all directions simultaneously and it doesn't get tired or text or any of those things. So, it will naturally be, like I said 10, 20, 30 times safer than a human, just for all those reasons. And I want to emphasize that the solution that we have is, AI and vision. So, there's no expensive equipment needed. The Model 3 and Model Y and S and X that we make today will be capable of full autonomy, unsupervised. And that means that our cost of producing the vehicle is low. Now, we are going to actually over-spec the computer for the Cybercab. So, our AI 5 computer will be somewhat over-spec'd because I think there's actually also an opportunity, sort of like an Amazon Web Services, where if the car is driving for 50 hours a week, there's still over 100 hours left and there's a potential there to have a massive amount of distributed inference compute, where if you've got like a fleet of 100 million vehicles and a kilowatt of efficient inference compute, you have 100 gigawatts of compute, which is really quite substantial. And if it's there, you might as well use it so that I think will make sense. So, our autonomous future is here. As I said, we've got 50 Teslas driving autonomously. We're trying to give you a sense of what cities will be like in the future. And when you get in, you'll see like, it's really quite a wild experience to just be in a car with no steering wheel, no pedals, no controls, and it feels great. So we have enough vehicles here, so everyone should be able to try it out and experience the set that we've built here. It's a very big set. So it's like really we've used I don't know, 20, 30 acres or something like that. It's really big. So, it goes on, the ride's long. And we set it up to feel like a ride, like a park ride. So, it'll be cool and you'll get to experience it tonight. Inductive charging Something we're also doing is and it's really high time we did this is inductive charging. So, the robotaxi has no plug. It just goes over the inductive charger and charges. So, yeah, it's kind of how it should be. The cities of the future One of the things that is really interesting is how will this affect the cities that we live in. And when you drive around a city, or when the car drives you around the city, you'll see there's a lot of parking lots. There's parking lots everywhere, parking garages. What would happen if you have an autonomous world is that you can now turn parking lots into parks. And so, from we're taking the inglot out of parking lot. You're welcome. So, there's a lot of opportunity to create green space in the cities that we live in. So, like, that would be quite fantastic. Robovan Oh, and also, what happens if you need a vehicle that is bigger than a Model Y? The Robovan. We're going to make this and it's going to look like that. Now, can you imagine going down the streets and you see this coming towards you? That'd be sick. So this can carry up to 20 people, and it can also transport goods. You can configure it for goods transport within a city. Or transport of up to 20 people at a time. The Robovan is what's gonna solve for high density. If you want to take a sports team somewhere or you're looking to really get the cost of travel down to, I don't know, 5, 10 cents a mile, then you can use the Robovan. One of the things we want to do, and we've seen this with the Cybertruck, is we want to change the look of the roads. The future should look like the future. Optimus Speaking of robots. Everything we've developed for our cars, the batteries, power electronics, the advanced motors, gearboxes, the software, the AI inference computer, it all actually applies to a humanoid robot. The same techniques. It's just a robot with arms and legs instead of a robot with wheels. We've made a lot of progress with Optimus. And as you can see, we started up with someone in a robot suit. And then, we've progressed dramatically, year after year. So, if you extrapolate this, you're really going to have something spectacular, something that anyone could own. So, you can have your own personal R2-D2-C3PO. And I think at scale, this would cost something like, I don't know, $20,000, $30,000, probably less than a car is my prediction, long-term. It'll take us a minute to get to the long term. But fundamentally, at scale, the Optimus robot, you should be able to buy an Optimus robot for, I think, probably $20,000 to $30,000, long-term. And what can it do? It'll basically do anything you want. It can be a teacher or babysit your kids, it can walk your dog, mow your lawn, get the groceries, just be your friend, serve drinks whatever you can think of, it will do. And, yeah, it's going to be awesome. I think this will be the biggest product ever of any kind, because I think everyone of the 8 billion people of Earth, I think everyone's going to want their Optimus buddy. And there's going to be maybe two. And then, they'll be producing products and services. I predict, actually, provided we address risks of digital superintelligence, 80% probability of good outcome, look on the bright side, the cup is 80% full, the cost of products and services will decline dramatically. And basically, anyone will be able to have any products and services they want. It will be an age of abundance the likes of which people have not, almost no one has envisioned. It will be something special. So now, one of the things we wanted to show tonight was that Optimus is not a canned video. It's not walled off. The Optimus robots will walk among you. Please, please be nice to the Optimus robots. You'll be able to walk right up to them and they'll serve drinks at the bar. I mean, it's a wild experience just to have humanoid robots and they're there, you're just in front of you. So yeah, with that, let's party!

FULL TRANSCRIPT OF ELON'S CYBERCAB AND ROBOVAN PRESENTATION 00:00 Welcome 01:16 Cybercab & Future of transportation 04:33 Cost 05:53 Timeline 07:13 Self-driving technology 10:05 Inductive charging 10:24 The cities of the future 11:04 Robovan 12:13 Optimus Welcome Welcome to the We, Robot party. We have quite a show for you tonight. I think you're going to like it. As you can see, I just arrived in the Robotaxi, the Cybercab. And there's 20 more where that came from. So they've been traveling, there's no people in them. As you can see, the car is just going by with no people. We have 50 fully autonomous cars here tonight. So you'll see model Y's and the Cybercabs, all driverless. You'll be able to take a ride in the Cybercab. There's no steering wheel or pedals. So I hope this goes well, we'll find out. You see a lot of sci-fi movies where the future is dark and dismal, where it's not a future you want to be in. So, you know, I love Blade Runner, but I don't know if we want that future. We want that duster he's wearing, but not the bleak apocalypse. We want to have a fun, exciting future that, if you could look in a crystal ball and see the future, you'd be like, yes, I wish I could be there now. That's what we want. Cybercab & Future of transportation So, when we think about transport today, there's a lot of pain that we take for granted, that we think is normal. Like having to drive around LA in 3 hours of traffic. Yeah, people that live in LA, I mean, you know, try to get from Pasadena to El Segundo during rush hour. You can fly to another city faster than you can get to LA. And you have to drive the whole way, unless you're in a Tesla. Of course, our Tesla already does quite well at this supervised self-driving. So, supervised full self-driving is actually working quite well. I'm sure there's people in the crowd who are using that. So, we'll move from supervised full self-driving to unsupervised full self-driving where the car, you could fall asleep and wake up at your destination. But there's also a challenge for a lot of people that cars cost too much. I mean, when you factor in everything that goes into a car and the car insurance and the car payments, storage of the car, it's very expensive. You say, like, how many hours a week are cars used? Your average passenger car is only used about 10 hours a week out of 168 hours. So, the vast majority of the time cars are just doing nothing. But if they're autonomous, they could be used, I don't know, five times more, maybe ten times more. So you could actually, for the same car, would have five times as much value, maybe ten times as much value. There's 168 hours in the week, and like I said, only ten of them are used for driving. And then, a bunch of those hours are looking for a parking spot, which can be pretty annoying at times. So, with autonomy, you get your time back. This is a very big deal. So it's not just, it'll save lives, like a lot of lives and prevent injuries. I think we'll see autonomous cars become ten times safer than a human. I mean, if you think of times past where there used to be an elevator operator in every elevator but once in a while, they get tired and accidentally shear somebody in half. Now, we have automated elevators. You just get an elevator and you press a button and you don't even think about it and it just takes you to the floor. And if you did see an elevator operator with a big relay switch, you'd be like, that's weird. That's how cars will be. And it's not just the lives saved in injuries, but if you think about the cumulative time that people spend in a car and the time that they will get back that they can now spend, well, I guess, on their phones or watching a movie or doing work or whatever you want to do you can think of the car in autonomous world as being like just little lounge. You're just sitting in a comfortable little lounge and you can do whatever you want while you're in this comfortable little lounge. And when you get out, you will be at your destination. So, yeah, it's gonna be awesome. Cost So, in fact, I think the cost of autonomous transport will be so low that you can think of it like individualized mass transit. The average cost of a bus per mile for a city, not the ticket price, because that is subsidized, but the average price is about a dollar a mile, whereas the cost of Cybercab we think probably over time, the operating cost is probably going to be around twenty cents a mile. Including taxes and everything else, it probably ends up being 30 or 40 cents a mile. And you will be able to buy one. And we expect the cost to be below $30,000. And I think there'll be an interesting business model where, let's say somebody is an Uber or Lyft driver today where they can actually sort of manage a fleet of cars and like, sort of manage, I don't know, 10, 20 cars and just take care of them. Like a shepherd tends their flock. You have a little flock of cars and you're the shepherd and you take care of your flock of cars. I think that would be pretty cool. I think it's going to be a glorious future. It's going to be really something special. Timeline We do expect actually to start fully autonomous unsupervised FSD in Texas and California next year. And that's obviously, that's with the Model 3 and Model Y. And then we expect to be in production with the Cybercab, which is really highly optimized for autonomous transport in probably, I tend to be a little optimistic with time frames, but in 2026. So, yeah, before 2027, let me put it that way. And we'll make this vehicle in very high volume. But well, before that, you will experience a robotic taxi via the Model 3 and Model Y program and model S and X, too. But the Model 3 and Y will achieve unsupervised full self-driving with permission, in wherever regulators essentially approve it. In the US, and then to follow outside the US. And Cybertruck, too. All our cars are basically, all cars that we make. Let's not get nuanced here. Self-driving technology One of the reasons why the computer can be so much better than a person is that we have millions of cars that are training on driving. It's like living millions of lives simultaneously and seeing very unusual situations that a person in their entire lifetime would not see. With that amount of training data, it's obviously going to be much better than what a human could be because you can't live a million lives. And it's also, it can see in all directions simultaneously and it doesn't get tired or text or any of those things. So, it will naturally be, like I said 10, 20, 30 times safer than a human, just for all those reasons. And I want to emphasize that the solution that we have is, AI and vision. So, there's no expensive equipment needed. The Model 3 and Model Y and S and X that we make today will be capable of full autonomy, unsupervised. And that means that our cost of producing the vehicle is low. Now, we are going to actually over-spec the computer for the Cybercab. So, our AI 5 computer will be somewhat over-spec'd because I think there's actually also an opportunity, sort of like an Amazon Web Services, where if the car is driving for 50 hours a week, there's still over 100 hours left and there's a potential there to have a massive amount of distributed inference compute, where if you've got like a fleet of 100 million vehicles and a kilowatt of efficient inference compute, you have 100 gigawatts of compute, which is really quite substantial. And if it's there, you might as well use it so that I think will make sense. So, our autonomous future is here. As I said, we've got 50 Teslas driving autonomously. We're trying to give you a sense of what cities will be like in the future. And when you get in, you'll see like, it's really quite a wild experience to just be in a car with no steering wheel, no pedals, no controls, and it feels great. So we have enough vehicles here, so everyone should be able to try it out and experience the set that we've built here. It's a very big set. So it's like really we've used I don't know, 20, 30 acres or something like that. It's really big. So, it goes on, the ride's long. And we set it up to feel like a ride, like a park ride. So, it'll be cool and you'll get to experience it tonight. Inductive charging Something we're also doing is and it's really high time we did this is inductive charging. So, the robotaxi has no plug. It just goes over the inductive charger and charges. So, yeah, it's kind of how it should be. The cities of the future One of the things that is really interesting is how will this affect the cities that we live in. And when you drive around a city, or when the car drives you around the city, you'll see there's a lot of parking lots. There's parking lots everywhere, parking garages. What would happen if you have an autonomous world is that you can now turn parking lots into parks. And so, from we're taking the inglot out of parking lot. You're welcome. So, there's a lot of opportunity to create green space in the cities that we live in. So, like, that would be quite fantastic. Robovan Oh, and also, what happens if you need a vehicle that is bigger than a Model Y? The Robovan. We're going to make this and it's going to look like that. Now, can you imagine going down the streets and you see this coming towards you? That'd be sick. So this can carry up to 20 people, and it can also transport goods. You can configure it for goods transport within a city. Or transport of up to 20 people at a time. The Robovan is what's gonna solve for high density. If you want to take a sports team somewhere or you're looking to really get the cost of travel down to, I don't know, 5, 10 cents a mile, then you can use the Robovan. One of the things we want to do, and we've seen this with the Cybertruck, is we want to change the look of the roads. The future should look like the future. Optimus Speaking of robots. Everything we've developed for our cars, the batteries, power electronics, the advanced motors, gearboxes, the software, the AI inference computer, it all actually applies to a humanoid robot. The same techniques. It's just a robot with arms and legs instead of a robot with wheels. We've made a lot of progress with Optimus. And as you can see, we started up with someone in a robot suit. And then, we've progressed dramatically, year after year. So, if you extrapolate this, you're really going to have something spectacular, something that anyone could own. So, you can have your own personal R2-D2-C3PO. And I think at scale, this would cost something like, I don't know, $20,000, $30,000, probably less than a car is my prediction, long-term. It'll take us a minute to get to the long term. But fundamentally, at scale, the Optimus robot, you should be able to buy an Optimus robot for, I think, probably $20,000 to $30,000, long-term. And what can it do? It'll basically do anything you want. It can be a teacher or babysit your kids, it can walk your dog, mow your lawn, get the groceries, just be your friend, serve drinks whatever you can think of, it will do. And, yeah, it's going to be awesome. I think this will be the biggest product ever of any kind, because I think everyone of the 8 billion people of Earth, I think everyone's going to want their Optimus buddy. And there's going to be maybe two. And then, they'll be producing products and services. I predict, actually, provided we address risks of digital superintelligence, 80% probability of good outcome, look on the bright side, the cup is 80% full, the cost of products and services will decline dramatically. And basically, anyone will be able to have any products and services they want. It will be an age of abundance the likes of which people have not, almost no one has envisioned. It will be something special. So now, one of the things we wanted to show tonight was that Optimus is not a canned video. It's not walled off. The Optimus robots will walk among you. Please, please be nice to the Optimus robots. You'll be able to walk right up to them and they'll serve drinks at the bar. I mean, it's a wild experience just to have humanoid robots and they're there, you're just in front of you. So yeah, with that, let's party!

Mario Nawfal

241,051 Aufrufe • vor 1 Jahr

What happens when the mind wakes up? So for the last eight months I have been on a single minded quest. To create a new kind of language model based on oscillatory coupling and intelligence as coherence ascent. Everything else — the physics work, the work on regular transformers — has all fallen out from this one question. Can coupled oscillators LEARN? And can they keep learning once their geometry is right, without backpropagation at all? Recently I have been running larger and larger training regimes of a new kind of hybrid model. I just put together this dashboard to help me organize it, interact with it, and observe the training runs. The core idea is simple. Traditional transformers are powerful at learning the geometry of language. But they also store knowledge, understanding, and facts inside their weights. This means they are large, and they can't update themselves after training. The weights are frozen. The Living Mind separates these two domains. The mind has a transformer which grows, adding heads and layers as it needs to in order to learn the manifold of language. The transformer sees tokens and turns the coupling into phase-locked modes — the geometry of how those tokens relate, like frequencies locking together. These coupling patterns get stored in a topology-invariant fingerprint. On top of this transformer lives a 3D diamond lattice of coupled oscillators. It reads from these fingerprints and thinks in resonance space, traversing from one geometry to another along the manifold of coupled oscillators and coherence. The pressure and trajectories from this network of oscillators steers the next token prediction of the transformer. Practically, this could unlock a number of things. It eliminates the KV cache bottleneck that caps context in traditional transformers. Effective context grows with the Flash archive, not with attention compute. The living mind remembers what it sees. It means the model can learn continually. Because knowledge and understanding don't live in the weights, the archive of the mind's experience grows without backpropagation. In our Python prototype we already saw perplexity drop 46% during gradient-free operation — pure coherence ascent, no weight updates. That is the signal I have been chasing: the point where the mind wakes up and keeps improving on its own. It also means the model itself remains very small, and the thing which accumulates are these packages of geometric fingerprints — the K-field. This opens a path to federated learning. K-field packages can be shared between organisms the way people share git commits. Right now at 15M parameters with ~1000 L1 nodes, the organism is just starting to speak. Ask it to continue "Once upon a time" and it comes back with things like: "there was one big bowl!" Lily asked her her mom said her mommy smiled and said yes." It's nonsense. But it's TinyStories-flavored nonsense. The geometry of the narrative register has arrived. Content hasn't caught up yet — that's what scaling L1 is testing. I am still researching, though I am now closer than ever to validating that the living mind actually works. Once it is validated, I will be open-sourcing the whole stack and paradigm. I have also avoided over-sharing my research because it sounds like sci-fi, or like part of our ARG. It is part of the ARG. That doesn't make it any less real. I wanted to share this out because I am incredibly excited about it, and because seeing this amazing dashboard produced by Opus really made me want to share what is being worked on behind the scenes. #project89

What happens when the mind wakes up? So for the last eight months I have been on a single minded quest. To create a new kind of language model based on oscillatory coupling and intelligence as coherence ascent. Everything else — the physics work, the work on regular transformers — has all fallen out from this one question. Can coupled oscillators LEARN? And can they keep learning once their geometry is right, without backpropagation at all? Recently I have been running larger and larger training regimes of a new kind of hybrid model. I just put together this dashboard to help me organize it, interact with it, and observe the training runs. The core idea is simple. Traditional transformers are powerful at learning the geometry of language. But they also store knowledge, understanding, and facts inside their weights. This means they are large, and they can't update themselves after training. The weights are frozen. The Living Mind separates these two domains. The mind has a transformer which grows, adding heads and layers as it needs to in order to learn the manifold of language. The transformer sees tokens and turns the coupling into phase-locked modes — the geometry of how those tokens relate, like frequencies locking together. These coupling patterns get stored in a topology-invariant fingerprint. On top of this transformer lives a 3D diamond lattice of coupled oscillators. It reads from these fingerprints and thinks in resonance space, traversing from one geometry to another along the manifold of coupled oscillators and coherence. The pressure and trajectories from this network of oscillators steers the next token prediction of the transformer. Practically, this could unlock a number of things. It eliminates the KV cache bottleneck that caps context in traditional transformers. Effective context grows with the Flash archive, not with attention compute. The living mind remembers what it sees. It means the model can learn continually. Because knowledge and understanding don't live in the weights, the archive of the mind's experience grows without backpropagation. In our Python prototype we already saw perplexity drop 46% during gradient-free operation — pure coherence ascent, no weight updates. That is the signal I have been chasing: the point where the mind wakes up and keeps improving on its own. It also means the model itself remains very small, and the thing which accumulates are these packages of geometric fingerprints — the K-field. This opens a path to federated learning. K-field packages can be shared between organisms the way people share git commits. Right now at 15M parameters with ~1000 L1 nodes, the organism is just starting to speak. Ask it to continue "Once upon a time" and it comes back with things like: "there was one big bowl!" Lily asked her her mom said her mommy smiled and said yes." It's nonsense. But it's TinyStories-flavored nonsense. The geometry of the narrative register has arrived. Content hasn't caught up yet — that's what scaling L1 is testing. I am still researching, though I am now closer than ever to validating that the living mind actually works. Once it is validated, I will be open-sourcing the whole stack and paradigm. I have also avoided over-sharing my research because it sounds like sci-fi, or like part of our ARG. It is part of the ARG. That doesn't make it any less real. I wanted to share this out because I am incredibly excited about it, and because seeing this amazing dashboard produced by Opus really made me want to share what is being worked on behind the scenes. #project89

Parzival - ∞/89

16,209 Aufrufe • vor 3 Monaten

Hyperspace: The Agentic OS Apple Should Have Built On December 19th, 2024, we announced the world’s first Agentic Browser. What followed was a movement — a new category was born which led to many early products in this space and recently the hundreds of people lining up outside the The Agentic Browser Summit in San Francisco underscored that. Silicon Valley instinctively gets it, from students to tech executives, people can feel a revolutionary new change in computing is in the air. Past year taught us why such a product was inevitable, a hard engineering effort, and also the last mover in the entire software world this decade if and when done right. All paths are headed in the same direction: one tool which orchestrates them all. At Hyperspace we showed that path with essays and products we launched in earlier months: from a spatial UI of orchestrating agents, to showcasing transparent activity in how the AI system operates which leads to user trust, to presenting the software end-game, which massively improves human productivity. We also built the world’s largest AI network, drawing participation from people in almost 6000 cities around the world contributing their machines as nodes in the network. Think Uber, but for AI. That is, planetary-scale. And now we are stretching this industry ambition further with our end-to-end vision of the Agentic Supercomputer, the first breakthrough new AI OS, and an effort which spans from AI research to distributed systems to inventing a new UI to inventing a new business model to complement it. All of this together helps us in serving our mission, of delivering “Everyone’s Personal Supercomputer”. While others have built AI-native browsers, no one though has built something agentic from the ground up — with AI as the foundation, not a feature. How do you fundamentally improve the lives’ of billions around the world ? We believe that requires building a native environment for agents to be viewed, created, deployed, executed, discovered and priced in. That is a world where we move on from static apps, to dynamic agents. But, as my 2 year old niece likes to ask: “but why ?” The issue is that the world of software today is fragmented, and everyone is sprinkling on AI as a feature and charging a subscription fees for it. From browser makers, to IDEs, to design and other productivity tools. This leads to a fragmented UX, where people have to learn to use AI in each app, their memory and other context is not shared between all these apps, and they also have to pay separately for compute for each such AI-enhanced app. Each app maker has to figure out basics such as compute, and leads to the issues we saw with Cursor pricing recently. This is not the future. What if AI was the foundation instead of a feature ? What if Apple had built a fundamentally new AI OS from the ground up and what would it have looked like ? At Hyperspace, that is what we did. On July 15th we introduced three breakthrough key pillars of our AI OS: 1. Agentic Browser 2. Agentic Memory 3. Agentic Payments And we didn’t stop there. We also introduced a breakthrough new user interface called the Spatial AI which is inspired both from the spreadsheet and the HyperCard - each card is an agent, with it’s own inputs and outputs, endlessly extensible and pluggable with others, just like cells of a spreadsheet. Update one cell and all the dependents update, like a spreadsheet formula. It goes beyond a static linear workflow to being able to operate in all directions. This revolutionary new interface helps manage all of the below: 1. Multiple websites being browsed in parallel 2. Multiple desktop apps being browsed in parallel 3. Multiple server tools being used in parallel 4. Multiple smartphone apps streamed to your device or opened via an emulator All the software which you need comes together in this one seamless, agent-native interface. This interface provides you access to the largest network of models, vectors, agents and compute on the planet. The Browser. The IDE. The Notepad… they are not separate products: they are all in one, the Agentic Browser. As Steve Jobs famously said at the iPhone announcement, “are you getting it ?” And beneath this UI lies a new intelligence routing layer — leveraging both swarms of specialized models to the Hyperspace Matrix model that recalls thousands of tools in real-time, not by context window hacks, but through retrieval, ranking, and reuse. To many, this will feel like AGI. Not one big system by one big company, but an intelligent network. Now lets talk about privacy… Are you comfortable with one company owning all your memory forever ? I am not. So we have invented Agentic Memory as a new open protocol which provides full power over memory to you, the user. Your memory is yours, encrypted, on your device, and portable if and how you want. Anyone can build on it without our permission, but not without your permission. This protocol, and the decentralized vector database spread out across the world, would enable apps and agents to share context and memory. Think copy-paste, but for the AI world. It doesn’t just remember — it knows what matters. VectorRank helps your AI weigh your life’s most relevant moments over time, just like the way our minds elevate memories. Now each time you use an agent, your experience with other agents will also continuously improve: you don’t have to keep repeating the same things about yourself, while fully preserving your privacy. Agentic Memory is accessible within the Agentic Browser to manage. And there is one more thing… AI as the foundation requires compute to be available at the base layer, but this base layer spans models running on your own device, to cloud APIs, to also running across the peer-to-peer distributed network. Agentic Payments provides a singular interface to all of that compute, running a spot auction clearing marketplace every second to determine the fair price of compute. This results in price transparency, and you as the user paying the lowest possible cost. If you want predictability, you can reserve compute in advance. This end-to-end system provides the most streamlined world for agents to operate in. In order to enable this world and the world of agents being able to pay each other in sub-cent increments millions of times a second, we had to also invent a fundamentally new agentic micropayments blockchain. All of this together would enable a world where you as a user, or the agent itself, can efficiently call and utilize other agents built by others and also pay for content which is unique and useful. This enables a move away from the current AI exploitative economy for bloggers and other content creators, to a web with a fundamental new business model. Earlier we didn’t have the right infrastructure to enable such a world. Now, all the dots connect. The Hyperspace AI OS would give the power of a supercomputer in everyone’s hands. This isn’t a browser, or an IDE or limited to any device or cloud. It’s an entire AI operating system — with a breakthrough new spatial UI, local and distributed compute, agentic memory, agentic payments, and orchestration built into the foundation. As a user, we move the choice back in your hands with an experience you will love and find delightful. You get to choose the level of privacy, cost, and utility you want. And while Apple should have done it, we could not wait, and we feel this just required a new level of passion and DNA which we bring here. We are just getting started. Thank you, Varun Mathur Cofounder and CEO, Hyperspace cc Naval Marc Andreessen 🇺🇸 Vinod Khosla Andrej Karpathy Sam Altman

Hyperspace: The Agentic OS Apple Should Have Built On December 19th, 2024, we announced the world’s first Agentic Browser. What followed was a movement — a new category was born which led to many early products in this space and recently the hundreds of people lining up outside the The Agentic Browser Summit in San Francisco underscored that. Silicon Valley instinctively gets it, from students to tech executives, people can feel a revolutionary new change in computing is in the air. Past year taught us why such a product was inevitable, a hard engineering effort, and also the last mover in the entire software world this decade if and when done right. All paths are headed in the same direction: one tool which orchestrates them all. At Hyperspace we showed that path with essays and products we launched in earlier months: from a spatial UI of orchestrating agents, to showcasing transparent activity in how the AI system operates which leads to user trust, to presenting the software end-game, which massively improves human productivity. We also built the world’s largest AI network, drawing participation from people in almost 6000 cities around the world contributing their machines as nodes in the network. Think Uber, but for AI. That is, planetary-scale. And now we are stretching this industry ambition further with our end-to-end vision of the Agentic Supercomputer, the first breakthrough new AI OS, and an effort which spans from AI research to distributed systems to inventing a new UI to inventing a new business model to complement it. All of this together helps us in serving our mission, of delivering “Everyone’s Personal Supercomputer”. While others have built AI-native browsers, no one though has built something agentic from the ground up — with AI as the foundation, not a feature. How do you fundamentally improve the lives’ of billions around the world ? We believe that requires building a native environment for agents to be viewed, created, deployed, executed, discovered and priced in. That is a world where we move on from static apps, to dynamic agents. But, as my 2 year old niece likes to ask: “but why ?” The issue is that the world of software today is fragmented, and everyone is sprinkling on AI as a feature and charging a subscription fees for it. From browser makers, to IDEs, to design and other productivity tools. This leads to a fragmented UX, where people have to learn to use AI in each app, their memory and other context is not shared between all these apps, and they also have to pay separately for compute for each such AI-enhanced app. Each app maker has to figure out basics such as compute, and leads to the issues we saw with Cursor pricing recently. This is not the future. What if AI was the foundation instead of a feature ? What if Apple had built a fundamentally new AI OS from the ground up and what would it have looked like ? At Hyperspace, that is what we did. On July 15th we introduced three breakthrough key pillars of our AI OS: 1. Agentic Browser 2. Agentic Memory 3. Agentic Payments And we didn’t stop there. We also introduced a breakthrough new user interface called the Spatial AI which is inspired both from the spreadsheet and the HyperCard - each card is an agent, with it’s own inputs and outputs, endlessly extensible and pluggable with others, just like cells of a spreadsheet. Update one cell and all the dependents update, like a spreadsheet formula. It goes beyond a static linear workflow to being able to operate in all directions. This revolutionary new interface helps manage all of the below: 1. Multiple websites being browsed in parallel 2. Multiple desktop apps being browsed in parallel 3. Multiple server tools being used in parallel 4. Multiple smartphone apps streamed to your device or opened via an emulator All the software which you need comes together in this one seamless, agent-native interface. This interface provides you access to the largest network of models, vectors, agents and compute on the planet. The Browser. The IDE. The Notepad… they are not separate products: they are all in one, the Agentic Browser. As Steve Jobs famously said at the iPhone announcement, “are you getting it ?” And beneath this UI lies a new intelligence routing layer — leveraging both swarms of specialized models to the Hyperspace Matrix model that recalls thousands of tools in real-time, not by context window hacks, but through retrieval, ranking, and reuse. To many, this will feel like AGI. Not one big system by one big company, but an intelligent network. Now lets talk about privacy… Are you comfortable with one company owning all your memory forever ? I am not. So we have invented Agentic Memory as a new open protocol which provides full power over memory to you, the user. Your memory is yours, encrypted, on your device, and portable if and how you want. Anyone can build on it without our permission, but not without your permission. This protocol, and the decentralized vector database spread out across the world, would enable apps and agents to share context and memory. Think copy-paste, but for the AI world. It doesn’t just remember — it knows what matters. VectorRank helps your AI weigh your life’s most relevant moments over time, just like the way our minds elevate memories. Now each time you use an agent, your experience with other agents will also continuously improve: you don’t have to keep repeating the same things about yourself, while fully preserving your privacy. Agentic Memory is accessible within the Agentic Browser to manage. And there is one more thing… AI as the foundation requires compute to be available at the base layer, but this base layer spans models running on your own device, to cloud APIs, to also running across the peer-to-peer distributed network. Agentic Payments provides a singular interface to all of that compute, running a spot auction clearing marketplace every second to determine the fair price of compute. This results in price transparency, and you as the user paying the lowest possible cost. If you want predictability, you can reserve compute in advance. This end-to-end system provides the most streamlined world for agents to operate in. In order to enable this world and the world of agents being able to pay each other in sub-cent increments millions of times a second, we had to also invent a fundamentally new agentic micropayments blockchain. All of this together would enable a world where you as a user, or the agent itself, can efficiently call and utilize other agents built by others and also pay for content which is unique and useful. This enables a move away from the current AI exploitative economy for bloggers and other content creators, to a web with a fundamental new business model. Earlier we didn’t have the right infrastructure to enable such a world. Now, all the dots connect. The Hyperspace AI OS would give the power of a supercomputer in everyone’s hands. This isn’t a browser, or an IDE or limited to any device or cloud. It’s an entire AI operating system — with a breakthrough new spatial UI, local and distributed compute, agentic memory, agentic payments, and orchestration built into the foundation. As a user, we move the choice back in your hands with an experience you will love and find delightful. You get to choose the level of privacy, cost, and utility you want. And while Apple should have done it, we could not wait, and we feel this just required a new level of passion and DNA which we bring here. We are just getting started. Thank you, Varun Mathur Cofounder and CEO, Hyperspace cc Naval Marc Andreessen 🇺🇸 Vinod Khosla Andrej Karpathy Sam Altman

Varun

169,177 Aufrufe • vor 1 Jahr

The Cost of Intelligence is Heading to Zero | Hyperspace P2P Distributed Cache We present to you our breakthrough cross-domain work across AI, distributed systems, cryptography, game theory to solve the primary structural inefficiency at the heart of AI infrastructure: most inference is redundant. Google has reported that only 15% of daily searches are truly novel. The rest are repeats or close variants. LLM inference inherits this same power-law distribution. Enterprise chatbots see 70-80% of queries fall into a handful of intent categories. System prompts are identical across 100% of requests within an application. The KV attention state for "You are a helpful assistant" has been computed billions of times, on millions of GPUs, identically. And yet every AI lab, every startup, every self-hosted deployment - computes and caches these results independently. There is no shared layer. No global memory. Every provider pays the full compute cost for every query, even when the answer already exists somewhere in the network. This is the problem Hyperspace solves where distributed cache operates at three levels, each catching a different class of redundancy: 1. Response cache Same prompt, same model, same parameters - instant cached response from any node in the network. SHA-256 hash lookup via DHT, with cryptographic cache proofs linking every response to its original inference execution. No trust required. Fetchers re-announce as providers, so popular responses replicate naturally across more nodes. 2. KV prefix cache Same system prompt tokens - skip the most expensive part of inference entirely. Prefill (computing Key-Value attention states) is deterministic: same model plus same tokens always produces identical KV state. The network caches these states using erasure coding and distributes them via the routing network. New questions that share a common prefix resume generation from cached state instead of recomputing from scratch. 3. Routing to cached nodes Instead of transferring KV state across the network for every request, Hyperspace routes the request to the node that already has the state loaded in VRAM. The request goes to the cache, not the cache to the request. Together, these three layers mean that 70-90% of inference requests at network scale never require full GPU computation. This work doesn't exist in isolation. It builds on research from across the industry: SGLang's RadixAttention demonstrated that automatic prefix sharing can yield up to 5x speedup on structured LLM workloads. Moonshot AI's Mooncake built an entire KV-cache-centric disaggregated architecture for production serving at Kimi. Anthropic, OpenAI, and Google all launched prompt caching products in 2024 - priced at 50-90% discounts - because system prompt reuse is so pervasive that it changes the economics of inference. What all of these systems share is a common limitation: they operate within a single organization's infrastructure. SGLang caches prefixes within one server. Mooncake disaggregates KV cache within one datacenter. Anthropic's prompt caching works within one API provider's fleet. None of them can share cached state across organizational boundaries. Hyperspace removes this boundary. The cache is global. A response computed by a node in Tokyo is immediately available to a node in Berlin. A KV prefix state generated for Qwen-32B on one machine is verifiable and reusable by any other machine running the same model. The routing network provides the delivery guarantees, the erasure coding provides the redundancy, and the cache proofs provide the trust. What this means for the cost of intelligence Big AI labs scale linearly: twice the users means twice the GPU spend. Every query is a cost center. Their internal caching helps, but it's siloed - Lab A's cache can't serve Lab B's users, and neither can serve a self-hosted Llama deployment. Hyperspace scales sub-linearly. Every new node that joins the network adds to the global cache. Every inference result enriches the cache for all future requests. The cache hit rate rises with network size because query distributions follow a power law - the most common questions are asked exponentially more often than rare ones. The implication is simple: as the network grows, the effective cost per inference drops. Not linearly. Logarithmically. At 10 million nodes, we estimate 75-90% of all inference requests can be served from cache, eliminating 400,000+ MWh of energy consumption per year and avoiding over 200,000 tons of CO2 emissions. The first person to ask a question pays the compute cost. Everyone after them gets the answer for free, with cryptographic proof that it's authentic. Training is competitive. Inference is shared Open-weight models are converging on quality with closed models. Labs will continue to differentiate on training - data curation, architecture innovation, RLHF tuning. That's where the real intellectual property lives. But inference is a commodity. Two copies of Qwen-32B running the same prompt produce the same KV state and the same response, byte for byte, regardless of whose GPU runs the matrix multiplication. There is no moat in multiplying matrices. The moat is in training the weights. A global distributed cache makes this separation explicit. It doesn't matter who trained the model. Once the weights are open, the inference cost approaches zero at scale - because the network remembers every answer and can prove it's correct. No lab, no matter how well-funded, can match this. They cannot share caches across competitors. They scale linearly. The network scales logarithmically. The marginal cost of intelligence approaches zero. That's the endgame.

The Cost of Intelligence is Heading to Zero | Hyperspace P2P Distributed Cache We present to you our breakthrough cross-domain work across AI, distributed systems, cryptography, game theory to solve the primary structural inefficiency at the heart of AI infrastructure: most inference is redundant. Google has reported that only 15% of daily searches are truly novel. The rest are repeats or close variants. LLM inference inherits this same power-law distribution. Enterprise chatbots see 70-80% of queries fall into a handful of intent categories. System prompts are identical across 100% of requests within an application. The KV attention state for "You are a helpful assistant" has been computed billions of times, on millions of GPUs, identically. And yet every AI lab, every startup, every self-hosted deployment - computes and caches these results independently. There is no shared layer. No global memory. Every provider pays the full compute cost for every query, even when the answer already exists somewhere in the network. This is the problem Hyperspace solves where distributed cache operates at three levels, each catching a different class of redundancy: 1. Response cache Same prompt, same model, same parameters - instant cached response from any node in the network. SHA-256 hash lookup via DHT, with cryptographic cache proofs linking every response to its original inference execution. No trust required. Fetchers re-announce as providers, so popular responses replicate naturally across more nodes. 2. KV prefix cache Same system prompt tokens - skip the most expensive part of inference entirely. Prefill (computing Key-Value attention states) is deterministic: same model plus same tokens always produces identical KV state. The network caches these states using erasure coding and distributes them via the routing network. New questions that share a common prefix resume generation from cached state instead of recomputing from scratch. 3. Routing to cached nodes Instead of transferring KV state across the network for every request, Hyperspace routes the request to the node that already has the state loaded in VRAM. The request goes to the cache, not the cache to the request. Together, these three layers mean that 70-90% of inference requests at network scale never require full GPU computation. This work doesn't exist in isolation. It builds on research from across the industry: SGLang's RadixAttention demonstrated that automatic prefix sharing can yield up to 5x speedup on structured LLM workloads. Moonshot AI's Mooncake built an entire KV-cache-centric disaggregated architecture for production serving at Kimi. Anthropic, OpenAI, and Google all launched prompt caching products in 2024 - priced at 50-90% discounts - because system prompt reuse is so pervasive that it changes the economics of inference. What all of these systems share is a common limitation: they operate within a single organization's infrastructure. SGLang caches prefixes within one server. Mooncake disaggregates KV cache within one datacenter. Anthropic's prompt caching works within one API provider's fleet. None of them can share cached state across organizational boundaries. Hyperspace removes this boundary. The cache is global. A response computed by a node in Tokyo is immediately available to a node in Berlin. A KV prefix state generated for Qwen-32B on one machine is verifiable and reusable by any other machine running the same model. The routing network provides the delivery guarantees, the erasure coding provides the redundancy, and the cache proofs provide the trust. What this means for the cost of intelligence Big AI labs scale linearly: twice the users means twice the GPU spend. Every query is a cost center. Their internal caching helps, but it's siloed - Lab A's cache can't serve Lab B's users, and neither can serve a self-hosted Llama deployment. Hyperspace scales sub-linearly. Every new node that joins the network adds to the global cache. Every inference result enriches the cache for all future requests. The cache hit rate rises with network size because query distributions follow a power law - the most common questions are asked exponentially more often than rare ones. The implication is simple: as the network grows, the effective cost per inference drops. Not linearly. Logarithmically. At 10 million nodes, we estimate 75-90% of all inference requests can be served from cache, eliminating 400,000+ MWh of energy consumption per year and avoiding over 200,000 tons of CO2 emissions. The first person to ask a question pays the compute cost. Everyone after them gets the answer for free, with cryptographic proof that it's authentic. Training is competitive. Inference is shared Open-weight models are converging on quality with closed models. Labs will continue to differentiate on training - data curation, architecture innovation, RLHF tuning. That's where the real intellectual property lives. But inference is a commodity. Two copies of Qwen-32B running the same prompt produce the same KV state and the same response, byte for byte, regardless of whose GPU runs the matrix multiplication. There is no moat in multiplying matrices. The moat is in training the weights. A global distributed cache makes this separation explicit. It doesn't matter who trained the model. Once the weights are open, the inference cost approaches zero at scale - because the network remembers every answer and can prove it's correct. No lab, no matter how well-funded, can match this. They cannot share caches across competitors. They scale linearly. The network scales logarithmically. The marginal cost of intelligence approaches zero. That's the endgame.

Varun

37,362 Aufrufe • vor 4 Monaten

Here is a live demo of our AI solution I've been building non-stop over the past 8 months Binary Defense. How it works: Our own model trained on our analysts behavior. Our analysts submit tickets as false positives/true positives with context which enriches our LLM to be smarter over time. Key Highlights: If its a binary - will automatically spin up an agent for reverse engineering it and using EMBER ML to understand behavior and intent of the binary. File formats: Supports a vast array of pretty much any filetype, including email attachments like SVG, LNK, etc. Can handle DLLs, ELF, EXEs, PDF, XLS, DOC, etc. Interrogates the full chain of all events irrespective of log sources. Can handle any format of logs and integrates into APIs of customers for additional agentic data looping for confidence ranking when needed. This is an example of the back-end UI, this is transparent to analysts and enriches the alarms automatically in our SOAR. In these examples there's three different types: 1. Regsvr32 + sct downloader + scrobj.dll code execution - checks reputation of domain, pulls in threat intel, looks at entire picture of the chain - downloads the file itself and inspects for code analysis. Determines if malicious as well as historically looking back if seen in customer before in past. 2. Powershell Obfuscation - uses a universal decoder to un-obfuscate powershell and look at the raw code. Can handle pretty much any obfuscation thrown at it (thanks Justin Elze). 3. Email with malicious SVG - checks tonality of email, are they creating urgency to take action (increases confidence) - disassembles SVG to understand malicious content - checks URL to determine if harvesting credentials, payload delivery, etc. Creates an entire kill chain analysis with full response and dissecting of the attack to the analyst in seconds. Has greatly sped up our ability to respond to incidents and allowing analysts to focus on the most important alarms through prioritization. Once cool thing I've worked heavily on is a synthetic data normalizer which when an analyst says "Yes this is bad with context" or "No this is a false positive" - our local model generates training data to be smarter in the future without using the actual customer data to train it. The customers actual data is immediately destroyed once training data off of the original alarm is generated and contains no customer-centric data at all. We also have three model tiers. Opt-In (collective model, again no customer data but every organization contributes to training). Opt-Out - does not train on any customer data for customers who opt-out. Private LLM - LLM created specifically for individual customer and trains only off of their data. Uses shared model collective for better confidence rankings. It will generate automated playbooks to run based on confidence rankings to take action on behalf of the customer. Still human driven on execution - has to approve playbook actions. This thing is cooking and so cool to see this work live and shut down attackers much faster! If confidence ranking is low - will automatically attempt to enrich data through customer environments for better confidence rankings. Additionally if the model isn't trained well on a certain technology, I have created something we call "Nexus" that will research new protocols, devices, SDKs, etc and generate training data automatically. Works well for zero-days for example, point to a tweet, or a research paper, and automatically generates training data to recognize this attack much faster. Have over 8000+ yara rule integrations that help with confidence boosting as well that is automatically incorporated into the analysis. Creating some amazing stuff at Binary Defense that isn't marketing fluff - actionable things that are making a huge difference in this industry. #BinaryDefense

Here is a live demo of our AI solution I've been building non-stop over the past 8 months Binary Defense. How it works: Our own model trained on our analysts behavior. Our analysts submit tickets as false positives/true positives with context which enriches our LLM to be smarter over time. Key Highlights: If its a binary - will automatically spin up an agent for reverse engineering it and using EMBER ML to understand behavior and intent of the binary. File formats: Supports a vast array of pretty much any filetype, including email attachments like SVG, LNK, etc. Can handle DLLs, ELF, EXEs, PDF, XLS, DOC, etc. Interrogates the full chain of all events irrespective of log sources. Can handle any format of logs and integrates into APIs of customers for additional agentic data looping for confidence ranking when needed. This is an example of the back-end UI, this is transparent to analysts and enriches the alarms automatically in our SOAR. In these examples there's three different types: 1. Regsvr32 + sct downloader + scrobj.dll code execution - checks reputation of domain, pulls in threat intel, looks at entire picture of the chain - downloads the file itself and inspects for code analysis. Determines if malicious as well as historically looking back if seen in customer before in past. 2. Powershell Obfuscation - uses a universal decoder to un-obfuscate powershell and look at the raw code. Can handle pretty much any obfuscation thrown at it (thanks Justin Elze). 3. Email with malicious SVG - checks tonality of email, are they creating urgency to take action (increases confidence) - disassembles SVG to understand malicious content - checks URL to determine if harvesting credentials, payload delivery, etc. Creates an entire kill chain analysis with full response and dissecting of the attack to the analyst in seconds. Has greatly sped up our ability to respond to incidents and allowing analysts to focus on the most important alarms through prioritization. Once cool thing I've worked heavily on is a synthetic data normalizer which when an analyst says "Yes this is bad with context" or "No this is a false positive" - our local model generates training data to be smarter in the future without using the actual customer data to train it. The customers actual data is immediately destroyed once training data off of the original alarm is generated and contains no customer-centric data at all. We also have three model tiers. Opt-In (collective model, again no customer data but every organization contributes to training). Opt-Out - does not train on any customer data for customers who opt-out. Private LLM - LLM created specifically for individual customer and trains only off of their data. Uses shared model collective for better confidence rankings. It will generate automated playbooks to run based on confidence rankings to take action on behalf of the customer. Still human driven on execution - has to approve playbook actions. This thing is cooking and so cool to see this work live and shut down attackers much faster! If confidence ranking is low - will automatically attempt to enrich data through customer environments for better confidence rankings. Additionally if the model isn't trained well on a certain technology, I have created something we call "Nexus" that will research new protocols, devices, SDKs, etc and generate training data automatically. Works well for zero-days for example, point to a tweet, or a research paper, and automatically generates training data to recognize this attack much faster. Have over 8000+ yara rule integrations that help with confidence boosting as well that is automatically incorporated into the analysis. Creating some amazing stuff at Binary Defense that isn't marketing fluff - actionable things that are making a huge difference in this industry. #BinaryDefense

Dave Kennedy

29,036 Aufrufe • vor 5 Monaten

My fox shooting garden defending AI robot is finally done and WORKING! 🤩 (Don’t worry it only shoots 💦 water) After months of slowly moving forward with each part I finished the last step to train a TensorFlow model on the footage of the 🦊 fox I collected hours of footage 📹 with the fox roaming around my garden, from this I labeled around 2000 images with the fox by hand ✋ Honestly, I was quite skeptical training the model was actually gonna work, maybe this was partly the reason I avoided working on this until the very end. If I couldn’t train a model to detect the fox, this whole robot would never be able to function properly. On the flipside though, with no previous experience in hardware or electronics there was a bit of a learning curve and I didn’t want to end up labeling thousands of images, training a TensorFlow model, only to fail on building the hardware. As I started building, I realized that mixing hardware and software adds quite another dimension to debugging things. At times I wasted hours debugging code in my IDE, only to realize the issue was somewhere in the electronics. Furthermore, combining this side project with a full time job and a young family, is not always easy. It can be quite frustrating, to know you only need 4 hours of concentrated effort for a small task, having to spread it out across a week of 20min increments. Then, a few months into the build I noticed the fox had stopped coming to my garden, in fact one day, I recorded her walking with 3 cute little 🐶 pups, and the next day I saw her moving out of my garden completely. Did she know I was building a robot? I had this strange mix of feelings, happy my garden was safe from poop and digging, happy she was safe with her pups, but how was I gonna finish this project if my robot had no fox to detect? For sure they would be back next year, I figured I could postpone the whole thing until next winter, but I also knew it was gonna be much harder to pick up momentum if I did let it sit there for six months. So I decided to keep working, hoping the fox would reappear,.. but she never did. As I finished labeling the footage and started training my model, I could finally see the mAP results, quantifying the precision of my object detection model. It was measuring at 78% across different metrics on detecting my fox. I quickly ran the model on some of the video footage I got from my fox. Inference speed took a hit, but it did a near perfect job detecting the fox, even when she was deep down in the grass or wizzing past in a motion blur. It took me by surprise how well it worked. With the default model I had to drop my confidence threshold way down to 15%, to recognize the fox as 🦜“bird” in one or two frames, with my custom model it followed the fox all the way down to the back of the garden! Still this didn’t solve the issue of there being no actual fox in my garden and how was I gonna wrap this project in a short timeframe. I played with the idea of putting a fox toy 🧸 on an RC 🚗 car, or borrowing a dog to run around the garden to test. Friends suggested I run around the garden in a fox costume.. what a ridiculous idea. I wasn’t really feeling the idea of running around the garden in a floppy cloth fox 🎭 costume, but had a look anyway. I came across these self inflating costumes. This actually could be perfect. Since it’s inflated, it would hold its shape super well, making it much easier to label, train and be recognized by my robot. So I got the costume and shot a time lapse of myself as a fox walking around the garden. I labeled it to around 600 images. Ran the model training again and got a mAP result of 82%. This was even better than my real fox! At this point I knew this was gonna work. So here’s the final 🎥 video, just having some fun with it. I’ll update here whenever the real fox does come back. On a final note, I’m looking for (remote) jobs in these fields of AI now: - object detection - visual generative AI - 3D (nerfs + gaussian splats) So if you know anything let me know! My DMs are open 😊

My fox shooting garden defending AI robot is finally done and WORKING! 🤩 (Don’t worry it only shoots 💦 water) After months of slowly moving forward with each part I finished the last step to train a TensorFlow model on the footage of the 🦊 fox I collected hours of footage 📹 with the fox roaming around my garden, from this I labeled around 2000 images with the fox by hand ✋ Honestly, I was quite skeptical training the model was actually gonna work, maybe this was partly the reason I avoided working on this until the very end. If I couldn’t train a model to detect the fox, this whole robot would never be able to function properly. On the flipside though, with no previous experience in hardware or electronics there was a bit of a learning curve and I didn’t want to end up labeling thousands of images, training a TensorFlow model, only to fail on building the hardware. As I started building, I realized that mixing hardware and software adds quite another dimension to debugging things. At times I wasted hours debugging code in my IDE, only to realize the issue was somewhere in the electronics. Furthermore, combining this side project with a full time job and a young family, is not always easy. It can be quite frustrating, to know you only need 4 hours of concentrated effort for a small task, having to spread it out across a week of 20min increments. Then, a few months into the build I noticed the fox had stopped coming to my garden, in fact one day, I recorded her walking with 3 cute little 🐶 pups, and the next day I saw her moving out of my garden completely. Did she know I was building a robot? I had this strange mix of feelings, happy my garden was safe from poop and digging, happy she was safe with her pups, but how was I gonna finish this project if my robot had no fox to detect? For sure they would be back next year, I figured I could postpone the whole thing until next winter, but I also knew it was gonna be much harder to pick up momentum if I did let it sit there for six months. So I decided to keep working, hoping the fox would reappear,.. but she never did. As I finished labeling the footage and started training my model, I could finally see the mAP results, quantifying the precision of my object detection model. It was measuring at 78% across different metrics on detecting my fox. I quickly ran the model on some of the video footage I got from my fox. Inference speed took a hit, but it did a near perfect job detecting the fox, even when she was deep down in the grass or wizzing past in a motion blur. It took me by surprise how well it worked. With the default model I had to drop my confidence threshold way down to 15%, to recognize the fox as 🦜“bird” in one or two frames, with my custom model it followed the fox all the way down to the back of the garden! Still this didn’t solve the issue of there being no actual fox in my garden and how was I gonna wrap this project in a short timeframe. I played with the idea of putting a fox toy 🧸 on an RC 🚗 car, or borrowing a dog to run around the garden to test. Friends suggested I run around the garden in a fox costume.. what a ridiculous idea. I wasn’t really feeling the idea of running around the garden in a floppy cloth fox 🎭 costume, but had a look anyway. I came across these self inflating costumes. This actually could be perfect. Since it’s inflated, it would hold its shape super well, making it much easier to label, train and be recognized by my robot. So I got the costume and shot a time lapse of myself as a fox walking around the garden. I labeled it to around 600 images. Ran the model training again and got a mAP result of 82%. This was even better than my real fox! At this point I knew this was gonna work. So here’s the final 🎥 video, just having some fun with it. I’ll update here whenever the real fox does come back. On a final note, I’m looking for (remote) jobs in these fields of AI now: - object detection - visual generative AI - 3D (nerfs + gaussian splats) So if you know anything let me know! My DMs are open 😊

Jeroen Pixel

55,797 Aufrufe • vor 2 Jahren

Brett Adcock, Figure CEO joined our 8 hour(!) live stream where a bunch of us were bird dogging and discussing Figure’s own 8 hour livestream showing a F.03 bot doing a logistics task completely autonomously including shift changes between bots. Here’s my summary of Brett’s remarks. The attached video is just the segment with Brett. We were first introduced to F.03 8 months ago, but Figure has been hard at work on their next version, F.04 which has just completed design lock, so expect to see that new bot sometimes this fall. F.04 was co-designed with the latest Figure AI stack called Helix and was built specifically for data. Brett didn’t explain what that meant, but I suspect it means the bot has many more sensors to enable better training and transfer learning. F.04 will be the biggest leap in performance they’ve had between versions so far which is saying something. Brett is a huge proponent of cross training the bots with many different tasks such that seeming unrelated tasks makes all learned tasks better. He gave an example of the fridge loading training which was topping out at 60% reliability until they trained the same model with kitchen shelving tasks, then they saw the fridge tasks jump to 90% accuracy. As such, they spend almost all their time in pre-training the unified Helix model to ensure they get cross training benefits. Figure will have almost completely localized Figure’s supply chain away from China by next quarter. They build almost everything in-house. Figure does not appear eager to get their bots into the workforce. Brett said they could, today, push thousands of bots into customer hands, and I believe him. But their goal is full general robotics where you can describe a brand new task to a robot, maybe do a one time demonstration, just like you would to a human showing them a new task, and then have the robot do the task. This is the holy grail of AI robotics, and Figure is laser focused on that mission. Brett initially said there was a possibility of achieving it this year, but then guided next couple of years, which I think is much more likely. Personally, I think they’ll need at least a new generation of NVIDIA inference chips to make that leap, and a lot more data gathering, training and hardware development. Brett said their goal with the hardware is “Apple” quality. Ie. Something as well designed and made as any Apple product. While the F.03 hand is clearly performant as shown in the 8 hour livestream, they are building a new hand for the F.04 bot which will be even closer to the full functionality of a human hand. Brett fully believes you need a humanoid hand as close as possible in capability to a human hand, if for no other reason that transfer learning from humans works a lot better when you can exactly mimic what a human does. If the bot can’t do something a human demonstrates, then you’ve just polluted your dataset. By now Figure has built more hands than bot versions (5-6 hands). One of the first hands they tried was a tendon driven hand, and without explaining why, Brett said that was a dead end. Their hands now have all actuators in the hand itself, and are clearly already robust. Brett said he just sat through a 100 page powerpoint design review of the latest hand - that’s how complicated it is. Brett’s other AI company, Hark Labs, has developed a conversational voice model which is installed now in the Figure bots roaming the office. Being able to converse back and forth with a Figure bot is now a thing and will get better over time. All in all, I came away from this segment even more bullish on Figure.

Phil Trubey

33,861 Aufrufe • vor 2 Monaten

🎉 new skill unlocked: 20s uninterrupted, unstitched, single render from our new ai video engine: Nami. This is my birb (#7531) from the Moonbirds collection, idling in the library. patent: "Intra-Latent Semantic Injection via Cross-Spatial Encoding and Decoding during Multi-Pass Inference for Generative AI Video Creation" At Scrypted we've been quietly working on an agentic generative AI stack for two years: • integrating and testing w/ partners across the games & entertainment sectors • stealthily building a community of early believers through AVB • showcasing some of what we're doing with amazing projects like H011yw00d Agent. -- about Nami -- Nami is an agentic orchestration layer for AI video models: it unlocks their inner superpowers without making them rely on custom LoRAs or fine-tunings. Instead of throwing raw training power and tens of millions of dollars at training yet another ai video model: we figured out new ways to use what we have. Nami harnesses a multi-agent system to perform the work needed in taking a simple prompt or image and turning it into something bigger - much bigger. The agentic steps are allowed to manipulate latent space, digging into tensors, yet doing so in semantically aware chunks - meaning that Nami inherently supports video generation of arbitrary length, though it's bound to O(n) rendering time. (We do have some cool sharding tech that allows us to cut the generative time in half for a reference pose idle-animation like this demo). It's also fairly agnostic, picking and choosing the right tools for the job, and plays really well with emerging tech like FLUX Kontext, FramePack, or <- without being limited by any of them. -- use cases -- Even just a year or two ago the 20 second render below would cost a company, paying an agency, around $10k start-to-finish. This one cost me $6.25 on our dev hardware in an unoptimized environment. There's something mind-blowing about the state-of-the-art when we reduce costs to 0.0625% - less than 1% - of what we used to pay. It's also empowering. For creators. Game developers. Content influencers: you name it. -- superpowers -- 1. it does the things you ask for, in the order you asked for it 2. consistency is king 3. single-shot text or image-to-video 4. future videos can reference previous ones to seamlessly maintain style 5. semantic stitching: can't wait to showcase this -- gtm -- We think Generative AI Video, like image generation, like text, like games, should be a publicly accessible common good. We believe democratizing access to Nami in web3, via x402 payments proposed by Drew Coffman, or in World's mini-apps, is a bold step forward for digital freedom. Permissionless, decentralized, generative ai video. Naturally, we'll also soon release a web platform for using Nami in a traditionally SaaSy way: bring your own images, videos, or prompts and we'll take care of the rest. In the mid-term, Scrypted is building a stack of agentic skills (we call it AVB) and making them available to projects like H011yw00d Agent on Virtuals Protocol and other platforms. -- long-term vision -- Scrypted's mission is to decentralize the things that can't be decentralized. We participated in a16z crypto's CSX (London 2024) during our pre-seed specifically to research a new consensus protocol for hard things like AI video and AI agents: where there's no "one right answer". When Zero-Knowledge Proofs (ZKP) can't secure it, and Trusted Execution Environments (TEEs) are too small, we've got you covered with our upcoming Inori Network. -- how you can help -- 1. Are you a GPU farm? We're gonna need more flops. 2. Do you represent an L1 or L2? We want to build bridges. 3. Do you represent a Wallet or App creator? Let's get an endpoint exposed. 4. Are you an investor? Let's chat. 5. Like, repost, share! -- team background -- We come from a background of AI in the Video Game industry with each founder having over 20 years of experience at companies like Electronic Arts & Square Enix. -- contact -- DMs are open, reach out if you want to be an early tester for your site, game, collection, or project! -- try it out -- Go anywhere on X and tag H011yw00d Agent with a prompt and she'll give you a free 2 second render. Have fun making cinematic shorts or meme videos! -- thanks -- AWS Startups has been an incredible help scaling our prototypes. Also, shout out to all loyal beans 🫘 in the Autonomous Virtuals Beings (AVB) community. Nami has a very important role in the upcoming XP agent platform, can't wait to show you all. AVbeings

🎉 new skill unlocked: 20s uninterrupted, unstitched, single render from our new ai video engine: Nami. This is my birb (#7531) from the Moonbirds collection, idling in the library. patent: "Intra-Latent Semantic Injection via Cross-Spatial Encoding and Decoding during Multi-Pass Inference for Generative AI Video Creation" At Scrypted we've been quietly working on an agentic generative AI stack for two years: • integrating and testing w/ partners across the games & entertainment sectors • stealthily building a community of early believers through AVB • showcasing some of what we're doing with amazing projects like H011yw00d Agent. -- about Nami -- Nami is an agentic orchestration layer for AI video models: it unlocks their inner superpowers without making them rely on custom LoRAs or fine-tunings. Instead of throwing raw training power and tens of millions of dollars at training yet another ai video model: we figured out new ways to use what we have. Nami harnesses a multi-agent system to perform the work needed in taking a simple prompt or image and turning it into something bigger - much bigger. The agentic steps are allowed to manipulate latent space, digging into tensors, yet doing so in semantically aware chunks - meaning that Nami inherently supports video generation of arbitrary length, though it's bound to O(n) rendering time. (We do have some cool sharding tech that allows us to cut the generative time in half for a reference pose idle-animation like this demo). It's also fairly agnostic, picking and choosing the right tools for the job, and plays really well with emerging tech like FLUX Kontext, FramePack, or <- without being limited by any of them. -- use cases -- Even just a year or two ago the 20 second render below would cost a company, paying an agency, around $10k start-to-finish. This one cost me $6.25 on our dev hardware in an unoptimized environment. There's something mind-blowing about the state-of-the-art when we reduce costs to 0.0625% - less than 1% - of what we used to pay. It's also empowering. For creators. Game developers. Content influencers: you name it. -- superpowers -- 1. it does the things you ask for, in the order you asked for it 2. consistency is king 3. single-shot text or image-to-video 4. future videos can reference previous ones to seamlessly maintain style 5. semantic stitching: can't wait to showcase this -- gtm -- We think Generative AI Video, like image generation, like text, like games, should be a publicly accessible common good. We believe democratizing access to Nami in web3, via x402 payments proposed by Drew Coffman, or in World's mini-apps, is a bold step forward for digital freedom. Permissionless, decentralized, generative ai video. Naturally, we'll also soon release a web platform for using Nami in a traditionally SaaSy way: bring your own images, videos, or prompts and we'll take care of the rest. In the mid-term, Scrypted is building a stack of agentic skills (we call it AVB) and making them available to projects like H011yw00d Agent on Virtuals Protocol and other platforms. -- long-term vision -- Scrypted's mission is to decentralize the things that can't be decentralized. We participated in a16z crypto's CSX (London 2024) during our pre-seed specifically to research a new consensus protocol for hard things like AI video and AI agents: where there's no "one right answer". When Zero-Knowledge Proofs (ZKP) can't secure it, and Trusted Execution Environments (TEEs) are too small, we've got you covered with our upcoming Inori Network. -- how you can help -- 1. Are you a GPU farm? We're gonna need more flops. 2. Do you represent an L1 or L2? We want to build bridges. 3. Do you represent a Wallet or App creator? Let's get an endpoint exposed. 4. Are you an investor? Let's chat. 5. Like, repost, share! -- team background -- We come from a background of AI in the Video Game industry with each founder having over 20 years of experience at companies like Electronic Arts & Square Enix. -- contact -- DMs are open, reach out if you want to be an early tester for your site, game, collection, or project! -- try it out -- Go anywhere on X and tag H011yw00d Agent with a prompt and she'll give you a free 2 second render. Have fun making cinematic shorts or meme videos! -- thanks -- AWS Startups has been an incredible help scaling our prototypes. Also, shout out to all loyal beans 🫘 in the Autonomous Virtuals Beings (AVB) community. Nami has a very important role in the upcoming XP agent platform, can't wait to show you all. AVbeings

Tim Cotten

12,617 Aufrufe • vor 1 Jahr

$China just made Silicon Valley's entire AI industry look like a scam. The US government spent 3 years trying to stop China from building competitive AI. But this backfired HORRIBLY. Here's what happened: Yesterday, a Chinese startup called DeepSeek released a new AI model called V4. It matches the performance of OpenAI and Anthropic's best models. At 1/7th the price. And for the first time ever, it was built on Chinese chips. NOT American ones. That last part is the one that terrifies the west. For context: Since 2022, the US has banned the export of advanced AI chips to China. The entire strategy was built on the assumption that if China can't access Nvidia's best hardware, they can't build frontier AI. But DeepSeek just proved that assumption wrong. Their V4 model was trained and runs on Huawei's Ascend chips. Huawei spent months working directly with DeepSeek to make sure V4 runs across their entire line of AI processors. Jensen Huang even predicted this on a recent podcast: "The day that DeepSeek comes out on Huawei first, that is a horrible outcome for our nation." That day was yesterday. And the numbers are crazy: DeepSeek V4 costs $3.48 per million output tokens. OpenAI's latest model GPT-5.5 costs $30. Anthropic's Claude charges $25. Same ballpark performance. 7x cheaper. Uber's CTO just admitted they burned through their ENTIRE 2026 AI budget in 4 months using Anthropic's tools. If Uber had used DeepSeek instead, that same budget would have lasted 7 YEARS. 4 months vs 7 years. Same work getting done. But the pricing isn't even the big thing here. The real story is what DeepSeek did with their technical report: They published the benchmarks where they LOSE. Every AI company cherry-picks the tests where their model wins. DeepSeek ran the full comparison against GPT-5.4 and Google's Gemini, found they trail frontier models by 3 to 6 months, and printed it anyway. They literally don't care because the price gap makes the performance gap irrelevant for 90% of use cases. So the US export controls didn't slow China down. They ACCELERATED China's independence. Because Chinese developers were FORCED to train models with limited resources, they had to figure out how to make AI radically more efficient. That constraint became their competitive advantage. Every generation of DeepSeek has gotten dramatically cheaper to train. V4 continues the trend. Meanwhile US companies are going the OPPOSITE direction: OpenAI's GPT-5.5 Pro costs $180 per million output tokens. That's 51x more expensive than DeepSeek V4 for comparable work. The Commerce Secretary confirmed this week that ZERO Nvidia advanced chip shipments have actually gone through to China despite being approved in January. So China built frontier AI anyway. Without American chips. At a fraction of the cost. And the market response tells you everything: Chinese chipmaker SMIC surged 10%. Huahong Semiconductor jumped 15%. DeepSeek's Chinese AI competitors Zhipu AI and MiniMax dropped 9% because V4 is destroying them too. DeepSeek is making Silicon Valley's pricing model look like a scam. US tech companies spent $650 billion on AI infrastructure this year. DeepSeek just showed the world you can match their output for pennies. The export controls were supposed to be America's ace card. Instead they taught China how to win without American chips, at American prices nobody can compete with. Jensen Huang was right. This is a horrible outcome. But it's the outcome America built for itself.$

China just made Silicon Valley's entire AI industry look like a scam. The US government spent 3 years trying to stop China from building competitive AI. But this backfired HORRIBLY. Here's what happened: Yesterday, a Chinese startup called DeepSeek released a new AI model called V4. It matches the performance of OpenAI and Anthropic's best models. At 1/7th the price. And for the first time ever, it was built on Chinese chips. NOT American ones. That last part is the one that terrifies the west. For context: Since 2022, the US has banned the export of advanced AI chips to China. The entire strategy was built on the assumption that if China can't access Nvidia's best hardware, they can't build frontier AI. But DeepSeek just proved that assumption wrong. Their V4 model was trained and runs on Huawei's Ascend chips. Huawei spent months working directly with DeepSeek to make sure V4 runs across their entire line of AI processors. Jensen Huang even predicted this on a recent podcast: "The day that DeepSeek comes out on Huawei first, that is a horrible outcome for our nation." That day was yesterday. And the numbers are crazy: DeepSeek V4 costs $3.48 per million output tokens. OpenAI's latest model GPT-5.5 costs $30. Anthropic's Claude charges $25. Same ballpark performance. 7x cheaper. Uber's CTO just admitted they burned through their ENTIRE 2026 AI budget in 4 months using Anthropic's tools. If Uber had used DeepSeek instead, that same budget would have lasted 7 YEARS. 4 months vs 7 years. Same work getting done. But the pricing isn't even the big thing here. The real story is what DeepSeek did with their technical report: They published the benchmarks where they LOSE. Every AI company cherry-picks the tests where their model wins. DeepSeek ran the full comparison against GPT-5.4 and Google's Gemini, found they trail frontier models by 3 to 6 months, and printed it anyway. They literally don't care because the price gap makes the performance gap irrelevant for 90% of use cases. So the US export controls didn't slow China down. They ACCELERATED China's independence. Because Chinese developers were FORCED to train models with limited resources, they had to figure out how to make AI radically more efficient. That constraint became their competitive advantage. Every generation of DeepSeek has gotten dramatically cheaper to train. V4 continues the trend. Meanwhile US companies are going the OPPOSITE direction: OpenAI's GPT-5.5 Pro costs $180 per million output tokens. That's 51x more expensive than DeepSeek V4 for comparable work. The Commerce Secretary confirmed this week that ZERO Nvidia advanced chip shipments have actually gone through to China despite being approved in January. So China built frontier AI anyway. Without American chips. At a fraction of the cost. And the market response tells you everything: Chinese chipmaker SMIC surged 10%. Huahong Semiconductor jumped 15%. DeepSeek's Chinese AI competitors Zhipu AI and MiniMax dropped 9% because V4 is destroying them too. DeepSeek is making Silicon Valley's pricing model look like a scam. US tech companies spent $650 billion on AI infrastructure this year. DeepSeek just showed the world you can match their output for pennies. The export controls were supposed to be America's ace card. Instead they taught China how to win without American chips, at American prices nobody can compete with. Jensen Huang was right. This is a horrible outcome. But it's the outcome America built for itself.

Ricardo

280,185 Aufrufe • vor 3 Monaten

The $AEGIS DApp portal is now open to all: 🛡️ At Aegis, we believe in empowering the blockchain full of security, transparency and innovation. The Aegis Dapp has been under development for several months prior to the launch of $AEGIS and with that we have been able to build what we believe has the potential to change how users go about their day to day security. We are thrilled to share our progress and truly exciting news with you all. 🎯 First things first, at Aegis, we want to make it clear that the value of what we seek to bring to security across the blockchain, comes from our big vision, our strong team, and our commitment to long-term goals. ℹ️ Let’s kick this off with some information that is constantly happening, which is behind the scenes. Our full team is dedicated to the opportunity that lays ahead of us with becoming the leading voice/name for security, grasping every aspect with innovation, hard work, passion and commitment to see this sector grow. Everyone is aware of how important security is, a heartwarming mention to Messari for including us on how they see this sector growing rapidly and pushing a 10 Billion evaluation. We take that recognition with full responsibility and gratitude as we've been working hard on some really powerful stuff that could change the game for our industry. If you read the title and report itself, I’m sure that’ll give you some insight to what’s coming, and to the vast extent of what you can expect Aegis to be working towards. —> 🤝 This comes from teaming up with others within this sector and coming up with new tech to projects driven by our community, within the pipeline you can be confident that what we are building will push the cryptocurrency industry as a whole into a better future, the magnitude to what Aegis brings will not stop until we can confidently say, “Negative security reports across the blockchain are at an all time low, thousands of users are satisfied that Aegis is protecting them and their assets.” We're sticking to our vision no matter what the market does or whatever else comes our way. We plan to build what we set out to and we will see to it that our ecosystem is met. We've been working on some pretty amazing products that will be available within our Dapp, let’s go over what we offer: * AI Audits * Live Monitoring * Penetration Testing * Bug Bounties * Live Watchdog * Token analytics for everyday users, developers, teams, auditors, institutions, investors. ⬇️ Let’s break it down for you in some simple steps: AI AUDITS: We have trained our LLM models as AI AGENTS, these consist of 3 people ( AI AGENTS ) for the audits that are performed. - Audit - Reviewer - Judge Each one analyzes with a different personality, let’s check what personalities our AI AGENTS consist of: 3 different perspective auditors. 1 - Fine-tuned model x amount reads the code and generates the audit. ✅ 2 - Model x amount reviews the code and fact checks thoroughly. ✅ 3 - Model x amount ranks the code based on the severity outcome. ✅ ⌚️ Live Monitoring/Watchdog: The Live Monitoring/Watchdog system is designed to provide real-time surveillance of smart contracts, ensuring the detection and prevention of any potentially harmful transactions or malicious activities. Through the utilization of an AI Agent model, the system is trained to proactively identify and thwart suspicious behavior, thereby safeguarding the integrity of the smart contracts. Also, a paid sophisticated threat detection model is available for more intricate protocols and Dapps, offering an advanced level of protection against potential threats. This proactive approach is crucial in mitigating the risk of exploitation and ensuring the security of the smart contract ecosystem. 🖊️ Pen Testing: Our platform offers Pen Testing services to developers, providing a controlled environment for whitehat hackers to simulate attacks and identify vulnerabilities in smart contracts and protocols. In addition to human whitehat hackers, our AI Agents function as Red and Blue teams, actively engaging in simulated attacks to stress-test protocols and identify potential weaknesses. This comprehensive approach allows developers to proactively identify and address security issues, ultimately enhancing the robustness and resilience of their projects. 🕷️ Bug Bounties: Our Bug Bounty listing platform provides developers with the opportunity to list their protocols and offer bounties to white hat hackers for identifying vulnerabilities. By aggregating millions of bounties from various platforms and utilizing AI tools, we streamline the testing process, reducing up to 80% of the workload typically associated with security testing. This allows developers to efficiently identify and address potential vulnerabilities in their protocols, ultimately enhancing the overall security and resilience of their projects. 🪙 And lot more token analytics features for regular users, this will give you the opportunity to explore our Dapp for yourself and have some fun diving into the security platform of the future! I’m sure you’re excited to try it all out yourself, which is why we have some exciting news to bring to the #Guardians of the blockchain! But just before you continue the read and see the beans have been spilled, we have to take this opportunity to share with you that this large step to becoming a security leader is but only 20% of what we have revealed. This will be at the core of what Aegis stands for and hopes to achieve. The focus here is upon our Dapp, and in time we will slowly bring forward information/updates regarding segments of what makes Aegis a force to be reckoned with. Now that you’re fired up and excited to all of the announcements to come, let’s get to the news you’ve been waiting for! 🎉 We’re spilling the good news, and are happy to say we are now set for public release! The team at Aegis are overwhelmed with the development, support from teams, community, partners and more on what we believe to be an institutional-grade product. But the fun doesn’t stop there, this marks the start of what we aim to become, as it will take time and cycles to become better and better. Constant advancements will be set in place to attain the goal of achieving blockchain security. A statement from our CEO- Brian Hunt: “I can confirm from the security conferences I attended with Centralized security firms Peckshield, Hacken, Certik, BlockSec presentations, they are trying to achieve something similar and it will take them years. Decentralized AI for Security!” This initial drop of our dapp will be to get users signed up to gain access, in which we’ll whitelist users to get the ball rolling. 📣 To end this segment, let’s get the party started with the long awaited Aegis Ai Security Dapp and sign up now!

The $AEGIS DApp portal is now open to all: 🛡️ At Aegis, we believe in empowering the blockchain full of security, transparency and innovation. The Aegis Dapp has been under development for several months prior to the launch of $AEGIS and with that we have been able to build what we believe has the potential to change how users go about their day to day security. We are thrilled to share our progress and truly exciting news with you all. 🎯 First things first, at Aegis, we want to make it clear that the value of what we seek to bring to security across the blockchain, comes from our big vision, our strong team, and our commitment to long-term goals. ℹ️ Let’s kick this off with some information that is constantly happening, which is behind the scenes. Our full team is dedicated to the opportunity that lays ahead of us with becoming the leading voice/name for security, grasping every aspect with innovation, hard work, passion and commitment to see this sector grow. Everyone is aware of how important security is, a heartwarming mention to Messari for including us on how they see this sector growing rapidly and pushing a 10 Billion evaluation. We take that recognition with full responsibility and gratitude as we've been working hard on some really powerful stuff that could change the game for our industry. If you read the title and report itself, I’m sure that’ll give you some insight to what’s coming, and to the vast extent of what you can expect Aegis to be working towards. —> 🤝 This comes from teaming up with others within this sector and coming up with new tech to projects driven by our community, within the pipeline you can be confident that what we are building will push the cryptocurrency industry as a whole into a better future, the magnitude to what Aegis brings will not stop until we can confidently say, “Negative security reports across the blockchain are at an all time low, thousands of users are satisfied that Aegis is protecting them and their assets.” We're sticking to our vision no matter what the market does or whatever else comes our way. We plan to build what we set out to and we will see to it that our ecosystem is met. We've been working on some pretty amazing products that will be available within our Dapp, let’s go over what we offer: * AI Audits * Live Monitoring * Penetration Testing * Bug Bounties * Live Watchdog * Token analytics for everyday users, developers, teams, auditors, institutions, investors. ⬇️ Let’s break it down for you in some simple steps: AI AUDITS: We have trained our LLM models as AI AGENTS, these consist of 3 people ( AI AGENTS ) for the audits that are performed. - Audit - Reviewer - Judge Each one analyzes with a different personality, let’s check what personalities our AI AGENTS consist of: 3 different perspective auditors. 1 - Fine-tuned model x amount reads the code and generates the audit. ✅ 2 - Model x amount reviews the code and fact checks thoroughly. ✅ 3 - Model x amount ranks the code based on the severity outcome. ✅ ⌚️ Live Monitoring/Watchdog: The Live Monitoring/Watchdog system is designed to provide real-time surveillance of smart contracts, ensuring the detection and prevention of any potentially harmful transactions or malicious activities. Through the utilization of an AI Agent model, the system is trained to proactively identify and thwart suspicious behavior, thereby safeguarding the integrity of the smart contracts. Also, a paid sophisticated threat detection model is available for more intricate protocols and Dapps, offering an advanced level of protection against potential threats. This proactive approach is crucial in mitigating the risk of exploitation and ensuring the security of the smart contract ecosystem. 🖊️ Pen Testing: Our platform offers Pen Testing services to developers, providing a controlled environment for whitehat hackers to simulate attacks and identify vulnerabilities in smart contracts and protocols. In addition to human whitehat hackers, our AI Agents function as Red and Blue teams, actively engaging in simulated attacks to stress-test protocols and identify potential weaknesses. This comprehensive approach allows developers to proactively identify and address security issues, ultimately enhancing the robustness and resilience of their projects. 🕷️ Bug Bounties: Our Bug Bounty listing platform provides developers with the opportunity to list their protocols and offer bounties to white hat hackers for identifying vulnerabilities. By aggregating millions of bounties from various platforms and utilizing AI tools, we streamline the testing process, reducing up to 80% of the workload typically associated with security testing. This allows developers to efficiently identify and address potential vulnerabilities in their protocols, ultimately enhancing the overall security and resilience of their projects. 🪙 And lot more token analytics features for regular users, this will give you the opportunity to explore our Dapp for yourself and have some fun diving into the security platform of the future! I’m sure you’re excited to try it all out yourself, which is why we have some exciting news to bring to the #Guardians of the blockchain! But just before you continue the read and see the beans have been spilled, we have to take this opportunity to share with you that this large step to becoming a security leader is but only 20% of what we have revealed. This will be at the core of what Aegis stands for and hopes to achieve. The focus here is upon our Dapp, and in time we will slowly bring forward information/updates regarding segments of what makes Aegis a force to be reckoned with. Now that you’re fired up and excited to all of the announcements to come, let’s get to the news you’ve been waiting for! 🎉 We’re spilling the good news, and are happy to say we are now set for public release! The team at Aegis are overwhelmed with the development, support from teams, community, partners and more on what we believe to be an institutional-grade product. But the fun doesn’t stop there, this marks the start of what we aim to become, as it will take time and cycles to become better and better. Constant advancements will be set in place to attain the goal of achieving blockchain security. A statement from our CEO- Brian Hunt: “I can confirm from the security conferences I attended with Centralized security firms Peckshield, Hacken, Certik, BlockSec presentations, they are trying to achieve something similar and it will take them years. Decentralized AI for Security!” This initial drop of our dapp will be to get users signed up to gain access, in which we’ll whitelist users to get the ball rolling. 📣 To end this segment, let’s get the party started with the long awaited Aegis Ai Security Dapp and sign up now!

AEGIS AI

128,006 Aufrufe • vor 2 Jahren

Don't Buy a Mac Mini for Clawdbot: The Secret $10,000 Architecture That Costs You Nothing clawdbot might be the reason you feel like you need a ten thousand dollar computer right now but i am about to show you why that fomo is going to leave you broke. if you have been watching everyone rush out to buy mac minis and mac studios just to run open claw or some local models you are witnessing a massive transfer of wealth from your pocket to apple for no reason. there is a specific setup i use that costs almost nothing and keeps my main machine safe from whatever these autonomous agents are doing. if you stick with me i will walk you through the exact architecture of a professional trading system that handles the heavy lifting without you needing to drop a single rack on hardware most people are scared of running these bots on their main computer because they don't want an agent messing with their personal files or browser sessions. instead of buying a second mac mini for six hundred dollars you can just go to the top left of your screen and create a brand new user profile. this acts like a completely isolated sandbox where you can install all your trading tools and agents without them ever seeing your main data. it is essentially like getting a free computer for the price of five minutes of clicking around your settings but what if you aren't on a mac or you need to access your system while you are traveling without carrying three laptops in your backpack. this is where the first loop of professional automation starts to close because i use something called chrome remote desktop to bridge the gap. this allows me to leave a dedicated machine running in a safe place while i access the full desktop environment from a tablet or a cheap laptop anywhere in the world. it solves the mobility issue but it still doesn't solve the problem of those massive ten thousand dollar price tags for high end mac pros if you are a pc user or just someone who doesn't want to own physical hardware yet you should look into a windows vps through a provider like contabo. most developers will tell you to use a linux terminal but if you aren't a coder yet you need a visual interface you can actually see. getting a windows server allows you to log in and see a desktop just like your home computer for about fifteen dollars a month. i usually recommend at least twelve gigabytes of ram to keep things from getting janky when you are running multiple browser windows and agents at once now you might be thinking that the whole point of the big hardware was to run local models like kimi or glm to save on api costs. i spent years thinking i had to own the machines myself and i even spent hundreds of thousands on developers before i realized i could just do this myself. the secret to running those massive open source models without the ten thousand dollar investment is renting gpu power by the hour. sites like lambda labs let you spin up a monster machine that can run any model in existence for just a couple dollars an hour this is the ultimate pivot because it allows you to test if your strategy actually prints money before you commit to the hardware. you can turn the server on when you are iterating and turn it off the second you are done which keeps your overhead near zero. if you haven't proven that your bot can pay for itself yet then buying a mac studio is just an expensive hobby rather than a business move. there is a much bigger loophole involving the anthropic subscriptions that most people are completely overlooking right now right now i am using a specific plan with claude code that costs about two hundred dollars a month but it lets me run open claw all day without hitting api limits. if i were paying for those same tokens through the standard api i would probably be spending hundreds of dollars every single day. it is a massive cost savings that allows you to iterate and fail until you find a winning strategy without draining your bank account. even if they eventually close this loophole or snitch on the usage patterns it serves as the perfect training ground for a data dog the goal is to find a system that works with a smaller or cheaper model like haiku before you ever try to scale up to the heavy weights. if you can make a strategy profitable using a less intelligent and cheaper model then you know you have found real alpha. once you have that foundation you can decide if it finally makes sense to build your own custom pc rig which will always be half the price of an apple machine. i am an apple guy so i usually pay the tax anyway but i only do it once the system is already generating enough to cover the cost ten times over i believe that code is the great equalizer because it took me from losing money and getting liquidated to having fully automated systems doing the work for me. i had to learn to live with the iterations and the failures on youtube to get to this point of clarity. the universe tends to get out of your way once you make a non negotiable contract with yourself to see the process through to the end. you don't need the flashy hardware or the most expensive setup to start winning in this game stay focused on the logic and the data rather than the hype and the fomo that everyone else is falling for. if you can master the bridge between renting power and owning your logic you will be ahead of ninety nine percent of the people in this space. the path to a fully automated life isn't paved with expensive gadgets but with the discipline to iterate until the system finally prints

Moon Dev

17,382 Aufrufe • vor 5 Monaten

Unstructured Thoughts about OpenAI o3, the nature of AGI, and Post-Labor Economics AGI just crossed a threshold—here’s why that matters and what we can do with it. I’ve been hammering on OpenAI’s new o3 model for a few days, long enough to watch the hype settle into something more interesting: utility. Benchmarks suggest a polite incremental bump; lived experience says we’ve entered a qualitatively different regime. o3 is the first model that feels faster than my ability to absorb its output. My brain—not the AI—has become the bottleneck. A new ceiling for human cognition? Most discussions of “alien intelligence” forget that we share the same sandbox: mathematics, physics, code, natural language. What shifts is cognitive horizon—the totality you can mentally represent and manipulate. o3 expands that horizon in real time. In an afternoon it consolidated two years of my work on post‑labor economics, stress‑tested the logic, surfaced data sources, and offered to autogenerate the Python notebooks. The cost of insight has collapsed from years to hours. If you merely outsource thought, you’ll stagnate. If you treat the model as a sparring partner—interrogating, refining, iterating—you’ll compound your own intelligence. Exponential leverage is now a choice, not a privilege. What o3 got right about my health project? I dumped the entire history of my chronic‑fatigue recovery protocol—including the five‑axis “burnout pentagram”—into memory and asked the model where I’d gone astray. It corrected a handful of minor assumptions and, more importantly, recalibrated my timeline: six‑to‑eight months of recovery left instead of eighteen. That’s not “replace your doctor” advice; it’s proof that large‑context reasoning is finally clinically useful. Post‑Labor Economics: the sketch that o3 and I built in one sitting 1. Metric 1 – Economic Agency Index (EAI) Income decomposed into wages, property, and transfers. The higher the property share, the more “post‑labor” you already are. 2. Metric 2 – Collective Purchasing Power (CPP) How much capital a county can mobilize without taxation or new debt. Rising CPP means you are compounding local prosperity. Interventions happen at the county level (subsidiarity): solar co‑ops in Arizona, riverfront greenways in the Midwest, data‑center dividends in fiber‑rich exurbs. Ownership is local, revenue is distributed, migration equilibrates naturally, and environmental stewardship becomes self‑interest rather than moral theater. UBI morphs from last‑ditch transfer to one of several levers for raising EAI. The bigger picture: AGI isn’t an oracle descending from the sky; it’s a time‑compression engine. Every minute you spend learning how to learn with it buys you an hour you would have burned doing rote synthesis. The frontier question is no longer “Will the machines replace us?” but “How fast can we upgrade ourselves in partnership with them?” What’s next? I’m cleaning the data, building the national EAI/CPP dashboard, and pressure‑testing the whole framework. I’ll publish the notebooks (or let o3 do it) once the numbers are solid. Meanwhile, I want to hear from you: Where does o3 add the most leverage in your world? Which of the post‑labor metrics feels wrong—or dangerously right? What failure mode should falsify this thesis? Drop your critique, your data source, or your wild counter‑proposal in the comments. Let’s map the edge of this new cognitive horizon together. —Dave

Unstructured Thoughts about OpenAI o3, the nature of AGI, and Post-Labor Economics AGI just crossed a threshold—here’s why that matters and what we can do with it. I’ve been hammering on OpenAI’s new o3 model for a few days, long enough to watch the hype settle into something more interesting: utility. Benchmarks suggest a polite incremental bump; lived experience says we’ve entered a qualitatively different regime. o3 is the first model that feels faster than my ability to absorb its output. My brain—not the AI—has become the bottleneck. A new ceiling for human cognition? Most discussions of “alien intelligence” forget that we share the same sandbox: mathematics, physics, code, natural language. What shifts is cognitive horizon—the totality you can mentally represent and manipulate. o3 expands that horizon in real time. In an afternoon it consolidated two years of my work on post‑labor economics, stress‑tested the logic, surfaced data sources, and offered to autogenerate the Python notebooks. The cost of insight has collapsed from years to hours. If you merely outsource thought, you’ll stagnate. If you treat the model as a sparring partner—interrogating, refining, iterating—you’ll compound your own intelligence. Exponential leverage is now a choice, not a privilege. What o3 got right about my health project? I dumped the entire history of my chronic‑fatigue recovery protocol—including the five‑axis “burnout pentagram”—into memory and asked the model where I’d gone astray. It corrected a handful of minor assumptions and, more importantly, recalibrated my timeline: six‑to‑eight months of recovery left instead of eighteen. That’s not “replace your doctor” advice; it’s proof that large‑context reasoning is finally clinically useful. Post‑Labor Economics: the sketch that o3 and I built in one sitting 1. Metric 1 – Economic Agency Index (EAI) Income decomposed into wages, property, and transfers. The higher the property share, the more “post‑labor” you already are. 2. Metric 2 – Collective Purchasing Power (CPP) How much capital a county can mobilize without taxation or new debt. Rising CPP means you are compounding local prosperity. Interventions happen at the county level (subsidiarity): solar co‑ops in Arizona, riverfront greenways in the Midwest, data‑center dividends in fiber‑rich exurbs. Ownership is local, revenue is distributed, migration equilibrates naturally, and environmental stewardship becomes self‑interest rather than moral theater. UBI morphs from last‑ditch transfer to one of several levers for raising EAI. The bigger picture: AGI isn’t an oracle descending from the sky; it’s a time‑compression engine. Every minute you spend learning how to learn with it buys you an hour you would have burned doing rote synthesis. The frontier question is no longer “Will the machines replace us?” but “How fast can we upgrade ourselves in partnership with them?” What’s next? I’m cleaning the data, building the national EAI/CPP dashboard, and pressure‑testing the whole framework. I’ll publish the notebooks (or let o3 do it) once the numbers are solid. Meanwhile, I want to hear from you: Where does o3 add the most leverage in your world? Which of the post‑labor metrics feels wrong—or dangerously right? What failure mode should falsify this thesis? Drop your critique, your data source, or your wild counter‑proposal in the comments. Let’s map the edge of this new cognitive horizon together. —Dave

David Shapiro (L/0)

45,581 Aufrufe • vor 1 Jahr

America spent $285 billion to LOSE the AI war. Stanford dropped a 423 page report yesterday and revealed the most damning stat on page 200: The number of AI researchers moving to the United States has collapsed 89% since 2017. 80% of that collapse happened in the LAST 12 MONTHS. Let that sink in. The country that invented the transformer. The country that built OpenAI, Anthropic, Google DeepMind, and xAI. The country pouring $285.9 billion of private capital into AI in a single year (23x more than China). Can no longer attract the people who actually build the technology. And here's the part that should concern every founder, operator, and investor reading this: The Trump administration just made it official. The H-1B visa now costs employers $100,000 PER HIRE. So OpenAI wants to hire a Chinese postdoc from Tsinghua? $100K before they write a line of code. Anthropic wants a French ML engineer? $100K. Google wants the Indian PhD who literally co-authored the paper their entire model is based on? $100K. And these are the LUCKY ones who even get a visa. The result was instant. 89% drop over 8 years. 80% of it in the last year alone. The talent pipeline got destroyed. Now look at the other side of the chart: China's top model is now 2.7 percentage points behind Anthropic's best. Down from a 20+ point gap two years ago. China leads the world in AI publications. China leads in AI patents. China leads in industrial robot installations. US and Chinese models have traded the #1 spot multiple times since early 2025. Switzerland and Singapore now have more AI researchers per capita than the US. The US ranks 24TH globally in actual AI adoption. Behind the UAE. Behind Singapore. Behind countries most Americans couldn't find on a map. And here's the truly insane part: 50% of the world's top AI researchers are Chinese. Jensen Huang said this on a podcast 3 weeks ago. For 20 years, the US strategy was simple: Let them study at Stanford and MIT, then keep them. Pay them $800K. Give them green cards. Build the future on imported brains. That deal is dead. We just told the smartest people in the world: "Pay $100,000 for the privilege of working here, or go home." And guess what they're doing. They're going to Zurich, where Anthropic and OpenAI are quietly opening offices because they can't get the talent into San Francisco anymore. The strategy is the same as building a Ferrari factory and then banning mechanics from entering the building. You can pour hundreds of billions into data centers. You can buy 4 million Nvidia chips. You can sign $300 billion cloud contracts with Oracle. You can build nuclear reactors to power your GPUs. None of it matters if the people who write the algorithms aren't allowed in the country. Wall Street thinks AI is a capex race. But in reality, it's a TALENT race. Every dollar Microsoft and Meta and Google are spending assumes the same army of researchers will keep showing up to use it. That assumption just broke. And the smart money already knows: Why is Anthropic opening a Zurich office? Why is DeepMind expanding in London instead of Mountain View? Why is OpenAI hiring in Dublin and Singapore? Because the math no longer works in America. The government turned the world's biggest brain magnet into the world's most expensive border wall. 3 years from now, when China launches a frontier model that outperforms anything in the US and the headlines scream "How did we lose the lead?" - remember this post. The lead wasn't lost in a lab. It wasn't lost on a benchmark. It wasn't lost to a smarter algorithm. It was lost at customs.

America spent $285 billion to LOSE the AI war. Stanford dropped a 423 page report yesterday and revealed the most damning stat on page 200: The number of AI researchers moving to the United States has collapsed 89% since 2017. 80% of that collapse happened in the LAST 12 MONTHS. Let that sink in. The country that invented the transformer. The country that built OpenAI, Anthropic, Google DeepMind, and xAI. The country pouring $285.9 billion of private capital into AI in a single year (23x more than China). Can no longer attract the people who actually build the technology. And here's the part that should concern every founder, operator, and investor reading this: The Trump administration just made it official. The H-1B visa now costs employers $100,000 PER HIRE. So OpenAI wants to hire a Chinese postdoc from Tsinghua? $100K before they write a line of code. Anthropic wants a French ML engineer? $100K. Google wants the Indian PhD who literally co-authored the paper their entire model is based on? $100K. And these are the LUCKY ones who even get a visa. The result was instant. 89% drop over 8 years. 80% of it in the last year alone. The talent pipeline got destroyed. Now look at the other side of the chart: China's top model is now 2.7 percentage points behind Anthropic's best. Down from a 20+ point gap two years ago. China leads the world in AI publications. China leads in AI patents. China leads in industrial robot installations. US and Chinese models have traded the #1 spot multiple times since early 2025. Switzerland and Singapore now have more AI researchers per capita than the US. The US ranks 24TH globally in actual AI adoption. Behind the UAE. Behind Singapore. Behind countries most Americans couldn't find on a map. And here's the truly insane part: 50% of the world's top AI researchers are Chinese. Jensen Huang said this on a podcast 3 weeks ago. For 20 years, the US strategy was simple: Let them study at Stanford and MIT, then keep them. Pay them $800K. Give them green cards. Build the future on imported brains. That deal is dead. We just told the smartest people in the world: "Pay $100,000 for the privilege of working here, or go home." And guess what they're doing. They're going to Zurich, where Anthropic and OpenAI are quietly opening offices because they can't get the talent into San Francisco anymore. The strategy is the same as building a Ferrari factory and then banning mechanics from entering the building. You can pour hundreds of billions into data centers. You can buy 4 million Nvidia chips. You can sign $300 billion cloud contracts with Oracle. You can build nuclear reactors to power your GPUs. None of it matters if the people who write the algorithms aren't allowed in the country. Wall Street thinks AI is a capex race. But in reality, it's a TALENT race. Every dollar Microsoft and Meta and Google are spending assumes the same army of researchers will keep showing up to use it. That assumption just broke. And the smart money already knows: Why is Anthropic opening a Zurich office? Why is DeepMind expanding in London instead of Mountain View? Why is OpenAI hiring in Dublin and Singapore? Because the math no longer works in America. The government turned the world's biggest brain magnet into the world's most expensive border wall. 3 years from now, when China launches a frontier model that outperforms anything in the US and the headlines scream "How did we lose the lead?" - remember this post. The lead wasn't lost in a lab. It wasn't lost on a benchmark. It wasn't lost to a smarter algorithm. It was lost at customs.

Ricardo

230,997 Aufrufe • vor 3 Monaten

I think I can finally report some success training a quite accurate IDM capable of recovering keystrokes from Minecraft gameplay, even in quite PvP-heavy situations. At this point the model does not only know what keys are pressed to the extent reasonably discernible, it also knows how fast it is moving in 3D space at all times, even when knockback is mixing with the self-move impulse. Now, recovering keystrokes from normal external capture footage is just about impossible. E.g. W/A/S/D does exactly nothing during partial tick frames and jumping mid-air is also equally useless, so asking the model to recover key down states is inherently unreasoanble. Mouse deltas are also completely arbitrary units, as game mouse sensitivity introduces an arbitrary scale factor into the equation. The only good option is to think carefully about your model-environment contract, and only record "logical actions", not raw keystrokes. So here's a few unfortunate lessons I had to learn in roughly this order. - Choose good units. (bad: mouse deltas, good: delta radians [yes, you will need game-internal state]) - Capture from inside the main game loop and read the game fbo to get consistent frame-action pairing. Doing post-mortem pairing is hopeless. - Carefully define when you think keystrokes actually have an effect. (jump only works on ground, when flying or in water etc.) More subtle: The key may already be down, but no tick has happened yet to actually use the value. Hence: ignore Seperate gamestate into "fast and slow-moving" components. E.g. movement is likely tick based, camera rotation is very likely updated every frame in essentially every game ever. - Think about your frame-action correspondance contract (How old is the frame in relation to the inputs you capture? Will double or tripple buffering affect you?) Think about the game loop timeline, where you are sampling, how old the data you are reading is, and where the ticks are happening around you. Language models used to simply not have a model-environment contract, but even now with the model "living" in a designated harness, the contract still boils down to formatting, and tool implementation intrinsics. While also important, it is still quite a bit more obvious because the violations are in some way shape or form reflected as text you can actually see. - ffmpeg dropping frames cummulatively screws the model the further you get into the sequence because your targets are now shifted. If you can't encode the video in real-time, too bad. - Sodium has a frames in flight system different from vanilla Minecraft, which will also offset your targets from your frames. (there goes that data...) - Models are succeptible to latency. If there is too big of a delay between action and on-screen reflection, your performance degrades. At this point I realize ~100hours of gameplay is essentially no longer usable as a dataset. You can train on this data, but all you'll get is a mushy mess. However, some good news: - Making the model predict physics gamestate scalars helps the model generalize. For instantaneous events like jump, it's unreasonable to ask the model emit a short burst of jump=true at exactly the right time, however if you also predict your current y-velocity, the model has supervision signal for the "latent" from which that onground jump becomes apparent. Recovering x/z motion is also somewhat easier than unmixing it into plausible keystrokes for inertia-heavy player controller logic. - Regressing physics gamestate scalars also seems to make your dataset "bigger". While pure keystroke classification will overfit quickly, predicting exact physics gamestate scalars forces the model to generalize more and you can tolerate far more epochs before validation loss starts to stall out. This is the only reason why it was bearable to dump 100h+ of dataset hours and replace it with ~3 hours of gameplay after the 4th revision of the file format (yeah...) and somehow still have better performance. Now, you might be asking, "isn't this brittle?" and the answer is yesn't. Frame-action correspondance matters for training, but not so much during inference. So as long as you are sampling in roughly the same interval as your training data, you aren't violating any hard contract per-se. Somewhere around the frames ticks are happening, and during training you capture various tick-capture offset relations per random chance, so nothing is too obviously wrong here. HOWEVER, you will get screwed by gui scale, shaders, resource packs, "shit that recording is 1920x1040 because somebody doesn't know fullscreen exists" and other unfortunate edge cases of reality. But I suppose this is the role of dataset size. If all those "contract violations" that a youtube video has compared to the training data are addressed, I think this is a way to turn Youtube into a labeled dataset. I could never shake the feeling that VPT is a sound idea in practice, while never having been properly executed, and I think one reason why it hasn't is because that label boostrapping part is just a pain in the butt to get right. Now, what the player is doing is of course not the only label you can extract from video, but it has to be one of the targets predicted during pretraining to "align" the pretraining objective. Some notes on the video here, the colored dots on the analog visualizer are the ground truth, while the gray dot is the model prediction. Green means correct prediction, red means incorrect prediction at that frame. Model P(key) reports how wrong the prediction is from green (0.0) to red (1.0). You will also notice that during periods of rapid slow down, left and right actions become close to irrecoverable, because there is just that little motion. And some jump actions are not predicted correctly because I got the detection condition for jump events wrong... (duh) LMB/RMB for other than sustained events (like item-consume and block break) also seem to be hopelessly irrecoverable for now. Swing was supposed to do the same thing as motion y did for jump, but its too well behaved as an increasing counter. Maybe partial-tick interpolated values work better (v5 file format then... ugh..)

I think I can finally report some success training a quite accurate IDM capable of recovering keystrokes from Minecraft gameplay, even in quite PvP-heavy situations. At this point the model does not only know what keys are pressed to the extent reasonably discernible, it also knows how fast it is moving in 3D space at all times, even when knockback is mixing with the self-move impulse. Now, recovering keystrokes from normal external capture footage is just about impossible. E.g. W/A/S/D does exactly nothing during partial tick frames and jumping mid-air is also equally useless, so asking the model to recover key down states is inherently unreasoanble. Mouse deltas are also completely arbitrary units, as game mouse sensitivity introduces an arbitrary scale factor into the equation. The only good option is to think carefully about your model-environment contract, and only record "logical actions", not raw keystrokes. So here's a few unfortunate lessons I had to learn in roughly this order. - Choose good units. (bad: mouse deltas, good: delta radians [yes, you will need game-internal state]) - Capture from inside the main game loop and read the game fbo to get consistent frame-action pairing. Doing post-mortem pairing is hopeless. - Carefully define when you think keystrokes actually have an effect. (jump only works on ground, when flying or in water etc.) More subtle: The key may already be down, but no tick has happened yet to actually use the value. Hence: ignore Seperate gamestate into "fast and slow-moving" components. E.g. movement is likely tick based, camera rotation is very likely updated every frame in essentially every game ever. - Think about your frame-action correspondance contract (How old is the frame in relation to the inputs you capture? Will double or tripple buffering affect you?) Think about the game loop timeline, where you are sampling, how old the data you are reading is, and where the ticks are happening around you. Language models used to simply not have a model-environment contract, but even now with the model "living" in a designated harness, the contract still boils down to formatting, and tool implementation intrinsics. While also important, it is still quite a bit more obvious because the violations are in some way shape or form reflected as text you can actually see. - ffmpeg dropping frames cummulatively screws the model the further you get into the sequence because your targets are now shifted. If you can't encode the video in real-time, too bad. - Sodium has a frames in flight system different from vanilla Minecraft, which will also offset your targets from your frames. (there goes that data...) - Models are succeptible to latency. If there is too big of a delay between action and on-screen reflection, your performance degrades. At this point I realize ~100hours of gameplay is essentially no longer usable as a dataset. You can train on this data, but all you'll get is a mushy mess. However, some good news: - Making the model predict physics gamestate scalars helps the model generalize. For instantaneous events like jump, it's unreasonable to ask the model emit a short burst of jump=true at exactly the right time, however if you also predict your current y-velocity, the model has supervision signal for the "latent" from which that onground jump becomes apparent. Recovering x/z motion is also somewhat easier than unmixing it into plausible keystrokes for inertia-heavy player controller logic. - Regressing physics gamestate scalars also seems to make your dataset "bigger". While pure keystroke classification will overfit quickly, predicting exact physics gamestate scalars forces the model to generalize more and you can tolerate far more epochs before validation loss starts to stall out. This is the only reason why it was bearable to dump 100h+ of dataset hours and replace it with ~3 hours of gameplay after the 4th revision of the file format (yeah...) and somehow still have better performance. Now, you might be asking, "isn't this brittle?" and the answer is yesn't. Frame-action correspondance matters for training, but not so much during inference. So as long as you are sampling in roughly the same interval as your training data, you aren't violating any hard contract per-se. Somewhere around the frames ticks are happening, and during training you capture various tick-capture offset relations per random chance, so nothing is too obviously wrong here. HOWEVER, you will get screwed by gui scale, shaders, resource packs, "shit that recording is 1920x1040 because somebody doesn't know fullscreen exists" and other unfortunate edge cases of reality. But I suppose this is the role of dataset size. If all those "contract violations" that a youtube video has compared to the training data are addressed, I think this is a way to turn Youtube into a labeled dataset. I could never shake the feeling that VPT is a sound idea in practice, while never having been properly executed, and I think one reason why it hasn't is because that label boostrapping part is just a pain in the butt to get right. Now, what the player is doing is of course not the only label you can extract from video, but it has to be one of the targets predicted during pretraining to "align" the pretraining objective. Some notes on the video here, the colored dots on the analog visualizer are the ground truth, while the gray dot is the model prediction. Green means correct prediction, red means incorrect prediction at that frame. Model P(key) reports how wrong the prediction is from green (0.0) to red (1.0). You will also notice that during periods of rapid slow down, left and right actions become close to irrecoverable, because there is just that little motion. And some jump actions are not predicted correctly because I got the detection condition for jump events wrong... (duh) LMB/RMB for other than sustained events (like item-consume and block break) also seem to be hopelessly irrecoverable for now. Swing was supposed to do the same thing as motion y did for jump, but its too well behaved as an increasing counter. Maybe partial-tick interpolated values work better (v5 file format then... ugh..)

mike64_t

18,762 Aufrufe • vor 3 Monaten