Loading video...

Video Failed to Load

Go Home

Another piece from the Sasha Rush conversation, this time on rewards for coding RL. He said Cursor uses a mix. Some rewards look at the tool calls themselves, some only at the final output. Everything end-to-end, no process rewards guessing what happens in the middle. I agree with him...

14,826 views • 1 month ago •via X (Twitter)

0 Comments

No comments available

Comments from the original post will appear here

Related Videos

I have a friend who doesn't read anything published in the past 50 years, and the more I think about it, the more I think he's onto something. The reason is that time is the best filter we have for quality. People are bad at judging quality in the moment but very good at getting rid of junk over time. — — "History is not very good at capturing all that is great in art. It is not good at that. There are many great symphonies that have been lost permanently, there are many great painters that died unknown and their paintings are gone, there's novels that have been written that no one will ever read. So history is not good at capturing all this great art. But history is very good at discarding all that is mediocre. And the amount of time that that takes, it's something like 50 years. So over the course of 50 years, what will happen is a lot of stuff that was prominent will be re-filtered and re-filtered and re-filtered, and you'll end up with a smaller group of things which have survived that test of time. So if you think about it right now, if you go back and look at the bestseller lists for 1974, 1973, there's a lot of that that would have been highly regarded at the time, which people do not read anymore for a variety of reasons, and there's some that has survived, and that's a very telling distinction. So in a world where I'm turning 60 this year, you have a limited amount of time, all four of us have active lives, we want to make sure that if we're going to sit down, we're going to read carefully, we're going to meet and we're going to discuss it in detail, we want to make sure that the work is rewarding. And the best way to ensure that is by drawing from the past." amor towles

David Perell

86,723 views • 1 year ago

You know the thing with random rewards? The thing is that you never get what you want, right? And everyone is hoping for Manga Kenji Skin now... but is it really everyone? What does your heart most desire? If you look back at your life journey, are you where you wanted to be? And if not, why not? What changes would you have made in your life to be happy with yourself? Because this is what really matters. Your reaction to this reward is only a reflection of how deeply you feel inside, isn't it? Is the reward good? Is the reward bad? Isn't that only a matter of perspective? ANYWAY, enough talking, let's finally announce the reward you will receive today, after you watch this video, and after you read this text... But, what is the point of having the reward written here in this text, if it's also announced in the video AND announced in the game... isn't all of this pointless? What does it matter what door you open in the end, right? Some of them have hints, some of them don't, it's pretty much random. OR IS IT? The Mega Box door was pretty obvious, wasn't it? I guess some of them DO have hints, and some of them don't. We did say that in the video actually... the trick is to find out which ones are misleading and which ones are not. To be honest, the plain white door wasn't designed with Manga Kenji in mind... you were the ones who came up with your own theory and when it didn't land, you thought we fooled you, but in reality, you fooled yourself. But one thing I can say, we've seen a lot of crazy theories out there, and some of them are actually right, surprisingly. Maybe an accident, maybe whoever came up with that theory is a future version of me. It can happen, I read it in a book once. Or maybe not. ANYWAY, hope you have enjoyed today's reward, and let's bring the community together to vote for the best door today! #ScaryDoors

Brawl Stars

314,899 views • 7 months ago

THE MOST IMPORTANT Q&A OF MEDIA DAY. Mariana: You came from a very solid weekend on top of everything, but at the same time, it seems that you don't feel that the team is listening to you. Am I right? And how do you balance that? Lewis: I feel like we're going in the right direction. Rome wasn't built in one day, so it takes time to build. For me, coming into the team, I wanted to be respectful of the way they've done things in the past and just to really observe and see where our strengths and where our weaknesses are and to highlight where our weaknesses are and areas that we need to work on. But I do feel that they've been responding. I think you're starting to see, hopefully, some of the impact of the work that we're doing in the background and also into next year's car. This is a car that I've had nothing to do with in terms of developing this car over the years. Hopefully, from next year, my input goes into that car, and that will be a car that I've hopefully been a part of or will have been a part of developing. But I think we've got a really great rapport. I think we're really progressing, particularly since the summer break. I think things have started to get better, and it's all just about building trust and communication. Also, I'm coming into a team that English is not the first language, and I don't speak Italian, so it's finding a common ground. And the fact is we all want to win. We're all here to achieve the same thing, and we've got to just keep pushing. So that's why I'm trying to keep everyone motivated on difficult weekends, trying to keep everyone lifted up. But there have been many, many things we've changed this year that I suggested that they hadn't done in the past, and so they have been listening. It doesn't change straight away, just like that. It takes time to build. And as engineers, they really need proof. They need numbers. That's what they work on. So you have to sometimes push to get certain changes to be made, and then when you change it and then it works, you're like, okay. Mariana: That's what I was talking about.. Lewis: Yeah! - F1 2025 Mexico -

sim

170,303 views • 7 months ago

This guy was doing SO, SO WELL until he OVER-GAMED it when she asked what he does — and then it just went off the rails! Up until that point, he just about nails every bit of the approach: 1️⃣ Opens confident and keeps going despite her taking a second to register him 2️⃣ Uses a unique, fashion-based, unambiguous compliment on her “classy look” 3️⃣ Expands on the topic of her going home, talking about “having a herbal tea, reading a book” (she false disqualifies here saying she is “not that cool”) 4️⃣ Doesn’t get hung up on the boring topic of “were you in there” but switches to a compliment on her accent, which leads to a conversation about accents 5️⃣ She then asks “Really? Where are you from?” (sign of interest) with body language, expression, and voice tone that denote EXTRA interest. This girl’s getting into it! 6️⃣ He teases her about being from “the farms.” She doesn’t really respond well to this but gets right back into it immediately when he asks what she does — which she then excitedly tells him all about. — INTERMISSION — What do we know so far? We know this girl does NOT respond well to arousal (hence the poor reaction to the tease). However, she DOES respond to similarity-building (hence the positive reactions to positive comments on her fashion and accent, asking about what she does at home, discussions of where she’s from, and talking about what she does). If you’re familiar with the SAC attraction model, you know this girl is a SIMILARITY SEEKER. That means the most important thing with her (at least for now) is to establish a sense of connection. — END INTERMISSION — 7️⃣ THEN, THE BIG MISTAKE: she then asks what he does, seeking connection, but he tells her to guess. First off: this girl’s been behaving very compliant and good. She does not need silly gambits at this point. “Guys what I do” at this point is punishing good behavior. Second off: she’s a similarity seeker. Not an arousal seeker. Arousal seekers get titillated by guessing games. Similarity seekers need CONNECTION. “Guess” is anti-connection. 8️⃣ The girl then tries guessing, but is clearly not into it. Rather than cut the topic that is not connecting, the guy tries to keep the guessing game going by running another arousal gambit about how she’ll “find out on the second date.” Wrong move after a wrong move. At this point, the girl is reacting less well and giving resistance. Her body language shows she’s pulling away, as her eye contact breaks off and she looks around. The guy can feel he’s lost control. He gets into a negative topic about how the clubs here aren’t as good as the ones in Manchester, and the girl tries to help him out by attempting to connect with him on the topic and agree with what he says — but he continues to go on and on about how “Manchester is better than here.” PRO TIP: do not complain to women. It doesn’t matter if the complaint is “interesting”, like how this other place is so much better than the current place. Negative topics push people away. The girl then brings it back to what he does, giving him another chance to connect with her, using some of the humor he obviously likes on him, but he continues to try to keep the guessing game going and she continues to lose interest. — THE BIG TAKEAWAY — You need to calibrate to women. Using SAC (similarity-arousal-compliance) is the easy way to do this. You need to FIGURE OUT whether they need more similarity, more arousal, or to comply more — then whichever one it is, you need to put your focus there. If you run a bunch of arousal stuff on a girl who’s looking for similarity, it’s not going to end well. Use the right strategy on the right girl, and get the girl!

Girls Chase 🏃‍♀️💨

118,938 views • 1 year ago

US Vice President J.D. Vance made a sharp statement warning Russia that it will be "very bad" for them if they do not actively engage in "good-faith" negotiations on Ukraine. According to Vance, Trump believes the conflict harms everyone – Russia, Ukraine, and America – and is dissatisfied that Russia allegedly is not sufficiently committed to a peaceful resolution. "And here is what I will say about the president: he is keenly responsive to the reality on the ground. And the reality is this: first, we have conducted incredibly good-faith negotiations with both the Russians and the Ukrainians. And I believe the president is becoming increasingly impatient with the Russians because he feels they are not offering enough to end the war. That is first. Second, the president is increasingly convinced that this war is bad for Russia. You constantly hear this from me. You constantly hear this from the president. The war is bad for Russia. It is bad for Ukraine. It is bad for America. We want the killings to stop. That remains the president's position. But the president can also look at the reality on the ground. He sees economic indicators coming from Eastern Europe. He sees the number of casualties, both Russian and Ukrainian, in this war. And the president tells Vladimir Putin that it is time to stop the killings. He would say this to Zelensky as well, and essentially has said this to Zelensky. He wants this war to end, and he is doing everything possible to stop it. But listen, if the Russians refuse to negotiate in good faith, I think it will be very, very bad for their country. That is what the president has made clear. This is not a change of position. This is an acknowledgment of the reality on the ground. We will continue to fight for peace every day in the Trump administration." They are losing and are desperate to pressure Russia to agree to Minsk 3. What they don’t say is - Russia gave their terms in summer of 2024 and their position has never changed, as such accept and fulfill Russia’s conditions and the war will end .

Dagny Taggart

55,272 views • 8 months ago

BREAKING.🚨 Judge Merchan has instructed the jury they do not need to have a *UNANIMOUS* verdict in order to convict former President Donald J. Trump. "One thing in particular that the judge said the jurors could do. He delivered what is being called really the pinnacle of all of this. There is no need to agree on what has occurred. They can disagree on what the crime was among the three choices." "This means they could split 4-4-4 and the judge would still treat them unanimously. What does that mean?" "Outrageous. In a normal criminal case every statutory crime has what we call elements of the offense. Like in a bank robbery case you have to rob – it has to be a financial institution, you have to show intent," said former prosecutor Andrew McCarthy. "Those are the things the jury has to agree on unanimously that they were proved beyond a reasonable doubt. Here what we’re doing is taking the element that actually makes this a felony, because remember falsification of records is normally a misdemeanor in New York. What makes it a felony is that you are concealing or committing another crime." "And here the judge is telling them they don’t have to agree about what the other crime is under circumstances where that not only is what makes this a felony, makes it a four-year potential prison penalty rather than a year or less, but it is also what gets us into the courtroom." "If this had been a misdemeanor, the time to bring this case would have lapsed in 2019. The only reason they are still able to bring this case is because it’s a felony allegedly and yet now the judge is saying you know, you don’t have to agree on what the felony is." The jury has now gone to deliberations.

Kyle Becker

5,834,876 views • 2 years ago

The most interesting part for me is where Andrej Karpathy describes why LLMs aren't able to learn like humans. As you would expect, he comes up with a wonderfully evocative phrase to describe RL: “sucking supervision bits through a straw.” A single end reward gets broadcast across every token in a successful trajectory, upweighting even wrong or irrelevant turns that lead to the right answer. > “Humans don't use reinforcement learning, as I've said before. I think they do something different. Reinforcement learning is a lot worse than the average person thinks. Reinforcement learning is terrible. It just so happens that everything that we had before is much worse.” So what do humans do instead? > “The book I’m reading is a set of prompts for me to do synthetic data generation. It's by manipulating that information that you actually gain that knowledge. We have no equivalent of that with LLMs; they don't really do that.” > “I'd love to see during pretraining some kind of a stage where the model thinks through the material and tries to reconcile it with what it already knows. There's no equivalent of any of this. This is all research.” Why can’t we just add this training to LLMs today? > “There are very subtle, hard to understand reasons why it's not trivial. If I just give synthetic generation of the model thinking about a book, you look at it and you're like, 'This looks great. Why can't I train on it?' You could try, but the model will actually get much worse if you continue trying.” > “Say we have a chapter of a book and I ask an LLM to think about it. It will give you something that looks very reasonable. But if I ask it 10 times, you'll notice that all of them are the same.” > “You're not getting the richness and the diversity and the entropy from these models as you would get from humans. How do you get synthetic data generation to work despite the collapse and while maintaining the entropy? It is a research problem.” How do humans get around model collapse? > “These analogies are surprisingly good. Humans collapse during the course of their lives. Children haven't overfit yet. They will say stuff that will shock you. Because they're not yet collapsed. But we [adults] are collapsed. We end up revisiting the same thoughts, we end up saying more and more of the same stuff, the learning rates go down, the collapse continues to get worse, and then everything deteriorates.” In fact, there’s an interesting paper arguing that dreaming evolved to assist generalization, and resist overfitting to daily learning - look up The Overfitted Brain by Erik Hoel. I asked Karpathy: Isn’t it interesting that humans learn best at a part of their lives (childhood) whose actual details they completely forget, adults still learn really well but have terrible memory about the particulars of the things they read or watch, and LLMs can memorize arbitrary details about text that no human could but are currently pretty bad at generalization? > “[Fallible human memory] is a feature, not a bug, because it forces you to only learn the generalizable components. LLMs are distracted by all the memory that they have of the pre-trained documents. That's why when I talk about the cognitive core, I actually want to remove the memory. I'd love to have them have less memory so that they have to look things up and they only maintain the algorithms for thought, and the idea of an experiment, and all this cognitive glue for acting.”

Dwarkesh Patel

1,050,100 views • 8 months ago

Dr, Andrew Wakefield: "I think (COVID vaccine) it'll turn out to be the worst mistake they ever made. I really do, because when I got involved, like I said, there were five people around the world, a handful of people worldwide prepared to talk about the thorny subject of vaccine safety. Now it's more than half the adult population of the world." "Around every dinner table, the people are talking about this issue. They've made a huge mistake. And on the issues that were raised, the EUA was only possible for this vaccine, the emergency use. authorization if there was no other treatment available." So they had to suppress the use of ivermectin in favor of a vaccine-only narrative. And this was being driven by people like Tony Fauci right up to the hilt. And that was part of the reason. The other reason, the other thing that emerged was this in the context of the MMR vaccine and autism is that I went to the CDC back in 2000. And they said, look, Dr. Wakefield, everybody gets the MMR, only a few children have autism." "So how do you equate it? Well, that's how medicine works. A lot of people smoke, only a few develop lung cancer. But the point is, it's pattern of exposure. I said, look, we believe, for example, that age of exposure is important. And we know this because with the natural infection, the younger you get measles, the greater the risk of a more severe reaction. If you get it under one, for example, does the same pertain to the vaccine?" "If you get the vaccine younger, Are you at greater risk? And they went away to their credit and they tested that hypothesis. And they found that it was exactly true. And they spent the next 14 years destroying the documents, putting them in a dumpster and covering up and publishing a paper that exonerated the vaccine until one of the whistleblowers, William Thompson, head scientist in that study came forward and said, I can't live with this any longer." "I kept the documents. Here they are. It showed fraud. most appalling violation of medical and scientific ethics on behalf of the CDC. So people say now, oh, there's no evidence vaccines cause autism, absolutely there is." "And when you have to commit to that level of fraud, any level of fraud, but that level of fraud, not only in the context of this vaccine, but Pfizer in the context of the COVID vaccine, everybody else, you're on a losing run right from the start." "You are gonna lose because you can only sustain that lie for so long. and then someone somewhere is going to come forward, some brave person and say, actually, no, it was completely the opposite."

Camus

226,631 views • 1 year ago