Loading video...
Video Failed to Load
Another piece from the Sasha Rush conversation, this time on rewards for coding RL. He said Cursor uses a mix. Some rewards look at the tool calls themselves, some only at the final output. Everything end-to-end, no process rewards guessing what happens in the middle. I agree with him... show more
14,826 views • 1 month ago •via X (Twitter)
0 Comments
No comments available
Comments from the original post will appear here
