Video wird geladen...
Video konnte nicht geladen werden
Another piece from the Sasha Rush conversation, this time on rewards for coding RL. He said Cursor uses a mix. Some rewards look at the tool calls themselves, some only at the final output. Everything end-to-end, no process rewards guessing what happens in the middle. I agree with him... show more
14,826 Aufrufe • vor 2 Monaten •via X (Twitter)
0 Kommentare
Keine Kommentare verfügbar
Kommentare vom Original-Post werden hier angezeigt
