Video wird geladen...
Video konnte nicht geladen werden
We developed an RL method for fine-tuning our models for precise tasks in just a few hours or even minutes. Instead of training the whole model, we add an “RL token” output to π-0.6, our latest model, which is used by a tiny actor and critic to learn quickly... show more
430,406 Aufrufe • vor 3 Monaten •via X (Twitter)
0 Kommentare
Keine Kommentare verfügbar
Kommentare vom Original-Post werden hier angezeigt
