Loading video...

Video Failed to Load

There was a problem loading this video. This could be due to a temporary network issue or the video might be unavailable.

$TETSUO Update Setup flask embeddings service using a local model for the vector dbs that the reinforcement learning component uses to measure responses with its previous storyline 🚀

BEELD 🐝/acc

84,894 subscribers

25,536 views • 1 year ago •via X (Twitter)

Science & Technology

Anya Rossi• Live Now

Private livecam show

9 Comments

tetsuo.ai1 year ago

LMFAO 😭

hitaro1 year ago

What a wonderful dev , TITSUUUUOO! 🇨🇦🇺🇸

tetsuo.ai1 year ago

TITSUUUUOO!!!

Paschamo1 year ago

Need also AI Agent for my physical Art studio :) lazy Artist here 😂🎨

tetsuo.ai1 year ago

lol, lazy dev here. nice to meet you.

makintosh1 year ago

$TETSUO ready to make history

Jia Zhen1 year ago

TITS UP TETSUOOOOOOO

tetsuo.ai1 year ago

😂

RPS_Crypto1 year ago

$TETSUO 💎

Related Videos

building an audio similarity search engine.. given an input track it will find others that sound similar using vector embeddings from a machine learning model with an index of 1.1 million electronic music tracks

building an audio similarity search engine.. given an input track it will find others that sound similar using vector embeddings from a machine learning model with an index of 1.1 million electronic music tracks

hurf

308,742 views • 2 years ago

E-2 Thrust Vector Control testing is underway! This test includes frequency sweeps starting at a maximum gimbal angle of ±7 degrees, using hardware from a previous engine revision. After completing additional testing on the component stand, the actuator will be deployed to the test site for its first gimballed test fire.

E-2 Thrust Vector Control testing is underway! This test includes frequency sweeps starting at a maximum gimbal angle of ±7 degrees, using hardware from a previous engine revision. After completing additional testing on the component stand, the actuator will be deployed to the test site for its first gimballed test fire.

LΛUNCHER

73,012 views • 3 months ago

notes on my reinforcement learning for rocket league side quest so far: i’m in awe that — we live in a time where i can train a reinforcement learning model that is learning to play rocket league and at the same time, test community made rocket league models. where, both processes can run in parallel & locally on my hardware. in the game window — i’m playing against a reinforcement learning ppo model that’s running locally. and in the vscode window — i’m training my own reinforcement learning model (just a dummy example script for now). also, the rocket league botting ecosystem has incredible tooling where you can test out different reinforcement learning ideas against other algorithms + it has great documentation for linux and windows. it’s honestly a beautiful hidden gem of a rabbit hole :)

notes on my reinforcement learning for rocket league side quest so far: i’m in awe that — we live in a time where i can train a reinforcement learning model that is learning to play rocket league and at the same time, test community made rocket league models. where, both processes can run in parallel & locally on my hardware. in the game window — i’m playing against a reinforcement learning ppo model that’s running locally. and in the vscode window — i’m training my own reinforcement learning model (just a dummy example script for now). also, the rocket league botting ecosystem has incredible tooling where you can test out different reinforcement learning ideas against other algorithms + it has great documentation for linux and windows. it’s honestly a beautiful hidden gem of a rabbit hole :)

naklecha

46,302 views • 1 year ago

Robot policies must be both reliable and highly capable to be useful; the best way to achieve this level of performance is with reinforcement learning. However, for reinforcement learning you are usually stuck between two difficult options: reinforcement in the real world is often risky and expensive, while reinforcement learning in a traditional simulator takes a lot of engineering work and has a persistent sim-to-real gap. What if instead you could train your robot purely in a world model? RISE by Jiazhi Yang et al. uses a compositional world model to predict the future and evaluate progress. This allows for a self-improving pipeline, which learns a world model from real data and then learns how the robot should perform different tasks. This pipeline results in a data-driven way to improve policy performance from real data but without real-world reinforcement learning. Watch Episode #86 of RoboPapers, with Chris Paxton and Jiafei Duan, to learn more!

Robot policies must be both reliable and highly capable to be useful; the best way to achieve this level of performance is with reinforcement learning. However, for reinforcement learning you are usually stuck between two difficult options: reinforcement in the real world is often risky and expensive, while reinforcement learning in a traditional simulator takes a lot of engineering work and has a persistent sim-to-real gap. What if instead you could train your robot purely in a world model? RISE by Jiazhi Yang et al. uses a compositional world model to predict the future and evaluate progress. This allows for a self-improving pipeline, which learns a world model from real data and then learns how the robot should perform different tasks. This pipeline results in a data-driven way to improve policy performance from real data but without real-world reinforcement learning. Watch Episode #86 of RoboPapers, with Chris Paxton and Jiafei Duan, to learn more!

RoboPapers

38,334 views • 1 month ago

BYD just rolled out its new God’s Eye 5.0 update in China, which now builds on the end-to-end architecture and “reinforcement learning”. BYD says this software is now intended to feel more “human-like” than previous software versions.

BYD just rolled out its new God’s Eye 5.0 update in China, which now builds on the end-to-end architecture and “reinforcement learning”. BYD says this software is now intended to feel more “human-like” than previous software versions.

Nic Cruz Patane

36,622 views • 6 months ago

New Course: Reinforcement Fine-Tuning LLMs with GRPO! Learn to use reinforcement learning to improve your LLM performance in this short course, built in collaboration with Predibase by Rubrik, and taught by Travis Addair, its Co-Founder and CTO, and Arnav Garg, its Senior Engineer and Machine Learning Lead. Reasoning models have been one of the most important developments in LLMs. Reinforcement Fine-Tuning (RFT) uses rewards to encourage LLMs to find solutions to multi-step reasoning tasks such as solving math problems and debugging code - without needing pre-existing training examples like in traditional supervised fine-tuning. Group Relative Policy Optimization (GRPO) is a reinforcement fine-tuning algorithm gaining rapid adoption. Developed by the DeepSeek team and used to train the R1 reasoning model, GRPO uses reward functions that you can write in Python to assign rewards to model responses. It’s beneficial for tasks with verifiable outcomes and can work well even with fewer than 100 training examples. It can also significantly improve the reasoning ability of smaller LLMs, making applications faster and more cost effective. In this course, you’ll take a technical deep dive into RFT with GRPO. You’ll learn to build reward functions that you can use in the GRPO training process to guide an LLM toward better performance on multi-step reasoning tasks. In detail, you’ll: - Learn when reinforcement fine-tuning is a better fit than supervised fine-tuning, especially for tasks involving multi-step reasoning or limited labeled data. - Understand how GRPO uses programmable reward functions as a more scalable alternative to the human feedback required for other reinforcement learning algorithms, such as RLHF and DPO. - Frame the Wordle game as a reinforcement fine-tuning problem and see how an LLM can learn to plan, analyze feedback, and improve its strategy over time. - Design reward functions that power the reinforcement fine-tuning process. - Learn techniques for evaluating more subjective tasks, such as rating the quality of a text summary, using an LLM as a judge. - Understand why reward hacking happens and how to avoid it by adding penalty functions to discourage undesirable behaviors. - Learn the four key components of the loss calculation in the GRPO algorithm: token probability distribution ratios, advantages, clipping, and KL-divergence. - Launch reinforcement fine-tuning jobs using Predibase’s hosted training services. By the end of this course, you’ll be able to build and fine-tune LLMs using reinforcement learning to improve reasoning without relying on large labeled datasets or subjective human feedback. Please sign up here:

New Course: Reinforcement Fine-Tuning LLMs with GRPO! Learn to use reinforcement learning to improve your LLM performance in this short course, built in collaboration with Predibase by Rubrik, and taught by Travis Addair, its Co-Founder and CTO, and Arnav Garg, its Senior Engineer and Machine Learning Lead. Reasoning models have been one of the most important developments in LLMs. Reinforcement Fine-Tuning (RFT) uses rewards to encourage LLMs to find solutions to multi-step reasoning tasks such as solving math problems and debugging code - without needing pre-existing training examples like in traditional supervised fine-tuning. Group Relative Policy Optimization (GRPO) is a reinforcement fine-tuning algorithm gaining rapid adoption. Developed by the DeepSeek team and used to train the R1 reasoning model, GRPO uses reward functions that you can write in Python to assign rewards to model responses. It’s beneficial for tasks with verifiable outcomes and can work well even with fewer than 100 training examples. It can also significantly improve the reasoning ability of smaller LLMs, making applications faster and more cost effective. In this course, you’ll take a technical deep dive into RFT with GRPO. You’ll learn to build reward functions that you can use in the GRPO training process to guide an LLM toward better performance on multi-step reasoning tasks. In detail, you’ll: - Learn when reinforcement fine-tuning is a better fit than supervised fine-tuning, especially for tasks involving multi-step reasoning or limited labeled data. - Understand how GRPO uses programmable reward functions as a more scalable alternative to the human feedback required for other reinforcement learning algorithms, such as RLHF and DPO. - Frame the Wordle game as a reinforcement fine-tuning problem and see how an LLM can learn to plan, analyze feedback, and improve its strategy over time. - Design reward functions that power the reinforcement fine-tuning process. - Learn techniques for evaluating more subjective tasks, such as rating the quality of a text summary, using an LLM as a judge. - Understand why reward hacking happens and how to avoid it by adding penalty functions to discourage undesirable behaviors. - Learn the four key components of the loss calculation in the GRPO algorithm: token probability distribution ratios, advantages, clipping, and KL-divergence. - Launch reinforcement fine-tuning jobs using Predibase’s hosted training services. By the end of this course, you’ll be able to build and fine-tune LLMs using reinforcement learning to improve reasoning without relying on large labeled datasets or subjective human feedback. Please sign up here:

Andrew Ng

86,457 views • 1 year ago

What you’re seeing Is the truffle learning To express itself Through light Every input (user keystroke, prompt, query) Produces embeddings That are live-diffused To the LED’s The truffle runs a diffusion model In parallel To express itself to the world

What you’re seeing Is the truffle learning To express itself Through light Every input (user keystroke, prompt, query) Produces embeddings That are live-diffused To the LED’s The truffle runs a diffusion model In parallel To express itself to the world

simp 4 satoshi

19,542 views • 9 months ago

Reinforcement learning is used to speed the production of behavior for the Boston Dynamics Atlas humanoid robot. At the heart of the learning process is a physics-based simulator that generates training data for a variety of maneuvers.

Reinforcement learning is used to speed the production of behavior for the Boston Dynamics Atlas humanoid robot. At the heart of the learning process is a physics-based simulator that generates training data for a variety of maneuvers.

RAI Institute

76,634 views • 1 year ago

New short course on Building Applications with Vector Databases, taught by Pinecone’s Tim Tully! At the heart of a vector database is the ability to store a collection of vectors and then query against that, meaning input a new vector and find similar ones. This is useful for many AI applications. In this course, you'll learn how to use vector databases to build: (i) Semantic Search: Create a text search tool that goes beyond keyword matching, and instead focuses on the meaning of content. (ii) RAG (retrieval augmented generation): Enhance your LLM output by incorporating context from sources the model wasn't trained on. (iii) Recommender System: Combine semantic search and RAG to recommend topics, and demonstrate it with a news article recommender. (iv) Hybrid Search: Build an application that finds items using both images and descriptive text -- by combining both sparse and dense vector representations of the data -- using an eCommerce dataset as an example. (v) Image Similarity: Use image vector embeddings to create an app to compare facial features, using a database of public figures to determine the likeness between them. (vi) Anomaly Detection: Build an anomaly detection app that identifies unusual patterns in network communication logs. I hope you’ll enjoy learning how to build all these types of applications! Please sign up here:

New short course on Building Applications with Vector Databases, taught by Pinecone’s Tim Tully! At the heart of a vector database is the ability to store a collection of vectors and then query against that, meaning input a new vector and find similar ones. This is useful for many AI applications. In this course, you'll learn how to use vector databases to build: (i) Semantic Search: Create a text search tool that goes beyond keyword matching, and instead focuses on the meaning of content. (ii) RAG (retrieval augmented generation): Enhance your LLM output by incorporating context from sources the model wasn't trained on. (iii) Recommender System: Combine semantic search and RAG to recommend topics, and demonstrate it with a news article recommender. (iv) Hybrid Search: Build an application that finds items using both images and descriptive text -- by combining both sparse and dense vector representations of the data -- using an eCommerce dataset as an example. (v) Image Similarity: Use image vector embeddings to create an app to compare facial features, using a database of public figures to determine the likeness between them. (vi) Anomaly Detection: Build an anomaly detection app that identifies unusual patterns in network communication logs. I hope you’ll enjoy learning how to build all these types of applications! Please sign up here:

Andrew Ng

137,091 views • 2 years ago

[RLHF] by Hand ✍️ Yesterday, Jan Leike (Jan Leike) announced he is joining #Anthropic to lead their "super-alignment" mission. He is the co-inventor of Reinforcement Learning with Human Feedback (#RLHF). How does RLHF work? [1] Given ↳ Reward Model (RM) ↳ Large Language Model (LLM) ↳ Two (Prompt, Next) Pairs 🟪 TRAIN RM Goal: Learn to give higher rewards to winners [2] Preferences ↳ A human reviews the two pairs and picks a "winner" ↳ (doc is, him) Embeddings ↳ This prompt has never received human feedback directly ↳ [S] is the special start symbol [11] Transformer ↳ Attention (yellow) ↳ Feed Forward (4x2 weight and bias matrix) ↳ Output: 3 "transformed" feature vector, one per position ↳ More details in my previous post 8. Transformer [] [12] Output Probabilities ↳ Apply a linear layer to map each transformed feature vector to a probability distribution over the vocabulary. [13] Sample ↳ Apply the greedy method, which is to pick the word with the highest score ↳ For output 1 and 2, the model accurately predicts the next word ↳ For 3rd output position, the model's predicts "him" [14] Reward Model ↳ The new pair (CEO is, him) is fed to the reward model ↳ The process is same as [3]-[6] ↳ Output: Reward = 3 [15] Loss Gradient ↳ We set the loss as the negative of the reward. ↳ The loss gradient is simply a constant -1. ↳ Run backpropagation and gradient descent to update LLM's weights and biases (red border)

[RLHF] by Hand ✍️ Yesterday, Jan Leike (Jan Leike) announced he is joining #Anthropic to lead their "super-alignment" mission. He is the co-inventor of Reinforcement Learning with Human Feedback (#RLHF). How does RLHF work? [1] Given ↳ Reward Model (RM) ↳ Large Language Model (LLM) ↳ Two (Prompt, Next) Pairs 🟪 TRAIN RM Goal: Learn to give higher rewards to winners [2] Preferences ↳ A human reviews the two pairs and picks a "winner" ↳ (doc is, him) Embeddings ↳ This prompt has never received human feedback directly ↳ [S] is the special start symbol [11] Transformer ↳ Attention (yellow) ↳ Feed Forward (4x2 weight and bias matrix) ↳ Output: 3 "transformed" feature vector, one per position ↳ More details in my previous post 8. Transformer [] [12] Output Probabilities ↳ Apply a linear layer to map each transformed feature vector to a probability distribution over the vocabulary. [13] Sample ↳ Apply the greedy method, which is to pick the word with the highest score ↳ For output 1 and 2, the model accurately predicts the next word ↳ For 3rd output position, the model's predicts "him" [14] Reward Model ↳ The new pair (CEO is, him) is fed to the reward model ↳ The process is same as [3]-[6] ↳ Output: Reward = 3 [15] Loss Gradient ↳ We set the loss as the negative of the reward. ↳ The loss gradient is simply a constant -1. ↳ Run backpropagation and gradient descent to update LLM's weights and biases (red border)

Tom Yeh

79,758 views • 2 years ago

Using reinforcement learning, we trained policies for Boston Dynamics Spot that allow the robot to achieve record running speeds of 11.5 mph (5.2 m/s) — over three times faster than Spot's default max speed.

Using reinforcement learning, we trained policies for Boston Dynamics Spot that allow the robot to achieve record running speeds of 11.5 mph (5.2 m/s) — over three times faster than Spot's default max speed.

RAI Institute

98,538 views • 1 year ago

New week, new Solara release 🚀! We're excited to introduce Solara 1.16.0. The standout feature? The effortless integration between #VueJS and #Python. Watch the demo to see how to create a new dashboard component with minimal setup. #Solara #DataScience

New week, new Solara release 🚀! We're excited to introduce Solara 1.16.0. The standout feature? The effortless integration between #VueJS and #Python. Watch the demo to see how to create a new dashboard component with minimal setup. #Solara #DataScience

Maarten A. Breddels

15,165 views • 3 years ago

A Switzerland-based startup Flexion has created a robotic brain that helps the Unitree G1 move smoothly and work on its own. It uses reinforcement learning, where the robot trains in simulations to learn walking, balancing, and picking objects. In tests, it cleaned a space by finding and placing items in a basket without human help.

A Switzerland-based startup Flexion has created a robotic brain that helps the Unitree G1 move smoothly and work on its own. It uses reinforcement learning, where the robot trains in simulations to learn walking, balancing, and picking objects. In tests, it cleaned a space by finding and placing items in a basket without human help.

Space and Technology

17,557 views • 3 months ago

🔧 Let your coding agent identify and fix INP issues for you → With Chrome DevTools MCP, your coding agent can measure performance for specific user interactions to find and resolve slow UI responses - on its own with the help of the browser!

🔧 Let your coding agent identify and fix INP issues for you → With Chrome DevTools MCP, your coding agent can measure performance for specific user interactions to find and resolve slow UI responses - on its own with the help of the browser!

Chrome for Developers

16,696 views • 9 months ago

OpenAI's Kevin Weil 🇺🇸 says 24/7 robotic labs could automate scientific discovery using "reinforcement learning with a loop through the real world": "There’s a lot of science that can be totally automated. There’s no reason at this point that you need to have grad students pipetting one thing into another thing." "The idea is to have robotic labs that are online 24/7 and can scale in parallel. You have models reasoning for two days to find the most efficient experiments to run, once they get to a good point, they pass that to a robotic lab which can experiment in parallel at high volume." "The results pass back into a model which reasons about the results and then goes out and runs a different set of experiments. You’re doing reinforcement learning with a loop through the real world."

OpenAI's Kevin Weil 🇺🇸 says 24/7 robotic labs could automate scientific discovery using "reinforcement learning with a loop through the real world": "There’s a lot of science that can be totally automated. There’s no reason at this point that you need to have grad students pipetting one thing into another thing." "The idea is to have robotic labs that are online 24/7 and can scale in parallel. You have models reasoning for two days to find the most efficient experiments to run, once they get to a good point, they pass that to a robotic lab which can experiment in parallel at high volume." "The results pass back into a model which reasons about the results and then goes out and runs a different set of experiments. You’re doing reinforcement learning with a loop through the real world."

TBPN

98,711 views • 6 months ago

OpenAI's Kevin Weil 🇺🇸 says 24/7 robotic labs could automate scientific discovery using "reinforcement learning with a loop through the real world": "There’s a lot of science that can be totally automated. There’s no reason at this point that you need to have grad students pipetting one thing into another thing." "The idea is to have robotic labs that are online 24/7 and can scale in parallel. You have models reasoning for two days to find the most efficient experiments to run, once they get to a good point, they pass that to a robotic lab which can experiment in parallel at high volume." "The results pass back into a model which reasons about the results and then goes out and runs a different set of experiments. You’re doing reinforcement learning with a loop through the real world."

OpenAI's Kevin Weil 🇺🇸 says 24/7 robotic labs could automate scientific discovery using "reinforcement learning with a loop through the real world": "There’s a lot of science that can be totally automated. There’s no reason at this point that you need to have grad students pipetting one thing into another thing." "The idea is to have robotic labs that are online 24/7 and can scale in parallel. You have models reasoning for two days to find the most efficient experiments to run, once they get to a good point, they pass that to a robotic lab which can experiment in parallel at high volume." "The results pass back into a model which reasons about the results and then goes out and runs a different set of experiments. You’re doing reinforcement learning with a loop through the real world."

TBPN

29,721 views • 5 months ago