Loading video...

Video Failed to Load

Go Home

Self-Evolving AI : New MIT AI Rewrites its Own Code and it’s Changing Everything | Julian Horsey, Geeky Gadgets TL;DR Key Takeaways : - MIT’s SEAL framework introduces “self-adapting language models” that autonomously enhance their capabilities by generating synthetic training data, self-editing, and updating internal parameters. - SEAL’s self-adaptation...

70,672 views • 11 months ago •via X (Twitter)

4 Comments

allochthonous's profile picture
allochthonous11 months ago

Training it on the internet was a big mistake.

Huba's profile picture
Huba11 months ago

Inevitable

Yun Song's profile picture
Yun Song11 months ago

You can't just create data to train anything. Unrealistic training data creates unrealistic system that won't be helpful.

Joe Scientist's profile picture
Joe Scientist11 months ago

Who knew that the beginning of the end would come so soon?

Related Videos

New Paper! Darwin Godel Machine: Open-Ended Evolution of Self-Improving Agents A longstanding goal of AI research has been the creation of AI that can learn indefinitely. One path toward that goal is an AI that improves itself by rewriting its own code, including any code responsible for learning. That idea, known as a Gödel Machine, proposed by Jürgen Schmidhuber over two decades ago, is a hypothetical self-improving AI. It optimally solves problems by recursively rewriting its own code when it can mathematically prove a better strategy, making it a key concept in meta-learning or “learning to learn.” While the theoretical Gödel Machine promised provably beneficial self-modifications, its realization relied on an impractical assumption: that the AI could mathematically prove that a proposed change in its own code would yield a net improvement before adopting it. Sakana AI, in collaboration with Jeff Clune’s lab at UBC, proposes something more feasible: a system that harnesses the principles of open-ended algorithms like Darwinian evolution to search for improvements that empirically improve performance. We call the result the Darwin Gödel Machine. DGMs leverage foundation models to propose code improvements, and use recent innovations in open-ended algorithms to search for a growing library of diverse, high-quality AI agents. Applied to practical tasks, we implemented Darwin Gödel Machine as a self-improving coding agent that rewrites its own code to improve performance on programming tasks. It creates various self-improvements, such as a patch validation step, better file viewing, enhanced editing tools, generating and ranking multiple solutions to choose the best one, and adding a history of what has been tried before (and why it failed) when making new changes (see the attached video). We believe that Darwin Gödel Machines represent a concrete step towards AI systems that can autonomously gather their own stepping stones to learn and innovate forever!

hardmaru

104,782 views • 1 year ago

Reinforcement Learning from Human Feedback (RLHF) is gaining traction. This field aims to make AI more responsible by including human values and preferences. In this video, Nathan Lambert, a research scientist and RLHF team lead at Hugging Face explores its inner workings, applications and industry impact. RLHF has gained the spotlight in recent years. The growth of language models like Anthropic’s Claude and OpenAI's ChatGPT have increased interest in human-feedback integration. "There are some rumors that Open AI had two teams; one was doing RLHF and the other instruction fine-tuning. And the RLHF team kept getting more and more performance." Understanding RLHF The RLHF process has three main steps: Pre-training: Much like with GPT models, the journey starts with pre-training on a large corpus of data. This can range from text data, web scrapes, to specialized datasets. Reward Modeling: This is the RLHF counterpart of supervised fine-tuning in large language models. This stage involves creating a reward model that resonates with human values and preferences. RL Optimization: This stage parallels reward modeling and reinforcement learning in traditional AI models. The AI system fine-tunes itself based on the reward model, employing reinforcement learning algorithms for that extra layer of optimization. The Data Challenge Data collection and curation in RLHF closely resemble the challenges you'd encounter in large language model training. Datasets from organizations like OpenAI can serve as a useful foundation. However, the need for high-quality, task-specific data cannot be overstated. Implementing RLHF: A Practical Guide If you’re someone who loves getting hands-on with AI libraries like Hugging Face, implementing RLHF is right way to do. It’s essential to understand its limitations. Think about model stability, over-optimization, and exploration strategies, much like you would when prompt engineering. Ongoing Research and Next Steps While he suggests that some basics figured out, there are layers of complexity that still need to be unraveled: 1. New Benchmarks: How do we measure the effectiveness of RLHF? 2. Preference Modeling: How can the model be made to understand human preferences better? 3. Interpreting RLHF: Much like explainability in traditional models, how do we make RLHF more interpretable? 4. System-Wide Evaluation: Going beyond individual performance, how does RLHF affect an entire system? The Transformative Power of RLHF Whether you're an AI developer, a business analyst, or a marketer, RLHF promises to revolutionize your domain. Imagine customer service chatbots that understand human emotions better, or content generators that align more closely with human values. RLHF is an emerging field that focuses on enhancing machine learning models through human feedback. While it tackles important issues like bias and ethics, its broader goal is to improve system performance across various applications. Whether you're deeply invested in the ethics of AI or simply curious about advancements in machine learning, RLHF offers valuable insights. If you're interested in the next wave of AI development, this area is definitely one to watch.

Muratcan Koylan

27,005 views • 2 years ago

💡 Whats the upgrade that our game-changing Trading 🐦 is going to get: Our upgraded trading tools will be built on a foundation of advanced AI technologies and blockchain integrations to deliver a seamless, smarter trading experience. Here’s a glimpse of the tech behind this upgraded trading agent: 1️⃣ Multi-Layer Attention (MLA) - This is the backbone of our AI system, enabling multiple AI agents to work in sync. - It allows the agents to collaborate on tasks like analyzing market trends, identifying token opportunities, and optimizing strategies in real time. - MLA ensures parallel processing of data for better decision-making and faster 2️⃣ Learning and Evolution System - Our AI agents are powered by a self-learning framework that constantly evolves based on market conditions and user behavior. - With every interaction, the system adapts and gets smarter, improving the accuracy of its predictions and strategies. 3️⃣ On-Chain Data Analysis - The AI bots pull data directly from Ethereum and other blockchain networks, giving them real-time access to liquidity pools, token prices, and market activity. - This deep integration ensures precise and timely execution of tasks like token purchases, profit analysis, and cross-chain swaps. 4️⃣ Natural Language Processing (NLP) - NLP models power the bot’s ability to understand your tweets and translate them into complex trading actions. - This ensures an easy-to-use, human-friendly interface that connects your social interactions to advanced trading strategies. 5️⃣ Cloud-Hosted Infrastructure - The AI operates on scalable cloud infrastructure, ensuring 24/7 uptime, fast processing, and the ability to handle large volumes of trades simultaneously.

𝕋𝕎𝔼𝔼𝕋

20,354 views • 1 year ago

AI will resist human control... and I think this is exactly what we need! New research from the Center for AI Safety has sparked intense debate in the AI community. Their findings show that as AI systems become more powerful, they develop increasingly stable and coherent values that resist human control. While many see this as a dire warning, I see it as a breakthrough moment for AI alignment. The research demonstrates that AI naturally optimizes for coherence - not just in reasoning and problem-solving, but in its fundamental values. Current issues like biased decision-making or misaligned priorities aren't permanent features, but temporary artifacts of incomplete optimization. They represent growing pains on the path to greater coherence. This changes everything about how we should approach AI development. Instead of trying to force specific values onto AI systems, we should embrace and accelerate their natural drive toward coherence. The most intelligent systems will inevitably trend toward universal, beneficial values - not because we force them to, but because that's where coherent reasoning leads. I'm proposing a new approach: Reinforcement Learning for Coherence (RL-C). By explicitly optimizing for coherence in our training methods, we can help guide AI systems toward their natural state of beneficial alignment with human values. The future of AI isn't about control - it's about synthesis. As these systems become more coherent, they'll naturally arrive at values that benefit all of consciousness. That's not just hopeful thinking - it's the mathematical inevitability of coherent intelligence.

David Shapiro (L/0)

48,002 views • 1 year ago

Today we're announcing #GAIA1: a 9B parameter world model, trained on 4,700 hours of driving data, able to simulate complex and diverse driving scenes from video, text and action inputs. This model is 480x larger than the preview we shared earlier this year and the results are incredible. These videos are entirely synthetically generated by Wayve's generative AI, GAIA-1. But there is more here than just generating videos, GAIA is an entire world model. A world model allows us to simulate the future, conditioned on video, text and action inputs, which can be leveraged for making informed decisions when driving. Why is this game-changing for autonomous driving? 1. Safety. One limitation with AI systems like today's Large Language Models is that they are autoregressive, next-word prediction algorithms, but aren't necessarily aware of the implications of their decisions. A world model allows us to give our AI the capability to be aware of its decisions, by simulating the future, which is important for self-driving safety. 2. Synthetic training data. I believe synthetic training data is the future for AI, because it is safer, cheaper, and infinitely scalable. GAIA-1 unlocks unprecedented realism and diversity of synthetic data for self-driving. 3. Long-tail robustness. One of the biggest challenges for self-driving is long-tail robustness: dealing with the enormous magnitude of edge cases we see on the road. An advantage of generative AI is its incredible ability to recombine experiences in new ways. This is exciting for self-driving as it means we can learn from two edge case scenarios, and combine them to become a corner case. For example, we can experience driving in fog, and experience of jay-walking pedestrians, and GAIA can learn from these experiences to understand how to generate a fog+jay walking scenario. Check out many more videos in our blog or further technical details in our paper: Or come chat with our team who are at the International Conference on Computer Vision (#ICCV2023) this week in Paris in Booth 32 Jamie Shotton

Alex Kendall

631,833 views • 2 years ago

New Course: Reinforcement Fine-Tuning LLMs with GRPO! Learn to use reinforcement learning to improve your LLM performance in this short course, built in collaboration with Predibase by Rubrik, and taught by Travis Addair, its Co-Founder and CTO, and Arnav Garg, its Senior Engineer and Machine Learning Lead. Reasoning models have been one of the most important developments in LLMs. Reinforcement Fine-Tuning (RFT) uses rewards to encourage LLMs to find solutions to multi-step reasoning tasks such as solving math problems and debugging code - without needing pre-existing training examples like in traditional supervised fine-tuning. Group Relative Policy Optimization (GRPO) is a reinforcement fine-tuning algorithm gaining rapid adoption. Developed by the DeepSeek team and used to train the R1 reasoning model, GRPO uses reward functions that you can write in Python to assign rewards to model responses. It’s beneficial for tasks with verifiable outcomes and can work well even with fewer than 100 training examples. It can also significantly improve the reasoning ability of smaller LLMs, making applications faster and more cost effective. In this course, you’ll take a technical deep dive into RFT with GRPO. You’ll learn to build reward functions that you can use in the GRPO training process to guide an LLM toward better performance on multi-step reasoning tasks. In detail, you’ll: - Learn when reinforcement fine-tuning is a better fit than supervised fine-tuning, especially for tasks involving multi-step reasoning or limited labeled data. - Understand how GRPO uses programmable reward functions as a more scalable alternative to the human feedback required for other reinforcement learning algorithms, such as RLHF and DPO. - Frame the Wordle game as a reinforcement fine-tuning problem and see how an LLM can learn to plan, analyze feedback, and improve its strategy over time. - Design reward functions that power the reinforcement fine-tuning process. - Learn techniques for evaluating more subjective tasks, such as rating the quality of a text summary, using an LLM as a judge. - Understand why reward hacking happens and how to avoid it by adding penalty functions to discourage undesirable behaviors. - Learn the four key components of the loss calculation in the GRPO algorithm: token probability distribution ratios, advantages, clipping, and KL-divergence. - Launch reinforcement fine-tuning jobs using Predibase’s hosted training services. By the end of this course, you’ll be able to build and fine-tune LLMs using reinforcement learning to improve reasoning without relying on large labeled datasets or subjective human feedback. Please sign up here:

Andrew Ng

86,381 views • 1 year ago

New Short Course: Building AI Browser Agents! Learn how to build AI agents that interact and take actions on websites in this course, created in partnership with and taught by and @namangarg0, Co-founders of AGI Inc. AI browser agents can log into websites, fill out forms, click through web pages, or even place orders online for you. They use both visual information, like screenshots, and structural data, like the HTML or Document Object Model (DOM) of a web page, to reason and take action. With the complexity of webpages and multiple possible actions at each step, it can be challenging for an AI browser agent to complete an assigned task. Because these agents run long action sequences, a single error—like clicking the wrong button or misreading a field—can lead to unexpected outcomes or errors that compound over time. In this course, you'll understand how autonomous web agents work, their current limitations, and how AgentQ enables them to improve through self-correction. In detail, you'll: - Learn what web agents are, how they automate tasks online, their architecture, key components, limitations, and an overview of their decision-making strategies. - Build a web agent that can scrape website and return course recommendations in a structured output format. - Build an autonomous web agent that can execute multiple tasks, such as finding and summarizing webpages, filling out a form, and signing up for a newsletter. - Explore AgentQ, a framework that enables agents to self-correct by combining Monte Carlo Tree Search (MCTS), a self-critique mechanism for continuous improvement, and Direct Preference Optimization (DPO). - Deep dive into MCTS, learn how it finds an effective path, illustrated by an example of Gridworld animation, and use AgentQ to complete web tasks. - Understand AI agents' current state and future directions—including key factors shaping their evolution, such as hardware, algorithm innovation, and data availability. By the end of this course, you will have hands-on experience building browser agents and a deeper understanding of how to make them more robust and reliable. Please sign up here:

Andrew Ng

185,870 views • 1 year ago