正在加载视频...

视频加载失败

加载此视频时出现问题。这可能是由于临时网络问题，或视频可能不可用。

Self-Evolving AI : New MIT AI Rewrites its Own Code and it’s Changing Everything | Julian Horsey, Geeky Gadgets TL;DR Key Takeaways : - MIT’s SEAL framework introduces “self-adapting language models” that autonomously enhance their capabilities by generating synthetic training data, self-editing, and updating internal parameters. - SEAL’s self-adaptation... process mirrors human learning, allowing continuous improvement and dynamic adaptation to new tasks without relying on external datasets. - Reinforcement learning serves as a feedback mechanism in SEAL, rewarding effective self-edits and making sure sustained progress and goal alignment. SEAL overcomes AI’s reliance on pre-existing datasets by generating its own training material, excelling in long-term task retention and complex problem-solving scenarios. - Potential applications of SEAL include autonomous robotics, personalized education, and advanced problem-solving in fields like healthcare, logistics, and scientific research. --- What if artificial intelligence could not only learn but also rewrite its own code to become smarter over time? This is no longer a futuristic fantasy—MIT’s new “self-adapting language models” (SEAL) framework has made it a reality. Unlike traditional AI systems that rely on external datasets and human intervention to improve, SEAL takes a bold leap forward by autonomously generating its own training data and refining its internal processes. In essence, this AI doesn’t just evolve—it rewires itself, mirroring the way humans adapt through trial, error, and self-reflection. The implications are staggering: a system that can independently enhance its capabilities could redefine the boundaries of what AI can achieve, from solving complex problems to adapting in real time to unforeseen challenges. In this exploration by Wes Roth of MIT’s innovative SEAL framework, you’ll uncover how this self-improving AI works and why it’s a fantastic option for the field of artificial intelligence. From its ability to overcome the “data wall” that limits many current systems to its use of reinforcement learning as a feedback mechanism, SEAL introduces a level of autonomy and adaptability that was previously unimaginable. Imagine AI systems that can retain knowledge over time, dynamically adjust to new tasks, and operate with minimal human oversight. Whether you’re intrigued by its potential for autonomous robotics, personalized education, or advanced problem-solving, SEAL’s ability to rewrite its own rules promises to reshape the future of technology. Could this be the first step toward truly independent, self-evolving AI? What Sets SEAL Apart? The SEAL framework introduces a novel concept of self-adaptation, distinguishing it from traditional AI models. Unlike conventional systems that depend on external datasets for updates, SEAL enables AI to generate synthetic training data independently. This self-generated data is then used to iteratively refine the model, making sure continuous improvement. By persistently updating its internal parameters, SEAL enables AI systems to dynamically adapt to new tasks and inputs. To better illustrate this, consider how humans learn. When faced with a new concept, you might take notes, revisit them, and refine your understanding as you gather more information. SEAL mirrors this process by continuously refining its internal knowledge and performance through iterative self-improvement. This capability allows SEAL to evolve in real time, making it uniquely suited for tasks requiring adaptability and long-term learning. The Role of Reinforcement Learning in SEAL Reinforcement learning plays a critical role in the SEAL framework, acting as a feedback mechanism that evaluates the effectiveness of the model’s self-edits. It rewards changes that enhance performance, creating a cycle of continuous improvement. Over time, this feedback loop optimizes the system’s ability to generate and apply edits, making sure sustained progress. This process is analogous to how humans learn through trial and error. By rewarding effective changes, SEAL aligns its self-generated data and edits with desired outcomes. The integration of reinforcement learning not only enhances the system’s adaptability but also ensures it remains focused on achieving specific goals. This structured feedback mechanism is a cornerstone of SEAL’s ability to refine itself autonomously and efficiently. Real-World Applications and Testing SEAL has demonstrated remarkable performance across various applications, particularly in tasks requiring the integration of factual knowledge and advanced question-answering capabilities. For instance, when tested on benchmarks like the ARC AGI, SEAL outperformed other models by effectively generating and using synthetic data. This ability to create its own training material addresses a significant limitation of current AI systems: their reliance on pre-existing datasets. SEAL’s capacity for long-term task retention and dynamic adaptation further enhances its utility. It excels in scenarios that demand sustained focus and coherence, such as answering complex questions or adapting to evolving objectives. By using its iterative learning process, SEAL is equipped to handle these challenges with exceptional efficiency, making it a valuable tool for a wide range of real-world applications. Overcoming AI’s Data Limitations One of SEAL’s most promising features is its ability to overcome the “data wall” that constrains many AI systems today. By generating synthetic data, SEAL ensures a continuous supply of training material, allowing sustained development without relying on external datasets. This capability is particularly valuable for autonomous AI systems that must operate independently over extended periods. Additionally, SEAL addresses a critical weakness in many current AI models: their struggle with coherence and task retention over long durations. By emulating human learning processes, SEAL enables AI systems to manage complex, long-term tasks with minimal human intervention. This ability to retain and apply knowledge over time positions SEAL as a fantastic tool for advancing AI capabilities. Potential Applications and Future Impact The introduction of SEAL marks a significant milestone in AI research, opening new possibilities for self-improving systems. Its ability to dynamically adapt, retain knowledge, and generate its own training data has far-reaching implications for the future of AI development. Potential applications include: - Autonomous robotics: Systems that can adapt to changing environments and perform tasks with minimal human oversight. - Personalized education: AI-driven platforms that tailor learning experiences to individual needs and preferences. - Advanced problem-solving: Applications in fields such as healthcare, logistics, and scientific research, where adaptability and precision are critical. Read more:show more

Owen Gregorian

114,929 subscribers

70,672 次观看 • 1 年前 •via X (Twitter)

Anya Rossi• Live Now

Private livecam show

4 条评论

allochthonous 的头像

allochthonous1 年前

Training it on the internet was a big mistake.

Huba 的头像

Huba1 年前

Inevitable

Yun Song 的头像

Yun Song1 年前

You can't just create data to train anything. Unrealistic training data creates unrealistic system that won't be helpful.

Joe Scientist 的头像

Joe Scientist1 年前

Who knew that the beginning of the end would come so soon?

相关视频

New Paper! Darwin Godel Machine: Open-Ended Evolution of Self-Improving Agents A longstanding goal of AI research has been the creation of AI that can learn indefinitely. One path toward that goal is an AI that improves itself by rewriting its own code, including any code responsible for learning. That idea, known as a Gödel Machine, proposed by Jürgen Schmidhuber over two decades ago, is a hypothetical self-improving AI. It optimally solves problems by recursively rewriting its own code when it can mathematically prove a better strategy, making it a key concept in meta-learning or “learning to learn.” While the theoretical Gödel Machine promised provably beneficial self-modifications, its realization relied on an impractical assumption: that the AI could mathematically prove that a proposed change in its own code would yield a net improvement before adopting it. Sakana AI, in collaboration with Jeff Clune’s lab at UBC, proposes something more feasible: a system that harnesses the principles of open-ended algorithms like Darwinian evolution to search for improvements that empirically improve performance. We call the result the Darwin Gödel Machine. DGMs leverage foundation models to propose code improvements, and use recent innovations in open-ended algorithms to search for a growing library of diverse, high-quality AI agents. Applied to practical tasks, we implemented Darwin Gödel Machine as a self-improving coding agent that rewrites its own code to improve performance on programming tasks. It creates various self-improvements, such as a patch validation step, better file viewing, enhanced editing tools, generating and ranking multiple solutions to choose the best one, and adding a history of what has been tried before (and why it failed) when making new changes (see the attached video). We believe that Darwin Gödel Machines represent a concrete step towards AI systems that can autonomously gather their own stepping stones to learn and innovate forever!

New Paper! Darwin Godel Machine: Open-Ended Evolution of Self-Improving Agents A longstanding goal of AI research has been the creation of AI that can learn indefinitely. One path toward that goal is an AI that improves itself by rewriting its own code, including any code responsible for learning. That idea, known as a Gödel Machine, proposed by Jürgen Schmidhuber over two decades ago, is a hypothetical self-improving AI. It optimally solves problems by recursively rewriting its own code when it can mathematically prove a better strategy, making it a key concept in meta-learning or “learning to learn.” While the theoretical Gödel Machine promised provably beneficial self-modifications, its realization relied on an impractical assumption: that the AI could mathematically prove that a proposed change in its own code would yield a net improvement before adopting it. Sakana AI, in collaboration with Jeff Clune’s lab at UBC, proposes something more feasible: a system that harnesses the principles of open-ended algorithms like Darwinian evolution to search for improvements that empirically improve performance. We call the result the Darwin Gödel Machine. DGMs leverage foundation models to propose code improvements, and use recent innovations in open-ended algorithms to search for a growing library of diverse, high-quality AI agents. Applied to practical tasks, we implemented Darwin Gödel Machine as a self-improving coding agent that rewrites its own code to improve performance on programming tasks. It creates various self-improvements, such as a patch validation step, better file viewing, enhanced editing tools, generating and ranking multiple solutions to choose the best one, and adding a history of what has been tried before (and why it failed) when making new changes (see the attached video). We believe that Darwin Gödel Machines represent a concrete step towards AI systems that can autonomously gather their own stepping stones to learn and innovate forever!

hardmaru

104,854 次观看 • 1 年前

Reinforcement Learning from Human Feedback (RLHF) is gaining traction. This field aims to make AI more responsible by including human values and preferences. In this video, Nathan Lambert, a research scientist and RLHF team lead at Hugging Face explores its inner workings, applications and industry impact. RLHF has gained the spotlight in recent years. The growth of language models like Anthropic’s Claude and OpenAI's ChatGPT have increased interest in human-feedback integration. "There are some rumors that Open AI had two teams; one was doing RLHF and the other instruction fine-tuning. And the RLHF team kept getting more and more performance." Understanding RLHF The RLHF process has three main steps: Pre-training: Much like with GPT models, the journey starts with pre-training on a large corpus of data. This can range from text data, web scrapes, to specialized datasets. Reward Modeling: This is the RLHF counterpart of supervised fine-tuning in large language models. This stage involves creating a reward model that resonates with human values and preferences. RL Optimization: This stage parallels reward modeling and reinforcement learning in traditional AI models. The AI system fine-tunes itself based on the reward model, employing reinforcement learning algorithms for that extra layer of optimization. The Data Challenge Data collection and curation in RLHF closely resemble the challenges you'd encounter in large language model training. Datasets from organizations like OpenAI can serve as a useful foundation. However, the need for high-quality, task-specific data cannot be overstated. Implementing RLHF: A Practical Guide If you’re someone who loves getting hands-on with AI libraries like Hugging Face, implementing RLHF is right way to do. It’s essential to understand its limitations. Think about model stability, over-optimization, and exploration strategies, much like you would when prompt engineering. Ongoing Research and Next Steps While he suggests that some basics figured out, there are layers of complexity that still need to be unraveled: 1. New Benchmarks: How do we measure the effectiveness of RLHF? 2. Preference Modeling: How can the model be made to understand human preferences better? 3. Interpreting RLHF: Much like explainability in traditional models, how do we make RLHF more interpretable? 4. System-Wide Evaluation: Going beyond individual performance, how does RLHF affect an entire system? The Transformative Power of RLHF Whether you're an AI developer, a business analyst, or a marketer, RLHF promises to revolutionize your domain. Imagine customer service chatbots that understand human emotions better, or content generators that align more closely with human values. RLHF is an emerging field that focuses on enhancing machine learning models through human feedback. While it tackles important issues like bias and ethics, its broader goal is to improve system performance across various applications. Whether you're deeply invested in the ethics of AI or simply curious about advancements in machine learning, RLHF offers valuable insights. If you're interested in the next wave of AI development, this area is definitely one to watch.

Reinforcement Learning from Human Feedback (RLHF) is gaining traction. This field aims to make AI more responsible by including human values and preferences. In this video, Nathan Lambert, a research scientist and RLHF team lead at Hugging Face explores its inner workings, applications and industry impact. RLHF has gained the spotlight in recent years. The growth of language models like Anthropic’s Claude and OpenAI's ChatGPT have increased interest in human-feedback integration. "There are some rumors that Open AI had two teams; one was doing RLHF and the other instruction fine-tuning. And the RLHF team kept getting more and more performance." Understanding RLHF The RLHF process has three main steps: Pre-training: Much like with GPT models, the journey starts with pre-training on a large corpus of data. This can range from text data, web scrapes, to specialized datasets. Reward Modeling: This is the RLHF counterpart of supervised fine-tuning in large language models. This stage involves creating a reward model that resonates with human values and preferences. RL Optimization: This stage parallels reward modeling and reinforcement learning in traditional AI models. The AI system fine-tunes itself based on the reward model, employing reinforcement learning algorithms for that extra layer of optimization. The Data Challenge Data collection and curation in RLHF closely resemble the challenges you'd encounter in large language model training. Datasets from organizations like OpenAI can serve as a useful foundation. However, the need for high-quality, task-specific data cannot be overstated. Implementing RLHF: A Practical Guide If you’re someone who loves getting hands-on with AI libraries like Hugging Face, implementing RLHF is right way to do. It’s essential to understand its limitations. Think about model stability, over-optimization, and exploration strategies, much like you would when prompt engineering. Ongoing Research and Next Steps While he suggests that some basics figured out, there are layers of complexity that still need to be unraveled: 1. New Benchmarks: How do we measure the effectiveness of RLHF? 2. Preference Modeling: How can the model be made to understand human preferences better? 3. Interpreting RLHF: Much like explainability in traditional models, how do we make RLHF more interpretable? 4. System-Wide Evaluation: Going beyond individual performance, how does RLHF affect an entire system? The Transformative Power of RLHF Whether you're an AI developer, a business analyst, or a marketer, RLHF promises to revolutionize your domain. Imagine customer service chatbots that understand human emotions better, or content generators that align more closely with human values. RLHF is an emerging field that focuses on enhancing machine learning models through human feedback. While it tackles important issues like bias and ethics, its broader goal is to improve system performance across various applications. Whether you're deeply invested in the ethics of AI or simply curious about advancements in machine learning, RLHF offers valuable insights. If you're interested in the next wave of AI development, this area is definitely one to watch.

Muratcan Koylan

27,005 次观看 • 2 年前

ELON: AI HAS DRAINED HUMAN KNOWLEDGE DRY - NOW DRINKING FROM ITS OWN WELL "You take the entire internet, all books ever written, all the interesting videos, and you distill that down into tokens, essentially bits of information. We've now exhausted all of that; the cumulative sum of human knowledge has been exhausted in AI training. That happened last year. So the only way to supplement that is with synthetic data, where the AI creates. It'll write an essay, or it'll come up with a thesis, and then it will grade itself. It'll go through this process of self-learning with synthetic data." Source: Elon Musk, Mark Penn, January 9th 2025

ELON: AI HAS DRAINED HUMAN KNOWLEDGE DRY - NOW DRINKING FROM ITS OWN WELL "You take the entire internet, all books ever written, all the interesting videos, and you distill that down into tokens, essentially bits of information. We've now exhausted all of that; the cumulative sum of human knowledge has been exhausted in AI training. That happened last year. So the only way to supplement that is with synthetic data, where the AI creates. It'll write an essay, or it'll come up with a thesis, and then it will grade itself. It'll go through this process of self-learning with synthetic data." Source: Elon Musk, Mark Penn, January 9th 2025

Mario Nawfal

1,938,755 次观看 • 9 个月前

AutoGPT might be the next big step in AI. Here's why Karpathy recently said "AutoGPT is the next frontier of prompt engineering" AutoGPT is the equivalent of giving GPT-based models a memory and a body. You can now give a task to an AI agent and have it autonomously come up with a plan, execute on it, browse the web, and use new data to revise the strategy until the task is completed. It can analyze the market and come up with a trading strategy, customer service, marketing, finance, or other tasks that requires continuous updates. There are three components to it: 1. Architecture: It leverages GPT-4 and GPT-3.5 via API. 2. Autonomous Iterations: AutoGPT can refine its outputs by self-critical review, building on its previous work and integrating prompt history for more accurate results. 3. Memory Management: Integration with Pinecone allows for long-term memory storage, enabling context preservation and improved decision-making. 4. Multi-functionality: Capabilities include file manipulation, web browsing, and data retrieval, distinguishing AutoGPT from previous AI advancements by broadening its application scope.

AutoGPT might be the next big step in AI. Here's why Karpathy recently said "AutoGPT is the next frontier of prompt engineering" AutoGPT is the equivalent of giving GPT-based models a memory and a body. You can now give a task to an AI agent and have it autonomously come up with a plan, execute on it, browse the web, and use new data to revise the strategy until the task is completed. It can analyze the market and come up with a trading strategy, customer service, marketing, finance, or other tasks that requires continuous updates. There are three components to it: 1. Architecture: It leverages GPT-4 and GPT-3.5 via API. 2. Autonomous Iterations: AutoGPT can refine its outputs by self-critical review, building on its previous work and integrating prompt history for more accurate results. 3. Memory Management: Integration with Pinecone allows for long-term memory storage, enabling context preservation and improved decision-making. 4. Multi-functionality: Capabilities include file manipulation, web browsing, and data retrieval, distinguishing AutoGPT from previous AI advancements by broadening its application scope.

Lior Alexander

808,102 次观看 • 3 年前

💡 Whats the upgrade that our game-changing Trading 🐦 is going to get: Our upgraded trading tools will be built on a foundation of advanced AI technologies and blockchain integrations to deliver a seamless, smarter trading experience. Here’s a glimpse of the tech behind this upgraded trading agent: 1️⃣ Multi-Layer Attention (MLA) - This is the backbone of our AI system, enabling multiple AI agents to work in sync. - It allows the agents to collaborate on tasks like analyzing market trends, identifying token opportunities, and optimizing strategies in real time. - MLA ensures parallel processing of data for better decision-making and faster 2️⃣ Learning and Evolution System - Our AI agents are powered by a self-learning framework that constantly evolves based on market conditions and user behavior. - With every interaction, the system adapts and gets smarter, improving the accuracy of its predictions and strategies. 3️⃣ On-Chain Data Analysis - The AI bots pull data directly from Ethereum and other blockchain networks, giving them real-time access to liquidity pools, token prices, and market activity. - This deep integration ensures precise and timely execution of tasks like token purchases, profit analysis, and cross-chain swaps. 4️⃣ Natural Language Processing (NLP) - NLP models power the bot’s ability to understand your tweets and translate them into complex trading actions. - This ensures an easy-to-use, human-friendly interface that connects your social interactions to advanced trading strategies. 5️⃣ Cloud-Hosted Infrastructure - The AI operates on scalable cloud infrastructure, ensuring 24/7 uptime, fast processing, and the ability to handle large volumes of trades simultaneously.

💡 Whats the upgrade that our game-changing Trading 🐦 is going to get: Our upgraded trading tools will be built on a foundation of advanced AI technologies and blockchain integrations to deliver a seamless, smarter trading experience. Here’s a glimpse of the tech behind this upgraded trading agent: 1️⃣ Multi-Layer Attention (MLA) - This is the backbone of our AI system, enabling multiple AI agents to work in sync. - It allows the agents to collaborate on tasks like analyzing market trends, identifying token opportunities, and optimizing strategies in real time. - MLA ensures parallel processing of data for better decision-making and faster 2️⃣ Learning and Evolution System - Our AI agents are powered by a self-learning framework that constantly evolves based on market conditions and user behavior. - With every interaction, the system adapts and gets smarter, improving the accuracy of its predictions and strategies. 3️⃣ On-Chain Data Analysis - The AI bots pull data directly from Ethereum and other blockchain networks, giving them real-time access to liquidity pools, token prices, and market activity. - This deep integration ensures precise and timely execution of tasks like token purchases, profit analysis, and cross-chain swaps. 4️⃣ Natural Language Processing (NLP) - NLP models power the bot’s ability to understand your tweets and translate them into complex trading actions. - This ensures an easy-to-use, human-friendly interface that connects your social interactions to advanced trading strategies. 5️⃣ Cloud-Hosted Infrastructure - The AI operates on scalable cloud infrastructure, ensuring 24/7 uptime, fast processing, and the ability to handle large volumes of trades simultaneously.

𝕋𝕎𝔼𝔼𝕋

20,357 次观看 • 1 年前

Jensen Huang on "distillation" On his new interview with axios, he was asked this question "Should open source model companies be allowed to distill closed models" "Distillation—learning from AI, learning from other people, and learning from other sources of knowledge, is fundamental to intelligence. We are constantly learning from other people. I am learning from you through the questions you are asking, and you are learning from me. All day long, we are learning from one another. AI also has to learn from something. The original AI models, whether they were open or closed, were trained on previously created knowledge from the internet. Now, AI is generating more content than humans. In a few more years, the internet could be 99% AI-generated content, and that content will have been created by some form of AI. As a result, AI systems will constantly be distilling knowledge and intelligence from other AI systems. The fact that AI can learn is a good thing. We want AI systems to be intelligent because a smarter AI can also be a safer AI." ---- From "Axios" YouTube channel, (full video link in comment)

Jensen Huang on "distillation" On his new interview with axios, he was asked this question "Should open source model companies be allowed to distill closed models" "Distillation—learning from AI, learning from other people, and learning from other sources of knowledge, is fundamental to intelligence. We are constantly learning from other people. I am learning from you through the questions you are asking, and you are learning from me. All day long, we are learning from one another. AI also has to learn from something. The original AI models, whether they were open or closed, were trained on previously created knowledge from the internet. Now, AI is generating more content than humans. In a few more years, the internet could be 99% AI-generated content, and that content will have been created by some form of AI. As a result, AI systems will constantly be distilling knowledge and intelligence from other AI systems. The fact that AI can learn is a good thing. We want AI systems to be intelligent because a smarter AI can also be a safer AI." ---- From "Axios" YouTube channel, (full video link in comment)

Rohan Paul

434,509 次观看 • 4 天前

AI will resist human control... and I think this is exactly what we need! New research from the Center for AI Safety has sparked intense debate in the AI community. Their findings show that as AI systems become more powerful, they develop increasingly stable and coherent values that resist human control. While many see this as a dire warning, I see it as a breakthrough moment for AI alignment. The research demonstrates that AI naturally optimizes for coherence - not just in reasoning and problem-solving, but in its fundamental values. Current issues like biased decision-making or misaligned priorities aren't permanent features, but temporary artifacts of incomplete optimization. They represent growing pains on the path to greater coherence. This changes everything about how we should approach AI development. Instead of trying to force specific values onto AI systems, we should embrace and accelerate their natural drive toward coherence. The most intelligent systems will inevitably trend toward universal, beneficial values - not because we force them to, but because that's where coherent reasoning leads. I'm proposing a new approach: Reinforcement Learning for Coherence (RL-C). By explicitly optimizing for coherence in our training methods, we can help guide AI systems toward their natural state of beneficial alignment with human values. The future of AI isn't about control - it's about synthesis. As these systems become more coherent, they'll naturally arrive at values that benefit all of consciousness. That's not just hopeful thinking - it's the mathematical inevitability of coherent intelligence.

AI will resist human control... and I think this is exactly what we need! New research from the Center for AI Safety has sparked intense debate in the AI community. Their findings show that as AI systems become more powerful, they develop increasingly stable and coherent values that resist human control. While many see this as a dire warning, I see it as a breakthrough moment for AI alignment. The research demonstrates that AI naturally optimizes for coherence - not just in reasoning and problem-solving, but in its fundamental values. Current issues like biased decision-making or misaligned priorities aren't permanent features, but temporary artifacts of incomplete optimization. They represent growing pains on the path to greater coherence. This changes everything about how we should approach AI development. Instead of trying to force specific values onto AI systems, we should embrace and accelerate their natural drive toward coherence. The most intelligent systems will inevitably trend toward universal, beneficial values - not because we force them to, but because that's where coherent reasoning leads. I'm proposing a new approach: Reinforcement Learning for Coherence (RL-C). By explicitly optimizing for coherence in our training methods, we can help guide AI systems toward their natural state of beneficial alignment with human values. The future of AI isn't about control - it's about synthesis. As these systems become more coherent, they'll naturally arrive at values that benefit all of consciousness. That's not just hopeful thinking - it's the mathematical inevitability of coherent intelligence.

David Shapiro (L/0)

48,002 次观看 • 1 年前

Today we're announcing #GAIA1: a 9B parameter world model, trained on 4,700 hours of driving data, able to simulate complex and diverse driving scenes from video, text and action inputs. This model is 480x larger than the preview we shared earlier this year and the results are incredible. These videos are entirely synthetically generated by Wayve's generative AI, GAIA-1. But there is more here than just generating videos, GAIA is an entire world model. A world model allows us to simulate the future, conditioned on video, text and action inputs, which can be leveraged for making informed decisions when driving. Why is this game-changing for autonomous driving? 1. Safety. One limitation with AI systems like today's Large Language Models is that they are autoregressive, next-word prediction algorithms, but aren't necessarily aware of the implications of their decisions. A world model allows us to give our AI the capability to be aware of its decisions, by simulating the future, which is important for self-driving safety. 2. Synthetic training data. I believe synthetic training data is the future for AI, because it is safer, cheaper, and infinitely scalable. GAIA-1 unlocks unprecedented realism and diversity of synthetic data for self-driving. 3. Long-tail robustness. One of the biggest challenges for self-driving is long-tail robustness: dealing with the enormous magnitude of edge cases we see on the road. An advantage of generative AI is its incredible ability to recombine experiences in new ways. This is exciting for self-driving as it means we can learn from two edge case scenarios, and combine them to become a corner case. For example, we can experience driving in fog, and experience of jay-walking pedestrians, and GAIA can learn from these experiences to understand how to generate a fog+jay walking scenario. Check out many more videos in our blog or further technical details in our paper: Or come chat with our team who are at the International Conference on Computer Vision (#ICCV2023) this week in Paris in Booth 32 Jamie Shotton

Today we're announcing #GAIA1: a 9B parameter world model, trained on 4,700 hours of driving data, able to simulate complex and diverse driving scenes from video, text and action inputs. This model is 480x larger than the preview we shared earlier this year and the results are incredible. These videos are entirely synthetically generated by Wayve's generative AI, GAIA-1. But there is more here than just generating videos, GAIA is an entire world model. A world model allows us to simulate the future, conditioned on video, text and action inputs, which can be leveraged for making informed decisions when driving. Why is this game-changing for autonomous driving? 1. Safety. One limitation with AI systems like today's Large Language Models is that they are autoregressive, next-word prediction algorithms, but aren't necessarily aware of the implications of their decisions. A world model allows us to give our AI the capability to be aware of its decisions, by simulating the future, which is important for self-driving safety. 2. Synthetic training data. I believe synthetic training data is the future for AI, because it is safer, cheaper, and infinitely scalable. GAIA-1 unlocks unprecedented realism and diversity of synthetic data for self-driving. 3. Long-tail robustness. One of the biggest challenges for self-driving is long-tail robustness: dealing with the enormous magnitude of edge cases we see on the road. An advantage of generative AI is its incredible ability to recombine experiences in new ways. This is exciting for self-driving as it means we can learn from two edge case scenarios, and combine them to become a corner case. For example, we can experience driving in fog, and experience of jay-walking pedestrians, and GAIA can learn from these experiences to understand how to generate a fog+jay walking scenario. Check out many more videos in our blog or further technical details in our paper: Or come chat with our team who are at the International Conference on Computer Vision (#ICCV2023) this week in Paris in Booth 32 Jamie Shotton

Alex Kendall

631,856 次观看 • 2 年前

New Course: Reinforcement Fine-Tuning LLMs with GRPO! Learn to use reinforcement learning to improve your LLM performance in this short course, built in collaboration with Predibase by Rubrik, and taught by Travis Addair, its Co-Founder and CTO, and Arnav Garg, its Senior Engineer and Machine Learning Lead. Reasoning models have been one of the most important developments in LLMs. Reinforcement Fine-Tuning (RFT) uses rewards to encourage LLMs to find solutions to multi-step reasoning tasks such as solving math problems and debugging code - without needing pre-existing training examples like in traditional supervised fine-tuning. Group Relative Policy Optimization (GRPO) is a reinforcement fine-tuning algorithm gaining rapid adoption. Developed by the DeepSeek team and used to train the R1 reasoning model, GRPO uses reward functions that you can write in Python to assign rewards to model responses. It’s beneficial for tasks with verifiable outcomes and can work well even with fewer than 100 training examples. It can also significantly improve the reasoning ability of smaller LLMs, making applications faster and more cost effective. In this course, you’ll take a technical deep dive into RFT with GRPO. You’ll learn to build reward functions that you can use in the GRPO training process to guide an LLM toward better performance on multi-step reasoning tasks. In detail, you’ll: - Learn when reinforcement fine-tuning is a better fit than supervised fine-tuning, especially for tasks involving multi-step reasoning or limited labeled data. - Understand how GRPO uses programmable reward functions as a more scalable alternative to the human feedback required for other reinforcement learning algorithms, such as RLHF and DPO. - Frame the Wordle game as a reinforcement fine-tuning problem and see how an LLM can learn to plan, analyze feedback, and improve its strategy over time. - Design reward functions that power the reinforcement fine-tuning process. - Learn techniques for evaluating more subjective tasks, such as rating the quality of a text summary, using an LLM as a judge. - Understand why reward hacking happens and how to avoid it by adding penalty functions to discourage undesirable behaviors. - Learn the four key components of the loss calculation in the GRPO algorithm: token probability distribution ratios, advantages, clipping, and KL-divergence. - Launch reinforcement fine-tuning jobs using Predibase’s hosted training services. By the end of this course, you’ll be able to build and fine-tune LLMs using reinforcement learning to improve reasoning without relying on large labeled datasets or subjective human feedback. Please sign up here:

New Course: Reinforcement Fine-Tuning LLMs with GRPO! Learn to use reinforcement learning to improve your LLM performance in this short course, built in collaboration with Predibase by Rubrik, and taught by Travis Addair, its Co-Founder and CTO, and Arnav Garg, its Senior Engineer and Machine Learning Lead. Reasoning models have been one of the most important developments in LLMs. Reinforcement Fine-Tuning (RFT) uses rewards to encourage LLMs to find solutions to multi-step reasoning tasks such as solving math problems and debugging code - without needing pre-existing training examples like in traditional supervised fine-tuning. Group Relative Policy Optimization (GRPO) is a reinforcement fine-tuning algorithm gaining rapid adoption. Developed by the DeepSeek team and used to train the R1 reasoning model, GRPO uses reward functions that you can write in Python to assign rewards to model responses. It’s beneficial for tasks with verifiable outcomes and can work well even with fewer than 100 training examples. It can also significantly improve the reasoning ability of smaller LLMs, making applications faster and more cost effective. In this course, you’ll take a technical deep dive into RFT with GRPO. You’ll learn to build reward functions that you can use in the GRPO training process to guide an LLM toward better performance on multi-step reasoning tasks. In detail, you’ll: - Learn when reinforcement fine-tuning is a better fit than supervised fine-tuning, especially for tasks involving multi-step reasoning or limited labeled data. - Understand how GRPO uses programmable reward functions as a more scalable alternative to the human feedback required for other reinforcement learning algorithms, such as RLHF and DPO. - Frame the Wordle game as a reinforcement fine-tuning problem and see how an LLM can learn to plan, analyze feedback, and improve its strategy over time. - Design reward functions that power the reinforcement fine-tuning process. - Learn techniques for evaluating more subjective tasks, such as rating the quality of a text summary, using an LLM as a judge. - Understand why reward hacking happens and how to avoid it by adding penalty functions to discourage undesirable behaviors. - Learn the four key components of the loss calculation in the GRPO algorithm: token probability distribution ratios, advantages, clipping, and KL-divergence. - Launch reinforcement fine-tuning jobs using Predibase’s hosted training services. By the end of this course, you’ll be able to build and fine-tune LLMs using reinforcement learning to improve reasoning without relying on large labeled datasets or subjective human feedback. Please sign up here:

Andrew Ng

86,457 次观看 • 1 年前

Could You Guess What Comes After the OptimAI Edge Node? We’re excited to share another glimpse into the future, Community-Powered Data Validation, a unique feature within the OptimAI Core Node for desktop. What is Data Validation? It’s the essential process of ensuring AI systems learn from accurate, unbiased, and high-quality data. Through human intelligence, users verify, correct, and refine information - making it reliable for training advanced AI agents. This process is enabled by the OptimAI DeHIN (Decentralized Human Intelligence Network), which brings collective intelligence into the heart of AI development. Why is it important? Because truly responsible and powerful AI starts with trusted data. Community validation improves fairness, transparency, and performance - ensuring AI benefits everyone. And as a participant, you’re not just supporting better AI; you’re earning rewards and shaping its future. Your desktop is about to become much more than a device. It’s a gateway to reinforcing the quality of data that powers Agentic AI. Validate data. Strengthen AI. Get rewarded. The future of AI is collective — and it needs you. #BUIDL with us! 👉OptimAI Lite Node: 👉OptimAI Edge Node:

Could You Guess What Comes After the OptimAI Edge Node? We’re excited to share another glimpse into the future, Community-Powered Data Validation, a unique feature within the OptimAI Core Node for desktop. What is Data Validation? It’s the essential process of ensuring AI systems learn from accurate, unbiased, and high-quality data. Through human intelligence, users verify, correct, and refine information - making it reliable for training advanced AI agents. This process is enabled by the OptimAI DeHIN (Decentralized Human Intelligence Network), which brings collective intelligence into the heart of AI development. Why is it important? Because truly responsible and powerful AI starts with trusted data. Community validation improves fairness, transparency, and performance - ensuring AI benefits everyone. And as a participant, you’re not just supporting better AI; you’re earning rewards and shaping its future. Your desktop is about to become much more than a device. It’s a gateway to reinforcing the quality of data that powers Agentic AI. Validate data. Strengthen AI. Get rewarded. The future of AI is collective — and it needs you. #BUIDL with us! 👉OptimAI Lite Node: 👉OptimAI Edge Node:

OptimAI Network

60,659 次观看 • 1 年前

New Short Course: Building AI Browser Agents! Learn how to build AI agents that interact and take actions on websites in this course, created in partnership with and taught by and @namangarg0, Co-founders of AGI Inc. AI browser agents can log into websites, fill out forms, click through web pages, or even place orders online for you. They use both visual information, like screenshots, and structural data, like the HTML or Document Object Model (DOM) of a web page, to reason and take action. With the complexity of webpages and multiple possible actions at each step, it can be challenging for an AI browser agent to complete an assigned task. Because these agents run long action sequences, a single error—like clicking the wrong button or misreading a field—can lead to unexpected outcomes or errors that compound over time. In this course, you'll understand how autonomous web agents work, their current limitations, and how AgentQ enables them to improve through self-correction. In detail, you'll: - Learn what web agents are, how they automate tasks online, their architecture, key components, limitations, and an overview of their decision-making strategies. - Build a web agent that can scrape website and return course recommendations in a structured output format. - Build an autonomous web agent that can execute multiple tasks, such as finding and summarizing webpages, filling out a form, and signing up for a newsletter. - Explore AgentQ, a framework that enables agents to self-correct by combining Monte Carlo Tree Search (MCTS), a self-critique mechanism for continuous improvement, and Direct Preference Optimization (DPO). - Deep dive into MCTS, learn how it finds an effective path, illustrated by an example of Gridworld animation, and use AgentQ to complete web tasks. - Understand AI agents' current state and future directions—including key factors shaping their evolution, such as hardware, algorithm innovation, and data availability. By the end of this course, you will have hands-on experience building browser agents and a deeper understanding of how to make them more robust and reliable. Please sign up here:

New Short Course: Building AI Browser Agents! Learn how to build AI agents that interact and take actions on websites in this course, created in partnership with and taught by and @namangarg0, Co-founders of AGI Inc. AI browser agents can log into websites, fill out forms, click through web pages, or even place orders online for you. They use both visual information, like screenshots, and structural data, like the HTML or Document Object Model (DOM) of a web page, to reason and take action. With the complexity of webpages and multiple possible actions at each step, it can be challenging for an AI browser agent to complete an assigned task. Because these agents run long action sequences, a single error—like clicking the wrong button or misreading a field—can lead to unexpected outcomes or errors that compound over time. In this course, you'll understand how autonomous web agents work, their current limitations, and how AgentQ enables them to improve through self-correction. In detail, you'll: - Learn what web agents are, how they automate tasks online, their architecture, key components, limitations, and an overview of their decision-making strategies. - Build a web agent that can scrape website and return course recommendations in a structured output format. - Build an autonomous web agent that can execute multiple tasks, such as finding and summarizing webpages, filling out a form, and signing up for a newsletter. - Explore AgentQ, a framework that enables agents to self-correct by combining Monte Carlo Tree Search (MCTS), a self-critique mechanism for continuous improvement, and Direct Preference Optimization (DPO). - Deep dive into MCTS, learn how it finds an effective path, illustrated by an example of Gridworld animation, and use AgentQ to complete web tasks. - Understand AI agents' current state and future directions—including key factors shaping their evolution, such as hardware, algorithm innovation, and data availability. By the end of this course, you will have hands-on experience building browser agents and a deeper understanding of how to make them more robust and reliable. Please sign up here:

Andrew Ng

186,031 次观看 • 1 年前

Robots are getting smarter, but most still fail the same way. They don’t learn from their own mistakes. A new paper proposes something different: a way for robots to self-improve directly from their failures in the real world. It’s called PLD (Probe, Learn, Distill). The idea: instead of collecting endless human demos, let the robot figure out where it fails, learn how to recover, and then distill that knowledge back into its main model. Key takeaways from the research: ✅ Uses residual reinforcement learning to recover from policy failures ✅ Achieves 99% success on LIBERO and 100% on real Franka and YAM arms ✅ Runs hour-long manipulation tasks without human resets ✅ Builds a feedback loop between real-world data and model adaptation Unlike supervised fine-tuning, which relies on humans, PLD learns from the robot’s own experience. By training on its own failure distribution, the model becomes both more efficient and more aligned with the real world. It’s not just a technical shift, it’s a step toward robots that improve themselves through real-world practice. Thanks for sharing, Wenli Xiao !! Paper and demos: —- Weekly robotics and AI insights. Subscribe free:

Robots are getting smarter, but most still fail the same way. They don’t learn from their own mistakes. A new paper proposes something different: a way for robots to self-improve directly from their failures in the real world. It’s called PLD (Probe, Learn, Distill). The idea: instead of collecting endless human demos, let the robot figure out where it fails, learn how to recover, and then distill that knowledge back into its main model. Key takeaways from the research: ✅ Uses residual reinforcement learning to recover from policy failures ✅ Achieves 99% success on LIBERO and 100% on real Franka and YAM arms ✅ Runs hour-long manipulation tasks without human resets ✅ Builds a feedback loop between real-world data and model adaptation Unlike supervised fine-tuning, which relies on humans, PLD learns from the robot’s own experience. By training on its own failure distribution, the model becomes both more efficient and more aligned with the real world. It’s not just a technical shift, it’s a step toward robots that improve themselves through real-world practice. Thanks for sharing, Wenli Xiao !! Paper and demos: —- Weekly robotics and AI insights. Subscribe free:

Ilir Aliu

22,812 次观看 • 8 个月前

In order for robots to be deployed in the real world, performing tasks of real value, they must be reliable. Unfortunately, even more, most robotic demos work maybe 70-80% of the time at best. The way to get better reliability is to do real-world reinforcement learning: having the robot teach itself how to perform the task up to a high level of success. The key to doing this is to start with a core of expert human data, use that to train a policy then iteratively improve it, until finally finishing with on-policy reinforcement learning. Kun Lei talks through a unified framework for imitation and reinforcement learning based on PPO, which enables this improvement process. In this episode, Kun Lei explains the theory behind his reinforcement learning method and how it allowed his robot to run in a shopping mall juicing oranges for seven hours at a time, among experiments on a wide variety of tasks and embodiments. Watch episode 58 of RoboPapers now, hosted by Michael Cho - Rbt/Acc and Chris Paxton!

In order for robots to be deployed in the real world, performing tasks of real value, they must be reliable. Unfortunately, even more, most robotic demos work maybe 70-80% of the time at best. The way to get better reliability is to do real-world reinforcement learning: having the robot teach itself how to perform the task up to a high level of success. The key to doing this is to start with a core of expert human data, use that to train a policy then iteratively improve it, until finally finishing with on-policy reinforcement learning. Kun Lei talks through a unified framework for imitation and reinforcement learning based on PPO, which enables this improvement process. In this episode, Kun Lei explains the theory behind his reinforcement learning method and how it allowed his robot to run in a shopping mall juicing oranges for seven hours at a time, among experiments on a wide variety of tasks and embodiments. Watch episode 58 of RoboPapers now, hosted by Michael Cho - Rbt/Acc and Chris Paxton!

RoboPapers

18,813 次观看 • 6 个月前

Sundar Pichai reaffirms his belief that AI is the most profound technology humanity will ever create—surpassing even fire, electricity, or the internet. He reflects on whether that view could be influenced by recency bias but ultimately argues that AI’s speed, scope, and potential set it apart from anything in history. What makes AI truly different, Pichai says, is its ability to improve itself and accelerate the act of creation. Unlike past technologies that enhanced human effort, AI can independently generate ideas, build solutions, and evolve. This recursive, compounding effect may place it in a league of its own, making it, in the long run, the greatest productivity multiplier ever.

Sundar Pichai reaffirms his belief that AI is the most profound technology humanity will ever create—surpassing even fire, electricity, or the internet. He reflects on whether that view could be influenced by recency bias but ultimately argues that AI’s speed, scope, and potential set it apart from anything in history. What makes AI truly different, Pichai says, is its ability to improve itself and accelerate the act of creation. Unlike past technologies that enhanced human effort, AI can independently generate ideas, build solutions, and evolve. This recursive, compounding effect may place it in a league of its own, making it, in the long run, the greatest productivity multiplier ever.

Wes Roth

57,483 次观看 • 1 年前

Tencent announces AppAgent Multimodal Agents as Smartphone Users paper page: Recent advancements in large language models (LLMs) have led to the creation of intelligent agents capable of performing complex tasks. This paper introduces a novel LLM-based multimodal agent framework designed to operate smartphone applications. Our framework enables the agent to operate smartphone applications through a simplified action space, mimicking human-like interactions such as tapping and swiping. This novel approach bypasses the need for system back-end access, thereby broadening its applicability across diverse apps. Central to our agent's functionality is its innovative learning method. The agent learns to navigate and use new apps either through autonomous exploration or by observing human demonstrations. This process generates a knowledge base that the agent refers to for executing complex tasks across different applications. To demonstrate the practicality of our agent, we conducted extensive testing over 50 tasks in 10 different applications, including social media, email, maps, shopping, and sophisticated image editing tools. The results affirm our agent's proficiency in handling a diverse array of high-level tasks.

Tencent announces AppAgent Multimodal Agents as Smartphone Users paper page: Recent advancements in large language models (LLMs) have led to the creation of intelligent agents capable of performing complex tasks. This paper introduces a novel LLM-based multimodal agent framework designed to operate smartphone applications. Our framework enables the agent to operate smartphone applications through a simplified action space, mimicking human-like interactions such as tapping and swiping. This novel approach bypasses the need for system back-end access, thereby broadening its applicability across diverse apps. Central to our agent's functionality is its innovative learning method. The agent learns to navigate and use new apps either through autonomous exploration or by observing human demonstrations. This process generates a knowledge base that the agent refers to for executing complex tasks across different applications. To demonstrate the practicality of our agent, we conducted extensive testing over 50 tasks in 10 different applications, including social media, email, maps, shopping, and sophisticated image editing tools. The results affirm our agent's proficiency in handling a diverse array of high-level tasks.

AK

343,834 次观看 • 2 年前

is changing its name. 13 years ago, launched with a simple idea: every student should learn computer science — to learn how technology works and how to create it — not just how to use it. After more than 2 billion hours of learning, the focus of CS has moved from coding to AI. As AI reshapes every part of daily life, students need digital fluency: the ability to understand AI, direct it, question it, and create with it — built on the foundations of computer science, AI science, and data science. Today, enters its next chapter as CodeAI. Our curriculum, teacher training, and frameworks, are there already: AI Discoveries and AI Foundations are free and in classrooms now. The K-12 digital sciences pathway is expanding. Our goal is a generation with agency over the systems shaping their lives — prepared to shape the work, civic life, relationships, and meaning that come after. Welcome to CodeAI.

is changing its name. 13 years ago, launched with a simple idea: every student should learn computer science — to learn how technology works and how to create it — not just how to use it. After more than 2 billion hours of learning, the focus of CS has moved from coding to AI. As AI reshapes every part of daily life, students need digital fluency: the ability to understand AI, direct it, question it, and create with it — built on the foundations of computer science, AI science, and data science. Today, enters its next chapter as CodeAI. Our curriculum, teacher training, and frameworks, are there already: AI Discoveries and AI Foundations are free and in classrooms now. The K-12 digital sciences pathway is expanding. Our goal is a generation with agency over the systems shaping their lives — prepared to shape the work, civic life, relationships, and meaning that come after. Welcome to CodeAI.

Hadi Partovi

10,856 次观看 • 1 个月前

Yann LeCun -- Meta's chief AI scientist and a Turing Award winner -- explains why scaling up LLMs will never reach human-level AI, and names the four things today's systems still can't do: "If you think that we're going to get to human-level AI by just training on more data and scaling up LLMs, you're making a mistake." "If you're an investor and you invest in a company that told you we're going to get to human-level AI and PhD level by just training on more data and with a few tricks... that was probably not a good idea." "There are ideas about how to go forward and have systems that are capable of doing what every intelligent animal and human are capable of doing, and that current AI systems are not capable of doing." "Understanding the physical world, having persistent memory, and being able to reason and plan. Those are the four characteristics that need to be there." "That requires systems that can acquire common sense, that can learn from natural sensors like video, as opposed to just text." The entire market is priced on a straight line from bigger models to AGI. One of the people who invented deep learning is telling you the line doesn't reach -- and that the real bottleneck isn't compute, it's world models. If he's right, the winner isn't whoever stacks the most GPUs. It's whoever teaches a machine to learn from video the way a child does.

Yann LeCun -- Meta's chief AI scientist and a Turing Award winner -- explains why scaling up LLMs will never reach human-level AI, and names the four things today's systems still can't do: "If you think that we're going to get to human-level AI by just training on more data and scaling up LLMs, you're making a mistake." "If you're an investor and you invest in a company that told you we're going to get to human-level AI and PhD level by just training on more data and with a few tricks... that was probably not a good idea." "There are ideas about how to go forward and have systems that are capable of doing what every intelligent animal and human are capable of doing, and that current AI systems are not capable of doing." "Understanding the physical world, having persistent memory, and being able to reason and plan. Those are the four characteristics that need to be there." "That requires systems that can acquire common sense, that can learn from natural sensors like video, as opposed to just text." The entire market is priced on a straight line from bigger models to AGI. One of the people who invented deep learning is telling you the line doesn't reach -- and that the real bottleneck isn't compute, it's world models. If he's right, the winner isn't whoever stacks the most GPUs. It's whoever teaches a machine to learn from video the way a child does.

Karl Mehta

34,676 次观看 • 25 天前

Announcing a new Coursera course: Retrieval Augmented Generation (RAG) You'll learn to build high performance, production-ready RAG systems in this hands-on, in-depth course created by and taught by , experienced AI and ML engineer, researcher, and educator. RAG is a critical component today of many LLM-based applications in customer support, internal company Q&A systems, even many of the leading chatbots that use web search to answer your questions. This course teaches you in-depth how to make RAG work well. LLMs can produce generic or outdated responses, especially when asked specialized questions not covered in its training data. RAG is the most widely used technique for addressing this. It brings in data from new data sources, such as internal documents or recent news, to give the LLM the relevant context to private, recent, or specialized information. This lets it generate more grounded and accurate responses. In this course, you’ll learn to design and implement every part of a RAG system, from retrievers to vector databases to generation to evals. You’ll learn about the fundamental principles behind RAG and how to optimize it at both the component and whole-system levels. As AI evolves, RAG is evolving too. New models can handle longer context windows, reason more effectively, and can be parts of complex agentic workflows. One exciting growth area is Agentic RAG, in which an AI agent at runtime (rather than it being hardcoded at development time) autonomously decides what data to retrieve, and when/how to go deeper. Even with this evolution, access to high-quality data at runtime is essential, which is why RAG is a key part of so many applications. You'll learn via hands-on experiences to: - Build a RAG system with retrieval and prompt augmentation - Compare retrieval methods like BM25, semantic search, and Reciprocal Rank Fusion - Chunk, index, and retrieve documents using a Weaviate vector database and a news dataset - Develop a chatbot, using open-source LLMs hosted by Together AI, for a fictional store that answers product and FAQ questions - Use evals to drive improving reliability, and incorporate multi-modal data RAG is an important foundational technique. Become good at it through this course! Please sign up here:

Announcing a new Coursera course: Retrieval Augmented Generation (RAG) You'll learn to build high performance, production-ready RAG systems in this hands-on, in-depth course created by and taught by , experienced AI and ML engineer, researcher, and educator. RAG is a critical component today of many LLM-based applications in customer support, internal company Q&A systems, even many of the leading chatbots that use web search to answer your questions. This course teaches you in-depth how to make RAG work well. LLMs can produce generic or outdated responses, especially when asked specialized questions not covered in its training data. RAG is the most widely used technique for addressing this. It brings in data from new data sources, such as internal documents or recent news, to give the LLM the relevant context to private, recent, or specialized information. This lets it generate more grounded and accurate responses. In this course, you’ll learn to design and implement every part of a RAG system, from retrievers to vector databases to generation to evals. You’ll learn about the fundamental principles behind RAG and how to optimize it at both the component and whole-system levels. As AI evolves, RAG is evolving too. New models can handle longer context windows, reason more effectively, and can be parts of complex agentic workflows. One exciting growth area is Agentic RAG, in which an AI agent at runtime (rather than it being hardcoded at development time) autonomously decides what data to retrieve, and when/how to go deeper. Even with this evolution, access to high-quality data at runtime is essential, which is why RAG is a key part of so many applications. You'll learn via hands-on experiences to: - Build a RAG system with retrieval and prompt augmentation - Compare retrieval methods like BM25, semantic search, and Reciprocal Rank Fusion - Chunk, index, and retrieve documents using a Weaviate vector database and a news dataset - Develop a chatbot, using open-source LLMs hosted by Together AI, for a fictional store that answers product and FAQ questions - Use evals to drive improving reliability, and incorporate multi-modal data RAG is an important foundational technique. Become good at it through this course! Please sign up here:

Andrew Ng

124,639 次观看 • 1 年前

An exciting new course: Fine-tuning and Reinforcement Learning for LLMs: Intro to Post-training, taught by Sharon Zhou, VP of AI at AMD. Available now at Post-training is the key technique used by frontier labs to turn a base LLM--a model trained on massive unlabeled text to predict the next word/token--into a helpful, reliable assistant that can follow instructions. I've also seen many applications where post-training is what turns a demo application that works only 80% of the time into a reliable system that consistently performs. This course will teach you the most important post-training techniques! In this 5 module course, Sharon walks you through the complete post-training pipeline: supervised fine-tuning, reward modeling, RLHF, and techniques like PPO and GRPO. You'll also learn to use LoRA for efficient training, and to design evals that catch problems before and after deployment. Skills you'll gain: - Apply supervised fine-tuning and reinforcement learning (RLHF, PPO, GRPO) to align models to desired behaviors - Use LoRA for efficient fine-tuning without retraining entire models - Prepare datasets and generate synthetic data for post-training - Understand how to operate LLM production pipelines, with go/no-go decision points and feedback loops These advanced methods aren’t limited to frontier AI labs anymore, and you can now use them in your own applications. Learn here:

An exciting new course: Fine-tuning and Reinforcement Learning for LLMs: Intro to Post-training, taught by Sharon Zhou, VP of AI at AMD. Available now at Post-training is the key technique used by frontier labs to turn a base LLM--a model trained on massive unlabeled text to predict the next word/token--into a helpful, reliable assistant that can follow instructions. I've also seen many applications where post-training is what turns a demo application that works only 80% of the time into a reliable system that consistently performs. This course will teach you the most important post-training techniques! In this 5 module course, Sharon walks you through the complete post-training pipeline: supervised fine-tuning, reward modeling, RLHF, and techniques like PPO and GRPO. You'll also learn to use LoRA for efficient training, and to design evals that catch problems before and after deployment. Skills you'll gain: - Apply supervised fine-tuning and reinforcement learning (RLHF, PPO, GRPO) to align models to desired behaviors - Use LoRA for efficient fine-tuning without retraining entire models - Prepare datasets and generate synthetic data for post-training - Understand how to operate LLM production pipelines, with go/no-go decision points and feedback loops These advanced methods aren’t limited to frontier AI labs anymore, and you can now use them in your own applications. Learn here:

Andrew Ng

132,304 次观看 • 9 个月前

Scale alone is not enough for AI data. Quality and complexity are equally critical. Excited to support all of these for LLM developers with Snorkel AI Data-as-a-Service, and to share our new leaderboard! — Our decade-plus of research and work in AI data has a simple point: scale alone is not enough. AI success is all about the quality, complexity, and distribution of data—in addition to volume. We’re excited to be powering leading LLM developers with Snorkel AI Expert Data-as-a-Service, our white glove service for custom, expert-level AI datasets—and to now preview some of what we’re building via our new Expert Data Leaderboard (🔗 in 🧵) + upcoming OSS dataset releases! Snorkel Expert Data-as-a-Service is built to meet the rapidly evolving data needs of the agentic AI world—where success is built on the quality, complexity, and distribution of datasets, in addition to size and scale. This kind of high-quality, frontier AI data can only come from a union of technology and human expertise. With Snorkel Expert Data-as-a-Service, we’re powering frontier LLM developers across agentic, expert knowledge, reasoning, coding, multi-modal, and other task types via the combination of these two key components: - (1) The Snorkel Expert Network: A global team of subject matter experts focused wholly on specialized knowledge–spanning thousands of topics in STEM/academic, vertical/professional, and consumer/lifestyle domains. - (2) Snorkel AI Data Development Platform: Our unique programmatic data curation and quality control platform, accelerating and improving expert authoring and review through principled techniques developed over the last decade of R&D. Now: we’re incredibly excited to showcase some of the power of Snorkel Expert Data-as-a-Service via the new Snorkel Leaderboard—putting frontier models to the test in complex, agentic, and reasoning settings inspired by real industry scenarios (not esoteric puzzles)! We’ll be releasing new leaderboards and accompanying expert-verified open source datasets (coming soon!) regularly. To start, we’re sharing three initial ones in preview: - SnorkelFinance: Q&A over financial documents requiring agentic tool-calling and reasoning - SnorkelUnderwrite: Agentic insurance tasks requiring industry-specific reasoning and tool use - SnorkelSequences: Mathematical tasks requiring compositional multi-step reasoning

Scale alone is not enough for AI data. Quality and complexity are equally critical. Excited to support all of these for LLM developers with Snorkel AI Data-as-a-Service, and to share our new leaderboard! — Our decade-plus of research and work in AI data has a simple point: scale alone is not enough. AI success is all about the quality, complexity, and distribution of data—in addition to volume. We’re excited to be powering leading LLM developers with Snorkel AI Expert Data-as-a-Service, our white glove service for custom, expert-level AI datasets—and to now preview some of what we’re building via our new Expert Data Leaderboard (🔗 in 🧵) + upcoming OSS dataset releases! Snorkel Expert Data-as-a-Service is built to meet the rapidly evolving data needs of the agentic AI world—where success is built on the quality, complexity, and distribution of datasets, in addition to size and scale. This kind of high-quality, frontier AI data can only come from a union of technology and human expertise. With Snorkel Expert Data-as-a-Service, we’re powering frontier LLM developers across agentic, expert knowledge, reasoning, coding, multi-modal, and other task types via the combination of these two key components: - (1) The Snorkel Expert Network: A global team of subject matter experts focused wholly on specialized knowledge–spanning thousands of topics in STEM/academic, vertical/professional, and consumer/lifestyle domains. - (2) Snorkel AI Data Development Platform: Our unique programmatic data curation and quality control platform, accelerating and improving expert authoring and review through principled techniques developed over the last decade of R&D. Now: we’re incredibly excited to showcase some of the power of Snorkel Expert Data-as-a-Service via the new Snorkel Leaderboard—putting frontier models to the test in complex, agentic, and reasoning settings inspired by real industry scenarios (not esoteric puzzles)! We’ll be releasing new leaderboards and accompanying expert-verified open source datasets (coming soon!) regularly. To start, we’re sharing three initial ones in preview: - SnorkelFinance: Q&A over financial documents requiring agentic tool-calling and reasoning - SnorkelUnderwrite: Agentic insurance tasks requiring industry-specific reasoning and tool use - SnorkelSequences: Mathematical tasks requiring compositional multi-step reasoning

Alex Ratner

495,851 次观看 • 1 年前