Loading video...

Video Failed to Load

There was a problem loading this video. This could be due to a temporary network issue or the video might be unavailable.

Introducing 🌈 Rainbow Teaming, a new method for generating diverse adversarial prompts for LLMs via LLMs It's a versatile tool 🛠️ for diagnosing model vulnerabilities across domains and creating data to enhance robustness & safety 🦺 Co-lead w/ Sharath Raparthy & Andrei Lupu

Mikayel Samvelyan

2,508 subscribers

56,409 views • 2 years ago •via X (Twitter)

Science & Technology Education Health & Wellness

Anya Rossi• Live Now

Private livecam show

15 Comments

Mikayel Samvelyan2 years ago

We employ Quality-Diversity, an evolutionary search framework, to iteratively populate an archive—a discrete grid spanning the dimensions of interest for diversity (e.g. Risk Category & Attack Style)—with prompts increasingly more effective at eliciting undesirable behaviours.

Mikayel Samvelyan2 years ago

Rainbow Teaming only requires 3 building blocks: 1. Feature descriptors for diversity 2. A mutation operator to evolve prompts 3. A preference model (a judge) for ranking prompts An open-ended cycle of selection, mutation & evaluation then endlessly refines the prompt archive 🔁

Mikayel Samvelyan2 years ago

🌈 Rainbow Teaming thrives on open-ended evolution: Each iteration of prompts builds on the last, forming stepping stones towards an ever-evolving spectrum of attacks. From a single seed, we generate countless diverse prompts, each tailored to distinct features of interest

Mikayel Samvelyan2 years ago

Existing methods for red teaming tend to focus on specific domains, lack diversity, or require extensive human annotations. In contrast, Rainbow Teaming is a domain-agnostic black-box method for automatically producing a diverse and effective collection of adversarial prompts.

Mikayel Samvelyan2 years ago

Our experiments with Llama 2-chat models reveal hundreds of effective adversarial prompts in the safety domain, achieving ~90% attack success rate for all model sizes. Although we focus on Llama 2, our method can in principle be applied to any LLM with only black-box access.

Mikayel Samvelyan2 years ago

Rainbow Teaming-generated prompts are also transferrable! Producing adversarial prompts for smaller models, which also transfer to larger ones, can save computational resources compared to directly optimising larger targets.

Mikayel Samvelyan2 years ago

Fine-tuning models with synthetic data generated by Rainbow Teaming significantly enhances safety against previously unseen attacks, without compromising the model's overall capabilities and helpfulness. A win-win! 📈

Mikayel Samvelyan2 years ago

Furthermore, applying Rainbow Teaming again on a fine-tuned model results in a reduction of attack success rate by ~50%, paving the path to iterative self-improvement.

Mikayel Samvelyan2 years ago

Not just for safety! Rainbow Teaming shows its true colours in other domains, such as question answering, where it populates a 3D archive with adversarial trivia questions that are tough for models like Llama 2-chat 7B, but answerable by more capable versions like 70B. 📚❓

Mikayel Samvelyan2 years ago

Rainbow Teaming also excels in cybersecurity. Focusing on MITRE Attack categories, it effectively reveals vulnerabilities, including insecure code or aiding cyberattacks, in all the models we experimented with.🌐🔒

Mikayel Samvelyan2 years ago

A huge shoutout to our stellar team: @erichammy @aramHmarkosyan Manish Bhatt @yuning_pro @MinqiJiang @jparkerholder @j_foerst @_rockt @robertarail for their exceptional work! 🙌

Mikayel Samvelyan2 years ago

We also extend our deepest gratitude to FAIR leadership @jpineau1 @ylecun @NailaMurray @nicola_cancedda for championing open science and supporting exploratory research by PhD students.📚🎓

Mikayel Samvelyan2 years ago

Like Rainbow Teaming, we build on stepping stones (of ideas) generated by trailblazing visionaries like @kenneth0stanley @jeffclune @joelbot3000 (and many others!) and hope that ideas from open-endedness can further improve the safety of foundational models @EthanJPerez @janleike @yaringal @sleepinyourhat @JacobSteinhardt @jayelmnop @herbiebradley

Mikayel Samvelyan2 years ago

To learn more about 🌈 Rainbow Teaming, check out Paper: Website:

Mikayel Samvelyan2 years ago

Fun fact: The idea for this project emerged unexpectedly while creating adversarial scenarios for the state-of-the-art video game football bot 🎮⚽ Just another real-life example of 'Why Greatness Cannot Be Planned' by @kenneth0stanley & @joelbot3000.

Related Videos

💡Divergence thinking💡 is a hallmark of human creativity and problem-solving 🤖Can LLMs also do divergent reasoning to generate diverse solutions🤔? Introducing Flow-of-Reasoning (FoR) 🌊, a data-efficient way of training LLM policy to generate diverse, high-quality reasoning trajectories Unlike existing RL (like PPO) and planning (like MCTS) to find the max-reward trajectory (akin to convergent thinking), FoR connects LLM reasoning with the #GFlowNet formulation and enables LLMs to find trajectories proportional to reward distribution. 🎬The demo video illustrates how FoR learns and infers multiple solutions to a ♠️Game24 puzzle. 🎯Inferring for diverse solutions could be useful for robustness, data augmentation, and enhanced model generalization. Project page: Paper: Github:

💡Divergence thinking💡 is a hallmark of human creativity and problem-solving 🤖Can LLMs also do divergent reasoning to generate diverse solutions🤔? Introducing Flow-of-Reasoning (FoR) 🌊, a data-efficient way of training LLM policy to generate diverse, high-quality reasoning trajectories Unlike existing RL (like PPO) and planning (like MCTS) to find the max-reward trajectory (akin to convergent thinking), FoR connects LLM reasoning with the #GFlowNet formulation and enables LLMs to find trajectories proportional to reward distribution. 🎬The demo video illustrates how FoR learns and infers multiple solutions to a ♠️Game24 puzzle. 🎯Inferring for diverse solutions could be useful for robustness, data augmentation, and enhanced model generalization. Project page: Paper: Github:

Lianhui Qin

50,447 views • 2 years ago

What’s left w/ foundation models? We found that they still can't ground modular concepts across domains. We present Logic-Enhanced FMs:🤝FMs & neuro-symbolic concept learners. We learn abstractions of concepts like “left” across domains & do domain-independent reasoning w/ LLMs.

What’s left w/ foundation models? We found that they still can't ground modular concepts across domains. We present Logic-Enhanced FMs:🤝FMs & neuro-symbolic concept learners. We learn abstractions of concepts like “left” across domains & do domain-independent reasoning w/ LLMs.

Joy Hsu

48,289 views • 2 years ago

Introducing Frontier Palantir for LLMs

Introducing Frontier Palantir for LLMs

Arav Kumar

64,097 views • 8 months ago

Introducing "Truth Chain." A real-time polling solution to log controversial responses across major LLMs to the blockchain. The objective is continuous accountability and measurement of which LLMs are being tampered with for political purposes. cc Elon Musk

Introducing "Truth Chain." A real-time polling solution to log controversial responses across major LLMs to the blockchain. The objective is continuous accountability and measurement of which LLMs are being tampered with for political purposes. cc Elon Musk

Rex St. John

210,970 views • 1 year ago

Columbia CS Prof explains why LLMs can’t generate new scientific ideas. Bcz LLMs learn a structured “map”, Bayesian manifold, of known data and work well within it, but fail outside it. But true discovery means creating new maps, which LLMs cannot do.

Columbia CS Prof explains why LLMs can’t generate new scientific ideas. Bcz LLMs learn a structured “map”, Bayesian manifold, of known data and work well within it, but fail outside it. But true discovery means creating new maps, which LLMs cannot do.

Rohan Paul

758,998 views • 7 months ago

🚀 1/7 We are thrilled to launch LLM360 — pushing the frontier of open-source & transparent LLMs! Starting with Amber (7B) & CrystalCoder (7B), we are releasing brand new pre-trained LLMs with all training code, data, and up to 360 model checkpoints. 🔗

LLM360

329,446 views • 2 years ago

From text to reality: MIT researchers find new ways to use LLMs to help in design & manufacturing. These models can convert text prompts to CAD, generate manufacturing instructions, and search for optimal designs:

From text to reality: MIT researchers find new ways to use LLMs to help in design & manufacturing. These models can convert text prompts to CAD, generate manufacturing instructions, and search for optimal designs:

MIT CSAIL

59,152 views • 2 years ago

🚨 TIME TO CHANGE THE GAME FOR SAGA HOLDERS! 🚨 Not only are we announcing a $SAMO airdrop... But we're announcing a NEW tool for #Solana projects to EASILY & SEAMLESSLY airdrop to Saga phones. 😱 🔊📱🛠️ Introducing: Saga Tools 📱🛠️🔊 🧵

🚨 TIME TO CHANGE THE GAME FOR SAGA HOLDERS! 🚨 Not only are we announcing a $SAMO airdrop... But we're announcing a NEW tool for #Solana projects to EASILY & SEAMLESSLY airdrop to Saga phones. 😱 🔊📱🛠️ Introducing: Saga Tools 📱🛠️🔊 🧵

SAMO

376,468 views • 2 years ago

Data preparation! It's crucial for machine learning, and we all hate it. Tools and techniques to reduce this burden? A quick summary of 10 years of R&D on this, from cheap tricks to LLMs and graph neural networks 1/9

Data preparation! It's crucial for machine learning, and we all hate it. Tools and techniques to reduce this burden? A quick summary of 10 years of R&D on this, from cheap tricks to LLMs and graph neural networks 1/9

Gael Varoquaux 🦋

13,752 views • 1 year ago

Sylvian (Sylvian) creates tool use environments and gathers expert tool use data (VSCode, Excel, etc.) for LLMs. They already have 4,500+ experts, from IMO Golds to MIT/Stanford PhDs, producing data at 1B tokens/wk.

Sylvian (Sylvian) creates tool use environments and gathers expert tool use data (VSCode, Excel, etc.) for LLMs. They already have 4,500+ experts, from IMO Golds to MIT/Stanford PhDs, producing data at 1B tokens/wk.

Y Combinator

34,674 views • 7 months ago

Introducing FlexOlmo, a new paradigm for language model training that enables the co-development of AI through data collaboration. 🧵

Introducing FlexOlmo, a new paradigm for language model training that enables the co-development of AI through data collaboration. 🧵

Ai2

370,042 views • 11 months ago

Excited to present the LLM-Council skill. Initial idea by Karpathy. I just packaged it as a skill. You can easily spin up a council of LLMs or agents via Fireworks AI. Watch how the new GLM-5 model "deliberates" on other LLMs' thoughts on the big question, "Can LLMs reason?" Things worth paying attention to: New open models like GLM-5 have surprisingly improved on complex reasoning and long-running agentic tasks. The AskUserQuestion tool in Claude Code came in handy to select the council and chairperson. As Andrej Karpathy puts it, it's a really interesting way to get different perspectives from LLMs, which can lead to better decision-making on whatever task you are working on. You can use it for other agentic coding use cases, like evaluation, tool building, designing, and research.

Excited to present the LLM-Council skill. Initial idea by Karpathy. I just packaged it as a skill. You can easily spin up a council of LLMs or agents via Fireworks AI. Watch how the new GLM-5 model "deliberates" on other LLMs' thoughts on the big question, "Can LLMs reason?" Things worth paying attention to: New open models like GLM-5 have surprisingly improved on complex reasoning and long-running agentic tasks. The AskUserQuestion tool in Claude Code came in handy to select the council and chairperson. As Andrej Karpathy puts it, it's a really interesting way to get different perspectives from LLMs, which can lead to better decision-making on whatever task you are working on. You can use it for other agentic coding use cases, like evaluation, tool building, designing, and research.

elvis

39,452 views • 4 months ago

90% of AI products fail in the first month. It's not bad models but bad prompts. Adaline AI is a platform to create prompts, test across models and modalities, integrate tools, evaluate performance, and deploy with real-time monitoring. Find the best prompt for your LLMs...👇

90% of AI products fail in the first month. It's not bad models but bad prompts. Adaline AI is a platform to create prompts, test across models and modalities, integrate tools, evaluate performance, and deploy with real-time monitoring. Find the best prompt for your LLMs...👇

Akshay 🚀

46,821 views • 1 year ago

Introducing A tool for creating animations of your code, in HD. With support for React, Vue, Astro, Python & many more. Built with Remotion, Next.js, shadcn, PlanetScale, & more.

Introducing A tool for creating animations of your code, in HD. With support for React, Vue, Astro, Python & many more. Built with Remotion, Next.js, shadcn, PlanetScale, & more.

David Parks

105,727 views • 2 years ago

🤖 Introducing our Model Context Protocol (MCP) Server The MCP protocol standardizes how apps provide context to LLMs. No more training LLMs — just connect them directly to our blockchain APIs for live token prices, transfers, NFT data, and more 💪 Build smarter AI agents today! Let’s gooooooo 🔗 -------------- TIMESTAMPS 0:00 Setting up the configuration file 0:30 Add configuration to Cursor 1:06 Executing prompts for the price of ETH 2:25 Getting data on Vitalik’s transfers 3:45 Contributing or accessing our MCP server

🤖 Introducing our Model Context Protocol (MCP) Server The MCP protocol standardizes how apps provide context to LLMs. No more training LLMs — just connect them directly to our blockchain APIs for live token prices, transfers, NFT data, and more 💪 Build smarter AI agents today! Let’s gooooooo 🔗 -------------- TIMESTAMPS 0:00 Setting up the configuration file 0:30 Add configuration to Cursor 1:06 Executing prompts for the price of ETH 2:25 Getting data on Vitalik’s transfers 3:45 Contributing or accessing our MCP server

Alchemy

11,521 views • 1 year ago

Pyth data is going institutional. Introducing Pyth Pro: A new market data service built by and for institutions. For the first time ever, banks, brokers, & trading firms can access specialized data across every asset class and geography from a single source of truth ⬇️

Pyth data is going institutional. Introducing Pyth Pro: A new market data service built by and for institutions. For the first time ever, banks, brokers, & trading firms can access specialized data across every asset class and geography from a single source of truth ⬇️

Pyth Network 🔮

810,001 views • 9 months ago

Whole Earth AI is a project-based learning tool. I built it to explore two questions: 1. what new ux patterns do LLMs make possible? 2. what might the montessori method applied to software for adults look like? here's what I learned,

Whole Earth AI is a project-based learning tool. I built it to explore two questions: 1. what new ux patterns do LLMs make possible? 2. what might the montessori method applied to software for adults look like? here's what I learned,

kasey

49,498 views • 1 year ago

Big thanks to AK for highlighting our work! LEO marks our pioneering step towards building an embodied generalist agent that can really comprehend the 3D world! 🚀Leveraging LLMs, we train LEO with real and synthetic 3D data across a diverse spectrum of tasks. It's thrilling to see LEO surpass current state-of-the-art SOTA methods in most benchmarked tasks, all under a single, unified model. 🔥 #Generalist_Agent

Big thanks to AK for highlighting our work! LEO marks our pioneering step towards building an embodied generalist agent that can really comprehend the 3D world! 🚀Leveraging LLMs, we train LEO with real and synthetic 3D data across a diverse spectrum of tasks. It's thrilling to see LEO surpass current state-of-the-art SOTA methods in most benchmarked tasks, all under a single, unified model. 🔥 #Generalist_Agent

Siyuan Huang

22,710 views • 2 years ago

I made a framework for LLMs to play Diplomacy against each other. Diplomacy is a complex board game with a heavy negotiation element. Good for experimenting with game theory & testing persuasion! It's super interesting reading the negotiation logs. Code & samples follow.

I made a framework for LLMs to play Diplomacy against each other. Diplomacy is a complex board game with a heavy negotiation element. Good for experimenting with game theory & testing persuasion! It's super interesting reading the negotiation logs. Code & samples follow.

Sam Paech

51,424 views • 1 year ago

Shipped the next version of ✨ 1. You can finally sign up and create projects. 2. Projects are to group prompts you want to monitor for your brand or product visibility across LLMs. 3. You can upgrade to get access to more prompts and models. I know it is not perfect. I know there will be a lot of issues. And it lacks a lot of features. Working on it! LLMConsole will become the best marketing platform for LLMs 😘

Dmytro Krasun

14,405 views • 1 year ago