Загрузка видео...

Не удалось загрузить видео

Возникла проблема при загрузке этого видео. Это может быть связано с временными проблемами сети или видео может быть недоступно.

На главную

💡Divergence thinking💡 is a hallmark of human creativity and problem-solving 🤖Can LLMs also do divergent reasoning to generate diverse solutions🤔? Introducing Flow-of-Reasoning (FoR) 🌊, a data-efficient way of training LLM policy to generate diverse, high-quality reasoning trajectories Unlike existing RL (like PPO) and planning (like MCTS) to find the... max-reward trajectory (akin to convergent thinking), FoR connects LLM reasoning with the #GFlowNet formulation and enables LLMs to find trajectories proportional to reward distribution. 🎬The demo video illustrates how FoR learns and infers multiple solutions to a ♠️Game24 puzzle. 🎯Inferring for diverse solutions could be useful for robustness, data augmentation, and enhanced model generalization. Project page: Paper: Github:show more

Lianhui Qin

7,327 subscribers

50,447 просмотров • 2 лет назад •via X (Twitter)

Anya Rossi• Live Now

Private livecam show

Комментарии: 10

Фото профиля Lianhui Qin

Lianhui Qin2 лет назад

On BlocksWorld, FoR produces both more diverse and higher-quality reasoning trajectories than CoT, Tree-of-Thoughts, RAP (MCTS), Supervised Finetuning (SFT), and PPO.

Фото профиля Lianhui Qin

Lianhui Qin2 лет назад

FoR is very data-efficient. With only 15 training examples, FoR achieves much better accuracy and diversity than SFT using more data.

Фото профиля Lianhui Qin

Lianhui Qin2 лет назад

Thanks our amazing students: Fangxu Yu @nerv_599164778, Lai Jiang @pero733858111, Haoqiang Kang @haoqik322 , Shibo Hao @Ber18791531

Фото профиля Dinghuai Zhang 张鼎怀

Dinghuai Zhang 张鼎怀2 лет назад

Interesting work! I suppose here one cannot use very long trajectory for training due to gpu memory constraint? Transition based objectives should be more appropriate for large models.

Фото профиля BensenHsu

BensenHsu2 лет назад

The flow-based formulation allows FoR (Flow of Reasoning) to adapt successful GFlowNet approaches for efficient LLM policy training. The diverse sampling enabled by the trajectory balance objective and the exploration mechanisms lead to the superior performance of FoR compared to other methods. full paper:

Фото профиля Nando Fioretto

Nando Fioretto2 лет назад

cool idea!

Фото профиля Lianhui Qin

Lianhui Qin2 лет назад

Thanks!!

Фото профиля Bluetick Consultants Inc.

Bluetick Consultants Inc.2 лет назад

Flow-of-Reasoning (FoR) can transform how LLMs approach problem-solving by fostering divergent thinking. Excited to see its applications in robustness, data augmentation, and model generalization.

Фото профиля FreeMind

FreeMind2 лет назад

What is the dataset like?

Фото профиля Lianhui Qin

Lianhui Qin2 лет назад

It's text-based

Похожие видео

Very Powerful but very simple Prompting technique. Simply ask the LLM to re-read the question - and this significantly boosts LLM reasoning across diverse tasks and model types. 💡 Repeats question input twice in prompt, unlocks latent reasoning potential **Original Problem** 🤔: Decoder-only LLMs with unidirectional attention struggle with nuanced reasoning tasks due to limited global understanding of input questions. **Key Insights from this Paper **💡: • Re-reading (RE2) input enhances reasoning by improving question comprehension • Enables "bidirectional" understanding in unidirectional LLMs • Compatible with existing thought-eliciting prompting methods • Effective across various LLM types and reasoning tasks

Very Powerful but very simple Prompting technique. Simply ask the LLM to re-read the question - and this significantly boosts LLM reasoning across diverse tasks and model types. 💡 Repeats question input twice in prompt, unlocks latent reasoning potential Original Problem 🤔: Decoder-only LLMs with unidirectional attention struggle with nuanced reasoning tasks due to limited global understanding of input questions. Key Insights from this Paper 💡: • Re-reading (RE2) input enhances reasoning by improving question comprehension • Enables "bidirectional" understanding in unidirectional LLMs • Compatible with existing thought-eliciting prompting methods • Effective across various LLM types and reasoning tasks

Rohan Paul

169,522 просмотров • 1 год назад

New Course: Reinforcement Fine-Tuning LLMs with GRPO! Learn to use reinforcement learning to improve your LLM performance in this short course, built in collaboration with Predibase by Rubrik, and taught by Travis Addair, its Co-Founder and CTO, and Arnav Garg, its Senior Engineer and Machine Learning Lead. Reasoning models have been one of the most important developments in LLMs. Reinforcement Fine-Tuning (RFT) uses rewards to encourage LLMs to find solutions to multi-step reasoning tasks such as solving math problems and debugging code - without needing pre-existing training examples like in traditional supervised fine-tuning. Group Relative Policy Optimization (GRPO) is a reinforcement fine-tuning algorithm gaining rapid adoption. Developed by the DeepSeek team and used to train the R1 reasoning model, GRPO uses reward functions that you can write in Python to assign rewards to model responses. It’s beneficial for tasks with verifiable outcomes and can work well even with fewer than 100 training examples. It can also significantly improve the reasoning ability of smaller LLMs, making applications faster and more cost effective. In this course, you’ll take a technical deep dive into RFT with GRPO. You’ll learn to build reward functions that you can use in the GRPO training process to guide an LLM toward better performance on multi-step reasoning tasks. In detail, you’ll: - Learn when reinforcement fine-tuning is a better fit than supervised fine-tuning, especially for tasks involving multi-step reasoning or limited labeled data. - Understand how GRPO uses programmable reward functions as a more scalable alternative to the human feedback required for other reinforcement learning algorithms, such as RLHF and DPO. - Frame the Wordle game as a reinforcement fine-tuning problem and see how an LLM can learn to plan, analyze feedback, and improve its strategy over time. - Design reward functions that power the reinforcement fine-tuning process. - Learn techniques for evaluating more subjective tasks, such as rating the quality of a text summary, using an LLM as a judge. - Understand why reward hacking happens and how to avoid it by adding penalty functions to discourage undesirable behaviors. - Learn the four key components of the loss calculation in the GRPO algorithm: token probability distribution ratios, advantages, clipping, and KL-divergence. - Launch reinforcement fine-tuning jobs using Predibase’s hosted training services. By the end of this course, you’ll be able to build and fine-tune LLMs using reinforcement learning to improve reasoning without relying on large labeled datasets or subjective human feedback. Please sign up here:

New Course: Reinforcement Fine-Tuning LLMs with GRPO! Learn to use reinforcement learning to improve your LLM performance in this short course, built in collaboration with Predibase by Rubrik, and taught by Travis Addair, its Co-Founder and CTO, and Arnav Garg, its Senior Engineer and Machine Learning Lead. Reasoning models have been one of the most important developments in LLMs. Reinforcement Fine-Tuning (RFT) uses rewards to encourage LLMs to find solutions to multi-step reasoning tasks such as solving math problems and debugging code - without needing pre-existing training examples like in traditional supervised fine-tuning. Group Relative Policy Optimization (GRPO) is a reinforcement fine-tuning algorithm gaining rapid adoption. Developed by the DeepSeek team and used to train the R1 reasoning model, GRPO uses reward functions that you can write in Python to assign rewards to model responses. It’s beneficial for tasks with verifiable outcomes and can work well even with fewer than 100 training examples. It can also significantly improve the reasoning ability of smaller LLMs, making applications faster and more cost effective. In this course, you’ll take a technical deep dive into RFT with GRPO. You’ll learn to build reward functions that you can use in the GRPO training process to guide an LLM toward better performance on multi-step reasoning tasks. In detail, you’ll: - Learn when reinforcement fine-tuning is a better fit than supervised fine-tuning, especially for tasks involving multi-step reasoning or limited labeled data. - Understand how GRPO uses programmable reward functions as a more scalable alternative to the human feedback required for other reinforcement learning algorithms, such as RLHF and DPO. - Frame the Wordle game as a reinforcement fine-tuning problem and see how an LLM can learn to plan, analyze feedback, and improve its strategy over time. - Design reward functions that power the reinforcement fine-tuning process. - Learn techniques for evaluating more subjective tasks, such as rating the quality of a text summary, using an LLM as a judge. - Understand why reward hacking happens and how to avoid it by adding penalty functions to discourage undesirable behaviors. - Learn the four key components of the loss calculation in the GRPO algorithm: token probability distribution ratios, advantages, clipping, and KL-divergence. - Launch reinforcement fine-tuning jobs using Predibase’s hosted training services. By the end of this course, you’ll be able to build and fine-tune LLMs using reinforcement learning to improve reasoning without relying on large labeled datasets or subjective human feedback. Please sign up here:

Andrew Ng

86,457 просмотров • 1 год назад

Reasoning LLMs Guide [Full Video - Unedited] 1 hr talk on reasoning LLMs and how to best use them for different applications. Share with your devs & students. I discuss lots of fun ideas like meta-prompting, LLM-as-a-Judge, use cases, prompting tips, and much more.

Reasoning LLMs Guide [Full Video - Unedited] 1 hr talk on reasoning LLMs and how to best use them for different applications. Share with your devs & students. I discuss lots of fun ideas like meta-prompting, LLM-as-a-Judge, use cases, prompting tips, and much more.

elvis

61,360 просмотров • 1 год назад

The DeepSeek-R1 paper is a gem! Highly encourage everyone to read it. It's clear that LLM reasoning capabilities can be learned in different ways. RL, if applied correctly and at scale, can lead to some really powerful and interesting scaling and emergent properties. There is more to RL than meets the eye! Here is my breakdown of the paper along with a few tests: The multi-state training might not make sense initially but they provide clues on optimizations that we can continue to tap into. Data quality is still very important for enhancing the usability of the LLM. Unlike other reasoning LLMs, DeepSeek-R1's training recipe and weights are open so we can build on top of it. This opens up exciting research opportunities. About the attached clip: the previous preview model wasn't able to solve this task. DeepSeek-R1 can solve this and many other tasks that o1 can solve. It's a very good model for coding and math.

The DeepSeek-R1 paper is a gem! Highly encourage everyone to read it. It's clear that LLM reasoning capabilities can be learned in different ways. RL, if applied correctly and at scale, can lead to some really powerful and interesting scaling and emergent properties. There is more to RL than meets the eye! Here is my breakdown of the paper along with a few tests: The multi-state training might not make sense initially but they provide clues on optimizations that we can continue to tap into. Data quality is still very important for enhancing the usability of the LLM. Unlike other reasoning LLMs, DeepSeek-R1's training recipe and weights are open so we can build on top of it. This opens up exciting research opportunities. About the attached clip: the previous preview model wasn't able to solve this task. DeepSeek-R1 can solve this and many other tasks that o1 can solve. It's a very good model for coding and math.

elvis

140,692 просмотров • 1 год назад

Geoffrey Hinton says the current path of scaling is hitting a limit Most high-value data is locked inside companies, and the "free internet" is largely exhausted The solution is for models to generate their own training data through reasoning "that's how AlphaGo beat humans"

Geoffrey Hinton says the current path of scaling is hitting a limit Most high-value data is locked inside companies, and the "free internet" is largely exhausted The solution is for models to generate their own training data through reasoning "that's how AlphaGo beat humans"

Haider.

149,374 просмотров • 7 месяцев назад

Like LLMs, autoregressive transformers for action tokens need a reasoning layer to reduce hallucinations and boost reliability. Grounding this layer in the physics of the action space using DVBFs makes for scalable, task-agnostic training—far simpler than creating RL reward functions for each task. Learn more about our novel approach in the video below 👇🏼

Like LLMs, autoregressive transformers for action tokens need a reasoning layer to reduce hallucinations and boost reliability. Grounding this layer in the physics of the action space using DVBFs makes for scalable, task-agnostic training—far simpler than creating RL reward functions for each task. Learn more about our novel approach in the video below 👇🏼

Sankaet

52,565 просмотров • 1 год назад

Introducing Reinforcement-Learned Teachers (RLTs): Transforming how we teach LLMs to reason with reinforcement learning (RL). Blog: Paper: Traditional RL focuses on “learning to solve” challenging problems with expensive LLMs and constitutes a key step in making student AI systems ultimately acquire reasoning capabilities via distillation and cold-starting. Enter our RLTs—a new class of models prompted with not only a problem’s question but also its solution, and directly trained to generate clear, step-by-step “explanations” to teach their students. Remarkably, an RLT with only 7B parameters produces superior results when distilling and cold-starting students in competitive and graduate-level reasoning tasks than orders-of-magnitude larger LLMs. RLTs are as effective even when distilling 32B students, much larger than the teacher itself—unlocking a new standard for efficiency in developing reasoning language models with RL. Code:

Introducing Reinforcement-Learned Teachers (RLTs): Transforming how we teach LLMs to reason with reinforcement learning (RL). Blog: Paper: Traditional RL focuses on “learning to solve” challenging problems with expensive LLMs and constitutes a key step in making student AI systems ultimately acquire reasoning capabilities via distillation and cold-starting. Enter our RLTs—a new class of models prompted with not only a problem’s question but also its solution, and directly trained to generate clear, step-by-step “explanations” to teach their students. Remarkably, an RLT with only 7B parameters produces superior results when distilling and cold-starting students in competitive and graduate-level reasoning tasks than orders-of-magnitude larger LLMs. RLTs are as effective even when distilling 32B students, much larger than the teacher itself—unlocking a new standard for efficiency in developing reasoning language models with RL. Code:

Sakana AI

179,276 просмотров • 1 год назад

OpenAI just announced API access to o1 (advanced reasoning model) yesterday. I'm delighted to announce today a new short course, Reasoning with o1, built with OpenAI, and taught by Colin Jarvis, Head of AI Solutions at OpenAI, to show you how to use this effectively! Unlike previous language models which generate output directly, o1 “thinks before it responds,” and generates many reasoning tokens before returning a more thoughtful and accurate response. It is great at complex reasoning -- including planning for agentic workflows, coding, and domain-specific reasoning in STEM fields like law. But how you should use it is quite different from other LLMs. I think o1 will be a game changer for many AI applications; and in this course, you'll learn how to use it effectively. In detail, you’ll: - Learn to recognize what tasks o1 is suited for, and when to use a smaller model, or combine o1 with a smaller model - Understand the new principles of prompting reasoning models: Be simple and direct; no explicit chain-of-thought required; use structure; show rather than tell - Implement multi-step orchestration in which o1 plans, and hands tasks over to gpt-4o-mini to execute specific steps; this illustrates a design pattern to optimize intelligence (accuracy) and cost - Use o1 for a coding task to build a new application, edit existing code, and test performance by running a coding competition between o1-mini and GPT 4o - Use o1 for image understanding and learn how it performs better with a "hierarchy of reasoning," in which it incurs the latency and cost upfront, preprocessing the image and indexing it with rich details so it can be used for Q&A later - Learn a technique called meta-prompting, in which you use o1 to improve your prompts. Using a customer support evaluation set, you'll iteratively use o1 to modify a prompt to improve performance You'll also learn about how OpenAI used reinforcement learning to produce a model that uses "test-time compute" to improve performance. I think you'll find this course enjoyable and valuable. Please sign up for it here:

OpenAI just announced API access to o1 (advanced reasoning model) yesterday. I'm delighted to announce today a new short course, Reasoning with o1, built with OpenAI, and taught by Colin Jarvis, Head of AI Solutions at OpenAI, to show you how to use this effectively! Unlike previous language models which generate output directly, o1 “thinks before it responds,” and generates many reasoning tokens before returning a more thoughtful and accurate response. It is great at complex reasoning -- including planning for agentic workflows, coding, and domain-specific reasoning in STEM fields like law. But how you should use it is quite different from other LLMs. I think o1 will be a game changer for many AI applications; and in this course, you'll learn how to use it effectively. In detail, you’ll: - Learn to recognize what tasks o1 is suited for, and when to use a smaller model, or combine o1 with a smaller model - Understand the new principles of prompting reasoning models: Be simple and direct; no explicit chain-of-thought required; use structure; show rather than tell - Implement multi-step orchestration in which o1 plans, and hands tasks over to gpt-4o-mini to execute specific steps; this illustrates a design pattern to optimize intelligence (accuracy) and cost - Use o1 for a coding task to build a new application, edit existing code, and test performance by running a coding competition between o1-mini and GPT 4o - Use o1 for image understanding and learn how it performs better with a "hierarchy of reasoning," in which it incurs the latency and cost upfront, preprocessing the image and indexing it with rich details so it can be used for Q&A later - Learn a technique called meta-prompting, in which you use o1 to improve your prompts. Using a customer support evaluation set, you'll iteratively use o1 to modify a prompt to improve performance You'll also learn about how OpenAI used reinforcement learning to produce a model that uses "test-time compute" to improve performance. I think you'll find this course enjoyable and valuable. Please sign up for it here:

Andrew Ng

357,661 просмотров • 1 год назад

Hiring RL Engineer! Started off as a curious project at Lossfunk to push the boundaries of LLMs in social reasoning - we are now building RL environments, data, and benchmarks to simulate more real-world scenarios. If you want to train SoTA RL models over multi-GPUs (H200s/B200s) to unlock next AI frontier, this is for you.

Hiring RL Engineer! Started off as a curious project at Lossfunk to push the boundaries of LLMs in social reasoning - we are now building RL environments, data, and benchmarks to simulate more real-world scenarios. If you want to train SoTA RL models over multi-GPUs (H200s/B200s) to unlock next AI frontier, this is for you.

Satpal Singh Rathore

45,974 просмотров • 11 месяцев назад

New short course on Reinforcement Learning from Human Feedback! RLHF is one of the key techniques that led to the rise of modern LLMs. It is used to align LLMs with human preferences, to make them more honest, helpful and harmless, by (i) learning a reward function that mimics human preferences, as expressed in human-provided labels, then, (ii) tuning an LLM to generate outputs that receive a high reward. In this course, taught by Nikita Namjoshi, Developer Advocate for GenAI at Google Cloud, you'll learn the details of how RLHF works, including how to apply it to tune an LLM for your own applications. You'll also use an open source library to tune a base LLM to align with human preferences expressed in a training set, and evaluate the tuned model by comparing its responses before and after RLHF-tuning. Please sign up here!

New short course on Reinforcement Learning from Human Feedback! RLHF is one of the key techniques that led to the rise of modern LLMs. It is used to align LLMs with human preferences, to make them more honest, helpful and harmless, by (i) learning a reward function that mimics human preferences, as expressed in human-provided labels, then, (ii) tuning an LLM to generate outputs that receive a high reward. In this course, taught by Nikita Namjoshi, Developer Advocate for GenAI at Google Cloud, you'll learn the details of how RLHF works, including how to apply it to tune an LLM for your own applications. You'll also use an open source library to tune a base LLM to align with human preferences expressed in a training set, and evaluate the tuned model by comparing its responses before and after RLHF-tuning. Please sign up here!

Andrew Ng

205,542 просмотров • 2 лет назад

LLM-grounded Diffusion: Enhancing Prompt Understanding of Text-to-Image Diffusion Models with Large Language Models paper page: github: Recent advancements in text-to-image generation with diffusion models have yielded remarkable results synthesizing highly realistic and diverse images. However, these models still encounter difficulties when generating images from prompts that demand spatial or common sense reasoning. We propose to equip diffusion models with enhanced reasoning capabilities by using off-the-shelf pretrained large language models (LLMs) in a novel two-stage generation process. First, we adapt an LLM to be a text-guided layout generator through in-context learning. When provided with an image prompt, an LLM outputs a scene layout in the form of bounding boxes along with corresponding individual descriptions. Second, we steer a diffusion model with a novel controller to generate images conditioned on the layout. Both stages utilize frozen pretrained models without any LLM or diffusion model parameter optimization. We validate the superiority of our design by demonstrating its ability to outperform the base diffusion model in accurately generating images according to prompts that necessitate both language and spatial reasoning. Additionally, our method naturally allows dialog-based scene specification and is able to handle prompts in a language that is not well-supported by the underlying diffusion model.

LLM-grounded Diffusion: Enhancing Prompt Understanding of Text-to-Image Diffusion Models with Large Language Models paper page: github: Recent advancements in text-to-image generation with diffusion models have yielded remarkable results synthesizing highly realistic and diverse images. However, these models still encounter difficulties when generating images from prompts that demand spatial or common sense reasoning. We propose to equip diffusion models with enhanced reasoning capabilities by using off-the-shelf pretrained large language models (LLMs) in a novel two-stage generation process. First, we adapt an LLM to be a text-guided layout generator through in-context learning. When provided with an image prompt, an LLM outputs a scene layout in the form of bounding boxes along with corresponding individual descriptions. Second, we steer a diffusion model with a novel controller to generate images conditioned on the layout. Both stages utilize frozen pretrained models without any LLM or diffusion model parameter optimization. We validate the superiority of our design by demonstrating its ability to outperform the base diffusion model in accurately generating images according to prompts that necessitate both language and spatial reasoning. Additionally, our method naturally allows dialog-based scene specification and is able to handle prompts in a language that is not well-supported by the underlying diffusion model.

AK

83,657 просмотров • 3 лет назад

Scale alone is not enough for AI data. Quality and complexity are equally critical. Excited to support all of these for LLM developers with Snorkel AI Data-as-a-Service, and to share our new leaderboard! — Our decade-plus of research and work in AI data has a simple point: scale alone is not enough. AI success is all about the quality, complexity, and distribution of data—in addition to volume. We’re excited to be powering leading LLM developers with Snorkel AI Expert Data-as-a-Service, our white glove service for custom, expert-level AI datasets—and to now preview some of what we’re building via our new Expert Data Leaderboard (🔗 in 🧵) + upcoming OSS dataset releases! Snorkel Expert Data-as-a-Service is built to meet the rapidly evolving data needs of the agentic AI world—where success is built on the quality, complexity, and distribution of datasets, in addition to size and scale. This kind of high-quality, frontier AI data can only come from a union of technology and human expertise. With Snorkel Expert Data-as-a-Service, we’re powering frontier LLM developers across agentic, expert knowledge, reasoning, coding, multi-modal, and other task types via the combination of these two key components: - (1) The Snorkel Expert Network: A global team of subject matter experts focused wholly on specialized knowledge–spanning thousands of topics in STEM/academic, vertical/professional, and consumer/lifestyle domains. - (2) Snorkel AI Data Development Platform: Our unique programmatic data curation and quality control platform, accelerating and improving expert authoring and review through principled techniques developed over the last decade of R&D. Now: we’re incredibly excited to showcase some of the power of Snorkel Expert Data-as-a-Service via the new Snorkel Leaderboard—putting frontier models to the test in complex, agentic, and reasoning settings inspired by real industry scenarios (not esoteric puzzles)! We’ll be releasing new leaderboards and accompanying expert-verified open source datasets (coming soon!) regularly. To start, we’re sharing three initial ones in preview: - SnorkelFinance: Q&A over financial documents requiring agentic tool-calling and reasoning - SnorkelUnderwrite: Agentic insurance tasks requiring industry-specific reasoning and tool use - SnorkelSequences: Mathematical tasks requiring compositional multi-step reasoning

Scale alone is not enough for AI data. Quality and complexity are equally critical. Excited to support all of these for LLM developers with Snorkel AI Data-as-a-Service, and to share our new leaderboard! — Our decade-plus of research and work in AI data has a simple point: scale alone is not enough. AI success is all about the quality, complexity, and distribution of data—in addition to volume. We’re excited to be powering leading LLM developers with Snorkel AI Expert Data-as-a-Service, our white glove service for custom, expert-level AI datasets—and to now preview some of what we’re building via our new Expert Data Leaderboard (🔗 in 🧵) + upcoming OSS dataset releases! Snorkel Expert Data-as-a-Service is built to meet the rapidly evolving data needs of the agentic AI world—where success is built on the quality, complexity, and distribution of datasets, in addition to size and scale. This kind of high-quality, frontier AI data can only come from a union of technology and human expertise. With Snorkel Expert Data-as-a-Service, we’re powering frontier LLM developers across agentic, expert knowledge, reasoning, coding, multi-modal, and other task types via the combination of these two key components: - (1) The Snorkel Expert Network: A global team of subject matter experts focused wholly on specialized knowledge–spanning thousands of topics in STEM/academic, vertical/professional, and consumer/lifestyle domains. - (2) Snorkel AI Data Development Platform: Our unique programmatic data curation and quality control platform, accelerating and improving expert authoring and review through principled techniques developed over the last decade of R&D. Now: we’re incredibly excited to showcase some of the power of Snorkel Expert Data-as-a-Service via the new Snorkel Leaderboard—putting frontier models to the test in complex, agentic, and reasoning settings inspired by real industry scenarios (not esoteric puzzles)! We’ll be releasing new leaderboards and accompanying expert-verified open source datasets (coming soon!) regularly. To start, we’re sharing three initial ones in preview: - SnorkelFinance: Q&A over financial documents requiring agentic tool-calling and reasoning - SnorkelUnderwrite: Agentic insurance tasks requiring industry-specific reasoning and tool use - SnorkelSequences: Mathematical tasks requiring compositional multi-step reasoning

Alex Ratner

495,851 просмотров • 1 год назад

#AGIBOTAIWeek Day 3: Introducing GO-2: the next-generation foundation model for embodied AI, built to unify reasoning and action. To truly bridge “thinking” and “doing,” embodied AI must solve two challenges at once: • It generate executable action plans through deep spatial reasoning • It deliver stable execution in real-world environments GO-2 tackles both with a comprehensive architecture: Action Chain-of-Thought for action reasoning, and an Asynchronous Dual-System for robust execution. #AGIBOT #AGIBOTAIWeek #Foundation #model #EmbodiedAI Official account tags X: AGIBOT LinkedIn: agibot

#AGIBOTAIWeek Day 3: Introducing GO-2: the next-generation foundation model for embodied AI, built to unify reasoning and action. To truly bridge “thinking” and “doing,” embodied AI must solve two challenges at once: • It generate executable action plans through deep spatial reasoning • It deliver stable execution in real-world environments GO-2 tackles both with a comprehensive architecture: Action Chain-of-Thought for action reasoning, and an Asynchronous Dual-System for robust execution. #AGIBOT #AGIBOTAIWeek #Foundation #model #EmbodiedAI Official account tags X: AGIBOT LinkedIn: agibot

Hasan Toor

36,202 просмотров • 3 месяцев назад

English Is The New SQL LLMs have made it easy for everyone to create on-the-fly dashboards and plots. You can connect to your database and then use a SOTA LLM like 4o, Opus, or Gemini to generate SQL and Python code to create on-the-fly dashboards and plots. ChatLLM makes it possible across all state-of-the-art LLMs -

English Is The New SQL LLMs have made it easy for everyone to create on-the-fly dashboards and plots. You can connect to your database and then use a SOTA LLM like 4o, Opus, or Gemini to generate SQL and Python code to create on-the-fly dashboards and plots. ChatLLM makes it possible across all state-of-the-art LLMs -

Bindu Reddy

50,564 просмотров • 2 лет назад

NEWS: NVIDIA just announced Alpamayo, what CEO Jensen Huang calls the world’s first thinking, reasoning autonomous vehicle AI, launching on U.S. roads later this year, starting with the Mercedes CLA. Jensen: "It's trained end-to-end. Literally from camera in to actuation out; It reasons what action it is about to take, the reason by which is came about that action, and the trajectory." Alpamayo introduces Vision-Language-Action (VLA) models, which enable self-driving systems to interpret what they see, reason about complex driving scenarios, and generate driving actions. The platform includes large reasoning models, simulation tools for testing rare and edge-case scenarios, and open datasets for training and validation. NVIDIA says the approach improves transparency, safety, and robustness in autonomous systems, particularly in complex real-world environments, and supports progress toward higher levels of vehicle autonomy: "With a 10-billion-parameter architecture, Alpamayo 1 uses video input to generate trajectories alongside reasoning traces, showing the logic behind each decision. Developers can adapt Alpamayo 1 into smaller runtime models for vehicle development, or use it as a foundation for AV development tools such as reasoning-based evaluators and auto-labeling systems. Alpamayo 1 provides open model weights and open-source inferencing scripts. Future models in the family will feature larger parameter counts, more detailed reasoning capabilities, more input and output flexibility, and options for commercial usage."

NEWS: NVIDIA just announced Alpamayo, what CEO Jensen Huang calls the world’s first thinking, reasoning autonomous vehicle AI, launching on U.S. roads later this year, starting with the Mercedes CLA. Jensen: "It's trained end-to-end. Literally from camera in to actuation out; It reasons what action it is about to take, the reason by which is came about that action, and the trajectory." Alpamayo introduces Vision-Language-Action (VLA) models, which enable self-driving systems to interpret what they see, reason about complex driving scenarios, and generate driving actions. The platform includes large reasoning models, simulation tools for testing rare and edge-case scenarios, and open datasets for training and validation. NVIDIA says the approach improves transparency, safety, and robustness in autonomous systems, particularly in complex real-world environments, and supports progress toward higher levels of vehicle autonomy: "With a 10-billion-parameter architecture, Alpamayo 1 uses video input to generate trajectories alongside reasoning traces, showing the logic behind each decision. Developers can adapt Alpamayo 1 into smaller runtime models for vehicle development, or use it as a foundation for AV development tools such as reasoning-based evaluators and auto-labeling systems. Alpamayo 1 provides open model weights and open-source inferencing scripts. Future models in the family will feature larger parameter counts, more detailed reasoning capabilities, more input and output flexibility, and options for commercial usage."

Sawyer Merritt

1,603,680 просмотров • 6 месяцев назад

New Course: Post-training of LLMs Learn to post-train and customize an LLM in this short course, taught by Banghua Zhu, Assistant Professor at the University of Washington University of Washington, and co-founder of @NexusflowX. Training an LLM to follow instructions or answer questions has two key stages: pre-training and post-training. In pre-training, it learns to predict the next word or token from large amounts of unlabeled text. In post-training, it learns useful behaviors such as following instructions, tool use, and reasoning. Post-training transforms a general-purpose token predictor—trained on trillions of unlabeled text tokens—into an assistant that follows instructions and performs specific tasks. Because it is much cheaper than pre-training, it is practical for many more teams to incorporate post-training methods into their workflows than pre-training. In this course, you’ll learn three common post-training methods—Supervised Fine-Tuning (SFT), Direct Preference Optimization (DPO), and Online Reinforcement Learning (RL)—and how to use each one effectively. With SFT, you train the model on pairs of input and ideal output responses. With DPO, you provide both a preferred (chosen) and a less preferred (rejected) response and train the model to favor the preferred output. With RL, the model generates an output, receives a reward score based on human or automated feedback, and updates the model to improve performance. You’ll learn the basic concepts, common use cases, and principles for curating high-quality data for effective training. Through hands-on labs, you’ll download a pre-trained model from Hugging Face and post-train it using SFT, DPO, and RL to see how each technique shapes model behavior. In detail, you’ll: - Understand what post-training is, when to use it, and how it differs from pre-training. - Build an SFT pipeline to turn a base model into an instruct model. - Explore how DPO reshapes behavior by minimizing contrastive loss—penalizing poor responses and reinforcing preferred ones. - Implement a DPO pipeline to change the identity of a chat assistant. - Learn online RL methods such as Proximal Policy Optimization (PPO) and Group Relative Policy Optimization (GRPO), and how to design reward functions. - Train a model with GRPO to improve its math capabilities using a verifiable reward. Post-training is one of the most rapidly developing areas of LLM training. Whether you’re building a high-accuracy context-specific assistant, fine-tuning a model's tone, or improving task-specific accuracy, this course will give you experience with the most important techniques shaping how LLMs are post-trained today. Please sign up here:

New Course: Post-training of LLMs Learn to post-train and customize an LLM in this short course, taught by Banghua Zhu, Assistant Professor at the University of Washington University of Washington, and co-founder of @NexusflowX. Training an LLM to follow instructions or answer questions has two key stages: pre-training and post-training. In pre-training, it learns to predict the next word or token from large amounts of unlabeled text. In post-training, it learns useful behaviors such as following instructions, tool use, and reasoning. Post-training transforms a general-purpose token predictor—trained on trillions of unlabeled text tokens—into an assistant that follows instructions and performs specific tasks. Because it is much cheaper than pre-training, it is practical for many more teams to incorporate post-training methods into their workflows than pre-training. In this course, you’ll learn three common post-training methods—Supervised Fine-Tuning (SFT), Direct Preference Optimization (DPO), and Online Reinforcement Learning (RL)—and how to use each one effectively. With SFT, you train the model on pairs of input and ideal output responses. With DPO, you provide both a preferred (chosen) and a less preferred (rejected) response and train the model to favor the preferred output. With RL, the model generates an output, receives a reward score based on human or automated feedback, and updates the model to improve performance. You’ll learn the basic concepts, common use cases, and principles for curating high-quality data for effective training. Through hands-on labs, you’ll download a pre-trained model from Hugging Face and post-train it using SFT, DPO, and RL to see how each technique shapes model behavior. In detail, you’ll: - Understand what post-training is, when to use it, and how it differs from pre-training. - Build an SFT pipeline to turn a base model into an instruct model. - Explore how DPO reshapes behavior by minimizing contrastive loss—penalizing poor responses and reinforcing preferred ones. - Implement a DPO pipeline to change the identity of a chat assistant. - Learn online RL methods such as Proximal Policy Optimization (PPO) and Group Relative Policy Optimization (GRPO), and how to design reward functions. - Train a model with GRPO to improve its math capabilities using a verifiable reward. Post-training is one of the most rapidly developing areas of LLM training. Whether you’re building a high-accuracy context-specific assistant, fine-tuning a model's tone, or improving task-specific accuracy, this course will give you experience with the most important techniques shaping how LLMs are post-trained today. Please sign up here:

Andrew Ng

125,146 просмотров • 1 год назад

Introducing `AutoRL` 📈 The world's simplest way to train a task-specific LLM with RL. *Just write a SENTENCE describing the model you want.* A chain of AI systems will generate data + rubrics and train a model for you. Powered by ART, it's open source. Link in thread:

Introducing `AutoRL` 📈 The world's simplest way to train a task-specific LLM with RL. Just write a SENTENCE describing the model you want. A chain of AI systems will generate data + rubrics and train a model for you. Powered by ART, it's open source. Link in thread:

Matt Shumer

150,107 просмотров • 1 год назад

Data preprocessing is critical for building effective RAG systems. Our new short course, Preprocessing Unstructured Data for LLM Applications, taught by Matt Robinson of Unstructured, demonstrates important but sometimes overlooked aspects of RAG systems: - How to extract and normalize content from diverse formats like PDF, Powerpoint, and HTML to expand your LLM's knowledge - Enriching data with metadata to enable more powerful retrieval and reasoning - Applying document layout analysis and vision transforms to process embedded images and tables Then you’ll use all these skills and build a RAG bot that draws from a corpus that includes PDF, PowerPoint, and Markdown documents. Please sign up here:

Data preprocessing is critical for building effective RAG systems. Our new short course, Preprocessing Unstructured Data for LLM Applications, taught by Matt Robinson of Unstructured, demonstrates important but sometimes overlooked aspects of RAG systems: - How to extract and normalize content from diverse formats like PDF, Powerpoint, and HTML to expand your LLM's knowledge - Enriching data with metadata to enable more powerful retrieval and reasoning - Applying document layout analysis and vision transforms to process embedded images and tables Then you’ll use all these skills and build a RAG bot that draws from a corpus that includes PDF, PowerPoint, and Markdown documents. Please sign up here:

Andrew Ng

150,317 просмотров • 2 лет назад

We have 13B hours of player interaction on the platform monthly. This data enables us to train intelligent NPCs that can reason and interact in 3D worlds. Our training goes beyond videos of gameplay and simple WASD actions, utilizing our full data model for a more detailed representation of human interactions. Our video below shows Roblox NPCs figuring out how to build a campfire by reasoning backwards to find an axe, cut down a tree, and bring the wood to the firepit. This is still early research but we imagine a future where intelligent NPCs could play alongside real players. 4/4

We have 13B hours of player interaction on the platform monthly. This data enables us to train intelligent NPCs that can reason and interact in 3D worlds. Our training goes beyond videos of gameplay and simple WASD actions, utilizing our full data model for a more detailed representation of human interactions. Our video below shows Roblox NPCs figuring out how to build a campfire by reasoning backwards to find an axe, cut down a tree, and bring the wood to the firepit. This is still early research but we imagine a future where intelligent NPCs could play alongside real players. 4/4

Roblox

288,109 просмотров • 5 месяцев назад