Video wird geladen...

Video konnte nicht geladen werden

Beim Laden dieses Videos ist ein Problem aufgetreten. Dies könnte an einem vorübergehenden Netzwerkproblem liegen oder das Video ist möglicherweise nicht verfügbar.

Reinforcement Learning (RL) has long been the dominant method for fine-tuning, powering many state-of-the-art LLMs. Methods like PPO and GRPO explore in action space. But can we instead explore directly in parameter space? YES we can. We propose a scalable framework for full-parameter fine-tuning using Evolution Strategies (ES). By... show more

Yulu Gan

3,732 subscribers

415,255 Aufrufe • vor 9 Monaten •via X (Twitter)

Wissenschaft & Technologie Bildung

Anya Rossi• Live Now

Private livecam show

0 Kommentare

Keine Kommentare verfügbar

Kommentare vom Original-Post werden hier angezeigt

Ähnliche Videos

🚀New paper out - We present Video-MSG (Multimodal Sketch Guidance), a novel planning-based training-free guidance method for T2V models, improving control of spatial layout and object trajectories. 🔧 Key idea: • Generate a Video Sketch — a spatio-temporal plan with background, foreground, and motion in the pixel space. • Encode this structure directly into the latent space of the diffusion model during generation, which does not require fine-tuning or additional memory during inference. 🧵

🚀New paper out - We present Video-MSG (Multimodal Sketch Guidance), a novel planning-based training-free guidance method for T2V models, improving control of spatial layout and object trajectories. 🔧 Key idea: • Generate a Video Sketch — a spatio-temporal plan with background, foreground, and motion in the pixel space. • Encode this structure directly into the latent space of the diffusion model during generation, which does not require fine-tuning or additional memory during inference. 🧵

Jialu Li

35,060 Aufrufe • vor 1 Jahr

Full Fine-tuning vs. Freezing Layers. Interact 👉 and == Full Fine-tuning == A real network has many — three layers in this example, billions of parameters in a production model. What does fine-tuning look like when you update all of them? That’s full fine-tuning: continue training every weight in the pretrained network on your new task. Every layer’s W gets its own ΔW. Nothing is frozen — every parameter is in play. Think of an MLP as a chain of prerequisites leading to an advanced course. Layer 1 might be Linear Algebra, layer 2 Probability, layer 3 Advanced Machine Learning — each one building on what came before. Fine-tuning is what happens during graduate study: the foundations are already there from undergrad, so you’re not re-learning. Full fine-tuning is reviewing every prerequisite to see what new topics have appeared and what discoveries the field has made since the last time you sat through them. Effective — but exhausting. This diagram shows the same three-layer MLP twice, side by side. On the left, the pretrained network runs on input X: three weight matrices W₁, W₂, W₃, each followed by a ReLU activation. Full fine-tuning gives the model the most freedom to specialize. Every parameter can move — and every parameter that can move must be stored. But not every prerequisite needs revisiting. The further you go back in the chain, the less the material has changed since pretraining — the linear-algebra basics under your computer-vision course are largely the same as they ever were. The next page does exactly that: freeze the prerequisites that haven’t moved, and only refresh the advanced one closest to your specialization. == Freezing Layers == Full fine-tuning reviewed every prerequisite — Linear Algebra, Probability, Advanced ML — to refresh each subject with the latest topics. Effective, but exhausting. Then you realize something. The prerequisites haven’t actually changed that much. Linear Algebra is still Linear Algebra; the matrix decompositions you learned still hold. Probability is still Probability; the distributions and Bayes’ rule haven’t moved. Almost all the new material — the new ideas, the recent discoveries — lives in the advanced layer at the top. That’s freezing layers: keep the prerequisite layers fixed at their pretrained state, and only update the advanced one. In the diagram below, W1 and W2 — the foundational prerequisites — stay frozen. Only W3 — the layer closest to your task-specific output — gets a ΔW.

Full Fine-tuning vs. Freezing Layers. Interact 👉 and == Full Fine-tuning == A real network has many — three layers in this example, billions of parameters in a production model. What does fine-tuning look like when you update all of them? That’s full fine-tuning: continue training every weight in the pretrained network on your new task. Every layer’s W gets its own ΔW. Nothing is frozen — every parameter is in play. Think of an MLP as a chain of prerequisites leading to an advanced course. Layer 1 might be Linear Algebra, layer 2 Probability, layer 3 Advanced Machine Learning — each one building on what came before. Fine-tuning is what happens during graduate study: the foundations are already there from undergrad, so you’re not re-learning. Full fine-tuning is reviewing every prerequisite to see what new topics have appeared and what discoveries the field has made since the last time you sat through them. Effective — but exhausting. This diagram shows the same three-layer MLP twice, side by side. On the left, the pretrained network runs on input X: three weight matrices W₁, W₂, W₃, each followed by a ReLU activation. Full fine-tuning gives the model the most freedom to specialize. Every parameter can move — and every parameter that can move must be stored. But not every prerequisite needs revisiting. The further you go back in the chain, the less the material has changed since pretraining — the linear-algebra basics under your computer-vision course are largely the same as they ever were. The next page does exactly that: freeze the prerequisites that haven’t moved, and only refresh the advanced one closest to your specialization. == Freezing Layers == Full fine-tuning reviewed every prerequisite — Linear Algebra, Probability, Advanced ML — to refresh each subject with the latest topics. Effective, but exhausting. Then you realize something. The prerequisites haven’t actually changed that much. Linear Algebra is still Linear Algebra; the matrix decompositions you learned still hold. Probability is still Probability; the distributions and Bayes’ rule haven’t moved. Almost all the new material — the new ideas, the recent discoveries — lives in the advanced layer at the top. That’s freezing layers: keep the prerequisite layers fixed at their pretrained state, and only update the advanced one. In the diagram below, W1 and W2 — the foundational prerequisites — stay frozen. Only W3 — the layer closest to your task-specific output — gets a ΔW.

Tom Yeh

27,587 Aufrufe • vor 3 Monaten

What if we could teach an AI to master the strategic game of 2048 through pure reinforcement learning? I did exactly that with "Agent 2048" - fine-tuning Qwen 7B model using GRPO to develop spatial reasoning and merge strategies with zero prior gameplay SFTdata! Thanks to Hugging Face and Unsloth AI for their easy to use implementation kalomaze and will brown you might like this :)

What if we could teach an AI to master the strategic game of 2048 through pure reinforcement learning? I did exactly that with "Agent 2048" - fine-tuning Qwen 7B model using GRPO to develop spatial reasoning and merge strategies with zero prior gameplay SFTdata! Thanks to Hugging Face and Unsloth AI for their easy to use implementation kalomaze and will brown you might like this :)

Hrishbh Dalal

31,863 Aufrufe • vor 1 Jahr

Model-Free Reinforcement Learning (MFRL) has been alluring, especially with supercharged compute with physics on GPU. However, the methods use 0-th order gradients, and are often not the best optimizers. Can we do better than PPO in continuous control for robotics? Turns out yes! 🥳 tl;dr: Faster, better RL than PPO in continuous control 💪 The answer lies in using more information from the simulation. We are juicing the simulation on GPU as it is, why not use it for gradients as well? This has been a driving question in a series of our works. We first studied this problem in ICLR 2022 paper on Short Horizon Actor Critic Naive gradient based methods are stuck in local minima and have exploding/vanishing gradients. SHAC solved this problem truncated rollouts and model based value estimation, where the model is Differentiable Sim. This boosted sample efficiency and wall-clock time immensely especially in high dimensional systems such as humanoids Yet, given enough compute PPO often caught up. Our follow up paper on on Adaptive Horizon Actor Critic at ICML 2024 discovers the cause and provides a fix. However, we find that even when given ground-truth dynamics, not all gradients are useful due to sample error. 1st-Order Model-Based Reinforcement Learning methods employing differentiable simulation provide gradients with reduced variance but are susceptible to bias in scenarios involving stiff dynamics, such as physical contact. We find that back-propagating through contact and long trajectories drastically reduces gradient accuracy. Using this insight, we propose AHAC to dynamically adapt its roll-out horizon to avoid differentiating through stiff contact. AHAC is a first-order model-based RL algorithm that learns high-dimensional tasks in minutes (wall clock) and outperforms PPO by 40%, even in the limit of data provided to PPO. This work is led by Ignat Georgiev alongside Krishnan Srinivasan, Jie Xu, Eric Heiden and ample assistance from warp team at NVIDIA Robotics (Miles Macklin)

Model-Free Reinforcement Learning (MFRL) has been alluring, especially with supercharged compute with physics on GPU. However, the methods use 0-th order gradients, and are often not the best optimizers. Can we do better than PPO in continuous control for robotics? Turns out yes! 🥳 tl;dr: Faster, better RL than PPO in continuous control 💪 The answer lies in using more information from the simulation. We are juicing the simulation on GPU as it is, why not use it for gradients as well? This has been a driving question in a series of our works. We first studied this problem in ICLR 2022 paper on Short Horizon Actor Critic Naive gradient based methods are stuck in local minima and have exploding/vanishing gradients. SHAC solved this problem truncated rollouts and model based value estimation, where the model is Differentiable Sim. This boosted sample efficiency and wall-clock time immensely especially in high dimensional systems such as humanoids Yet, given enough compute PPO often caught up. Our follow up paper on on Adaptive Horizon Actor Critic at ICML 2024 discovers the cause and provides a fix. However, we find that even when given ground-truth dynamics, not all gradients are useful due to sample error. 1st-Order Model-Based Reinforcement Learning methods employing differentiable simulation provide gradients with reduced variance but are susceptible to bias in scenarios involving stiff dynamics, such as physical contact. We find that back-propagating through contact and long trajectories drastically reduces gradient accuracy. Using this insight, we propose AHAC to dynamically adapt its roll-out horizon to avoid differentiating through stiff contact. AHAC is a first-order model-based RL algorithm that learns high-dimensional tasks in minutes (wall clock) and outperforms PPO by 40%, even in the limit of data provided to PPO. This work is led by Ignat Georgiev alongside Krishnan Srinivasan, Jie Xu, Eric Heiden and ample assistance from warp team at NVIDIA Robotics (Miles Macklin)

Animesh Garg

52,300 Aufrufe • vor 2 Jahren

1/ Happy to share UniDisc - Unified Multimodal Discrete Diffusion – We train a 1.5 billion parameter transformer model from scratch on 250 million image/caption pairs using a **discrete diffusion objective**. Our model has all the benefits of diffusion models but now in multimodal space! - flexible compute-quality tradeoff, zero-shot inpainting and editing, better control via classifier-free guidance and lower latency! We open source everything - our code, weights and the training dataset.

1/ Happy to share UniDisc - Unified Multimodal Discrete Diffusion – We train a 1.5 billion parameter transformer model from scratch on 250 million image/caption pairs using a discrete diffusion objective. Our model has all the benefits of diffusion models but now in multimodal space! - flexible compute-quality tradeoff, zero-shot inpainting and editing, better control via classifier-free guidance and lower latency! We open source everything - our code, weights and the training dataset.

Mihir Prabhudesai

104,934 Aufrufe • vor 1 Jahr

EDGS: Eliminating Densification for Efficient Convergence of 3DGS Contributions: • We show that initial triangulation based on 2D correspondences can replace the incremental refinement process, fundamentally changing how 3DGS models allocate resources. • Our method reduces the path each Gaussian must travel in parameter space. Careful initialization not only accelerates convergence but also guides optimization toward a convergence point corresponding to lower reconstruction error and thus higher reconstruction quality. • Our approach outperforms both speed-optimized and quality-focused state-of-the-art models while using only half the splats of standard 3DGS. By improving initialization rather than altering the optimization process, this method is compatible with other 3DGS acceleration techniques, making it a flexible enhancement to existing models.

EDGS: Eliminating Densification for Efficient Convergence of 3DGS Contributions: • We show that initial triangulation based on 2D correspondences can replace the incremental refinement process, fundamentally changing how 3DGS models allocate resources. • Our method reduces the path each Gaussian must travel in parameter space. Careful initialization not only accelerates convergence but also guides optimization toward a convergence point corresponding to lower reconstruction error and thus higher reconstruction quality. • Our approach outperforms both speed-optimized and quality-focused state-of-the-art models while using only half the splats of standard 3DGS. By improving initialization rather than altering the optimization process, this method is compatible with other 3DGS acceleration techniques, making it a flexible enhancement to existing models.

MrNeRF

124,131 Aufrufe • vor 1 Jahr

The Sabotaging Practice of Over Supply and Sameness in the NFT Space. The current zeitgeist of the NFT space is that the same artists are doing the same kind of work five times a year, with project after project leaving a trail of disappointment and discontent among collectors and all of us watching in disbelief as huge resources are extracted from the space over work that feels like it could be left as an "artist study." I understand that you can do what you want with your money as collectors, but we are killing the whole space with this incestuous practice. No artist is that prolific to be able to do 5 collections of 100+ pieces each every year and actually deliver innovation and some kind of creative evolution. Of course, they can pretend play that the work has something new, but there is no precedent nor proof that that has ever happened in the speed that it happens in the NFT space. Again, people are free to through away their resources on whatever they want but with this way of doing things, we more and more are going to start seeing the consequences. Oh! There are consequences? Yes. Maybe unintended, but there are. Let's see. Let's start with the loss of belief in the NFT space as somewhere where emerging artists can come and find support for their experiments. Why even bother to bring experiments, innovation, and new ways to think of art on the blockchain if the same people have all the collectors hypnotized with their magical flutes? Why even try to come to a space where taking risks and challenging the status quo (the mission of art!!!) is overlooked? This makes the NFT space a social club and not a space for art. I guess it is fine, but IMO it is a recipe for disaster. New collectors stay away because the art will slowly but surely become stale and un-challenging. Why even bother to come and see what is happening here if you can't, as a collector, see new weird and up-and-coming artists? The amount of noise emitted by the same artists doing the same art over and over, drowns out any new voices. Again. A recipe for disaster. The NFT space is becoming a space of disappointment and doubt. We think that collections going to zero one after the other, over and over, is not damaging? I feel we are kidding ourselves. Disappointment piles up, and again, the people who will hurt are the emerging artists, the new blood, the ones who are willing to risk the most and, in return, put fire in this cold space of sameness. I love this space—don't get me wrong—it has changed my life, and I believe it has a ton of potential, but things need to change for it to become a beacon of light in art. But we need to support new voices. We need to support new ideas. The challenge is huge. I hope to contribute all I can to this change. I hope more and more see how exciting it is to go out and try to discover what else is out there and move this space forward. But again, I understand the leaps of faith needed, but if there is a space that is based on that, it's the NFT space...so there is hope. We will see. 📺by Boldtron

The Sabotaging Practice of Over Supply and Sameness in the NFT Space. The current zeitgeist of the NFT space is that the same artists are doing the same kind of work five times a year, with project after project leaving a trail of disappointment and discontent among collectors and all of us watching in disbelief as huge resources are extracted from the space over work that feels like it could be left as an "artist study." I understand that you can do what you want with your money as collectors, but we are killing the whole space with this incestuous practice. No artist is that prolific to be able to do 5 collections of 100+ pieces each every year and actually deliver innovation and some kind of creative evolution. Of course, they can pretend play that the work has something new, but there is no precedent nor proof that that has ever happened in the speed that it happens in the NFT space. Again, people are free to through away their resources on whatever they want but with this way of doing things, we more and more are going to start seeing the consequences. Oh! There are consequences? Yes. Maybe unintended, but there are. Let's see. Let's start with the loss of belief in the NFT space as somewhere where emerging artists can come and find support for their experiments. Why even bother to bring experiments, innovation, and new ways to think of art on the blockchain if the same people have all the collectors hypnotized with their magical flutes? Why even try to come to a space where taking risks and challenging the status quo (the mission of art!!!) is overlooked? This makes the NFT space a social club and not a space for art. I guess it is fine, but IMO it is a recipe for disaster. New collectors stay away because the art will slowly but surely become stale and un-challenging. Why even bother to come and see what is happening here if you can't, as a collector, see new weird and up-and-coming artists? The amount of noise emitted by the same artists doing the same art over and over, drowns out any new voices. Again. A recipe for disaster. The NFT space is becoming a space of disappointment and doubt. We think that collections going to zero one after the other, over and over, is not damaging? I feel we are kidding ourselves. Disappointment piles up, and again, the people who will hurt are the emerging artists, the new blood, the ones who are willing to risk the most and, in return, put fire in this cold space of sameness. I love this space—don't get me wrong—it has changed my life, and I believe it has a ton of potential, but things need to change for it to become a beacon of light in art. But we need to support new voices. We need to support new ideas. The challenge is huge. I hope to contribute all I can to this change. I hope more and more see how exciting it is to go out and try to discover what else is out there and move this space forward. But again, I understand the leaps of faith needed, but if there is a space that is based on that, it's the NFT space...so there is hope. We will see. 📺by Boldtron

alejandro cartagena

98,261 Aufrufe • vor 2 Jahren

LongWriter Unleashing 10,000+ Word Generation from Long Context LLMs discuss: Current long context large language models (LLMs) can process inputs up to 100,000 tokens, yet struggle to generate outputs exceeding even a modest length of 2,000 words. Through controlled experiments, we find that the model's effective generation length is inherently bounded by the sample it has seen during supervised fine-tuning (SFT). In other words, their output limitation is due to the scarcity of long-output examples in existing SFT datasets. To address this, we introduce AgentWrite, an agent-based pipeline that decomposes ultra-long generation tasks into subtasks, enabling off-the-shelf LLMs to generate coherent outputs exceeding 20,000 words. Leveraging AgentWrite, we construct LongWriter-6k, a dataset containing 6,000 SFT data with output lengths ranging from 2k to 32k words. By incorporating this dataset into model training, we successfully scale the output length of existing models to over 10,000 words while maintaining output quality. We also develop LongBench-Write, a comprehensive benchmark for evaluating ultra-long generation capabilities. Our 9B parameter model, further improved through DPO, achieves state-of-the-art performance on this benchmark, surpassing even much larger proprietary models. In general, our work demonstrates that existing long context LLM already possesses the potential for a larger output window--all you need is data with extended output during model alignment to unlock this capability.

LongWriter Unleashing 10,000+ Word Generation from Long Context LLMs discuss: Current long context large language models (LLMs) can process inputs up to 100,000 tokens, yet struggle to generate outputs exceeding even a modest length of 2,000 words. Through controlled experiments, we find that the model's effective generation length is inherently bounded by the sample it has seen during supervised fine-tuning (SFT). In other words, their output limitation is due to the scarcity of long-output examples in existing SFT datasets. To address this, we introduce AgentWrite, an agent-based pipeline that decomposes ultra-long generation tasks into subtasks, enabling off-the-shelf LLMs to generate coherent outputs exceeding 20,000 words. Leveraging AgentWrite, we construct LongWriter-6k, a dataset containing 6,000 SFT data with output lengths ranging from 2k to 32k words. By incorporating this dataset into model training, we successfully scale the output length of existing models to over 10,000 words while maintaining output quality. We also develop LongBench-Write, a comprehensive benchmark for evaluating ultra-long generation capabilities. Our 9B parameter model, further improved through DPO, achieves state-of-the-art performance on this benchmark, surpassing even much larger proprietary models. In general, our work demonstrates that existing long context LLM already possesses the potential for a larger output window--all you need is data with extended output during model alignment to unlock this capability.

AK

50,995 Aufrufe • vor 1 Jahr

New research from Databricks: LLMs Can Learn to Reason via Off-Policy RL Optimal Advantage-based Policy Optimization with Lagged Inference policy (OAPL) shows you don’t need strict on-policy training to improve reasoning. It matches or beats Group Relative Policy Optimization (GRPO), stays stable with large policy lag, and uses ~3× fewer training generations. For Databricks customers, it’s a simpler, practical, and equally powerful approach to RL that Databricks is pioneering internally — and bringing directly to Databricks customers, so enterprises can improve agents using the same methods we use for our in-house agents, without complex infrastructure changes.

New research from Databricks: LLMs Can Learn to Reason via Off-Policy RL Optimal Advantage-based Policy Optimization with Lagged Inference policy (OAPL) shows you don’t need strict on-policy training to improve reasoning. It matches or beats Group Relative Policy Optimization (GRPO), stays stable with large policy lag, and uses ~3× fewer training generations. For Databricks customers, it’s a simpler, practical, and equally powerful approach to RL that Databricks is pioneering internally — and bringing directly to Databricks customers, so enterprises can improve agents using the same methods we use for our in-house agents, without complex infrastructure changes.

Databricks AI Research

12,633 Aufrufe • vor 5 Monaten

Fine-tune DeepSeek-OCR on your own language! (100% local) DeepSeek-OCR is a 3B-parameter vision model that achieves 97% precision while using 10× fewer vision tokens than text-based LLMs. It handles tables, papers, and handwriting without killing your GPU or budget. Why it matters: Most vision models treat documents as massive sequences of tokens, making long-context processing expensive and slow. DeepSeek-OCR uses context optical compression to convert 2D layouts into vision tokens, enabling efficient processing of complex documents. The best part? You can easily fine-tune it for your specific use case on a single GPU. I used Unsloth to run this experiment on Persian text and saw an 88.26% improvement in character error rate. ↳ Base model: 149% character error rate (CER) ↳ Fine-tuned model: 60% CER (57% more accurate) ↳ Training time: 60 steps on a single GPU Persian was just the test case. You can swap in your own dataset for any language, document type, or specific domain you're working with. I've shared the complete guide in the next tweet - all the code, notebooks, and environment setup ready to run with a single click. Everything is 100% open-source!

Fine-tune DeepSeek-OCR on your own language! (100% local) DeepSeek-OCR is a 3B-parameter vision model that achieves 97% precision while using 10× fewer vision tokens than text-based LLMs. It handles tables, papers, and handwriting without killing your GPU or budget. Why it matters: Most vision models treat documents as massive sequences of tokens, making long-context processing expensive and slow. DeepSeek-OCR uses context optical compression to convert 2D layouts into vision tokens, enabling efficient processing of complex documents. The best part? You can easily fine-tune it for your specific use case on a single GPU. I used Unsloth to run this experiment on Persian text and saw an 88.26% improvement in character error rate. ↳ Base model: 149% character error rate (CER) ↳ Fine-tuned model: 60% CER (57% more accurate) ↳ Training time: 60 steps on a single GPU Persian was just the test case. You can swap in your own dataset for any language, document type, or specific domain you're working with. I've shared the complete guide in the next tweet - all the code, notebooks, and environment setup ready to run with a single click. Everything is 100% open-source!

Akshay 🚀

126,122 Aufrufe • vor 8 Monaten

HTML enters 3D! Or vice versa? With the new HTML in Canvas by WICG, we can finally put native DOM elements directly into WebGL/WebGPU scenes. It is experimental for now, but the possibilities for 3D interfaces and special effects are huge. This demo was built using Three.js and Omma AI (tool by Spline ) It’s a fun new way to explore what the web can do! Are you interested in seeing the demo?

HTML enters 3D! Or vice versa? With the new HTML in Canvas by WICG, we can finally put native DOM elements directly into WebGL/WebGPU scenes. It is experimental for now, but the possibilities for 3D interfaces and special effects are huge. This demo was built using Three.js and Omma AI (tool by Spline ) It’s a fun new way to explore what the web can do! Are you interested in seeing the demo?

Gábor Pribék

176,257 Aufrufe • vor 3 Monaten

We release Diamond Maps💎 unlocking accurate and efficient guidance for diffusion models. Our experiments show that our methods scale incredibly well. Excited to see what people will build with this! Accurate guidance has been a notoriously hard problem, but in this work, we’re bringing TWO (!) solutions to the table. The recipe for success: 1️⃣ Speed: Use distilled models (flow maps, mean flows, consistency models). 2️⃣ Exploration: Inject stochasticity to properly explore your search space. Because this fundamentally improves anything using flow matching and diffusion, we see a lot of potential for applications across audio, robotics, molecules, and beyond. Paper: Code: Huge thanks to an amazing team: Douglas Chen, Luca Eyring @ ICML26, Ishin Shah, Giri Anantharaman, Yutong (Kelly) He, Zeynep Akata, Tommi Jaakkola, Nicholas Boffi, and Max Simchowitz. It was awesome bringing this to life together!

We release Diamond Maps💎 unlocking accurate and efficient guidance for diffusion models. Our experiments show that our methods scale incredibly well. Excited to see what people will build with this! Accurate guidance has been a notoriously hard problem, but in this work, we’re bringing TWO (!) solutions to the table. The recipe for success: 1️⃣ Speed: Use distilled models (flow maps, mean flows, consistency models). 2️⃣ Exploration: Inject stochasticity to properly explore your search space. Because this fundamentally improves anything using flow matching and diffusion, we see a lot of potential for applications across audio, robotics, molecules, and beyond. Paper: Code: Huge thanks to an amazing team: Douglas Chen, Luca Eyring @ ICML26, Ishin Shah, Giri Anantharaman, Yutong (Kelly) He, Zeynep Akata, Tommi Jaakkola, Nicholas Boffi, and Max Simchowitz. It was awesome bringing this to life together!

Peter Holderrieth

60,179 Aufrufe • vor 3 Monaten

Gaussian Head Avatar: Ultra High-fidelity Head Avatar via Dynamic Gaussians paper page: Creating high-fidelity 3D head avatars has always been a research hotspot, but there remains a great challenge under lightweight sparse view setups. In this paper, we propose Gaussian Head Avatar represented by controllable 3D Gaussians for high-fidelity head avatar modeling. We optimize the neutral 3D Gaussians and a fully learned MLP-based deformation field to capture complex expressions. The two parts benefit each other, thereby our method can model fine-grained dynamic details while ensuring expression accuracy. Furthermore, we devise a well-designed geometry-guided initialization strategy based on implicit SDF and Deep Marching Tetrahedra for the stability and convergence of the training procedure. Experiments show our approach outperforms other state-of-the-art sparse-view methods, achieving ultra high-fidelity rendering quality at 2K resolution even under exaggerated expressions.

AK

65,853 Aufrufe • vor 2 Jahren

M+8: The ISS has multiple airlocks, some are designed for astronauts and equipment, and others like this one, the JEM airlock from the Japan Aerospace Exploration Agency, are designed not for humans but for small payloads. This particular airlock is positioned next to a Japanese platform for payloads that sits outside the ISS. We can use this airlock to take stuff inside to space, or from space to inside. 大西卓哉（JAXA宇宙飛行士）Takuya Onishi and I worked together to assemble the sliding table so that we can deliver science payloads to the outside of the International Space Station. Then we double checked the airlock interior to ensure we didn’t leave any tools or equipment inside which could get unintentionally lost in space.

M+8: The ISS has multiple airlocks, some are designed for astronauts and equipment, and others like this one, the JEM airlock from the Japan Aerospace Exploration Agency, are designed not for humans but for small payloads. This particular airlock is positioned next to a Japanese platform for payloads that sits outside the ISS. We can use this airlock to take stuff inside to space, or from space to inside. 大西卓哉（JAXA宇宙飛行士）Takuya Onishi and I worked together to assemble the sliding table so that we can deliver science payloads to the outside of the International Space Station. Then we double checked the airlock interior to ensure we didn’t leave any tools or equipment inside which could get unintentionally lost in space.

Jonny Kim

17,967 Aufrufe • vor 1 Jahr

For more than 15 years, Codrops has been a place for sharing experimental demos that push the boundaries of web design and development. Over time, we've also highlighted many creative demos from the community in our demo roundups and newsletter. For this reason we have evolved our Demos Hub into the new Creative Hub, which brings everything together in one place: a growing collection of hand-picked, open-source demos from Codrops and beyond. It's a space to discover, learn from, and celebrate the creativity of the web community. We're curating this collection carefully, but we also welcome submissions from creators who'd like to share their work. Come and explore the new Creative Hub:

For more than 15 years, Codrops has been a place for sharing experimental demos that push the boundaries of web design and development. Over time, we've also highlighted many creative demos from the community in our demo roundups and newsletter. For this reason we have evolved our Demos Hub into the new Creative Hub, which brings everything together in one place: a growing collection of hand-picked, open-source demos from Codrops and beyond. It's a space to discover, learn from, and celebrate the creativity of the web community. We're curating this collection carefully, but we also welcome submissions from creators who'd like to share their work. Come and explore the new Creative Hub:

Codrops

29,750 Aufrufe • vor 10 Monaten

🧬 We have many foundation models or language models for DNAs, but can we control them? We introduce Ctrl-DNA: Controllable Cell-Type-Specific Regulatory DNA Design via Constrained RL — a reinforcement learning framework for controllable cis-regulatory sequence generation. Paper: Code: 🔬What’s the challenge? Designing regulatory DNA that is both highly expressive in target cell types and inactive in others is essential for synthetic biology, gene therapy, and precision medicine. Yet, controlling these trade-offs is challenging due to sparse, sequence-level rewards and biological constraints. 🔥Why Ctrl-DNA? Ctrl-DNA fine-tunes pre-trained DNA language models using a value model free, Lagrangian-guided RL framework, enabling flexible and customizable constraint optimization. Users can define application-specific thresholds across cell types, balancing expression strength with specificity. ✅ Maximize target-cell expression ✅ Constrain off-target activity under user-defined thresholds ✅ Preserve cell-type-specific TF motif structure Benchmarked on human enhancer and promoter datasets, Ctrl-DNA consistently outperforms prior methods, achieving stronger specificity, higher fitness, and more biologically grounded sequence generation — all with direct control over regulatory trade-offs. Shoutout to the PhD students Xingyu Chen (Xingyu Chen ) and Rex Ma (Rex Ma) for their amazing work leading this project!

🧬 We have many foundation models or language models for DNAs, but can we control them? We introduce Ctrl-DNA: Controllable Cell-Type-Specific Regulatory DNA Design via Constrained RL — a reinforcement learning framework for controllable cis-regulatory sequence generation. Paper: Code: 🔬What’s the challenge? Designing regulatory DNA that is both highly expressive in target cell types and inactive in others is essential for synthetic biology, gene therapy, and precision medicine. Yet, controlling these trade-offs is challenging due to sparse, sequence-level rewards and biological constraints. 🔥Why Ctrl-DNA? Ctrl-DNA fine-tunes pre-trained DNA language models using a value model free, Lagrangian-guided RL framework, enabling flexible and customizable constraint optimization. Users can define application-specific thresholds across cell types, balancing expression strength with specificity. ✅ Maximize target-cell expression ✅ Constrain off-target activity under user-defined thresholds ✅ Preserve cell-type-specific TF motif structure Benchmarked on human enhancer and promoter datasets, Ctrl-DNA consistently outperforms prior methods, achieving stronger specificity, higher fitness, and more biologically grounded sequence generation — all with direct control over regulatory trade-offs. Shoutout to the PhD students Xingyu Chen (Xingyu Chen ) and Rex Ma (Rex Ma) for their amazing work leading this project!

Bo Wang

30,719 Aufrufe • vor 1 Jahr

We’re excited to introduce Text-to-LoRA: a Hypernetwork that generates task-specific LLM adapters (LoRAs) based on a text description of the task. Catch our presentation at #ICML2025! Paper: Code: Biological systems are capable of rapid adaptation, given limited sensory cues. For example, our human visual system can quickly adapt and tune its light sensitivity to our surroundings. While modern LLMs exhibit a wide variety of capabilities and knowledge, they remain rigid when adding task-specific capabilities. Traditionally, customizing these models requires gathering large datasets and performing often expensive, time-consuming fine-tuning for specific applications. To bypass these limitations, Text-to-LoRA (T2L) meta-learns a “hypernetwork” that takes in a text description of a desired task, as a prompt, and generates a task-specific LoRA that performs well on the task. In our experiments, we show that T2L can encode hundreds of existing LoRA adapters. While the compression is lossy, T2L maintains the performance of task-specifically tuned LoRA adapters. We also show that T2L can even generalize to unseen tasks given a natural language description of the tasks. Importantly, Text-to-LoRA is parameter-efficient. It generates LoRAs in a single, inexpensive step, based solely on a simple text description of the task. This approach is a step towards dramatically lowering the technical and computational barriers, allowing non-technical users to specialize foundation models using plain language, rather than needing deep technical expertise or large compute resources.

We’re excited to introduce Text-to-LoRA: a Hypernetwork that generates task-specific LLM adapters (LoRAs) based on a text description of the task. Catch our presentation at #ICML2025! Paper: Code: Biological systems are capable of rapid adaptation, given limited sensory cues. For example, our human visual system can quickly adapt and tune its light sensitivity to our surroundings. While modern LLMs exhibit a wide variety of capabilities and knowledge, they remain rigid when adding task-specific capabilities. Traditionally, customizing these models requires gathering large datasets and performing often expensive, time-consuming fine-tuning for specific applications. To bypass these limitations, Text-to-LoRA (T2L) meta-learns a “hypernetwork” that takes in a text description of a desired task, as a prompt, and generates a task-specific LoRA that performs well on the task. In our experiments, we show that T2L can encode hundreds of existing LoRA adapters. While the compression is lossy, T2L maintains the performance of task-specifically tuned LoRA adapters. We also show that T2L can even generalize to unseen tasks given a natural language description of the tasks. Importantly, Text-to-LoRA is parameter-efficient. It generates LoRAs in a single, inexpensive step, based solely on a simple text description of the task. This approach is a step towards dramatically lowering the technical and computational barriers, allowing non-technical users to specialize foundation models using plain language, rather than needing deep technical expertise or large compute resources.

Sakana AI

403,159 Aufrufe • vor 1 Jahr

One of THE MOST impactful changes we can make to urban roads in India is to maintain consistent road widths from one intersection to the next. Our current, variable road widths disrupt the orderly flow of traffic. For example, if there's just a bit of extra space on the side, auto-rickshaws attempt to squeeze in, causing chaos. If a road suddenly narrows from three lanes to two and a half, congestion follows. In cities like London, when extra right-of-way is found, the footpaths are widened up to the road. But in Mumbai, we tend to expand the road instead. The issue with this approach is that the number of private vehicles in a densely populated city like Mumbai can grow without limits. Expanding roads only encourages more vehicles. On the other hand, widening footpaths doesn't cause the same problem because the space required for pedestrians is limited—walking is highly space-efficient. #WalkingProject

One of THE MOST impactful changes we can make to urban roads in India is to maintain consistent road widths from one intersection to the next. Our current, variable road widths disrupt the orderly flow of traffic. For example, if there's just a bit of extra space on the side, auto-rickshaws attempt to squeeze in, causing chaos. If a road suddenly narrows from three lanes to two and a half, congestion follows. In cities like London, when extra right-of-way is found, the footpaths are widened up to the road. But in Mumbai, we tend to expand the road instead. The issue with this approach is that the number of private vehicles in a densely populated city like Mumbai can grow without limits. Expanding roads only encourages more vehicles. On the other hand, widening footpaths doesn't cause the same problem because the space required for pedestrians is limited—walking is highly space-efficient. #WalkingProject

Walking Project

21,448 Aufrufe • vor 1 Jahr

What if you kept asking an LLM to "make it better"? In some recent work at FAIR, we investigate how we can efficiently use RL to fine-tune LLMs to iteratively self-improve on their previous solutions at inference-time. Training for iterated self-improvement can be costly. The naive approach to training for K self-improvement steps leads to K times the number of rollout steps per episode. We introduce Exploratory Iteration (ExIt), an RL-based automatic curriculum method that bootstraps diverse training distributions of self-improvement tasks by upcycling the LLM's own responses at previous turns as the starting points for both self-improvement and *self-divergence.* In order to decide what task to train on next, the curriculum prioritizes sampling of partial turn histories that led to higher return variance in its GRPO group (a learnability score that comes for free). This automatic curriculum over the bootstrapped task space teaches the model how to perform iterated self-improvement while only ever training the model on single-step self-improvement tasks. We look at ExIt's impact in both single-turn (contest math problems) and multi-turn (BFCLv3 multi-turn tasks), as well as MLE-bench, where the LLM is run in a search scaffold to produce solutions to real Kaggle competitions. Across these eval settings, we find ExIt produces models with greater capacity for inference-time self-improvement compared to GRPO. Notably, ExIt models can self-improve on test tasks for many more steps than the typical solution depth encountered during training, including a 22% improvement in MLE-bench performance compared to GRPO.

What if you kept asking an LLM to "make it better"? In some recent work at FAIR, we investigate how we can efficiently use RL to fine-tune LLMs to iteratively self-improve on their previous solutions at inference-time. Training for iterated self-improvement can be costly. The naive approach to training for K self-improvement steps leads to K times the number of rollout steps per episode. We introduce Exploratory Iteration (ExIt), an RL-based automatic curriculum method that bootstraps diverse training distributions of self-improvement tasks by upcycling the LLM's own responses at previous turns as the starting points for both self-improvement and self-divergence. In order to decide what task to train on next, the curriculum prioritizes sampling of partial turn histories that led to higher return variance in its GRPO group (a learnability score that comes for free). This automatic curriculum over the bootstrapped task space teaches the model how to perform iterated self-improvement while only ever training the model on single-step self-improvement tasks. We look at ExIt's impact in both single-turn (contest math problems) and multi-turn (BFCLv3 multi-turn tasks), as well as MLE-bench, where the LLM is run in a search scaffold to produce solutions to real Kaggle competitions. Across these eval settings, we find ExIt produces models with greater capacity for inference-time self-improvement compared to GRPO. Notably, ExIt models can self-improve on test tasks for many more steps than the typical solution depth encountered during training, including a 22% improvement in MLE-bench performance compared to GRPO.

Minqi Jiang

41,099 Aufrufe • vor 10 Monaten

💬 We get asked What should I do if I don’t have my own trading ideas yet? ❕ Answer from a GT App Specialist: You don’t need to be a professional strategist to start trading. GT App’s AI layer generates new strategy ideas every day that you can immediately explore and test. 🔸 Daily AI-generated strategies Advanced LLMs build fresh trading strategies daily. The LLM Builder creates complete strategy setups that you can instantly optimize to see how they would have performed. 🔸 Pick and test in seconds Inside the app you’ll find AI strategy cards labeled by the LLM that generated them. Select a strategy, run an optimization, and instantly review metrics like win rate, trade history, and profit performance. 🔸 Or build a strategy directly in Telegram You can also generate and test strategies through our Telegram bot. Just open @gt_ai_trading_bot, request a strategy for a trading pair, and the AI will build and backtest it for you. Explore AI-generated strategies 👉

💬 We get asked What should I do if I don’t have my own trading ideas yet? ❕ Answer from a GT App Specialist: You don’t need to be a professional strategist to start trading. GT App’s AI layer generates new strategy ideas every day that you can immediately explore and test. 🔸 Daily AI-generated strategies Advanced LLMs build fresh trading strategies daily. The LLM Builder creates complete strategy setups that you can instantly optimize to see how they would have performed. 🔸 Pick and test in seconds Inside the app you’ll find AI strategy cards labeled by the LLM that generated them. Select a strategy, run an optimization, and instantly review metrics like win rate, trade history, and profit performance. 🔸 Or build a strategy directly in Telegram You can also generate and test strategies through our Telegram bot. Just open @gt_ai_trading_bot, request a strategy for a trading pair, and the AI will build and backtest it for you. Explore AI-generated strategies 👉

GT Protocol

32,277 Aufrufe • vor 4 Monaten