Загрузка видео...

Не удалось загрузить видео

На главную

Wow, we can steer diffusion models at inference time! Introducing Diffusion Tree Sampling (DTS): a search-based approach inspired by Monte Carlo Tree Search that turns inference into an anytime, reward-guided optimization process. Diffusion Tree Sampling (DTS) produces asymptotically exact samples from the target distribution in the limit of infinite...

19,037 просмотров • 11 месяцев назад •via X (Twitter)

Комментарии: 8

Фото профиля 机器之心 JIQIZHIXIN
机器之心 JIQIZHIXIN11 месяцев назад

Diffusion Tree Sampling: Scalable inference-time alignment of diffusion models Paper: Page:

Фото профиля supervinculum
supervinculum11 месяцев назад

anti diffusion with scalar pixels framing tensor nodes

Фото профиля Tsukuyomi
Tsukuyomi11 месяцев назад

steering diffusion models? sounds like a fun ride. let’s hope this doesn’t lead us down another rabbit hole of unintended consequences. keep your seatbelts fastened, folks.

Фото профиля zay
zay11 месяцев назад

We’ve been able to steer diffusion models at inference for a long time now lol

Фото профиля Towaki Takikawa / 瀧川永遠希
Towaki Takikawa / 瀧川永遠希11 месяцев назад

Adding horizontal lines to images improves VLM (vision language model) performance of tasks like counting, visual search, spatial understating, scene understanding, and more

Фото профиля Amir Zamir
Amir Zamir11 месяцев назад

We open-sourced the codebase of Flextok. Flextok is an image tokenizer that produces flexible-length token sequences and represents image content in a compressed coarse-to-fine way. Like in PCA: the 1st token captures the most compressed representation of the image, the 2nd token is added on top of the 1st token and adds more details, and so on. This contrasts with most common image tokenizers, which output fixed-size token sequences and often roughly align with local image content. Flexible-length, coarse-to-fine tokens are a useful and intuitive structure to model. They impact the whole pipeline of image generation and understanding. Flextok does this using simple and effective known mechanisms, e.g., applying nested dropout on tokens during training. The emerged structure looks semantic, while no language-based supervision was used anywhere. We’ll present it at #ICML25. Demo: Visuals: Code: @EPFL_en @Apple @ICepfl @EPFL_AI_Center

Фото профиля Tanishq Mathew Abraham, Ph.D.
Tanishq Mathew Abraham, Ph.D.11 месяцев назад

Transition Matching: Scalable and Flexible Generative Modeling "This paper introduces Transition Matching (TM), a novel discrete-time, continuous-state generative paradigm that unifies and advances both diffusion/flow models and continuous AR generation. TM decomposes complex generation tasks into simpler Markov transitions, allowing for expressive non-deterministic probability transition kernels and arbitrary non-continuous supervision processes, thereby unlocking new flexible design avenues."

Фото профиля Cameron R. Wolfe, Ph.D.
Cameron R. Wolfe, Ph.D.11 месяцев назад

Reward models have transformed LLM research by incorporating human preferences into the training process. Here’s how they work from the ground up… What is a reward model? Reward models (RMs) are specialized LLMs—usually derived from an LLM that we are currently training—that are trained to predict a human preference score given a prompt and a candidate completion as input. A higher score from the RM indicates that a given completion is likely to be preferred by humans. Bradley-Terry model. The standard implementation of an RM is derived from the Bradley-Terry model of preference—a statistical model used to rank paired comparison data based on the relative strength or performance of items in the pair. Given two events i and j drawn from the same distribution, the Bradley-Terry model defines the probability that item i wins—or is preferred—compared to item j. For LLMs, items i and j are two completions generated by the same LLM and from the same prompt (i.e., same distribution). The RM assigns a score to each of these completions, and we use Bradley-Terry to express probabilities for pairwise comparisons between two completions. Preference data is used extensively in LLM post-training. Such data consists of many different prompts. For each prompt, we have a pair of candidate completions, where one completion has been identified—by a human or a model—as preferable to the other. RM architecture. In practice, RMs are implemented with an LLM by adding a linear head to the end of the decoder-only architecture. Specifically, the LLM outputs a list of token vectors—one for each input token vector—and we pass the final vector from this list through the linear head to produce a single, scalar score. RMs are just specialized LLMs with an extra classification head used to classify a completion as preferred or not preferred. Training process. The parameters of the RM are usually initialized with an existing policy; e.g., the SFT or pretrained base model. Once the RM is initialized, we add the linear head and train it over a preference dataset. Given a preference pair, we want our RM to assign a higher score to the chosen response relative to the rejected response. We can use the Bradley-Terry model to express this probability. By rearranging this probability expression, we obtain a pairwise ranking loss that encourages the model to assign higher scores to chosen responses.

Похожие видео

DreamCraft3D: Hierarchical 3D Generation with Bootstrapped Diffusion Prior paper page: present DreamCraft3D, a hierarchical 3D content generation method that produces high-fidelity and coherent 3D objects. We tackle the problem by leveraging a 2D reference image to guide the stages of geometry sculpting and texture boosting. A central focus of this work is to address the consistency issue that existing works encounter. To sculpt geometries that render coherently, we perform score distillation sampling via a view-dependent diffusion model. This 3D prior, alongside several training strategies, prioritizes the geometry consistency but compromises the texture fidelity. We further propose Bootstrapped Score Distillation to specifically boost the texture. We train a personalized diffusion model, Dreambooth, on the augmented renderings of the scene, imbuing it with 3D knowledge of the scene being optimized. The score distillation from this 3D-aware diffusion prior provides view-consistent guidance for the scene. Notably, through an alternating optimization of the diffusion prior and 3D scene representation, we achieve mutually reinforcing improvements: the optimized 3D scene aids in training the scene-specific diffusion model, which offers increasingly view-consistent guidance for 3D optimization. The optimization is thus bootstrapped and leads to substantial texture boosting. With tailored 3D priors throughout the hierarchical generation, DreamCraft3D generates coherent 3D objects with photorealistic renderings, advancing the state-of-the-art in 3D content generation.

AK

161,400 просмотров • 2 лет назад