Загрузка видео...
Не удалось загрузить видео
Wow, we can steer diffusion models at inference time! Introducing Diffusion Tree Sampling (DTS): a search-based approach inspired by Monte Carlo Tree Search that turns inference into an anytime, reward-guided optimization process. Diffusion Tree Sampling (DTS) produces asymptotically exact samples from the target distribution in the limit of infinite... show more
19,037 просмотров • 11 месяцев назад •via X (Twitter)
Комментарии: 8

Diffusion Tree Sampling: Scalable inference-time alignment of diffusion models Paper: Page:

anti diffusion with scalar pixels framing tensor nodes

steering diffusion models? sounds like a fun ride. let’s hope this doesn’t lead us down another rabbit hole of unintended consequences. keep your seatbelts fastened, folks.

We’ve been able to steer diffusion models at inference for a long time now lol

Adding horizontal lines to images improves VLM (vision language model) performance of tasks like counting, visual search, spatial understating, scene understanding, and more

We open-sourced the codebase of Flextok. Flextok is an image tokenizer that produces flexible-length token sequences and represents image content in a compressed coarse-to-fine way. Like in PCA: the 1st token captures the most compressed representation of the image, the 2nd token is added on top of the 1st token and adds more details, and so on. This contrasts with most common image tokenizers, which output fixed-size token sequences and often roughly align with local image content. Flexible-length, coarse-to-fine tokens are a useful and intuitive structure to model. They impact the whole pipeline of image generation and understanding. Flextok does this using simple and effective known mechanisms, e.g., applying nested dropout on tokens during training. The emerged structure looks semantic, while no language-based supervision was used anywhere. We’ll present it at #ICML25. Demo: Visuals: Code: @EPFL_en @Apple @ICepfl @EPFL_AI_Center

Transition Matching: Scalable and Flexible Generative Modeling "This paper introduces Transition Matching (TM), a novel discrete-time, continuous-state generative paradigm that unifies and advances both diffusion/flow models and continuous AR generation. TM decomposes complex generation tasks into simpler Markov transitions, allowing for expressive non-deterministic probability transition kernels and arbitrary non-continuous supervision processes, thereby unlocking new flexible design avenues."

Reward models have transformed LLM research by incorporating human preferences into the training process. Here’s how they work from the ground up… What is a reward model? Reward models (RMs) are specialized LLMs—usually derived from an LLM that we are currently training—that are trained to predict a human preference score given a prompt and a candidate completion as input. A higher score from the RM indicates that a given completion is likely to be preferred by humans. Bradley-Terry model. The standard implementation of an RM is derived from the Bradley-Terry model of preference—a statistical model used to rank paired comparison data based on the relative strength or performance of items in the pair. Given two events i and j drawn from the same distribution, the Bradley-Terry model defines the probability that item i wins—or is preferred—compared to item j. For LLMs, items i and j are two completions generated by the same LLM and from the same prompt (i.e., same distribution). The RM assigns a score to each of these completions, and we use Bradley-Terry to express probabilities for pairwise comparisons between two completions. Preference data is used extensively in LLM post-training. Such data consists of many different prompts. For each prompt, we have a pair of candidate completions, where one completion has been identified—by a human or a model—as preferable to the other. RM architecture. In practice, RMs are implemented with an LLM by adding a linear head to the end of the decoder-only architecture. Specifically, the LLM outputs a list of token vectors—one for each input token vector—and we pass the final vector from this list through the linear head to produce a single, scalar score. RMs are just specialized LLMs with an extra classification head used to classify a completion as preferred or not preferred. Training process. The parameters of the RM are usually initialized with an existing policy; e.g., the SFT or pretrained base model. Once the RM is initialized, we add the linear head and train it over a preference dataset. Given a preference pair, we want our RM to assign a higher score to the chosen response relative to the rejected response. We can use the Bradley-Terry model to express this probability. By rearranging this probability expression, we obtain a pairwise ranking loss that encourages the model to assign higher scores to chosen responses.
