Video yükleniyor...
Video Yüklenemedi
Adjoint-based diffusion samplers have simple & scalable objectives w/o impt weight complication. Like many, though, they solve degenerate Schrödinger bridges, despite all being SB-inspired. 📢 Proudly introduce #Adjoint #Schrödinger #Bridge #Sampler, a full SB-based sampler that is simple to implement, scalable, practically very effective, theoretically sounded, and extends AM... show more
44,389 görüntüleme • 11 ay önce •via X (Twitter)
7 Yorum

What works best in practice? The adjoint SB sampler with Brownian reference dynamics to bypass computation of the lean adjoint or your non-equilibrium annealed adjoint sampler?

ASBS scales much better whereas NAAS converges much faster in smaller-scale setups. Both have good performances 🙂 I view them as two orthogonal directions in improving Adjoint Sampling with better exploration (with non-Dirac priors or annealing reference paths)

This is super cool!

The 2-Simplicial is really cool work, but its a negative results IMO. O(n^3) made approximately O(n^2) through sliding windows. Faster at >50k tokens, but untested for long-context. Tested on short context metrics (2k-16k tokens) where its 3-10x slower but 0.45-2.27% better.

Diffusion models have analytical solutions, but they involve sums over the entire training set, and they don't generalise at all. They are mainly useful to help us understand how practical diffusion models generalise. Nice blog + code by Raymond Fan:

Excited to share that I’ll be presenting two oral papers in this ICML—see u guys in Vancouver!!🇨🇦 1️⃣ Understanding Masked Diffusion Models theoretically/scientifically 2️⃣ Theoretical analysis on LoRA training

We benchmarked leading multimodal foundation models (GPT-4o, Claude 3.5 Sonnet, Gemini, Llama, etc.) on standard computer vision tasks—from segmentation to surface normal estimation—using standard datasets like COCO and ImageNet. These models have made remarkable progress; however, it is unclear exactly where they stand in terms of understanding vision in detail. Especially when it comes to tasks beyond question-answering. How well do they understand an object's segments or geometry? Our analyses yield an assessment that is quantitatively and qualitatively detailed and is compatible with evaluations developed in the field of computer vision over the past decades. Observed trends: 🔹 The foundation models consistently underperform task-specific SOTA models across all tasks. However, they are respectable generalists, which is remarkable as they are presumably trained primarily on image-text-based tasks. 🔹 They perform semantic tasks notably better than geometric ones. 🔹 GPT-4o performs the best among non-reasoning models, getting the top position in 4 out of 6 tasks. 🔹 Reasoning models, e.g., o3, show improvements in geometric tasks. 🔹 The 'image generation' models, e.g., GPT-40 Image Generation, which have been natively trained multimodally, exhibit quirks. E.g., hallucinated objects, misalignment between the input and output, etc. 🔹 While the prompting techniques affect performance, better models exhibit less sensitivity to variations in prompts. We control for the variance introduced by the prompting methods in our experiments. 🌐 Detailed analyses, visualizations: ⌨️ code: 🧵 1/n

