正在加载视频...

视频加载失败

Adjoint-based diffusion samplers have simple & scalable objectives w/o impt weight complication. Like many, though, they solve degenerate Schrödinger bridges, despite all being SB-inspired. 📢 Proudly introduce #Adjoint #Schrödinger #Bridge #Sampler, a full SB-based sampler that is simple to implement, scalable, practically very effective, theoretically sounded, and extends AM...

44,389 次观看 • 11 个月前 •via X (Twitter)

7 条评论

Arnaud Doucet 的头像
Arnaud Doucet11 个月前

What works best in practice? The adjoint SB sampler with Brownian reference dynamics to bypass computation of the lean adjoint or your non-equilibrium annealed adjoint sampler?

Guan-Horng Liu 的头像
Guan-Horng Liu11 个月前

ASBS scales much better whereas NAAS converges much faster in smaller-scale setups. Both have good performances 🙂 I view them as two orthogonal directions in improving Adjoint Sampling with better exploration (with non-Dirac priors or annealing reference paths)

Gilad (SF summer) 的头像
Gilad (SF summer)11 个月前

This is super cool!

Alexia Jolicoeur-Martineau 的头像
Alexia Jolicoeur-Martineau11 个月前

The 2-Simplicial is really cool work, but its a negative results IMO. O(n^3) made approximately O(n^2) through sliding windows. Faster at >50k tokens, but untested for long-context. Tested on short context metrics (2k-16k tokens) where its 3-10x slower but 0.45-2.27% better.

Sander Dieleman 的头像
Sander Dieleman11 个月前

Diffusion models have analytical solutions, but they involve sums over the entire training set, and they don't generalise at all. They are mainly useful to help us understand how practical diffusion models generalise. Nice blog + code by Raymond Fan:

Jaeyeon Kim 的头像
Jaeyeon Kim11 个月前

Excited to share that I’ll be presenting two oral papers in this ICML—see u guys in Vancouver!!🇨🇦 1️⃣ Understanding Masked Diffusion Models theoretically/scientifically 2️⃣ Theoretical analysis on LoRA training

Amir Zamir 的头像
Amir Zamir11 个月前

We benchmarked leading multimodal foundation models (GPT-4o, Claude 3.5 Sonnet, Gemini, Llama, etc.) on standard computer vision tasks—from segmentation to surface normal estimation—using standard datasets like COCO and ImageNet. These models have made remarkable progress; however, it is unclear exactly where they stand in terms of understanding vision in detail. Especially when it comes to tasks beyond question-answering. How well do they understand an object's segments or geometry? Our analyses yield an assessment that is quantitatively and qualitatively detailed and is compatible with evaluations developed in the field of computer vision over the past decades. Observed trends: 🔹 The foundation models consistently underperform task-specific SOTA models across all tasks. However, they are respectable generalists, which is remarkable as they are presumably trained primarily on image-text-based tasks. 🔹 They perform semantic tasks notably better than geometric ones. 🔹 GPT-4o performs the best among non-reasoning models, getting the top position in 4 out of 6 tasks. 🔹 Reasoning models, e.g., o3, show improvements in geometric tasks. 🔹 The 'image generation' models, e.g., GPT-40 Image Generation, which have been natively trained multimodally, exhibit quirks. E.g., hallucinated objects, misalignment between the input and output, etc. 🔹 While the prompting techniques affect performance, better models exhibit less sensitivity to variations in prompts. We control for the variance introduced by the prompting methods in our experiments. 🌐 Detailed analyses, visualizations: ⌨️ code:  🧵 1/n

相关视频