Video yükleniyor...

Video Yüklenemedi

Ana Sayfaya Dön

Adjoint-based diffusion samplers have simple & scalable objectives w/o impt weight complication. Like many, though, they solve degenerate Schrödinger bridges, despite all being SB-inspired. 📢 Proudly introduce #Adjoint #Schrödinger #Bridge #Sampler, a full SB-based sampler that is simple to implement, scalable, practically very effective, theoretically sounded, and extends AM...

44,389 görüntüleme • 11 ay önce •via X (Twitter)

7 Yorum

Arnaud Doucet profil fotoğrafı
Arnaud Doucet11 ay önce

What works best in practice? The adjoint SB sampler with Brownian reference dynamics to bypass computation of the lean adjoint or your non-equilibrium annealed adjoint sampler?

Guan-Horng Liu profil fotoğrafı
Guan-Horng Liu11 ay önce

ASBS scales much better whereas NAAS converges much faster in smaller-scale setups. Both have good performances 🙂 I view them as two orthogonal directions in improving Adjoint Sampling with better exploration (with non-Dirac priors or annealing reference paths)

Gilad (SF summer) profil fotoğrafı
Gilad (SF summer)11 ay önce

This is super cool!

Alexia Jolicoeur-Martineau profil fotoğrafı
Alexia Jolicoeur-Martineau11 ay önce

The 2-Simplicial is really cool work, but its a negative results IMO. O(n^3) made approximately O(n^2) through sliding windows. Faster at >50k tokens, but untested for long-context. Tested on short context metrics (2k-16k tokens) where its 3-10x slower but 0.45-2.27% better.

Sander Dieleman profil fotoğrafı
Sander Dieleman11 ay önce

Diffusion models have analytical solutions, but they involve sums over the entire training set, and they don't generalise at all. They are mainly useful to help us understand how practical diffusion models generalise. Nice blog + code by Raymond Fan:

Jaeyeon Kim profil fotoğrafı
Jaeyeon Kim11 ay önce

Excited to share that I’ll be presenting two oral papers in this ICML—see u guys in Vancouver!!🇨🇦 1️⃣ Understanding Masked Diffusion Models theoretically/scientifically 2️⃣ Theoretical analysis on LoRA training

Amir Zamir profil fotoğrafı
Amir Zamir11 ay önce

We benchmarked leading multimodal foundation models (GPT-4o, Claude 3.5 Sonnet, Gemini, Llama, etc.) on standard computer vision tasks—from segmentation to surface normal estimation—using standard datasets like COCO and ImageNet. These models have made remarkable progress; however, it is unclear exactly where they stand in terms of understanding vision in detail. Especially when it comes to tasks beyond question-answering. How well do they understand an object's segments or geometry? Our analyses yield an assessment that is quantitatively and qualitatively detailed and is compatible with evaluations developed in the field of computer vision over the past decades. Observed trends: 🔹 The foundation models consistently underperform task-specific SOTA models across all tasks. However, they are respectable generalists, which is remarkable as they are presumably trained primarily on image-text-based tasks. 🔹 They perform semantic tasks notably better than geometric ones. 🔹 GPT-4o performs the best among non-reasoning models, getting the top position in 4 out of 6 tasks. 🔹 Reasoning models, e.g., o3, show improvements in geometric tasks. 🔹 The 'image generation' models, e.g., GPT-40 Image Generation, which have been natively trained multimodally, exhibit quirks. E.g., hallucinated objects, misalignment between the input and output, etc. 🔹 While the prompting techniques affect performance, better models exhibit less sensitivity to variations in prompts. We control for the variance introduced by the prompting methods in our experiments. 🌐 Detailed analyses, visualizations: ⌨️ code:  🧵 1/n

Benzer Videolar