
Ravid Shwartz Ziv
@ziv_ravid • 11,969 subscribers
AI researcher | Meta | NYU. Working on compression, representation learning, and memory. I have an AI podcast! https://t.co/Bzzp2Oq4Cc
Shorts
Videos

New episode of The Information Bottleneck is out, this time with Zhuang Liu (Princeton). We talked about ConvNeXt and whether architecture still matters; dataset bias and what "good data" actually looks like; ImageBind and why vision is the natural bridge across modalities; CLIP's blind spots; memory as the real bottleneck behind the agent hype; whether LLMs have world models; and Transformers Without Normalization. For years, the vision community debated what actually matters: architecture, inductive bias, self-attention vs convolution. After a lot of back-and-forth, we ended up in a funny place: ViT and ConvNet give roughly the same performance once you tune the details. What I find interesting is that once you reach a certain performance level, it becomes much easier to swap and tweak components without really changing the outcome. Talking to Zhuang on this episode, I kept wondering whether the same is now true for LLMs. If we wil spent serious time on an alternative architecture today, would you actually get a meaningfully different model, or just land on the same Pareto curve with extra steps? I'm starting to suspect it's the latter. Architecture matters less than we think. Data, compute, and a handful of pillars do most of the work.
Ravid Shwartz Ziv25,121 görüntüleme • 1 ay önce

Some more thoughts about Yann interview: Even if LLMs work great, that's missing the point. Everyone's doing the same thing now. More scale, more data, longer CoT, tweak RL. But the path to get there was completely stochastic. Attention, transformers, scaling laws, RLHF, none of it was obvious, it came from people trying very different things. We wouldn't be here if everyone had agreed early on which direction to go. Assuming the next leap comes from all of us optimizing the same recipe is dangerous. I agree that the situation today is indeed way more complicated. Back then you could test a wild idea on a few GPUs. Now every serious bet costs millions in compute. That makes it harder to explore, which makes the convergence problem even worse. But that's exactly why we need to be more intentional about funding diverse approaches, not less.
Ravid Shwartz Ziv57,486 görüntüleme • 6 ay önce

New episode of The Information Bottleneck is out 🥳 In this one, we talked with Sasha Rush, a researcher at Cursor and professor at Cornell, about the hottest topic - coding agents! We talked about how Cursor trains Composer, the challenges, and where all this is going. One point Sasha made stuck with me. Coding agents work really well, but only when you can specify a clear hill-climbing signal. A problem where the agent knows it's getting better. Karpathy tried this on nanochat a few weeks back, letting an agent run overnight to optimize the validation loss autonomously. The follow-ups were mixed. Sometimes the "improvement" was worse than classical methods. Most real problems don't come pre-packaged with a reward signal at all. You don't know if a new architecture is better until you've already run the experiment. You don't know if a refactor is cleaner until someone reads it. You don't know if a product decision worked until months later. A lot of what we call hard problems are hard precisely because the signal is missing, noisy, or expensive to get. I think the next big challenge isn't getting agents to solve problems. It's finding problems you can actually formalize as hill climbing, or building a cheap proxy that correlates with what you care about.
Ravid Shwartz Ziv18,911 görüntüleme • 2 ay önce

New episode of the Information Bottleneck! We talked with Stefano Ermon about why he thinks diffusion LLMs will replace autoregressive ones. Stefano co-invented DDIM, FlashAttention, DPO, and score-based diffusion models. He's a Stanford professor and now runs Inception AI, where they built Mercury II. We go deep but also cover the bigger picture - the startup journey, PhD vs industry, and where AI is heading. A few things that stuck with me: - He thinks of autoregressive models as typewriters and diffusion models as editors. One goes left to right. The other starts messy and refines. - Mercury II (their text difussion model) already beats the fastest autoregressive models on latency-critical stuff as voice agents, code suggestions, anything where you have a tight time budget. And it does it because diffusion generates tokens in parallel instead of one at a time. - We also got into whether AI will actually replace software engineers (his answer: no), PhD vs industry advice, and what it was like going from an ICML best paper to raising money.
Ravid Shwartz Ziv22,334 görüntüleme • 2 ay önce

Another piece from the Sasha Rush conversation, this time on rewards for coding RL. He said Cursor uses a mix. Some rewards look at the tool calls themselves, some only at the final output. Everything end-to-end, no process rewards guessing what happens in the middle. I agree with him that the next push (and the current one) is going past verifiers. Coding is a good example where you get both. You have the 0/1 signals, but also a ton of soft ones. Does the diff read well? Did the agent burn context for no reason? Is the PR the right size? Did it rewrite a file it didn't need to touch? So many signals sitting there, and we need to figure out how to actually train on them. This is also why I think the environment (and the data) is becoming the bottleneck, not the algorithm. GRPO works. PPO works. The hard part is building a sandbox that exposes enough signal to learn from. We're already seeing this, and it'll only accelerate: a whole category of companies selling environments is emerging, and they have a good reward to do so…
Ravid Shwartz Ziv14,826 görüntüleme • 1 ay önce

Here's a simple rule I use: When someone tells you AGI is coming, ask yourself what they're selling. VCs need the hype for valuations. Researchers need it for funding. Doomers sell fear. Skeptics sell their brand. Even I'm selling you my podcast 😅 There's no single moment where we'll suddenly have it. We'll reach superhuman level on some tasks, human level on others, and stay behind on many. Some tasks have already happened, while others will take forever. The thing is that for 99% of problems, I don't care about "general intelligence." I just want something that solves my problem. Sometimes it's way better than humans. Sometimes about the same. Sometimes worse but still useful. None of this is "AGI." "AGI" is not a scientific concept. It's a story. And usually someone is selling it to you.
Ravid Shwartz Ziv27,881 görüntüleme • 6 ay önce

I believe safety is important, but we must distinguish between two very different narratives: The "Sci-Fi" Narrative: Stories about AI controlling the world, having "feelings," or possessing secret intelligence designed to fool us. Too often, the research in this narrative is used for PR or as an excuse to stifle open-source research. The Engineering Reality: How do we build systems that are robust and hard to break? As Yann points out, current LLMs rely on post-training for safety, which is inherently fragile and can always be jailbroken. He argues for "Objective-Driven AI, which means systems that satisfy safety constraints by construction, similar to how a jet engine is engineered to handle stress. I agree with Yann that patching models with fine-tuning isn't the long-term solution. However, the practical path to embedding these hard "guardrails" into a reasoning agent is still a massive open question. We know what we need, but we haven't figured out how to build it effectively in practice.
Ravid Shwartz Ziv23,782 görüntüleme • 6 ay önce
Daha fazla içerik yok.