Rudy Gilman's banner
Rudy Gilman's profile picture

Rudy Gilman

@rgilman333,109 subscribers

substrate agnostic

Shorts

The attention layers in the VAEs for FLUX, Stable Diffusion 3.5, and SDXL don't do anything. You can ablate them with almost no effect. At first I thought they might be involved in some clever circuitry—maybe moving global information—but no they're just flailing around doing nothing.

The attention layers in the VAEs for FLUX, Stable Diffusion 3.5, and SDXL don't do anything. You can ablate them with almost no effect. At first I thought they might be involved in some clever circuitry—maybe moving global information—but no they're just flailing around doing nothing.

88,587 görüntüleme

Half the channels in the last layer of the sdxl vae are disabled intentionally by the model itself. I've never seen this type of dead neuron before—it's like a form of apoptosis where the model kills off a part of itself so the rest may thrive. This is the final conv layer where we produce the output RGB. Notice half the neurons never contribute. The weights on the ignored channels are tiny.

Half the channels in the last layer of the sdxl vae are disabled intentionally by the model itself. I've never seen this type of dead neuron before—it's like a form of apoptosis where the model kills off a part of itself so the rest may thrive. This is the final conv layer where we produce the output RGB. Notice half the neurons never contribute. The weights on the ignored channels are tiny.

59,882 görüntüleme

The sdxl-VAE models a substantial amount of noise. Things we can't even see. It meticulously encodes the noise, uses precious bottleneck capacity to store it, then faithfully reconstructs it in the decoder. I grabbed what I thought was a simple black vector circle on a white background but the VAE latched on to a veritable Navajo quilt of background noise. I was absolutely flummoxed until I realized what was going on! A lot of capacity being used on things we don't need to be modelling at all. Suspected culprit is group norm.

The sdxl-VAE models a substantial amount of noise. Things we can't even see. It meticulously encodes the noise, uses precious bottleneck capacity to store it, then faithfully reconstructs it in the decoder. I grabbed what I thought was a simple black vector circle on a white background but the VAE latched on to a veritable Navajo quilt of background noise. I was absolutely flummoxed until I realized what was going on! A lot of capacity being used on things we don't need to be modelling at all. Suspected culprit is group norm.

52,110 görüntüleme

The later features in DINO-v2 are more abstract and semantically meaningful than I'd expected from the training objectives. This neuron responds only to hugs. Nothing else, just hugs.

The later features in DINO-v2 are more abstract and semantically meaningful than I'd expected from the training objectives. This neuron responds only to hugs. Nothing else, just hugs.

34,369 görüntüleme

This is Siglip-2's dedicated DEI neuron. It fires for LGBTQ and indigenous flags, BLM imagery, and especially mixed-race groups doing happy things together (e.g. business meetings, jumping triumphantly, reaching summits etc)

This is Siglip-2's dedicated DEI neuron. It fires for LGBTQ and indigenous flags, BLM imagery, and especially mixed-race groups doing happy things together (e.g. business meetings, jumping triumphantly, reaching summits etc)

22,825 görüntüleme

A challenge for ML diagnosticians: The patient, Segment-Anything-Model-2 (SAM-2), is presenting with two prominent symptoms: Symptom 1) Extremely high-magnitude tokens spread evenly across spatial dimensions.

A challenge for ML diagnosticians: The patient, Segment-Anything-Model-2 (SAM-2), is presenting with two prominent symptoms: Symptom 1) Extremely high-magnitude tokens spread evenly across spatial dimensions.

25,931 görüntüleme

Videos

Daha fazla içerik yok.