Video wird geladen...

Video konnte nicht geladen werden

Beim Laden dieses Videos ist ein Problem aufgetreten. Dies könnte an einem vorübergehenden Netzwerkproblem liegen oder das Video ist möglicherweise nicht verfügbar.

This layer in DINO-v2 dedicates about half its attention mass to a single operation. Each of the sixteen heads independently learns the same circuit to perform this task. What is this all-important operation? The “no-op”. That’s right, we’re spending half our computation to do… absolutely nothing.

Rudy Gilman

3,112 subscribers

97,147 Aufrufe • vor 1 Jahr •via X (Twitter)

Bildung Wissenschaft & Technologie

Anya Rossi• Live Now

Private livecam show

10 Kommentare

Profilbild von Rudy Gilman

Rudy Gilmanvor 1 Jahr

The “no-op” acts as an attention sink. It allows a head to not move anything if it doesn’t want to. The circuit works by attending strongly to a single spatial location and then moving a “nothing vector” in the value projection. Each head uses one or two of its QK pairwise detectors to attend to the no-op position. This is the threshold. If other QK matches want to count for anything in the softmax they have to exceed this threshold.

Profilbild von Rudy Gilman

Rudy Gilmanvor 1 Jahr

If we look at the K projection, we see that one channel looks for the no-op token. The matching Q channel fires all the time, thus creating a coupled QK detector that promotes attention to the no-op location regardless of Q location. We can see that it’s a “nothing vector” by looking at the size of the vector in the V projection.

Profilbild von Rudy Gilman

Rudy Gilmanvor 1 Jahr

The “no-op” isn’t a new finding. Clark et al saw it in BERT, where it attends to the SEP token Kobayashi points out that you need to weight your attn patterns by the magnitude of your V projection if you want a real sense of what’s being moved Anthropic reiterated Kobayashi’s point in their transformers circuits work I was just surprised by the sheer mass of attention we’re spending on the no-op, especially when this attention matrix is the memory bottleneck!

Profilbild von Rudy Gilman

Rudy Gilmanvor 1 Jahr

For the record I’ve never liked the softmax. Feels coercive. Activations in linear / conv layers are voluntary, and indeed many neurons choose to not fire most of the time. But with softmax attention we’re forcing the model to develop this complex “no-op” apparatus just to get what other layers enjoy naturally: the ability to not fire when the pattern it’s looking for isn’t there!

Profilbild von Mingjie Sun

Mingjie Sunvor 1 Jahr

Check out our paper In the last section, we explain this phenomenon on DINO-v2 with registers.

Profilbild von Rudy Gilman

Rudy Gilmanvor 1 Jahr

This is a great reference, thank you! Definitely worth the read.

Profilbild von Zach Nussbaum

Zach Nussbaumvor 1 Jahr

i’m pretty sure there are vits trained with registers/sinks based on that finding!

Profilbild von Rudy Gilman

Rudy Gilmanvor 1 Jahr

Yes I suspect these are related to @TimDarcet 's registers! more evidence for this is that the no-op detectors also fire somewhat for the CLS token, which we know has global information.

Profilbild von uɐɥdǝʇS

uɐɥdǝʇSvor 1 Jahr

wasn't this supposed to be a fix for some of this or am I misunderstanding?

Profilbild von Rudy Gilman

Rudy Gilmanvor 1 Jahr

Yes this definitely seems connected to the phenomenon of registers, you are understanding correctly!

Ähnliche Videos

The objective of the operation in Gaza city is to bring this war to an end. This is what it’s all about. ERTNEWS adamantia.l

The objective of the operation in Gaza city is to bring this war to an end. This is what it’s all about. ERTNEWS adamantia.l

Oren Marmorstein

29,741 Aufrufe • vor 9 Monaten

BREAKING: 100% coordinated op with Democrats on SSCI, timed right before the hearing. This is about derailing MAGA, the agenda, isolating cabinet members to turn them against each other. That’s what we’re seeing. This is what they do

BREAKING: 100% coordinated op with Democrats on SSCI, timed right before the hearing. This is about derailing MAGA, the agenda, isolating cabinet members to turn them against each other. That’s what we’re seeing. This is what they do

Jack Posobiec

876,048 Aufrufe • vor 1 Jahr

Americans are saying NO to the Epstein gaslight operation. This is all of us right now.

Americans are saying NO to the Epstein gaslight operation. This is all of us right now.

Candace Owens

448,977 Aufrufe • vor 11 Monaten

WATCH: DrOzCMS highlights the work of the Trump Administration's White House Task Force to Eliminate Fraud: "Our agency has now suspended payments to 850—almost half—of all the hospices in California... and we’re going to keep aggressively going at this problem."

WATCH: DrOzCMS highlights the work of the Trump Administration's White House Task Force to Eliminate Fraud: "Our agency has now suspended payments to 850—almost half—of all the hospices in California... and we’re going to keep aggressively going at this problem."

Rapid Response 47

49,104 Aufrufe • vor 22 Tagen

Q: What more do you need to do militarily for this operation to end? Trump: "More of the same and we'll see how that all comes out.”

Q: What more do you need to do militarily for this operation to end? Trump: "More of the same and we'll see how that all comes out.”

Republicans against Trump

43,724 Aufrufe • vor 3 Monaten

🚨Matt Gaetz on the DNC soliciting Soros funding for a political smear op against President Trump using the Steele Dossier—while John Brennan overruled CIA and then lied under oath committing perjury: “What appears to have happened is that the Debbie Wasserman Schultz DNC apparatus was trying to seduce the Soros operation into really funding a lot of the information operation around this.” “That’s not how you run a counterintel op… That’s how you run a political smear campaign.” “How come John Brennan never had to testify in public about this stuff?” “Brennan said, ‘I had nothing to do with any analysis of the Steele Dossier.’ That was perjury.”

🚨Matt Gaetz on the DNC soliciting Soros funding for a political smear op against President Trump using the Steele Dossier—while John Brennan overruled CIA and then lied under oath committing perjury: “What appears to have happened is that the Debbie Wasserman Schultz DNC apparatus was trying to seduce the Soros operation into really funding a lot of the information operation around this.” “That’s not how you run a counterintel op… That’s how you run a political smear campaign.” “How come John Brennan never had to testify in public about this stuff?” “Brennan said, ‘I had nothing to do with any analysis of the Steele Dossier.’ That was perjury.”

Benny Johnson

80,742 Aufrufe • vor 10 Monaten

DINO-v3 has a single high-magnitude channel on its residual pathway, channel 416. Turning off this single channel affects DINO's entire output by 50-80%. For context, turning off a random channel has an effect of less than one percent. The model builds up channel 416 in its last two layer-scale operations, using a single high-magnitude weight in each op to drastically ramp up channel 416's magnitude. This channel doesn't depend on the input, every image fires with a constant overlay. After bringing channel 416 up to a value of about ten thousand, DINO-v3 then scales it down in the final layer-norm to almost nothing, removing it without a trace.

DINO-v3 has a single high-magnitude channel on its residual pathway, channel 416. Turning off this single channel affects DINO's entire output by 50-80%. For context, turning off a random channel has an effect of less than one percent. The model builds up channel 416 in its last two layer-scale operations, using a single high-magnitude weight in each op to drastically ramp up channel 416's magnitude. This channel doesn't depend on the input, every image fires with a constant overlay. After bringing channel 416 up to a value of about ten thousand, DINO-v3 then scales it down in the final layer-norm to almost nothing, removing it without a trace.

Rudy Gilman

85,888 Aufrufe • vor 10 Monaten

What was Operation Talla? A directive from the government to police and all agencies telling them not to accept or take seriously any public reports or complaints about the shutdowns or the vaccines? What is this operation that the people in the UK are now trying to expose?

What was Operation Talla? A directive from the government to police and all agencies telling them not to accept or take seriously any public reports or complaints about the shutdowns or the vaccines? What is this operation that the people in the UK are now trying to expose?

Died Suddenly

57,589 Aufrufe • vor 5 Monaten

🧵1/ HOW TO SPOT AN ISRAELI INFLUENCE OPERATION Viva Frei (David Freiheit) is a jwho from Canada, living in the US, LARPing as an American patriot, misdirecting attention to China, shilling for an israeli spy operation. This is the "international jwho," my friends. They literally clothe themselves in our garb & use faux patriotism--all to win our affection and disguise their true identity & loyalty. Watch this short video all the way through to understand how they operate. What do you notice from this video?

🧵1/ HOW TO SPOT AN ISRAELI INFLUENCE OPERATION Viva Frei (David Freiheit) is a jwho from Canada, living in the US, LARPing as an American patriot, misdirecting attention to China, shilling for an israeli spy operation. This is the "international jwho," my friends. They literally clothe themselves in our garb & use faux patriotism--all to win our affection and disguise their true identity & loyalty. Watch this short video all the way through to understand how they operate. What do you notice from this video?

Sam Parker 🇺🇸🧯

126,121 Aufrufe • vor 1 Jahr

THE EPSTEIN INTELLIGENCE BLACKMAIL OPERATION. They use kids to blackmail the elite to control the world. This is all you need to know. Who's behind it all, and what are we going to do about it is the question?

THE EPSTEIN INTELLIGENCE BLACKMAIL OPERATION. They use kids to blackmail the elite to control the world. This is all you need to know. Who's behind it all, and what are we going to do about it is the question?

The SCIF

352,532 Aufrufe • vor 11 Monaten

"Was the US forced to strike because of an impending Israeli action?" Secretary Marco Rubio: "No... No matter what, ultimately, this operation needed to happen — that's the question of 'why now?' But this operation needed to happen because Iran, in about a year or a year and a half, would cross the line of immunity, meaning they would have so many short-range missiles, so many drones, that no one could do anything about it because they could hold the whole world hostage. Look at the damage they're doing now — and this is a weakened Iran. Imagine a year from now. So, that had to happen. Obviously, we were aware of Israeli intentions and understood what that would mean for us, and we had to be prepared to act as a result of it — but this had to happen no matter what."

"Was the US forced to strike because of an impending Israeli action?" Secretary Marco Rubio: "No... No matter what, ultimately, this operation needed to happen — that's the question of 'why now?' But this operation needed to happen because Iran, in about a year or a year and a half, would cross the line of immunity, meaning they would have so many short-range missiles, so many drones, that no one could do anything about it because they could hold the whole world hostage. Look at the damage they're doing now — and this is a weakened Iran. Imagine a year from now. So, that had to happen. Obviously, we were aware of Israeli intentions and understood what that would mean for us, and we had to be prepared to act as a result of it — but this had to happen no matter what."

Rapid Response 47

977,444 Aufrufe • vor 3 Monaten

Operation Epstien fury should now be changed to operation no Plan B. Israel is going to sampson option this shit once they run the iron dome dry. Because Iran ain't no Iraq. Both sides have now signalled that this is for all the cookies. A no win situation for the US, because there is nothing to win. Good luck to you all Long Live The Republic

Operation Epstien fury should now be changed to operation no Plan B. Israel is going to sampson option this shit once they run the iron dome dry. Because Iran ain't no Iraq. Both sides have now signalled that this is for all the cookies. A no win situation for the US, because there is nothing to win. Good luck to you all Long Live The Republic

The Artist known as Jess

285,801 Aufrufe • vor 3 Monaten

It’s amateur hour for the politicians at the Pentagon and White House in charge of this war. This is not a video game. And there is absolutely no excuse for the Trump operation to dishonor this nation by raising campaign money off the images of our war dead.

It’s amateur hour for the politicians at the Pentagon and White House in charge of this war. This is not a video game. And there is absolutely no excuse for the Trump operation to dishonor this nation by raising campaign money off the images of our war dead.

Pete Buttigieg

328,923 Aufrufe • vor 3 Monaten

Nick Fuentes says this isn’t the operation to topple Iran’s regime, it’s an operation to weaken Iran’s military capabilities. “This is not the end, this is just the beginning of a new chapter”

Nick Fuentes says this isn’t the operation to topple Iran’s regime, it’s an operation to weaken Iran’s military capabilities. “This is not the end, this is just the beginning of a new chapter”

Charging…

13,514 Aufrufe • vor 3 Monaten

Everyone’s talking about what Jensen said at GTC about AI capex. But the most important thing he said got almost no attention — that software companies are about to become the biggest resellers of AI tokens in the world. Think about what that means. Workday buys raw intelligence from OpenAI. Wraps it in HR domain expertise. Sells it back as an AI product at a massive markup. That’s not a software company anymore. That’s a refinery. And right now nobody is tracking the economics of that refining layer — what the crude costs, what the refined product sells for, and who’s capturing the spread. This is the most important data gap in AI right now. More on this soon. $NVDA

Everyone’s talking about what Jensen said at GTC about AI capex. But the most important thing he said got almost no attention — that software companies are about to become the biggest resellers of AI tokens in the world. Think about what that means. Workday buys raw intelligence from OpenAI. Wraps it in HR domain expertise. Sells it back as an AI product at a massive markup. That’s not a software company anymore. That’s a refinery. And right now nobody is tracking the economics of that refining layer — what the crude costs, what the refined product sells for, and who’s capturing the spread. This is the most important data gap in AI right now. More on this soon. $NVDA

Midnight Capital LLC

40,075 Aufrufe • vor 3 Monaten

"By what right have successive governments kept this really important issue a secret from the people?" Kevin O'Sullivan is furious about the government covering up a secret 'people smuggling operation' to bring people from Afghanistan to the UK. Kevin O'Sullivan | Alex Phillips

"By what right have successive governments kept this really important issue a secret from the people?" Kevin O'Sullivan is furious about the government covering up a secret 'people smuggling operation' to bring people from Afghanistan to the UK. Kevin O'Sullivan | Alex Phillips

Talk

18,209 Aufrufe • vor 11 Monaten

“Half of the Somalia community in Minneapolis Minnesota got here by fraud. That’s according to immigration and customs services here in the U.S.” Why is no one talking about this??

“Half of the Somalia community in Minneapolis Minnesota got here by fraud. That’s according to immigration and customs services here in the U.S.” Why is no one talking about this??

TONY™

131,249 Aufrufe • vor 7 Monaten

Nick Fuentes goes off on the third-worldist America haters. "This is our world, and at the minimum, this is our half of it. This is our half of the world for us to have uncontested, undisputed dominance... in every way, shape, and form."

Nick Fuentes goes off on the third-worldist America haters. "This is our world, and at the minimum, this is our half of it. This is our half of the world for us to have uncontested, undisputed dominance... in every way, shape, and form."

S.clips

140,190 Aufrufe • vor 5 Monaten

This #WorldChildrensDay, we’re sharing our film, ‘This is why play is so important’ – which celebrates children’s right to play. The film shows children and adults from all over Wales, explaining what they love about play, and what it means to them. 🎥

This #WorldChildrensDay, we’re sharing our film, ‘This is why play is so important’ – which celebrates children’s right to play. The film shows children and adults from all over Wales, explaining what they love about play, and what it means to them. 🎥

Play Wales

10,192 Aufrufe • vor 2 Jahren