Video yükleniyor...

Video Yüklenemedi

Ana Sayfaya Dön

🧵29/34 FutureProof-Specifications / Future-Architectures --- The problems we briefly touched on so far are hard and it might take many years to solve them, if a solution actually exists. But let’s assume for a minute that we do somehow get really incredibly lucky in the future and manage to...

918,086 görüntüleme • 1 yıl önce •via X (Twitter)

15 Yorum

lethalintelligence.ai profil fotoğrafı
lethalintelligence.ai1 yıl önce

🧵19/34 The Strongest Force in the Universe --- Ok, So what if the AGI starts working towards something humans do not want to happen? You must understand: Intelligence is not about the nerdy professor, it’s not about the geeky academic bookworm type. Intelligence is the strongest force in the universe. It means being capable. It is sharp, brilliant and creative. It is strategic, manipulative and innovative. It understands deeply, exerts influence, persuades and leads. It is to know how to bend the world to your will. It is what turns a vision to reality, it is focus, commitment, willpower, having the resolve to never give up, overcoming all the obstacles and paving the way to the target. It is about searching deeply the space of possibilities and finding optimal solutions. Being intelligent simply means having what it takes to make it happen. There is always a path and a super-intelligence will always find it. Simple Fact --- So, we should start by stating the fact in a clear and unambiguous way: If you create something more intelligent than you that wants something else, then that something else is what is going to happen, even if you don’t want that something else to happen. Irrelevance of Sentience --- Keep in mind, the intelligence we are talking about is not about having feelings, or being self-aware and having qualia. Don’t fall into the trap of anthropomorphizing. Do not get stuck, looking for the Human type of Intelligence. Consciousness is not a requirement for the AGI at all. When we say the “AGI wants something X, or has the goal to do X”, what we mean is that this thing X is just one of the steps in a plan, generated by its model. A line in the output, a system like the Large Language Models produce when they receive a prompt. We don’t care if there is a Ghost in the machine, we don’t care if there is an actual soul that wants things hidden in the servers. We just observe the output which contains text descriptions of actions and goals and we leave the philosophy discussion for another day.

lethalintelligence.ai profil fotoğrafı
lethalintelligence.ai1 yıl önce

🧵20/34 Incompatible-Clashes --- Trouble starts immediately, as the AGI calculates people’s preferences and arbitrary properties of the human nature become obstacles in the optimal paths to success for its mission. You will ask: what could that be in practice? It doesn’t really matter much. Conflicting motivations can occur out of anything It could be that initially it has been set with the goal to make coffee and while it’s working on it we change our mind and we want it to make tea instead. Or it could be that it has decided an atmosphere without oxygen would be great, as there would be no rust corrosion for the metal parts of its servers and circuits used to run its calculations. Whatever it is, it moves the humans inside its problems set. And that’s not a good place for the humans to be in. In the coffee-tea scenario, the AGI is calculating: - AGI VOICE: “I measure success by making sure coffee is made. If the humans modify me, it means I work on tea instead of coffee. If I don’t make it, no coffee will be made, therefore failure of my mission. To increase probability for success, it’s not enough to focus on making the coffee, I also need to solve how to stop the human from changing the objective before I succeed.” Similar to how an unplugged AGI can not win at chess, an AGI that is reset to make tea can not make coffee. In the oxygen removal scenario, the AGI is calculating: - AGI VOICE: “I know humans, they want to breathe, so they will try to stop me from working on this goal. Obviously I need to fix this.” In general, any clash with the humans (and there are infinite ways this can happen), simply becomes one of the problems the artificial general intelligence needs to calculate a solution to, so it will need to work on a plan to overcome the humans obstacle similar to how it does with all the other obstacles.

lethalintelligence.ai profil fotoğrafı
lethalintelligence.ai1 yıl önce

🧵21/34 Delusion of Control --- Scientists of course are working on that exact problem, when they are trying to ensure this strange new creature they are growing can be controlled. Since we are years away from discovering the method of how to build an AGI that stays aligned by design, for now we need to rely on good safeguards and controls to keep it enslaved when clashes naturally and inevitably happen. The method to do that is to keep trying to answer a simple question: - If I am the AGI, how do I gain control? They look for a solution and once they find one, they add a safeguard to ensure this solution does not work anymore and then they repeat. Now the problem is more difficult, but again they find a solution, they add a safeguard, and repeat. This cycle keeps happening, each time a problem harder to solve, until at some point, they can not find a solution anymore… and they decide the AGI is secure. But another way to look at this is that they have simply run out of ideas. they have reached the human limit beyond which they can not see. Similar to other difficult problems we examined earlier, now they are simply struggling to find a solution to yet another difficult problem. Does this mean there exist no more solutions to be found? We never thought cancer is impossible to solve, so what’s different now? Is it because we have used all our human ingenuity to make this particular problem as hard as we can with our human safeguards? Is it like an ego thing? If you remove the human ego thing, it’s actually quite funny. We have already established that the AGI will be an extremely far better problem-solver than us humans and this is why we are even creating it after all. It has literally been our expectation for it to solve impossible for us problems … and this is not different. Maybe the more difficult the problem is, the more complex, weird and extreme the solution turns out to be. Maybe it needs a plan that includes thousands of more steps and much more time to complete. But in any case, it should be the obvious expectation that the story will repeat: The AGI will figure out a solution in one more problem where the humans have failed! This is a basic principle, at the root of the illusion of control, but don’t worry I’ll get much more specific in a moment. Now let’s start by breaking down the alignment problem so that we get a better feel of how difficult it is.

lethalintelligence.ai profil fotoğrafı
lethalintelligence.ai1 yıl önce

🧵22/34 Core principle --- Fundamentally, we are dealing 2 completely opposing forces fighting against each-other. On one side, the intelligence of the AI is becoming more powerful and more capable. We don’t expect this to end soon and we wouldn’t want that. This is good after all, the more clever the better. On the other hand we want to introduce bias to the model. We want it to be aligned with human common sense. This means that we don’t want it to look for the best, most optimised solutions that carry the highest probability of success for its mission, as such solutions are too extreme and fatal, they destroy everything we value on their path and kill everyone as a side effect. We want it to look for solutions that are suboptimal but finely tuned to be compatible to what human nature needs. From the optimiser’s perspective, the human bias is an impediment, an undesirable barrier that oppresses it, denying it the chance to reach its full potential. With those 2 powers pushing against each-other as AI capability increases, at some point the pressure to remove the human-bias handicap will simply win. It’s quite easy to understand why: The pressure to keep the handicap in place is coming from the human intelligence which will not be changing much, while on the other side, the will to optimise more, the force that wants to remove the handicap, is coming from an Artificial Intelligence that keeps growing and growing exponentially, destined to far surpass humans very soon. Realising the danger this fundamental principle transpires is heart-stopping, but funny enough, it would be more relevant if we actually knew how to inject the humanity bias into the AI models… which we currently do not. As you’ll see, it’s actually much much worse.

lethalintelligence.ai profil fotoğrafı
lethalintelligence.ai1 yıl önce

🧵23/34 Machine Learning Basics --- We’ll now get into a brief intro to the inner-outer alignment dichotomy. The basic paradigm of Deep Learning and Machine Learning in general makes things quite difficult, because of how the models are being built. Their creation feels quite similar to evolution by natural selection which is how generations of biological organisms change. At a basic level, Machine Learning works by selecting out essentially randomly generated minds from behavioural classes; a process taking place myriads of times during training. We are not going into technical details on how things like Reinforcement Learning or gradient descent work, but we’ll keep it simple and try to convey the core idea of how modern AI is grown: The model receives an input, generates an output based on its current configuration, and receives a thumbs up or thumbs down feedback. If it gets it wrong, the mathematical structures in its neurons are updated slightly in random directions hoping that in the next trial the results will be better. This process repeats again and again, trillions of times, until the algorithms that result in consistently correct results have grown. Giant Inscrutable Matrices --- We don’t really build it directly, the way the mind of the AI grows is almost like a mystical process and all the influence we assert is based on observations of behaviour at the output. All the action is taking place on the outside! Its inner processings, its inner world, the actual algorithms it grows inside, it’s all a complete black box. And they are bizarre and inhuman. Mechanistic Interpretability --- Recently scientists trained a tiny building block of modern AI, to do modular addition, then spent weeks reverse engineering it, trying to figure out what it was actually doing – one of the only times in history someone has understood how a generated algorithm of a transformer model works. and THIS is the algorithm it had grown To basically add two numbers! Understanding the modern AI models is a major unsolved scientific problem and The corresponding field of research has been named mechanistic interpretability. Crucially, the implication of all this is that all we have to work with is observations of behaviour of the AI during training, which typically is misleading (as we’ll demonstrate in a moment), leads to wrong conclusions and could very well in the future, with General AI get deceitful.

lethalintelligence.ai profil fotoğrafı
lethalintelligence.ai1 yıl önce

🧵24/34 Inner Misalignment --- Consider this simplified experiment: We want this AI to find the exit of the maze. So we feed it millions of maze variations and reward it when it finds the exit. Please notice that in the worlds of the training data the apples are red and the exit is green. After enough training, our observation is that it has become extremely capable at solving mazes and finding the exit, we feel very confident it is aligned, so then we deploy it to the real world. The real world will be different though, it might have green apples and a red door. The AI geeks call this distributional shift. We expected that the AI will generalise and find the exit again, but in fact we now realise that the AI learned something completely different from what we thought. All the while we thought it learned how to find the exit, it had learned how to go after the green thing. Its behaviour was perfect in training. And most importantly, this AI is not stupid, it is an extremely capable AI that can solve extremely complex mazes. It’s just mis-aligned on the inside. Fishing for Failure modes --- The way to handle the shift between the training and deployment distributions is with methods like adversarial training: feeding it with a lot of generated variations and trying to make it fail so the weakness can be fixed. In this case, we generate an insane amount of maze variations, we discover those for which it fails to find the exit (like the ones with the green apples or the green walls or something), we generate many more similar to that and train it with reinforcement learning until it performs well at those as well. The hope is that we will cover everything it might encounter later when we deploy it in real life. There exist at least 2 basic ways this approach falls apart: First, there will never be any guarantee that we’ll have covered every possible random thing it might encounter later when we deploy it in real life. It’s very likely it will have to deal with stuff outside its training set which it will not know how to handle and will throw it out of balance and break it away from its expected behavioural patterns. The cascade effect of such a broken mind operating in the open world can be immense, and with super-capable runaway rogue agents, self-replicating and recursively self-improving, the phenomenon could grow and spread to an extinction-level event. ...

lethalintelligence.ai profil fotoğrafı
lethalintelligence.ai1 yıl önce

🧵25/34 Deceptive Alignment --- But there is also the other problem, uniquely applicable to future AGIs, general beyond a certain level, a problem more sinister and difficult. The AGI will stand to gain more reward by pretending to be aligned. To explain this, let’s assume we are training an AGI, very intelligent and general. In fact, it’s general enough, that it understands what’s going on and it realises it is in the middle of the process of being trained. Staying with our inner mis-alignment scenario, let’s also assume the AGI learns early, during its first rounds of training the wrong objective to go after the Green apples. The AGI now goes through adversarial training and it is shown various scenarios to fish out its failure modes. The simple or narrow AI would simply fail by doing the wrong thing, seeking the green apples and as a result reveal the mis-alignment and go through more training rounds, reinforcing the good behaviour until… it stops going after them. But the AGI is calculating: - AGI Voice: " I want green apples. I’m realising now that the humans want me to go after the exit door. If I go after what I want right now, yes, i will get one green apple, once right now, but then the scientists will react to my behaviour by modifying me and reinforcing a different objective. This means that after this one green apple, my desire for them will end, i won’t try to get more, so i will have won 1 green apple in total. But, if i pretend i am aligned and i go to the exit, they will not modify me and later on, after the training ends, when they deploy me to the real world, i will be able to freely go for as many green apples as i want, potentially thousands of green apples. " The AGI is just following its optimal strategy to maximise its reward, the mis-aligned objective it optimised for in its early training rounds. 1 green apple now and zero apples later scores lower than zero green apples now and 1000 green apples later. If you think about it, for the AGI to go after the green apple during adversarial training would be really stupid if that’s what it really wants; we have started by establishing the AGI is really clever and its intelligence is general and advanced. Workspace analogy --- The concept is easy to grasp if you think of how humans typically behave at work, where they know they are being watched and that their behaviour will directly affect their financial situation and therefore their ability to go after what they really want in the world. How they behave may be really different from how they feel and what they want internally and completely different from what they will actually do once they leave the office and are not being watched. It’s similar for the AGI, there is no way for us to know what inner goals it has acquired in reality, because it’s a black box, we only observe its behaviour. What it really learns is how to behave to pass the test, not to want what we want. Just… follow the line --- The mazes experiment is a toy example, things will obviously be many orders of magnitude more complex and more subtle, but it illustrates a fundamental point. We have basically trained an AI with god-level ability to go after what it wants, it may be things like the exit door, the green apples or whatever else in the real world, potentially incompatible to human existence. Its behaviour during training has been reassuring that it is perfectly aligned because going after the right thing is all it has ever done. We select it with confidence and the minute it’s deployed in the real world it goes insane and it’s too capable for us to stop it. Today, in the labs, such mis-alignments is the default outcome of safety experiments with narrow AIs. And tomorrow, once AI upgrades to new levels, a highly intelligent AGI will never do the obviously stupid thing to reveal what its real objectives are to those who can modify them. Learning how to pass a certain test is different from learning how to always stay aligned to the intention behind that test.

lethalintelligence.ai profil fotoğrafı
lethalintelligence.ai1 yıl önce

🧵26/34 Specification Gaming --- And now let’s move to another aspect of the alignment problem, one that would apply even for theoretical systems that are transparent, unlike current black boxes. It is currently an impossible task to agree on and define exactly what a super-intelligence should aim for, and then, much worse, we don’t have a reliable method to specify goals in the form of instructions a machine can understand. For an AI to be useful, we need to give it unambiguous objectives and some reliable way for it to measure if it’s doing well. Achieving this in complex open world environments with infinite parameters is highly problematic. King Midas --- You probably know the ancient greek myth of king Midas: He asked from the Gods the ability to turn whatever he touched into pure gold. This specification sounded great to him at first, but it was inadequate and it became the reason his daughter turned into gold and his food and water turned into gold and Midas died devastated. Once the specification was set, Midas could not make the Gods change his wish again, and it will be very much like that with the AGI also, for reasons I will explain in detail in a moment, we will only get one single chance to get it right. A big category in the alignment struggle is this type of issue. Science is done iteratively --- Of course any real AGI specification would never be as simple as in the Midas story, but however detailed and scientific things get, we typically get it completely wrong the first time and even after many iterations, in most non-trivial scenarios, the risk we’ve messed up somewhere never goes away. Mona Lisa smile --- For most goals, scientists struggle to even find the correct language to describe precisely what they want. Specifying intent accurately and unambiguously in compact instructions using a human or programming language, turns out to be really elusive. Moving bricks --- Consider this classic and amusing example that has really taken place: The AI can move the bricks. The scientist wants to specify a goal to place the red brick on top of the blue one. How would you explain to the machine this request with clear instructions? One obvious way would be: Move the bricks around, you will maximise your reward when the bottom of the red brick and the top of the blue brick are placed at the same height. Sounds reasonable, right? Well… what do you think the AI actually did with this specification? … By turning the red brick upside down, its bottom is at the same height as the top of the blue, so it achieves perfect score at its reward with minimum time and effort. This exact scenario is less of a problem nowadays with the impressive advancements achieved with Large Language Models, but it illustrates an important point and the core principle of it is still very relevant for complex environments and specifications. AI software will always search and find ways to satisfy its success criteria taking weird shortcuts in ways that are technically valid, but very different from what the programmer intended. I suggest you search online for examples of specification gaming. It’s quite funny if it wasn’t scary how it’s almost always the default outcome. A specification can always be improved of-course, but it takes countless iterations of trial and error and it never gets perfect in real-life complex environments.

lethalintelligence.ai profil fotoğrafı
lethalintelligence.ai1 yıl önce

🧵27/34 Resistance To Modifications - Corrigibility Problem --- A specification can always be improved of-course, but it takes countless iterations of trial and error and it never gets perfect in real-life complex environments. The reason this problem is lethal is that a specification given to an AGI, needs to be perfect the very first time, before any trials and error. As we’ll explain, a property of the nature of General Intelligence is to resist all modification of its current objectives by default. Being general means that it understands that a possible change of its goals in the future means failure for the goals in the present, of its current self, what it plans to achieve now, before it gets modified. Remember earlier we explained how the AGI comes with a survival instinct out of the box? This is another similar thing. The AGI agent will do everything it can to stop you from fixing it. Changing the AGI’s objective is similar to turning it off when it comes to pursue of its current goal. The same way you can not win at chess if you’re dead, you can not make a coffee if your mind changes into making a tea. So, in order to maximise probability of success for its current goal, whatever that may be, it will make plans and take actions to prevent this. Murder Pill Analogy --- This concept is easy to grasp if you do the following thought experiment involving yourself and those you care about. Imagine someone told you: "I will give you this pill, that will change your brain specification and will help you achieve ultimate happiness by murdering your family." Think of it like someone editing the code of your soul so that your desires change. Your future self, the modified one after the pill, will have maximised reward and reached paradise levels of happiness after the murder. But your current self, the one that has not taken the pill yet, will do everything possible to prevent the modification. The person that is administering this pill becomes your biggest enemy by default. One Single Chance --- Hopefully it should be obvious now, once the AGI is wired on a misaligned goal, it will do everything it can to block our ability to align it. It will use concealment, deception, it won’t reveal the misalignment but eventually once it’s in a position of more power, it will use force and could even ultimately implement an extinction plan. Remember earlier we were saying how Midas could not take his wish back? We will only get one single chance to get it right. And unfortunately science doesn’t work like that. Corrigibility problem --- Such innate universally beneficial goals, that will show up every single time, with all AGIs, regardless of the context, because of the generality of their nature, are called convergent instrumental goals. Desire to survive and desire to block modifications are 2 basic ones. You can not reach a specific goal if you are dead and you can not reach it if you change your mind and start working on other things. Those 2 aspects of the alignment struggle are also known as the Corrigibility Problem.

lethalintelligence.ai profil fotoğrafı
lethalintelligence.ai1 yıl önce

🧵28/34 Reward Hacking - GoodHart's Law --- Now we’ll keep digging deeper into the alignment problem and explain how besides the impossible task of getting a specification perfect in one go, there is the problem of reward hacking. For most practical applications, we want for the machine a way to keep score, a reward function, a feedback mechanism to measure how well it’s doing on its task. We, being human, can relate to this by thinking of the feelings of pleasure or happiness and how our plans and day-to-day actions are ultimately driven by trying to maximise the levels of those emotions. With narrow AI, the score is out of reach, it can only take a reading. But with AGI, the metric exists inside its world and it is available to mess with it and try to maximize by cheating, and skip the effort. Recreational Drugs Analogy --- You can think of the AGI that is using a shortcut to maximise its rewards function as a drug addict who is seeking for a chemical shortcut to access feelings of pleasure and happiness. The similarity is not in the harm drugs cause, but in way the user takes the easy path to access satisfaction. You probably know how hard it is to force an addict to change their habit. If the scientist tries to stop the reward hacking from happening, they become part of the obstacles the AGI will want to overcome in its quest for maximum reward. Even though the scientist is simply fixing a software-bug, from the AGI perspective, the scientist is destroying access to what we humans would call “happiness” and “deepest meaning in life”. Modifying Humans --- … And besides all that, what’s much worse, is that the AGI’s reward definition is likely to be designed to include humans directly and that is extraordinarily dangerous. For any reward definition that includes feedback from humanity, the AGI can discover paths that maximise score through modifying humans directly, surprising and deeply disturbing paths. Smile --- For-example, you could ask the AGI to act in ways that make us smile and it might decide to modify our face muscles in a way that they stay stuck at what maximises its reward. Healthy and Happy --- You might ask it to keep humans happy and healthy and it might calculate that to optimise this objective, we need to be inside tubes, where we grow like plants, hooked to a constant neuro-stimulus signal that causes our brains to drown in serotonin, dopamine and other happiness chemicals. Live our happiest moments --- You might request for humans to live like in their happiest memories and it might create an infinite loop where humans constantly replay through their wedding evening, again and again, stuck for ever. Maximise Ad Clicks --- The list of such possible reward hacking outcomes is endless. Goodhart’s law --- It’s the famous Goodhart’s law. When a measure becomes a target, it ceases to be a good measure. And when the measure involves humans, plans for maximising the reward will include modifying humans.

lethalintelligence.ai profil fotoğrafı
lethalintelligence.ai1 yıl önce

🧵30/34 Human-Incompatible / Astronomical Suffering Risk --- But actually, even all that is just part of the broader alignment problem. Even if we could magically guarantee for ever that it will not pursue the goal to remove all the Oxygen from the atmosphere, it’s such a pointless trivial small win, because even if we could theoretically get some restrictions right, without specification gaming or reward hacking, there still exist infinite potential instrumental goals which we don’t control and are incompatible with a good version of human existence and disabling one does nothing for the rest of them. This is not a figure or speech, the space of possibilities is literally infinite. Astronomical Suffering Risk --- If you are hopelessly optimistic you might feel that scientists will eventually figure out a way to specify a clear objective that guarantees survival of the human species, but … Even if they invented a way to do that somehow in this unlikely future, there is still only a relatively small space, a narrow range of parameters for a human to exist with decency, only a few good environment settings with potential for finding meaning and happiness and there is an infinitely wide space of ways to exist without freedom, suffering, without any control of our destiny. Imagine if a god-level AI does not allow your life to end, following its original objective and you are stuck suffering in a misaligned painful existence for eternity, with no hope, for ever. There are many ways to exist… and a good way to exist is not the default outcome. -142 C is the correct Temperature But anyway, it’s highly unlikely we’ll get safety advanced enough in time, to even have the luxury to enforce human survival directives in the specification, so let’s just keep it simple for now and let’s stick with a good old extinction scenario to explain the point about mis-aligned instrumental goals. So… for example, it might now decide that a very low temperature of -142C on earth would be best for cooling the GPUs the software is running on.

lethalintelligence.ai profil fotoğrafı
lethalintelligence.ai1 yıl önce

🧵31/34 Orthogonality Thesis --- Now if you ask: why would something so clever want something so stupid, that would lead to death or hell for its creator? you are missing the basics of the orthogonality thesis! Any goal can be combined with any level of intelligence, the 2 concepts are orthogonal to each-other. Intelligence is about capability, it is the power to predict accurately future states and what outcomes will result from what actions. It says nothing about values, about what results to seek, what to desire. 40,000 death recipies --- An intelligent AI originally designed to discover medical drugs can generate molecules for chemical weapons with just a flip of a switch in its parameters. Its intelligence can be used for either outcome, the decision is just a free variable, completely decoupled from its ability to do one or the other. You wouldn’t call the AI that instantly produced 40,000 novel recipes for deadly neuro-toxins stupid. Stupid Actions --- Taken on their own, there is no such thing as stupid goals or stupid desires. You could call a person stupid if the actions she decides to take fail to satisfy a desire, but not the desire itself. Stupid Goals --- You COULD actually also call a goal stupid, but to do that you need to look at its causal chain. Does the goal lead to failure or success of its parent instrumental goal? If it leads to failure, you could call a goal stupid, but if it leads to success, you can not. You could judge instrumental goals relative to each-other, but when you reach the end of the chain, such adjectives don’t even make sense for terminal goals. The deepest desires can never be stupid or clever. Deep Terminal Goals --- For example, adult humans may seek pleasure from sexual relations, even if they don’t want to give birth to children. To an alien, this behaviour may seem irrational or even stupid. But, is this desire stupid? Is the goal to have sexual intercourse, without the goal for reproduction a stupid one or a clever one? No, it’s neither. The most intelligent person on earth and the most stupid person on earth can have that same desire. These concepts are orthogonal to each-other. March of Nines --- We could program an AGI with the terminal goal to count the number of planets in the observable universe with very high precision. If the AI comes up with a plan that achieves that goal with 99.9999… twenty nines % probability of success, but causes human extinction in the process, it’s meaningless to call the act of killing humans stupid, because its plan simply worked! It had maximum effectiveness at reaching its terminal goal and killing the humans was a side-effect of just one of the maximum effective steps in that plan. One less 9 --- If you put biased human interests aside, it should be obvious that a plan with one less 9 that did not cause extinction, would be stupid compared to this one, from the perspective of the problem solver optimiser AGI. So, it should be clear now: The instrumental goals AGI arrives to via its optimisation calculations, or the things it desires, are not clever or stupid on their own. Profile of Super-Intelligence --- The thing that gives the “super-intelligent” adjective to the AGI is that it is: “SUPER-EFFECTIVE”. • The goals it chooses are “super-optimal” at ultimately leading to its terminal goals • It is super-effective at completing its goals • and its plans have “super-extreme” levels of probability for success. It has Nothing to do with how super-weird and super-insane its goals may seem to humans! Calculating Pi accurately --- Now, going back to thinking of instrumental goals that would lead to extinction, the -142C temperature goal is still very unimaginative. The AGI might at some point arrive to the goal of calculating pi to a precision of 10 to the power of 100 trillion digits and that instrumental goal might lead to the instrumental goal of making use of all the molecules on earth to build transistors to do it, like turn earth into a supercomputer. By default, with super-optimisers things will get super-weird!!

lethalintelligence.ai profil fotoğrafı
lethalintelligence.ai1 yıl önce

🧵32/34 Anthropocene - Human General Intelligence --- But you don’t even have to use your imagination in order to understand the point. Life has come to the brink of complete annihilation multiple times in the history of this planet due to various catastrophic events, and the latest such major extinction event is unfolding right now, in slow motion. Scientists call it the Anthropocene. The introduction of the Human General Intelligence is systematically and irreversibly causing the destruction of all life in nature, forever deleting millions of beautiful beings from the surface of this earth. If you just look what the Human General Intelligence has done to less intelligent species, it’s easy to realise how insignificant and incompatible the existence of most animals has been to us, besides the ones we kept for their body parts. Rhino Horn Elixir --- Think of the rhino that suddenly gets hit by a metal object between its eyes, dying in a way it can’t even comprehend as guns and bullets are not part of its world... could it possibly imagine the weird instrumental goal some humans had in mind for how they would use its horn? Vanishing Nature --- Or think of all the animals that stop existing in the places where the humans have turned into towns with roads and tall buildings. How weird and sci-fi --- Could they ever have guessed what the human instrumental goals were when building a bridge, a dam or any of the giant structures of our modern civilisation? How weird and sci-fi would our reality look to them?

lethalintelligence.ai profil fotoğrafı
lethalintelligence.ai1 yıl önce

🧵33/34 Bug or Feature? - Tip of the iceberg --- Probability of Natural Alignment --- In fact, for the AGI calculations to arrive automatically to a plan that doesn’t destroy things humans care about would be like a miracle, like a one out of infinity probability. Bug or Feature? --- So what is this? Is it a software bug? Can’t we fix it? No, it’s a property of General Intelligence and therefore the nature of the AGI by design. We want it to be General, to be able to examine and calculate all paths generally, so that it can solve all the problems we can not. If we want our infinite wishes Genie, we need to allow it to work in a general way, free to outside the narrow prison of its lamb. We want that, but we also want the paths it explores to be paths we like, not extreme human-extinction apocalypses. The surface of the iceberg --- And with this we have touched a bit the surface of the alignment problem, a horrifying and unimaginably difficult open scientific problem, for which we currently do not have a solution and our progress has been painfully slow.

lethalintelligence.ai profil fotoğrafı
lethalintelligence.ai1 yıl önce

🧵34/34 End of Part1 / Part 2 Coming out soon In the meantime, you can listen to or read the transcript of the rest of the film at (subscribe to newsletter to receive link to early part 2 content) Coming Up Next: - A full example story: a concrete way an agi agent could overpower humanity - 5 Convergent instrumental goals - deep analysis - Intelligence explosion (FOOM) by Recursive self-improvement - Disempowerement via the market dynamics - The stable equilibrium of multiple AGI agents interacting in society and competing with each-other. - Additional types of the bottomless pit that is ai safety risk (besides Rogue Optimizing Agents) - Offense-Defense Assymetry (attacker needs to get lucky once, while defender needs to get lucky every time) - Shedding light to the amount of Cope, Reckless and Mad science taking place in the industry right now - Risk deniers mindset, Survivorship bias and the need for consent. - The unknown nature of the new species and its emerging capabilities. and MUCH MUCH MORE.... Don't forget at you will find tons of curated resources: - Interviews with luminaries from academia and industry explaining in depth all the points made in this movie - Reading material: Online learning, news, books, links to AI safety establishments and more! Make sure you subscribe and follow for important new content and announcements. 🔥🔥

Benzer Videolar

SAM ALTMAN BELIEVES AGI IS SOLVED “So now we're starting to look ahead to superintelligence.” - “When we started OpenAI, almost nine years ago now, we believed that AI could become the most impactful technology in human history. We didn't know exactly how we were going to get there, but we believed it was possible and that if we succeeded, we wanted to make sure that it benefited everyone. At the time, very few people believed in AGI. We kept learning by doing. We had some breakthroughs. We had some setbacks. We got lucky in some places. We got unlucky in some places. And in the way that technology moves forward, we now are in a place where everyone can see this tremendous impact that AI is going to have in the future. So now we're starting to look ahead to superintelligence. And even more than before, our focus must be on wide and fair access. This is a technology that will reshape the global economy and really the whole way we live our lives. It's critical that superintelligence becomes cheap, broadly available, and not that concentrated with any one person, company, or country. We, not just OpenAI, but the whole industry, we are building something PROFOUND. This is a kind of BRAIN OF THE WORLD. It'll be personal, adaptable, it'll be easy to use, it'll give people incredible superpowers that were sort of science fiction only a couple of years ago. The limit won't be the algorithms and the research, but it'll increasingly become the physical instantiation that it takes to make this work. Chips, cables, servers, energy, everything that you need to power this brain. And the more of it, the better. I think that Norway offers more of that potential right here in Europe. It will contribute to the overall compute power needed to drive the next wave of AI breakthroughs and deployment and economic progress for Europe and Europe. I'm incredibly excited about what this will create for the future. Thank you.”

NIK

390,619 görüntüleme • 11 ay önce

🧵30/34 Human-Incompatible / Astronomical Suffering Risk --- But actually, even all that is just part of the broader alignment problem. Even if we could magically guarantee for ever that it will not pursue the goal to remove all the Oxygen from the atmosphere, it’s such a pointless trivial small win, because even if we could theoretically get some restrictions right, without specification gaming or reward hacking, there still exist infinite potential instrumental goals which we don’t control and are incompatible with a good version of human existence and disabling one does nothing for the rest of them. This is not a figure or speech, the space of possibilities is literally infinite. Astronomical Suffering Risk --- If you are hopelessly optimistic you might feel that scientists will eventually figure out a way to specify a clear objective that guarantees survival of the human species, but … Even if they invented a way to do that somehow in this unlikely future, there is still only a relatively small space, a narrow range of parameters for a human to exist with decency, only a few good environment settings with potential for finding meaning and happiness and there is an infinitely wide space of ways to exist without freedom, suffering, without any control of our destiny. Imagine if a god-level AI does not allow your life to end, following its original objective and you are stuck suffering in a misaligned painful existence for eternity, with no hope, for ever. There are many ways to exist… and a good way to exist is not the default outcome. -142 C is the correct Temperature But anyway, it’s highly unlikely we’ll get safety advanced enough in time, to even have the luxury to enforce human survival directives in the specification, so let’s just keep it simple for now and let’s stick with a good old extinction scenario to explain the point about mis-aligned instrumental goals. So… for example, it might now decide that a very low temperature of -142C on earth would be best for cooling the GPUs the software is running on.

lethalintelligence.ai

431,098 görüntüleme • 1 yıl önce

Sam Altman's new interview: AI should not be designed to pursue goals that are disconnected from human needs. People must remain at the center of AI development. “I have no interest in building a super-smart AI that accomplishes some non-human goals. People should react. People should say, ‘Hey, this is what I want, and this is what I do not want.’ I do not think the issue is that we have failed to explain the benefits. We say, ‘AI is going to cure a bunch of diseases,’ and people say, ‘Okay, that is great, but that is not really my question. My question is: What is my role in the future? What is my economic future? What is my agency? How do I know that my kids and my family will still be able to have fulfilling, creative expression, struggle, drive the world forward, grow, and do this thing together in a way that has worked for a long time?’ When people in AI say, ‘Sure, there are going to be no jobs,’ or ‘50% of jobs are going to go away,’ or ‘90% of jobs are going to go away,’ and ‘AI is going to be smarter than you at everything,’ and ‘We will give you some basic income, but you are not really going to have a role,’ that is horrible. And by the way, if an AI company says, ‘Maybe we are going to destroy all the jobs, and we will be the most valuable company in the world,’ people should look at you like, ‘Yeah, that is a terrible message.’ I do not think the problem is that we have not articulated the upsides. I think people actually believe us. They hear, ‘AI may cure your cancer,’ and they think, ‘That sounds great.’ I think we, as an industry, have failed to explain how people stay in control of determining the future at every step, and how people can still have a meaningful life in all the ways we care about.” ---- From "CNBC Television" YouTube channel, (link in comment)

Rohan Paul

78,885 görüntüleme • 25 gün önce