انتقل إلى وضع عدم الاتصال باستخدام تطبيق Player FM !
AF - In Defense of Open-Minded UDT by Abram Demski
سلسلة مؤرشفة ("تلقيمة معطلة" status)
When? This feed was archived on October 23, 2024 10:10 (). Last successful fetch was on September 19, 2024 11:06 ()
Why? تلقيمة معطلة status. لم تتمكن خوادمنا من جلب تلقيمة بودكاست صحيحة لفترة طويلة.
What now? You might be able to find a more up-to-date version using the search function. This series will no longer be checked for updates. If you believe this to be in error, please check if the publisher's feed link below is valid and contact support to request the feed be restored or if you have any other concerns about this.
Manage episode 433989447 series 3337166
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: In Defense of Open-Minded UDT, published by Abram Demski on August 12, 2024 on The AI Alignment Forum.
A Defense of Open-Minded Updatelessness, with applications to alignment.
This work owes a great debt to many conversations with Sahil, Martín Soto, and Scott Garrabrant.
Sahil and I will have a public discussion about this this Wednesday (August 14) at 11am EDT; join via this link.
You can support my work on Patreon.
Iterated Counterfactual Mugging On a Single Coinflip
Iterated counterfactual mugging on a single coinflip begins like a classic counterfactual mugging, with Omega approaching you, explaining the situation, and asking for your money. Let's say you buy the classic UDT idea, so you happily give Omega your money.
Next week, Omega appears again, with the same question. However, Omega clarifies that it has used the same coin-flip as last week.
This throws you off a little bit, but you see that the math is the same either way; your prior still assigns a 50-50 chance to both outcomes. If you thought it was a good deal last week, you should also think it is a good deal this week. You pay up again.
On the third week, Omega makes the same offer again, and once again has used the same coinflip. You ask Omega how many times it's going to do this. Omega replies, "forever". You ask Omega whether it would have continued coming if the coin had landed heads; it says "Of course! How else could I make you this offer now? Since the coin landed tails, I will come and ask you for $100 every single week going forward.
If the coin had landed heads, I would have simulated what would happen if it had landed tails, and I would come and give you $10,000 on every week that simulated-you gives up $100!"
Let's say for the sake of the thought experiment that you can afford to give Omega $100 once a week. It hurts, but it doesn't hurt as much as getting $10,000 from Omega every week would have benefited you, if that had happened.[1]
Nonetheless, I suspect many readers will feel some doubt creep in as they imagine giving Omega $100 week after week after week. The first few weeks, the possibility of the coin landing heads might feel "very real". Heck yeah I want to be the sort of person who gets a 50% chance of 10K from Omega for a (50% chance) cost of $100!
By the hundredth week, though, you may feel yourself the fool for giving up so much money for the imaginary benefit of the "heads" world that never was.
If you think you'd still happily give up the $100 for as long as Omega kept asking, then I would ask you to consider a counterlogical mugging instead. Rather than flipping a coin, Omega uses a digit of the binary expansion of π; as before, Omega uses the same digit week after week, for infinitely many counterlogical muggings.
Feeling uneasy yet? Does the possibility of the digit of π going one way or the other continue to feel "just as real" as time passes? Or do you become more sympathetic to the idea that, at some point, you're wasting money on helping a non-real world?
UDT vs Learning
Updateless Decision Theory (UDT) clearly keeps giving Omega the $100 forever in this situation, at least, under the usual assumptions. A single Counterfactual Mugging is not any different from an infinitely iterated one, especially in the version above where only a single coinflip is used. The ordinary decision between "give up $100" and "refuse" is isomorphic to the choice of general policy "give up $100 forever" and "refuse forever".[2]
However, the idea of applying a decision theory to a specific decision problem is actually quite subtle, especially for UDT. We generally assume an agent's prior equals the probabilities described in the decision problem.[3] A simple interpretation of this could be that the agent is born with this prior (and immediately placed into the decision problem). This isn't v...
392 حلقات
سلسلة مؤرشفة ("تلقيمة معطلة" status)
When? This feed was archived on October 23, 2024 10:10 (). Last successful fetch was on September 19, 2024 11:06 ()
Why? تلقيمة معطلة status. لم تتمكن خوادمنا من جلب تلقيمة بودكاست صحيحة لفترة طويلة.
What now? You might be able to find a more up-to-date version using the search function. This series will no longer be checked for updates. If you believe this to be in error, please check if the publisher's feed link below is valid and contact support to request the feed be restored or if you have any other concerns about this.
Manage episode 433989447 series 3337166
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: In Defense of Open-Minded UDT, published by Abram Demski on August 12, 2024 on The AI Alignment Forum.
A Defense of Open-Minded Updatelessness, with applications to alignment.
This work owes a great debt to many conversations with Sahil, Martín Soto, and Scott Garrabrant.
Sahil and I will have a public discussion about this this Wednesday (August 14) at 11am EDT; join via this link.
You can support my work on Patreon.
Iterated Counterfactual Mugging On a Single Coinflip
Iterated counterfactual mugging on a single coinflip begins like a classic counterfactual mugging, with Omega approaching you, explaining the situation, and asking for your money. Let's say you buy the classic UDT idea, so you happily give Omega your money.
Next week, Omega appears again, with the same question. However, Omega clarifies that it has used the same coin-flip as last week.
This throws you off a little bit, but you see that the math is the same either way; your prior still assigns a 50-50 chance to both outcomes. If you thought it was a good deal last week, you should also think it is a good deal this week. You pay up again.
On the third week, Omega makes the same offer again, and once again has used the same coinflip. You ask Omega how many times it's going to do this. Omega replies, "forever". You ask Omega whether it would have continued coming if the coin had landed heads; it says "Of course! How else could I make you this offer now? Since the coin landed tails, I will come and ask you for $100 every single week going forward.
If the coin had landed heads, I would have simulated what would happen if it had landed tails, and I would come and give you $10,000 on every week that simulated-you gives up $100!"
Let's say for the sake of the thought experiment that you can afford to give Omega $100 once a week. It hurts, but it doesn't hurt as much as getting $10,000 from Omega every week would have benefited you, if that had happened.[1]
Nonetheless, I suspect many readers will feel some doubt creep in as they imagine giving Omega $100 week after week after week. The first few weeks, the possibility of the coin landing heads might feel "very real". Heck yeah I want to be the sort of person who gets a 50% chance of 10K from Omega for a (50% chance) cost of $100!
By the hundredth week, though, you may feel yourself the fool for giving up so much money for the imaginary benefit of the "heads" world that never was.
If you think you'd still happily give up the $100 for as long as Omega kept asking, then I would ask you to consider a counterlogical mugging instead. Rather than flipping a coin, Omega uses a digit of the binary expansion of π; as before, Omega uses the same digit week after week, for infinitely many counterlogical muggings.
Feeling uneasy yet? Does the possibility of the digit of π going one way or the other continue to feel "just as real" as time passes? Or do you become more sympathetic to the idea that, at some point, you're wasting money on helping a non-real world?
UDT vs Learning
Updateless Decision Theory (UDT) clearly keeps giving Omega the $100 forever in this situation, at least, under the usual assumptions. A single Counterfactual Mugging is not any different from an infinitely iterated one, especially in the version above where only a single coinflip is used. The ordinary decision between "give up $100" and "refuse" is isomorphic to the choice of general policy "give up $100 forever" and "refuse forever".[2]
However, the idea of applying a decision theory to a specific decision problem is actually quite subtle, especially for UDT. We generally assume an agent's prior equals the probabilities described in the decision problem.[3] A simple interpretation of this could be that the agent is born with this prior (and immediately placed into the decision problem). This isn't v...
392 حلقات
Semua episod
×مرحبًا بك في مشغل أف ام!
يقوم برنامج مشغل أف أم بمسح الويب للحصول على بودكاست عالية الجودة لتستمتع بها الآن. إنه أفضل تطبيق بودكاست ويعمل على أجهزة اندرويد والأيفون والويب. قم بالتسجيل لمزامنة الاشتراكات عبر الأجهزة.