LW - My AI Model Delta Compared To Christiano by johnswentworth

The Nonlinear Library: LessWrong

المحتوى المقدم من The Nonlinear Fund. يتم تحميل جميع محتويات البودكاست بما في ذلك الحلقات والرسومات وأوصاف البودكاست وتقديمها مباشرة بواسطة The Nonlinear Fund أو شريك منصة البودكاست الخاص بهم. إذا كنت تعتقد أن شخصًا ما يستخدم عملك المحمي بحقوق الطبع والنشر دون إذنك، فيمكنك اتباع العملية الموضحة هنا https://ar.player.fm/legal.

13d ago 6:39

MP3•منزل الحلقة

Link to original article
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: My AI Model Delta Compared To Christiano, published by johnswentworth on June 12, 2024 on LessWrong. Preamble: Delta vs Crux This section is redundant if you already read My AI Model Delta Compared To Yudkowsky. I don't natively think in terms of cruxes. But there's a similar concept which is more natural for me, which I'll call a delta. Imagine that you and I each model the world (or some part of it) as implementing some program. Very oversimplified example: if I learn that e.g. it's cloudy today, that means the "weather" variable in my program at a particular time[1] takes on the value "cloudy". Now, suppose your program and my program are exactly the same, except that somewhere in there I think a certain parameter has value 5 and you think it has value 0.3. Even though our programs differ in only that one little spot, we might still expect very different values of lots of variables during execution - in other words, we might have very different beliefs about lots of stuff in the world. If your model and my model differ in that way, and we're trying to discuss our different beliefs, then the obvious useful thing-to-do is figure out where that one-parameter difference is. That's a delta: one or a few relatively "small"/local differences in belief, which when propagated through our models account for most of the differences in our beliefs. For those familiar with Pearl-style causal models: think of a delta as one or a few do() operations which suffice to make my model basically match somebody else's model, or vice versa. This post is about my current best guesses at the delta between my AI models and Paul Christiano's AI models. When I apply the delta outlined here to my models, and propagate the implications, my models mostly look like Paul's as far as I can tell. That said, note that this is not an attempt to pass Paul's Intellectual Turing Test; I'll still be using my own usual frames. My AI Model Delta Compared To Christiano Best guess: Paul thinks that verifying solutions to problems is generally "easy" in some sense. He's sometimes summarized this as " verification is easier than generation", but I think his underlying intuition is somewhat stronger than that. What do my models look like if I propagate that delta? Well, it implies that delegation is fundamentally viable in some deep, general sense. That propagates into a huge difference in worldviews. Like, I walk around my house and look at all the random goods I've paid for - the keyboard and monitor I'm using right now, a stack of books, a tupperware, waterbottle, flip-flops, carpet, desk and chair, refrigerator, sink, etc. Under my models, if I pick one of these objects at random and do a deep dive researching that object, it will usually turn out to be bad in ways which were either nonobvious or nonsalient to me, but unambiguously make my life worse and would unambiguously have been worth-to-me the cost to make better. But because the badness is nonobvious/nonsalient, it doesn't influence my decision-to-buy, and therefore companies producing the good are incentivized not to spend the effort to make it better. It's a failure of ease of verification: because I don't know what to pay attention to, I can't easily notice the ways in which the product is bad. (For a more game-theoretic angle, see When Hindsight Isn't 20/20.) On (my model of) Paul's worldview, that sort of thing is rare; at most it's the exception to the rule. On my worldview, it's the norm for most goods most of the time. See e.g. the whole air conditioner episode for us debating the badness of single-hose portable air conditioners specifically, along with a large sidebar on the badness of portable air conditioner energy ratings. How does the ease-of-verification delta propagate to AI? Well, most obviously, Paul expects AI to go well mostly via ...

1690 حلقات

#The Nonlinear Fund #Podcasting Education #Of TexttoSpeech