LW - If-Then Commitments for AI Risk Reduction [by Holden Karnofsky] by habryka

The Nonlinear Library

المحتوى المقدم من The Nonlinear Fund. يتم تحميل جميع محتويات البودكاست بما في ذلك الحلقات والرسومات وأوصاف البودكاست وتقديمها مباشرة بواسطة The Nonlinear Fund أو شريك منصة البودكاست الخاص بهم. إذا كنت تعتقد أن شخصًا ما يستخدم عملك المحمي بحقوق الطبع والنشر دون إذنك، فيمكنك اتباع العملية الموضحة هنا https://ar.player.fm/legal.

5d ago 1:05:50

MP3•منزل الحلقة

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: If-Then Commitments for AI Risk Reduction [by Holden Karnofsky], published by habryka on September 14, 2024 on LessWrong.
Holden just published this paper on the Carnegie Endowment website. I thought it was a decent reference, so I figured I would crosspost it (included in full for convenience, but if either Carnegie Endowment or Holden has a preference for just having an excerpt or a pure link post, happy to change that)
If-then commitments are an emerging framework for preparing for risks from AI without unnecessarily slowing the development of new technology. The more attention and interest there is these commitments, the faster a mature framework can progress.
Introduction
Artificial intelligence (AI) could pose a variety of catastrophic risks to international security in several domains, including the proliferation and acceleration of cyberoffense capabilities, and of the ability to develop chemical or biological weapons of mass destruction. Even the most powerful AI models today are not yet capable enough to pose such risks,[1] but the coming years could see fast and hard-to-predict changes in AI capabilities.
Both companies and governments have shown significant interest in finding ways to prepare for such risks without unnecessarily slowing the development of new technology.
This piece is a primer on an emerging framework for handling this challenge: if-then commitments. These are commitments of the form: If an AI model has capability X, risk mitigations Y must be in place. And, if needed, we will delay AI deployment and/or development to ensure the mitigations can be present in time.
A specific example: If an AI model has the ability to walk a novice through constructing a weapon of mass destruction, we must ensure that there are no easy ways for consumers to elicit behavior in this category from the AI model.
If-then commitments can be voluntarily adopted by AI developers; they also, potentially, can be enforced by regulators. Adoption of if-then commitments could help reduce risks from AI in two key ways: (a) prototyping, battle-testing, and building consensus around a potential framework for regulation; and (b) helping AI developers and others build roadmaps of what risk mitigations need to be in place by when.
Such adoption does not require agreement on whether major AI risks are imminent - a polarized topic - only that certain situations would require certain risk mitigations if they came to pass.
Three industry leaders - Google DeepMind, OpenAI, and Anthropic - have published relatively detailed frameworks along these lines.
Sixteen companies have announced their intention to establish frameworks in a similar spirit by the time of the upcoming 2025 AI Action Summit in France.[2] Similar ideas have been explored at the International Dialogues on AI Safety in March 2024[3] and the UK AI Safety Summit in November 2023.[4] As of mid-2024, most discussions of if-then commitments have been in the context of voluntary commitments by companies, but this piece focuses on the general framework as something that could be
useful to a variety of actors with different enforcement mechanisms.
This piece explains the key ideas behind if-then commitments via a detailed walkthrough of a particular if-then commitment, pertaining to the potential ability of an AI model to walk a novice through constructing a chemical or biological weapon of mass destruction.
It then discusses some limitations of if-then commitments and closes with an outline of how different actors - including governments and companies - can contribute to the path toward a robust, enforceable system of if-then commitments.
Context and aims of this piece. In 2023, I helped with the initial development of ideas related to if-then commitments.[5] To date, I have focused on private discussion of this new fram...

2443 حلقات

#Podcasting Education #The Nonlinear Fund