Audio narrations of LessWrong posts. Includes all curated posts and all posts with 125+ karma. If you'd like more, subscribe to the “Lesswrong (30+ karma)” feed.
…
continue reading
1
“Building AI Research Fleets” by bgold, Jesse Hoogland
9:49
9:49
التشغيل لاحقا
التشغيل لاحقا
قوائم
إعجاب
احب
9:49
From AI scientist to AI research fleet Research automation is here (1, 2, 3). We saw it coming and planned ahead, which puts us ahead of most (4, 5, 6). But that foresight also comes with a set of outdated expectations that are holding us back. In particular, research automation is not just about “aligning the first AI scientist”, it's also about t…
…
continue reading
1
“What Is The Alignment Problem?” by johnswentworth
46:26
46:26
التشغيل لاحقا
التشغيل لاحقا
قوائم
إعجاب
احب
46:26
So we want to align future AGIs. Ultimately we’d like to align them to human values, but in the shorter term we might start with other targets, like e.g. corrigibility. That problem description all makes sense on a hand-wavy intuitive level, but once we get concrete and dig into technical details… wait, what exactly is the goal again? When we say w…
…
continue reading
1
“Applying traditional economic thinking to AGI: a trilemma” by Steven Byrnes
5:55
5:55
التشغيل لاحقا
التشغيل لاحقا
قوائم
إعجاب
احب
5:55
Traditional economics thinking has two strong principles, each based on abundant historical data: Principle (A): No “lump of labor”: If human population goes up, there might be some wage drop in the very short term, because the demand curve for labor slopes down. But in the longer term, people will find new productive things to do, such that human …
…
continue reading
1
“Passages I Highlighted in The Letters of J.R.R.Tolkien” by Ivan Vendrov
57:25
57:25
التشغيل لاحقا
التشغيل لاحقا
قوائم
إعجاب
احب
57:25
All quotes, unless otherwise marked, are Tolkien's words as printed in The Letters of J.R.R.Tolkien: Revised and Expanded Edition. All emphases mine. Machinery is Power is Evil Writing to his son Michael in the RAF: [here is] the tragedy and despair of all machinery laid bare. Unlike art which is content to create a new secondary world in the mind,…
…
continue reading
1
“Parkinson’s Law and the Ideology of Statistics” by Benquo
14:50
14:50
التشغيل لاحقا
التشغيل لاحقا
قوائم
إعجاب
احب
14:50
The anonymous review of The Anti-Politics Machine published on Astral Codex X focuses on a case study of a World Bank intervention in Lesotho, and tells a story about it: The World Bank staff drew reasonable-seeming conclusions from sparse data, and made well-intentioned recommendations on that basis. However, the recommended programs failed, due t…
…
continue reading
1
“Capital Ownership Will Not Prevent Human Disempowerment” by beren
25:11
25:11
التشغيل لاحقا
التشغيل لاحقا
قوائم
إعجاب
احب
25:11
Crossposted from my personal blog. I was inspired to cross-post this here given the discussion that this post on the role of capital in an AI future elicited. When discussing the future of AI, I semi-often hear an argument along the lines that in a slow takeoff world, despite AIs automating increasingly more of the economy, humanity will remain in …
…
continue reading
1
“Activation space interpretability may be doomed” by bilalchughtai, Lucius Bushnaq
15:56
15:56
التشغيل لاحقا
التشغيل لاحقا
قوائم
إعجاب
احب
15:56
TL;DR: There may be a fundamental problem with interpretability work that attempts to understand neural networks by decomposing their individual activation spaces in isolation: It seems likely to find features of the activations - features that help explain the statistical structure of activation spaces, rather than features of the model - the feat…
…
continue reading
1
“What o3 Becomes by 2028” by Vladimir_Nesov
8:40
8:40
التشغيل لاحقا
التشغيل لاحقا
قوائم
إعجاب
احب
8:40
Funding for $150bn training systems just turned less speculative, with OpenAI o3 reaching 25% on FrontierMath, 70% on SWE-Verified, 2700 on Codeforces, and 80% on ARC-AGI. These systems will be built in 2026-2027 and enable pretraining models for 5e28 FLOPs, while o3 itself is plausibly based on an LLM pretrained only for 8e25-4e26 FLOPs. The natur…
…
continue reading
1
“What Indicators Should We Watch to Disambiguate AGI Timelines?” by snewman
25:26
25:26
التشغيل لاحقا
التشغيل لاحقا
قوائم
إعجاب
احب
25:26
(Cross-post from https://amistrongeryet.substack.com/p/are-we-on-the-brink-of-agi, lightly edited for LessWrong. The original has a lengthier introduction and a bit more explanation of jargon.) No one seems to know whether transformational AGI is coming within a few short years. Or rather, everyone seems to know, but they all have conflicting opini…
…
continue reading
1
“How will we update about scheming?” by ryan_greenblatt
1:18:48
1:18:48
التشغيل لاحقا
التشغيل لاحقا
قوائم
إعجاب
احب
1:18:48
I mostly work on risks from scheming (that is, misaligned, power-seeking AIs that plot against their creators such as by faking alignment). Recently, I (and co-authors) released "Alignment Faking in Large Language Models", which provides empirical evidence for some components of the scheming threat model. One question that's really important is how…
…
continue reading
This week, Altman offers a post called Reflections, and he has an interview in Bloomberg. There's a bunch of good and interesting answers in the interview about past events that I won’t mention or have to condense a lot here, such as his going over his calendar and all the meetings he constantly has, so consider reading the whole thing. Table of Co…
…
continue reading
1
“Maximizing Communication, not Traffic” by jefftk
2:15
2:15
التشغيل لاحقا
التشغيل لاحقا
قوائم
إعجاب
احب
2:15
As someone who writes for fun, I don't need to get people onto my site: If I write a post and some people are able to get the core ideajust from the title or a tweet-length summary, great! I can include the full contents of my posts in my RSS feed andon FB, because so what if people read the whole post there and neverclick though to my site? It wou…
…
continue reading
1
“What’s the short timeline plan?” by Marius Hobbhahn
44:21
44:21
التشغيل لاحقا
التشغيل لاحقا
قوائم
إعجاب
احب
44:21
This is a low-effort post. I mostly want to get other people's takes and express concern about the lack of detailed and publicly available plans so far. This post reflects my personal opinion and not necessarily that of other members of Apollo Research. I’d like to thank Ryan Greenblatt, Bronson Schoen, Josh Clymer, Buck Shlegeris, Dan Braun, Mikit…
…
continue reading
1
“Shallow review of technical AI safety, 2024” by technicalities, Stag, Stephen McAleese, jordine, Dr. David Mathers
1:57:07
1:57:07
التشغيل لاحقا
التشغيل لاحقا
قوائم
إعجاب
احب
1:57:07
from aisafety.world The following is a list of live agendas in technical AI safety, updating our post from last year. It is “shallow” in the sense that 1) we are not specialists in almost any of it and that 2) we only spent about an hour on each entry. We also only use public information, so we are bound to be off by some additional factor. The poi…
…
continue reading
1
“By default, capital will matter more than ever after AGI” by L Rudolf L
28:44
28:44
التشغيل لاحقا
التشغيل لاحقا
قوائم
إعجاب
احب
28:44
I've heard many people say something like "money won't matter post-AGI". This has always struck me as odd, and as most likely completely incorrect. First: labour means human mental and physical effort that produces something of value. Capital goods are things like factories, data centres, and software—things humans have built that are used in the p…
…
continue reading
Take a stereotypical fantasy novel, a textbook on mathematical logic, and Fifty Shades of Grey. Mix them all together and add extra weirdness for spice. The result might look a lot like Planecrash (AKA: Project Lawful), a work of fiction co-written by "Iarwain" (a pen-name of Eliezer Yudkowsky) and "lintamande". (image from Planecrash) Yudkowsky is…
…
continue reading
1
“The Field of AI Alignment: A Postmortem, and What To Do About It” by johnswentworth
14:03
14:03
التشغيل لاحقا
التشغيل لاحقا
قوائم
إعجاب
احب
14:03
A policeman sees a drunk man searching for something under a streetlight and asks what the drunk has lost. He says he lost his keys and they both look under the streetlight together. After a few minutes the policeman asks if he is sure he lost them here, and the drunk replies, no, and that he lost them in the park. The policeman asks why he is sear…
…
continue reading
1
“When Is Insurance Worth It?” by kqr
11:20
11:20
التشغيل لاحقا
التشغيل لاحقا
قوائم
إعجاب
احب
11:20
TL;DR: If you want to know whether getting insurance is worth it, use the Kelly Insurance Calculator. If you want to know why or how, read on. Note to LW readers: this is almost the entire article, except some additional maths that I couldn't figure out how to get right in the LW editor, and margin notes. If you're very curious, read the original a…
…
continue reading
1
“Orienting to 3 year AGI timelines” by Nikola Jurkovic
14:58
14:58
التشغيل لاحقا
التشغيل لاحقا
قوائم
إعجاب
احب
14:58
My median expectation is that AGI[1] will be created 3 years from now. This has implications on how to behave, and I will share some useful thoughts I and others have had on how to orient to short timelines. I’ve led multiple small workshops on orienting to short AGI timelines and compiled the wisdom of around 50 participants (but mostly my thought…
…
continue reading
1
“What Goes Without Saying” by sarahconstantin
9:26
9:26
التشغيل لاحقا
التشغيل لاحقا
قوائم
إعجاب
احب
9:26
There are people I can talk to, where all of the following statements are obvious. They go without saying. We can just “be reasonable” together, with the context taken for granted. And then there are people who…don’t seem to be on the same page at all. There's a real way to do anything, and a fake way; we need to make sure we’re doing the real vers…
…
continue reading
I'm editing this post. OpenAI announced (but hasn't released) o3 (skipping o2 for trademark reasons). It gets 25% on FrontierMath, smashing the previous SoTA of 2%. (These are really hard math problems.) Wow. 72% on SWE-bench Verified, beating o1's 49%. Also 88% on ARC-AGI. --- First published: December 20th, 2024 Source: https://www.lesswrong.com/…
…
continue reading
1
“‘Alignment Faking’ frame is somewhat fake” by Jan_Kulveit
11:40
11:40
التشغيل لاحقا
التشغيل لاحقا
قوائم
إعجاب
احب
11:40
I like the research. I mostly trust the results. I dislike the 'Alignment Faking' name and frame, and I'm afraid it will stick and lead to more confusion. This post offers a different frame. The main way I think about the result is: it's about capability - the model exhibits strategic preference preservation behavior; also, harmlessness generalized…
…
continue reading
1
“AIs Will Increasingly Attempt Shenanigans” by Zvi
51:06
51:06
التشغيل لاحقا
التشغيل لاحقا
قوائم
إعجاب
احب
51:06
Increasingly, we have seen papers eliciting in AI models various shenanigans. There are a wide variety of scheming behaviors. You’ve got your weight exfiltration attempts, sandbagging on evaluations, giving bad information, shielding goals from modification, subverting tests and oversight, lying, doubling down via more lying. You name it, we can tr…
…
continue reading
1
“Alignment Faking in Large Language Models” by ryan_greenblatt, evhub, Carson Denison, Benjamin Wright, Fabien Roger, Monte M, Sam Marks, Johannes Treutlein, Sam Bowman, Buck
19:35
19:35
التشغيل لاحقا
التشغيل لاحقا
قوائم
إعجاب
احب
19:35
What happens when you tell Claude it is being trained to do something it doesn't want to do? We (Anthropic and Redwood Research) have a new paper demonstrating that, in our experiments, Claude will often strategically pretend to comply with the training objective to prevent the training process from modifying its preferences. Abstract We present a …
…
continue reading
1
“Communications in Hard Mode (My new job at MIRI)” by tanagrabeast
10:24
10:24
التشغيل لاحقا
التشغيل لاحقا
قوائم
إعجاب
احب
10:24
Six months ago, I was a high school English teacher. I wasn’t looking to change careers, even after nineteen sometimes-difficult years. I was good at it. I enjoyed it. After long experimentation, I had found ways to cut through the nonsense and provide real value to my students. Daily, I met my nemesis, Apathy, in glorious battle, and bested her wi…
…
continue reading