Artwork

المحتوى المقدم من BlueDot Impact. يتم تحميل جميع محتويات البودكاست بما في ذلك الحلقات والرسومات وأوصاف البودكاست وتقديمها مباشرة بواسطة BlueDot Impact أو شريك منصة البودكاست الخاص بهم. إذا كنت تعتقد أن شخصًا ما يستخدم عملك المحمي بحقوق الطبع والنشر دون إذنك، فيمكنك اتباع العملية الموضحة هنا https://ar.player.fm/legal.
Player FM - تطبيق بودكاست
انتقل إلى وضع عدم الاتصال باستخدام تطبيق Player FM !

Chinchilla’s Wild Implications

24:57
 
مشاركة
 

Manage episode 424087968 series 3498845
المحتوى المقدم من BlueDot Impact. يتم تحميل جميع محتويات البودكاست بما في ذلك الحلقات والرسومات وأوصاف البودكاست وتقديمها مباشرة بواسطة BlueDot Impact أو شريك منصة البودكاست الخاص بهم. إذا كنت تعتقد أن شخصًا ما يستخدم عملك المحمي بحقوق الطبع والنشر دون إذنك، فيمكنك اتباع العملية الموضحة هنا https://ar.player.fm/legal.

This post is about language model scaling laws, specifically the laws derived in the DeepMind paper that introduced Chinchilla. The paper came out a few months ago, and has been discussed a lot, but some of its implications deserve more explicit notice in my opinion. In particular: Data, not size, is the currently active constraint on language modeling performance. Current returns to additional data are immense, and current returns to additional model size are miniscule; indeed, most recent landmark models are wastefully big. If we can leverage enough data, there is no reason to train ~500B param models, much less 1T or larger models. If we have to train models at these large sizes, it will mean we have encountered a barrier to exploitation of data scaling, which would be a great loss relative to what would otherwise be possible. The literature is extremely unclear on how much text data is actually available for training. We may be "running out" of general-domain data, but the literature is too vague to know one way or the other. The entire available quantity of data in highly specialized domains like code is woefully tiny, compared to the gains that would be possible if much more such data were available. Some things to note at the outset: This post assumes you have some familiarity with LM scaling laws. As in the paper, I'll assume here that models never see repeated data in training.

Original text:

https://www.alignmentforum.org/posts/6Fpvch8RR29qLEWNH/chinchilla-s-wild-implications

Narrated for AI Safety Fundamentals by Perrin Walker of TYPE III AUDIO.

---

A podcast by BlueDot Impact.
Learn more on the AI Safety Fundamentals website.

  continue reading

فصول

1. Chinchilla’s Wild Implications (00:00:00)

2. 1. the scaling law (00:02:19)

3. plugging in real models (00:04:10)

4. 2. are we running out of data? (00:11:48)

5. web scrapes (00:15:02)

6. "all the data we have" (00:20:46)

7. what is compute? (on a further barrier to data scaling) (00:21:35)

8. appendix: to infinity (00:23:24)

80 حلقات

Artwork
iconمشاركة
 
Manage episode 424087968 series 3498845
المحتوى المقدم من BlueDot Impact. يتم تحميل جميع محتويات البودكاست بما في ذلك الحلقات والرسومات وأوصاف البودكاست وتقديمها مباشرة بواسطة BlueDot Impact أو شريك منصة البودكاست الخاص بهم. إذا كنت تعتقد أن شخصًا ما يستخدم عملك المحمي بحقوق الطبع والنشر دون إذنك، فيمكنك اتباع العملية الموضحة هنا https://ar.player.fm/legal.

This post is about language model scaling laws, specifically the laws derived in the DeepMind paper that introduced Chinchilla. The paper came out a few months ago, and has been discussed a lot, but some of its implications deserve more explicit notice in my opinion. In particular: Data, not size, is the currently active constraint on language modeling performance. Current returns to additional data are immense, and current returns to additional model size are miniscule; indeed, most recent landmark models are wastefully big. If we can leverage enough data, there is no reason to train ~500B param models, much less 1T or larger models. If we have to train models at these large sizes, it will mean we have encountered a barrier to exploitation of data scaling, which would be a great loss relative to what would otherwise be possible. The literature is extremely unclear on how much text data is actually available for training. We may be "running out" of general-domain data, but the literature is too vague to know one way or the other. The entire available quantity of data in highly specialized domains like code is woefully tiny, compared to the gains that would be possible if much more such data were available. Some things to note at the outset: This post assumes you have some familiarity with LM scaling laws. As in the paper, I'll assume here that models never see repeated data in training.

Original text:

https://www.alignmentforum.org/posts/6Fpvch8RR29qLEWNH/chinchilla-s-wild-implications

Narrated for AI Safety Fundamentals by Perrin Walker of TYPE III AUDIO.

---

A podcast by BlueDot Impact.
Learn more on the AI Safety Fundamentals website.

  continue reading

فصول

1. Chinchilla’s Wild Implications (00:00:00)

2. 1. the scaling law (00:02:19)

3. plugging in real models (00:04:10)

4. 2. are we running out of data? (00:11:48)

5. web scrapes (00:15:02)

6. "all the data we have" (00:20:46)

7. what is compute? (on a further barrier to data scaling) (00:21:35)

8. appendix: to infinity (00:23:24)

80 حلقات

Alle afleveringen

×
 
Loading …

مرحبًا بك في مشغل أف ام!

يقوم برنامج مشغل أف أم بمسح الويب للحصول على بودكاست عالية الجودة لتستمتع بها الآن. إنه أفضل تطبيق بودكاست ويعمل على أجهزة اندرويد والأيفون والويب. قم بالتسجيل لمزامنة الاشتراكات عبر الأجهزة.

 

دليل مرجعي سريع