Artwork

المحتوى المقدم من Turpentine, Erik Torenberg, and Nathan Labenz. يتم تحميل جميع محتويات البودكاست بما في ذلك الحلقات والرسومات وأوصاف البودكاست وتقديمها مباشرة بواسطة Turpentine, Erik Torenberg, and Nathan Labenz أو شريك منصة البودكاست الخاص بهم. إذا كنت تعتقد أن شخصًا ما يستخدم عملك المحمي بحقوق الطبع والنشر دون إذنك، فيمكنك اتباع العملية الموضحة هنا https://ar.player.fm/legal.
Player FM - تطبيق بودكاست
انتقل إلى وضع عدم الاتصال باستخدام تطبيق Player FM !

Data, data, everywhere - enough for AGI?

1:01:40
 
مشاركة
 

Manage episode 412312670 series 3452589
المحتوى المقدم من Turpentine, Erik Torenberg, and Nathan Labenz. يتم تحميل جميع محتويات البودكاست بما في ذلك الحلقات والرسومات وأوصاف البودكاست وتقديمها مباشرة بواسطة Turpentine, Erik Torenberg, and Nathan Labenz أو شريك منصة البودكاست الخاص بهم. إذا كنت تعتقد أن شخصًا ما يستخدم عملك المحمي بحقوق الطبع والنشر دون إذنك، فيمكنك اتباع العملية الموضحة هنا https://ar.player.fm/legal.

In this podcast, Nathan and Nick dive deep into the data requirements for achieving Artificial General Intelligence. They explore the current paradigms, the role of data in approximating intelligence, and the scaling trends for GPT models. The discussion covers various datasets, from email and Twitter to YouTube and genomic data, as they analyze the feasibility of reaching the target of 100 trillion high-quality tokens. While the bull case suggests an abundance of data, the bear case highlights the limits on high-quality data, prompting a fascinating exploration of what makes data good for AI and the potential for AI to generate its own data.

Sponsors

Omneky is an omnichannel creative generation platform that lets you launch hundreds of thousands of ad iterations that actually work customized across all platforms, with a click of a button. Omneky combines generative AI and real-time advertising data. Mention "Cog Rev" for 10% off https://www.omneky.com/

The Brave search API can be used to assemble a data set to train your AI models and help with retrieval augmentation at the time of inference. All while remaining affordable with developer first pricing, integrating the Brave search API into your workflow translates to more ethical data sourcing and more human representative data sets. Try the Brave search API for free for up to 2000 queries per month at https://bit.ly/BraveTCR

Head to Squad to access global engineering without the headache and at a fraction of the cost: head to http://choosesquad.com/ and mention “Turpentine” to skip the waitlist.

Plumb is a no-code AI app builder designed for product teams who care about quality and speed. What is taking you weeks to hand-code today can be done confidently in hours. Check out https://bit.ly/PlumbTCR for early access.


Chapters

(00:00) Introduction

(05:04) Scaling Hypothesis of Intelligence

(07:32) Is There Enough High Quality Data?

(10:19) Algorithms Impacting Data Requirements

(17:42) Sponsor : Omneky

(18:04) Estimating High Quality Token Requirements

(24:07) Astronomy and YouTube Data Scale

(29:42) Genomics Data

(37:58) Sponsors : Brave / Plumb / Squad

(41:16) Code Datasets and Synthetic Data

(45:48) The Bear Case: Quality and Usability of Data

(50:54) Investment Trends and Compute Efficiency

(54:19) Training Run

(57:21) Synthetic Data Generation and Self-Play

  continue reading

134 حلقات

Artwork
iconمشاركة
 
Manage episode 412312670 series 3452589
المحتوى المقدم من Turpentine, Erik Torenberg, and Nathan Labenz. يتم تحميل جميع محتويات البودكاست بما في ذلك الحلقات والرسومات وأوصاف البودكاست وتقديمها مباشرة بواسطة Turpentine, Erik Torenberg, and Nathan Labenz أو شريك منصة البودكاست الخاص بهم. إذا كنت تعتقد أن شخصًا ما يستخدم عملك المحمي بحقوق الطبع والنشر دون إذنك، فيمكنك اتباع العملية الموضحة هنا https://ar.player.fm/legal.

In this podcast, Nathan and Nick dive deep into the data requirements for achieving Artificial General Intelligence. They explore the current paradigms, the role of data in approximating intelligence, and the scaling trends for GPT models. The discussion covers various datasets, from email and Twitter to YouTube and genomic data, as they analyze the feasibility of reaching the target of 100 trillion high-quality tokens. While the bull case suggests an abundance of data, the bear case highlights the limits on high-quality data, prompting a fascinating exploration of what makes data good for AI and the potential for AI to generate its own data.

Sponsors

Omneky is an omnichannel creative generation platform that lets you launch hundreds of thousands of ad iterations that actually work customized across all platforms, with a click of a button. Omneky combines generative AI and real-time advertising data. Mention "Cog Rev" for 10% off https://www.omneky.com/

The Brave search API can be used to assemble a data set to train your AI models and help with retrieval augmentation at the time of inference. All while remaining affordable with developer first pricing, integrating the Brave search API into your workflow translates to more ethical data sourcing and more human representative data sets. Try the Brave search API for free for up to 2000 queries per month at https://bit.ly/BraveTCR

Head to Squad to access global engineering without the headache and at a fraction of the cost: head to http://choosesquad.com/ and mention “Turpentine” to skip the waitlist.

Plumb is a no-code AI app builder designed for product teams who care about quality and speed. What is taking you weeks to hand-code today can be done confidently in hours. Check out https://bit.ly/PlumbTCR for early access.


Chapters

(00:00) Introduction

(05:04) Scaling Hypothesis of Intelligence

(07:32) Is There Enough High Quality Data?

(10:19) Algorithms Impacting Data Requirements

(17:42) Sponsor : Omneky

(18:04) Estimating High Quality Token Requirements

(24:07) Astronomy and YouTube Data Scale

(29:42) Genomics Data

(37:58) Sponsors : Brave / Plumb / Squad

(41:16) Code Datasets and Synthetic Data

(45:48) The Bear Case: Quality and Usability of Data

(50:54) Investment Trends and Compute Efficiency

(54:19) Training Run

(57:21) Synthetic Data Generation and Self-Play

  continue reading

134 حلقات

Усі епізоди

×
 
Loading …

مرحبًا بك في مشغل أف ام!

يقوم برنامج مشغل أف أم بمسح الويب للحصول على بودكاست عالية الجودة لتستمتع بها الآن. إنه أفضل تطبيق بودكاست ويعمل على أجهزة اندرويد والأيفون والويب. قم بالتسجيل لمزامنة الاشتراكات عبر الأجهزة.

 

دليل مرجعي سريع