Artwork

المحتوى المقدم من Yannic Kilcher. يتم تحميل جميع محتويات البودكاست بما في ذلك الحلقات والرسومات وأوصاف البودكاست وتقديمها مباشرة بواسطة Yannic Kilcher أو شريك منصة البودكاست الخاص بهم. إذا كنت تعتقد أن شخصًا ما يستخدم عملك المحمي بحقوق الطبع والنشر دون إذنك، فيمكنك اتباع العملية الموضحة هنا https://ar.player.fm/legal.
Player FM - تطبيق بودكاست
انتقل إلى وضع عدم الاتصال باستخدام تطبيق Player FM !

ACCEL: Evolving Curricula with Regret-Based Environment Design (Paper Review)

44:05
 
مشاركة
 

Manage episode 327328767 series 2974171
المحتوى المقدم من Yannic Kilcher. يتم تحميل جميع محتويات البودكاست بما في ذلك الحلقات والرسومات وأوصاف البودكاست وتقديمها مباشرة بواسطة Yannic Kilcher أو شريك منصة البودكاست الخاص بهم. إذا كنت تعتقد أن شخصًا ما يستخدم عملك المحمي بحقوق الطبع والنشر دون إذنك، فيمكنك اتباع العملية الموضحة هنا https://ar.player.fm/legal.

#ai #accel #evolution

Automatic curriculum generation is one of the most promising avenues for Reinforcement Learning today. Multiple approaches have been proposed, each with their own set of advantages and drawbacks. This paper presents ACCEL, which takes the next step into the direction of constructing curricula for multi-capable agents. ACCEL combines the adversarial adaptiveness of regret-based sampling methods with the capabilities of level-editing, usually found in Evolutionary Methods.

OUTLINE:

0:00 - Intro & Demonstration

3:50 - Paper overview

5:20 - The ACCEL algorithm

15:25 - Looking at the pseudocode

23:10 - Approximating regret

33:45 - Experimental results

40:00 - Discussion & Comments

Website: https://accelagent.github.io

Paper: https://arxiv.org/abs/2203.01302

Abstract:

It remains a significant challenge to train generally capable agents with reinforcement learning (RL). A promising avenue for improving the robustness of RL agents is through the use of curricula. One such class of methods frames environment design as a game between a student and a teacher, using regret-based objectives to produce environment instantiations (or levels) at the frontier of the student agent's capabilities. These methods benefit from their generality, with theoretical guarantees at equilibrium, yet they often struggle to find effective levels in challenging design spaces. By contrast, evolutionary approaches seek to incrementally alter environment complexity, resulting in potentially open-ended learning, but often rely on domain-specific heuristics and vast amounts of computational resources. In this paper we propose to harness the power of evolution in a principled, regret-based curriculum. Our approach, which we call Adversarially Compounding Complexity by Editing Levels (ACCEL), seeks to constantly produce levels at the frontier of an agent's capabilities, resulting in curricula that start simple but become increasingly complex. ACCEL maintains the theoretical benefits of prior regret-based methods, while providing significant empirical gains in a diverse set of environments. An interactive version of the paper is available at this http URL.

Authors: Jack Parker-Holder, Minqi Jiang, Michael Dennis, Mikayel Samvelyan, Jakob Foerster, Edward Grefenstette, Tim Rocktäschel

Links:

TabNine Code Completion (Referral): http://bit.ly/tabnine-yannick

YouTube: https://www.youtube.com/c/yannickilcher

Twitter: https://twitter.com/ykilcher

Discord: https://discord.gg/4H8xxDF

BitChute: https://www.bitchute.com/channel/yann...

LinkedIn: https://www.linkedin.com/in/ykilcher

BiliBili: https://space.bilibili.com/2017636191

If you want to support me, the best thing to do is to share out the content :)

If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):

SubscribeStar: https://www.subscribestar.com/yannick...

Patreon: https://www.patreon.com/yannickilcher

Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq

Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2

Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m

Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n

  continue reading

177 حلقات

Artwork
iconمشاركة
 
Manage episode 327328767 series 2974171
المحتوى المقدم من Yannic Kilcher. يتم تحميل جميع محتويات البودكاست بما في ذلك الحلقات والرسومات وأوصاف البودكاست وتقديمها مباشرة بواسطة Yannic Kilcher أو شريك منصة البودكاست الخاص بهم. إذا كنت تعتقد أن شخصًا ما يستخدم عملك المحمي بحقوق الطبع والنشر دون إذنك، فيمكنك اتباع العملية الموضحة هنا https://ar.player.fm/legal.

#ai #accel #evolution

Automatic curriculum generation is one of the most promising avenues for Reinforcement Learning today. Multiple approaches have been proposed, each with their own set of advantages and drawbacks. This paper presents ACCEL, which takes the next step into the direction of constructing curricula for multi-capable agents. ACCEL combines the adversarial adaptiveness of regret-based sampling methods with the capabilities of level-editing, usually found in Evolutionary Methods.

OUTLINE:

0:00 - Intro & Demonstration

3:50 - Paper overview

5:20 - The ACCEL algorithm

15:25 - Looking at the pseudocode

23:10 - Approximating regret

33:45 - Experimental results

40:00 - Discussion & Comments

Website: https://accelagent.github.io

Paper: https://arxiv.org/abs/2203.01302

Abstract:

It remains a significant challenge to train generally capable agents with reinforcement learning (RL). A promising avenue for improving the robustness of RL agents is through the use of curricula. One such class of methods frames environment design as a game between a student and a teacher, using regret-based objectives to produce environment instantiations (or levels) at the frontier of the student agent's capabilities. These methods benefit from their generality, with theoretical guarantees at equilibrium, yet they often struggle to find effective levels in challenging design spaces. By contrast, evolutionary approaches seek to incrementally alter environment complexity, resulting in potentially open-ended learning, but often rely on domain-specific heuristics and vast amounts of computational resources. In this paper we propose to harness the power of evolution in a principled, regret-based curriculum. Our approach, which we call Adversarially Compounding Complexity by Editing Levels (ACCEL), seeks to constantly produce levels at the frontier of an agent's capabilities, resulting in curricula that start simple but become increasingly complex. ACCEL maintains the theoretical benefits of prior regret-based methods, while providing significant empirical gains in a diverse set of environments. An interactive version of the paper is available at this http URL.

Authors: Jack Parker-Holder, Minqi Jiang, Michael Dennis, Mikayel Samvelyan, Jakob Foerster, Edward Grefenstette, Tim Rocktäschel

Links:

TabNine Code Completion (Referral): http://bit.ly/tabnine-yannick

YouTube: https://www.youtube.com/c/yannickilcher

Twitter: https://twitter.com/ykilcher

Discord: https://discord.gg/4H8xxDF

BitChute: https://www.bitchute.com/channel/yann...

LinkedIn: https://www.linkedin.com/in/ykilcher

BiliBili: https://space.bilibili.com/2017636191

If you want to support me, the best thing to do is to share out the content :)

If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):

SubscribeStar: https://www.subscribestar.com/yannick...

Patreon: https://www.patreon.com/yannickilcher

Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq

Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2

Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m

Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n

  continue reading

177 حلقات

كل الحلقات

×
 
Loading …

مرحبًا بك في مشغل أف ام!

يقوم برنامج مشغل أف أم بمسح الويب للحصول على بودكاست عالية الجودة لتستمتع بها الآن. إنه أفضل تطبيق بودكاست ويعمل على أجهزة اندرويد والأيفون والويب. قم بالتسجيل لمزامنة الاشتراكات عبر الأجهزة.

 

دليل مرجعي سريع