Player FM - Internet Radio Done Right
89 subscribers
Checked 9d ago
تمت الإضافة منذ قبل six عام
المحتوى المقدم من Robin Ranjit Singh Chauhan. يتم تحميل جميع محتويات البودكاست بما في ذلك الحلقات والرسومات وأوصاف البودكاست وتقديمها مباشرة بواسطة Robin Ranjit Singh Chauhan أو شريك منصة البودكاست الخاص بهم. إذا كنت تعتقد أن شخصًا ما يستخدم عملك المحمي بحقوق الطبع والنشر دون إذنك، فيمكنك اتباع العملية الموضحة هنا https://ar.player.fm/legal.
Player FM - تطبيق بودكاست
انتقل إلى وضع عدم الاتصال باستخدام تطبيق Player FM !
انتقل إلى وضع عدم الاتصال باستخدام تطبيق Player FM !
المدونة الصوتية تستحق الاستماع
برعاية
F
For Good


1 Angela Simmons: From Whippin' Pastry to Single Parenthood & Dating in the Spotlight 42:43
42:43
التشغيل لاحقا
التشغيل لاحقا
قوائم
إعجاب
احب42:43
When Jay-Z rapped "in the kitchen like a Simmons whipping Pastry" about their family business in "Empire State of Mind" Angela Simmons knew she made it. She sits down with her brother Joseph "JoJo" Simmons on the For Good podcast to reveal what really happened behind the scenes of Run's House, how she and Vanessa Simmons built the Pastry empire that got Jay-Z's attention, and why Rev Run saying "no" became her biggest motivation. In this unfiltered sibling conversation, Angela opens up about her breakup with Yo Gotti and why she's drawn to certain kinds of men. JoJo hints that she has a type, leading to honest talk about dating patterns and relationships. She also gets real about the unique challenges of raising her son as a single mother and the childhood body image insecurities that JoJo admits he contributed to by calling her "fat" during arguments. Angela also recalls the exact moment she realized she was famous - standing in Times Square after Run's House aired when someone asked for her picture. She and JoJo also break down the pressures of being from a successful family and why having famous parents actually makes it harder, not easier. Also: Pastry outselling Jordan Brand at Foot Locker Body transformation from insecurity to fitness obsession Plant-based lifestyle and wellness routines for mental health Why Simmons kids don't get handouts despite the famous name…
TalkRL: The Reinforcement Learning Podcast
وسم كل الحلقات كغير/(كـ)مشغلة
Manage series 2536330
المحتوى المقدم من Robin Ranjit Singh Chauhan. يتم تحميل جميع محتويات البودكاست بما في ذلك الحلقات والرسومات وأوصاف البودكاست وتقديمها مباشرة بواسطة Robin Ranjit Singh Chauhan أو شريك منصة البودكاست الخاص بهم. إذا كنت تعتقد أن شخصًا ما يستخدم عملك المحمي بحقوق الطبع والنشر دون إذنك، فيمكنك اتباع العملية الموضحة هنا https://ar.player.fm/legal.
TalkRL podcast is All Reinforcement Learning, All the Time. In-depth interviews with brilliant people at the forefront of RL research and practice. Guests from places like MILA, OpenAI, MIT, DeepMind, Berkeley, Amii, Oxford, Google Research, Brown, Waymo, Caltech, and Vector Institute. Hosted by Robin Ranjit Singh Chauhan.
…
continue reading
72 حلقات
وسم كل الحلقات كغير/(كـ)مشغلة
Manage series 2536330
المحتوى المقدم من Robin Ranjit Singh Chauhan. يتم تحميل جميع محتويات البودكاست بما في ذلك الحلقات والرسومات وأوصاف البودكاست وتقديمها مباشرة بواسطة Robin Ranjit Singh Chauhan أو شريك منصة البودكاست الخاص بهم. إذا كنت تعتقد أن شخصًا ما يستخدم عملك المحمي بحقوق الطبع والنشر دون إذنك، فيمكنك اتباع العملية الموضحة هنا https://ar.player.fm/legal.
TalkRL podcast is All Reinforcement Learning, All the Time. In-depth interviews with brilliant people at the forefront of RL research and practice. Guests from places like MILA, OpenAI, MIT, DeepMind, Berkeley, Amii, Oxford, Google Research, Brown, Waymo, Caltech, and Vector Institute. Hosted by Robin Ranjit Singh Chauhan.
…
continue reading
72 حلقات
كل الحلقات
×T
TalkRL: The Reinforcement Learning Podcast

1 Jake Beck, Alex Goldie, & Cornelius Braun on Sutton's OaK, Metalearning, LLMs, Squirrels @ RLC 2025 12:20
12:20
التشغيل لاحقا
التشغيل لاحقا
قوائم
إعجاب
احب12:20
T
TalkRL: The Reinforcement Learning Podcast

1 Outstanding Paper Award Winners - 2/2 @ RLC 2025 14:18
14:18
التشغيل لاحقا
التشغيل لاحقا
قوائم
إعجاب
احب14:18
We caught up with the RLC Outstanding Paper award winners for your listening pleasure. Recorded on location at Reinforcement Learning Conference 2025 , at University of Alberta, in Edmonton Alberta Canada in August 2025. Featured References Empirical Reinforcement Learning Research Mitigating Suboptimality of Deterministic Policy Gradients in Complex Q-functions Ayush Jain, Norio Kosaka, Xinhu Li, Kyung-Min Kim, Erdem Biyik, Joseph J Lim Applications of Reinforcement Learning WOFOSTGym: A Crop Simulator for Learning Annual and Perennial Crop Management Strategies William Solow, Sandhya Saisubramanian, Alan Fern Emerging Topics in Reinforcement Learning Towards Improving Reward Design in RL: A Reward Alignment Metric for RL Practitioners Calarina Muslimani, Kerrick Johnstonbaugh, Suyog Chandramouli, Serena Booth, W. Bradley Knox, Matthew E. Taylor Scientific Understanding in Reinforcement Learning Multi-Task Reinforcement Learning Enables Parameter Scaling Reginald McLean, Evangelos Chatzaroulas, J K Terry, Isaac Woungang, Nariman Farsad, Pablo Samuel Castro…
T
TalkRL: The Reinforcement Learning Podcast

1 Outstanding Paper Award Winners - 1/2 @ RLC 2025 6:46
6:46
التشغيل لاحقا
التشغيل لاحقا
قوائم
إعجاب
احب6:46
We caught up with the RLC Outstanding Paper award winners for your listening pleasure. Recorded on location at Reinforcement Learning Conference 2025 , at University of Alberta, in Edmonton Alberta Canada in August 2025. Featured References Scientific Understanding in Reinforcement Learning How Should We Meta-Learn Reinforcement Learning Algorithms? Alexander David Goldie, Zilin Wang, Jakob Nicolaus Foerster, Shimon Whiteson Tooling, Environments, and Evaluation for Reinforcement Learning Syllabus: Portable Curricula for Reinforcement Learning Agents Ryan Sullivan, Ryan Pégoud, Ameen Ur Rehman, Xinchen Yang, Junyun Huang, Aayush Verma, Nistha Mitra, John P Dickerson Resourcefulness in Reinforcement Learning PufferLib 2.0: Reinforcement Learning at 1M steps/s Joseph Suarez Theory of Reinforcement Learning Deep Reinforcement Learning with Gradient Eligibility Traces Esraa Elelimy, Brett Daley, Andrew Patterson, Marlos C. Machado, Adam White, Martha White…
T
TalkRL: The Reinforcement Learning Podcast

1 Thomas Akam on Model-based RL in the Brain 52:06
52:06
التشغيل لاحقا
التشغيل لاحقا
قوائم
إعجاب
احب52:06
Prof Thomas Akam is a Neuroscientist at the Oxford University Department of Experimental Psychology. He is a Wellcome Career Development Fellow and Associate Professor at the University of Oxford, and leads the Cognitive Circuits research group . Featured References Brain Architecture for Adaptive Behaviour Thomas Akam, RLDM 2025 Tutorial Additional References Thomas Akam on Google Scholar pyPhotometry : Open source, Python based, fiber photometry data acquisition pyControl : Open source, Python based, behavioural experiment control. Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control , Nathaniel D Daw, Yael Niv, Peter Dayan, 2005 Further analysis of the hippocampal amnesic syndrome: 14-year follow-up study of H. M. , Milner, B., Corkin, S., & Teuber, H. L., 1968 Internally generated cell assembly sequences in the rat hippocampus , Pastalkova E, Itskov V, Amarasingham A, Buzsáki G. Science. 2008 Multi-disciplinary Conference on Reinforcement Learning and Decision 2025…
T
TalkRL: The Reinforcement Learning Podcast

1 Stefano Albrecht on Multi-Agent RL @ RLDM 2025 31:34
31:34
التشغيل لاحقا
التشغيل لاحقا
قوائم
إعجاب
احب31:34
Stefano V. Albrecht was previously Associate Professor at the University of Edinburgh, and is currently serving as Director of AI at startup Deepflow . He is a Program Chair of RLDM 2025 and is co-author of the MIT Press textbook " Multi-Agent Reinforcement Learning: Foundations and Modern Approaches ". Featured References Multi-Agent Reinforcement Learning: Foundations and Modern Approaches Stefano V. Albrecht, Filippos Christianos, Lukas Schäfer MIT Press, 2024 RLDM 2025: Reinforcement Learning and Decision Making Conference Dublin, Ireland EPyMARL: Extended Python MARL framework https://github.com/uoe-agents/epymarl Benchmarking Multi-Agent Deep Reinforcement Learning Algorithms in Cooperative Tasks Georgios Papoudakis and Filippos Christianos and Lukas Schäfer and Stefano V. Albrecht…
T
TalkRL: The Reinforcement Learning Podcast

1 Satinder Singh: The Origin Story of RLDM @ RLDM 2025 5:57
5:57
التشغيل لاحقا
التشغيل لاحقا
قوائم
إعجاب
احب5:57
Professor Satinder Singh of Google DeepMind and U of Michigan is co-founder of RLDM. Here he narrates the origin story of the Reinforcement Learning and Decision Making meeting (not conference). Recorded on location at Trinity College Dublin, Ireland during RLDM 2025. Featured References RLDM 2025: Multi-disciplinary Conference on Reinforcement Learning and Decision Making (RLDM) June 11-14, 2025 at Trinity College Dublin, Ireland Satinder Singh on Google Scholar…
T
TalkRL: The Reinforcement Learning Podcast

Posters and Hallway episodes are short interviews and poster summaries. Recorded at NeurIPS 2024 in Vancouver BC Canada. Featuring Claire Bizon Monroc from Inria: WFCRL: A Multi-Agent Reinforcement Learning Benchmark for Wind Farm Control Andrew Wagenmaker from UC Berkeley: Overcoming the Sim-to-Real Gap: Leveraging Simulation to Learn to Explore for Real-World RL Harley Wiltzer from MILA: Foundations of Multivariate Distributional Reinforcement Learning Vinzenz Thoma from ETH AI Center: Contextual Bilevel Reinforcement Learning for Incentive Alignment Haozhe (Tony) Chen & Ang (Leon) Li from Columbia: QGym: Scalable Simulation and Benchmarking of Queuing Network Controllers…
T
TalkRL: The Reinforcement Learning Podcast

Posters and Hallway episodes are short interviews and poster summaries. Recorded at NeurIPS 2024 in Vancouver BC Canada. Featuring Jonathan Cook from University of Oxford: Artificial Generational Intelligence: Cultural Accumulation in Reinforcement Learning Yifei Zhou from Berkeley AI Research: DigiRL: Training In-The-Wild Device-Control Agents with Autonomous Reinforcement Learning Rory Young from University of Glasgow: Enhancing Robustness in Deep Reinforcement Learning: A Lyapunov Exponent Approach Glen Berseth from MILA: Improving Deep Reinforcement Learning by Reducing the Chain Effect of Value and Policy Churn Alexander Rutherford from University of Oxford: JaxMARL: Multi-Agent RL Environments and Algorithms in JAX…
Posters and Hallway episodes are short interviews and poster summaries. Recorded at NeurIPS 2024 in Vancouver BC Canada. Featuring Jiaheng Hu of University of Texas: Disentangled Unsupervised Skill Discovery for Efficient Hierarchical Reinforcement Learning Skander Moalla of EPFL: No Representation, No Trust: Connecting Representation, Collapse, and Trust Issues in PPO Adil Zouitine of IRT Saint Exupery/Hugging Face : Time-Constrained Robust MDPs Soumyendu Sarkar of HP Labs : SustainDC: Benchmarking for Sustainable Data Center Control Matteo Bettini of Cambridge University: BenchMARL: Benchmarking Multi-Agent Reinforcement Learning Michael Bowling of U Alberta : Beyond Optimism: Exploration With Partially Observable Rewards…

1 Abhishek Naik on Continuing RL & Average Reward 1:21:40
1:21:40
التشغيل لاحقا
التشغيل لاحقا
قوائم
إعجاب
احب1:21:40
Abhishek Naik was a student at University of Alberta and Alberta Machine Intelligence Institute, and he just finished his PhD in reinforcement learning, working with Rich Sutton. Now he is a postdoc fellow at the National Research Council of Canada, where he does AI research on Space applications. Featured References Reinforcement Learning for Continuing Problems Using Average Reward Abhishek Naik Ph.D. dissertation 2024 Reward Centering Abhishek Naik, Yi Wan, Manan Tomar, Richard S. Sutton 2024 Learning and Planning in Average-Reward Markov Decision Processes Yi Wan, Abhishek Naik, Richard S. Sutton 2020 Discounted Reinforcement Learning Is Not an Optimization Problem Abhishek Naik, Roshan Shariff, Niko Yasui, Hengshuai Yao, Richard S. Sutton 2019 Additional References Explaining dopamine through prediction errors and beyond , Gershman et al 2024 (proposes Differential-TD-like learning mechanism in the brain around Box 4)…

1 Neurips 2024 RL meetup Hot takes: What sucks about RL? 17:45
17:45
التشغيل لاحقا
التشغيل لاحقا
قوائم
إعجاب
احب17:45
What do RL researchers complain about after hours at the bar? In this "Hot takes" episode, we find out! Recorded at The Pearl in downtown Vancouver, during the RL meetup after a day of Neurips 2024. Special thanks to "David Beckham" for the inspiration :)
Posters and Hallway episodes are short interviews and poster summaries. Recorded at RLC 2024 in Amherst MA. Featuring: 0:01 David Radke of the Chicago Blackhawks NHL on RL for professional sports 0:56 Abhishek Naik from the National Research Council on Continuing RL and Average Reward 2:42 Daphne Cornelisse from NYU on Autonomous Driving and Multi-Agent RL 08:58 Shray Bansal from Georgia Tech on Cognitive Bias for Human AI Ad hoc Teamwork 10:21 Claas Voelcker from University of Toronto on Can we hop in general? 11:23 Brent Venable from The Institute for Human & Machine Cognition on Cooperative information dissemination…
Posters and Hallway episodes are short interviews and poster summaries. Recorded at RLC 2024 in Amherst MA. Featuring: 0:01 David Abel from DeepMind on 3 Dogmas of RL 0:55 Kevin Wang from Brown on learning variable depth search for MCTS 2:17 Ashwin Kumar from Washington University in St Louis on fairness in resource allocation 3:36 Prabhat Nagarajan from UAlberta on Value overestimation…
Posters and Hallway episodes are short interviews and poster summaries. Recorded at RLC 2024 in Amherst MA. Featuring: 0:01 Kris De Asis from Openmind on Time Discretization 2:23 Anna Hakhverdyan from U of Alberta on Online Hyperparameters 3:59 Dilip Arumugam from Princeton on Information Theory and Exploration 5:04 Micah Carroll from UC Berkeley on Changing preferences and AI alignment…
Posters and Hallway episodes are short interviews and poster summaries. Recorded at RLC 2024 in Amherst MA. Featuring: 0:01 Hector Kohler from Centre Inria de l'Université de Lille with " Interpretable and Editable Programmatic Tree Policies for Reinforcement Learning " 2:29 Quentin Delfosse from TU Darmstadt on " Interpretable Concept Bottlenecks to Align Reinforcement Learning Agents " 4:15 Sonja Johnson-Yu from Harvard on " Understanding biological active sensing behaviors by interpreting learned artificial agent policies " 6:42 Jannis Blüml from TU Darmstadt on " OCAtari: Object-Centric Atari 2600 Reinforcement Learning Environments " 8:20 Cameron Allen from UC Berkeley on " Resolving Partial Observability in Decision Processes via the Lambda Discrepancy " 9:48 James Staley from Tufts on " Agent-Centric Human Demonstrations Train World Models " 14:54 Jonathan Li from Rensselaer Polytechnic Institute…
T
TalkRL: The Reinforcement Learning Podcast

AI Generating Algos, Learning to play Minecraft with Video PreTraining (VPT), Go-Explore for hard exploration, POET and Open Endedness, AI-GAs and ChatGPT, AGI predictions, and lots more! Professor Jeff Clune is Associate Professor of Computer Science at University of British Columbia, a Canada CIFAR AI Chair and Faculty Member at Vector Institute, and Senior Research Advisor at DeepMind. Featured References Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online Videos [ Blog Post ] Bowen Baker, Ilge Akkaya, Peter Zhokhov, Joost Huizinga, Jie Tang, Adrien Ecoffet, Brandon Houghton, Raul Sampedro, Jeff Clune Robots that can adapt like animals Antoine Cully, Jeff Clune, Danesh Tarapore, Jean-Baptiste Mouret Illuminating search spaces by mapping elites Jean-Baptiste Mouret, Jeff Clune Enhanced POET: Open-Ended Reinforcement Learning through Unbounded Invention of Learning Challenges and their Solutions Rui Wang, Joel Lehman, Aditya Rawal, Jiale Zhi, Yulun Li, Jeff Clune, Kenneth O. Stanley Paired Open-Ended Trailblazer (POET): Endlessly Generating Increasingly Complex and Diverse Learning Environments and Their Solutions Rui Wang, Joel Lehman, Jeff Clune, Kenneth O. Stanley First return, then explore Adrien Ecoffet, Joost Huizinga, Joel Lehman, Kenneth O. Stanley, Jeff Clune…
T
TalkRL: The Reinforcement Learning Podcast

Hear about why OpenAI cites her work in RLHF and dialog models, approaches to rewards in RLHF, ChatGPT, Industry vs Academia, PsiPhi-Learning, AGI and more! Dr Natasha Jaques is a Senior Research Scientist at Google Brain. Featured References Way Off-Policy Batch Deep Reinforcement Learning of Implicit Human Preferences in Dialog Natasha Jaques, Asma Ghandeharioun, Judy Hanwen Shen, Craig Ferguson, Agata Lapedriza, Noah Jones, Shixiang Gu, Rosalind Picard Sequence Tutor: Conservative Fine-Tuning of Sequence Generation Models with KL-control Natasha Jaques, Shixiang Gu, Dzmitry Bahdanau, José Miguel Hernández-Lobato, Richard E. Turner, Douglas Eck PsiPhi-Learning: Reinforcement Learning with Demonstrations using Successor Features and Inverse Temporal Difference Learning Angelos Filos, Clare Lyle, Yarin Gal, Sergey Levine, Natasha Jaques, Gregory Farquhar Basis for Intentions: Efficient Inverse Reinforcement Learning using Past Experience Marwa Abdulhai, Natasha Jaques, Sergey Levine Additional References Fine-Tuning Language Models from Human Preferences , Daniel M. Ziegler et al 2019 Learning to summarize from human feedback , Nisan Stiennon et al 2020 Training language models to follow instructions with human feedback , Long Ouyang et al 2022…
T
TalkRL: The Reinforcement Learning Podcast

Jacob Beck and Risto Vuorio on their recent Survey of Meta-Reinforcement Learning. Jacob and Risto are Ph.D. students at Whiteson Research Lab at University of Oxford. Featured Reference A Survey of Meta-Reinforcement Learning Jacob Beck, Risto Vuorio, Evan Zheran Liu, Zheng Xiong, Luisa Zintgraf, Chelsea Finn, Shimon Whiteson Additional References VariBAD: A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning , Luisa Zintgraf et al Mastering Diverse Domains through World Models (Dreamerv3), Hafner et al Unsupervised Meta-Learning for Reinforcement Learning (MAML), Gupta et al Decoupling Exploration and Exploitation for Meta-Reinforcement Learning without Sacrifices (DREAM), Liu et al RL2: Fast Reinforcement Learning via Slow Reinforcement Learning , Duan et al Learning to reinforcement learn , Wang et al…
T
TalkRL: The Reinforcement Learning Podcast

John Schulman is a cofounder of OpenAI, and currently a researcher and engineer at OpenAI. Featured References WebGPT: Browser-assisted question-answering with human feedback Reiichiro Nakano, Jacob Hilton, Suchir Balaji, Jeff Wu, Long Ouyang, Christina Kim, Christopher Hesse, Shantanu Jain, Vineet Kosaraju, William Saunders, Xu Jiang, Karl Cobbe, Tyna Eloundou, Gretchen Krueger, Kevin Button, Matthew Knight, Benjamin Chess, John Schulman Training language models to follow instructions with human feedback Long Ouyang, Jeff Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul Christiano, Jan Leike, Ryan Lowe Additional References Our approach to alignment research , OpenAI 2022 Training Verifiers to Solve Math Word Problems , Cobbe et al 2021 UC Berkeley Deep RL Bootcamp Lecture 6: Nuts and Bolts of Deep RL Experimentation , John Schulman 2017 Proximal Policy Optimization Algorithms , Schulman 2017 Optimizing Expectations: From Deep Reinforcement Learning to Stochastic Computation Graphs , Schulman 2016…
T
TalkRL: The Reinforcement Learning Podcast

Sven Mika is the Reinforcement Learning Team Lead at Anyscale, and lead committer of RLlib. He holds a PhD in biomathematics, bioinformatics, and computational biology from Witten/Herdecke University. Featured References RLlib Documentation: RLlib: Industry-Grade Reinforcement Learning Ray: Documentation RLlib: Abstractions for Distributed Reinforcement Learning Eric Liang, Richard Liaw, Philipp Moritz, Robert Nishihara, Roy Fox, Ken Goldberg, Joseph E. Gonzalez, Michael I. Jordan, Ion Stoica Episode sponsor: Anyscale Ray Summit 2022 is coming to San Francisco on August 23-24. Hear how teams at Dow, Verizon, Riot Games, and more are solving their RL challenges with Ray's RLlib. Register at raysummit.org and use code RAYSUMMIT22RL for a further 25% off the already reduced prices.…
T
TalkRL: The Reinforcement Learning Podcast

Karol Hausman is a Senior Research Scientist at Google Brain and an Adjunct Professor at Stanford working on robotics and machine learning. Karol is interested in enabling robots to acquire general-purpose skills with minimal supervision in real-world environments. Fei Xia is a Research Scientist with Google Research. Fei Xia is mostly interested in robot learning in complex and unstructured environments. Previously he has been approaching this problem by learning in realistic and scalable simulation environments (GibsonEnv, iGibson). Most recently, he has been exploring using foundation models for those challenges. Featured References Do As I Can, Not As I Say: Grounding Language in Robotic Affordances [ website ] Michael Ahn, Anthony Brohan, Noah Brown, Yevgen Chebotar, Omar Cortes, Byron David, Chelsea Finn, Keerthana Gopalakrishnan, Karol Hausman, Alex Herzog, Daniel Ho, Jasmine Hsu, Julian Ibarz, Brian Ichter, Alex Irpan, Eric Jang, Rosario Jauregui Ruano, Kyle Jeffrey, Sally Jesmonth, Nikhil J Joshi, Ryan Julian, Dmitry Kalashnikov, Yuheng Kuang, Kuang-Huei Lee, Sergey Levine, Yao Lu, Linda Luu, Carolina Parada, Peter Pastor, Jornell Quiambao, Kanishka Rao, Jarek Rettinghouse, Diego Reyes, Pierre Sermanet, Nicolas Sievers, Clayton Tan, Alexander Toshev, Vincent Vanhoucke, Fei Xia, Ted Xiao, Peng Xu, Sichun Xu, Mengyuan Yan Inner Monologue: Embodied Reasoning through Planning with Language Models Wenlong Huang, Fei Xia, Ted Xiao, Harris Chan, Jacky Liang, Pete Florence, Andy Zeng, Jonathan Tompson, Igor Mordatch, Yevgen Chebotar, Pierre Sermanet, Noah Brown, Tomas Jackson, Linda Luu, Sergey Levine, Karol Hausman, Brian Ichter Additional References Large-scale simulation for embodied perception and robot learning , Xia 2021 QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation , Kalashnikov et al 2018 MT-Opt: Continuous Multi-Task Robotic Reinforcement Learning at Scale , Kalashnikov et al 2021 ReLMoGen: Leveraging Motion Generation in Reinforcement Learning for Mobile Manipulation , Xia et al 2020 Actionable Models: Unsupervised Offline Reinforcement Learning of Robotic Skills , Chebotar et al 2021 Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language , Zeng et al 2022 Episode sponsor: Anyscale Ray Summit 2022 is coming to San Francisco on August 23-24. Hear how teams at Dow, Verizon, Riot Games, and more are solving their RL challenges with Ray's RLlib. Register at raysummit.org and use code RAYSUMMIT22RL for a further 25% off the already reduced prices.…
T
TalkRL: The Reinforcement Learning Podcast

Saikrishna Gottipati is an RL Researcher at AI Redefined, working on RL, MARL, human in the loop learning. Featured References Cogment: Open Source Framework For Distributed Multi-actor Training, Deployment & Operations AI Redefined, Sai Krishna Gottipati, Sagar Kurandwad, Clodéric Mars, Gregory Szriftgiser, François Chabot Do As You Teach: A Multi-Teacher Approach to Self-Play in Deep Reinforcement Learning Currently under review Learning to navigate the synthetically accessible chemical space using reinforcement learning Sai Krishna Gottipati, Boris Sattarov, Sufeng Niu, Yashaswi Pathak, Haoran Wei, Shengchao Liu, Karam J. Thomas, Simon Blackburn, Connor W. Coley, Jian Tang, Sarath Chandar, Yoshua Bengio Additional References Asymmetric self-play for automatic goal discovery in robotic manipulation , 2021 OpenAI et al Continuous Coordination As a Realistic Scenario for Lifelong Learning , 2021 Nekoei et al Episode sponsor: Anyscale Ray Summit 2022 is coming to San Francisco on August 23-24. Hear how teams at Dow, Verizon, Riot Games, and more are solving their RL challenges with Ray's RLlib. Register at raysummit.org and use code RAYSUMMIT22RL for a further 25% off the already reduced prices.…
T
TalkRL: The Reinforcement Learning Podcast

Aravind Srinivas is back! He is now a research Scientist at OpenAI. Featured References Decision Transformer: Reinforcement Learning via Sequence Modeling Lili Chen, Kevin Lu, Aravind Rajeswaran, Kimin Lee, Aditya Grover, Michael Laskin, Pieter Abbeel, Aravind Srinivas, Igor Mordatch VideoGPT: Video Generation using VQ-VAE and Transformers Wilson Yan, Yunzhi Zhang, Pieter Abbeel, Aravind Srinivas…
T
TalkRL: The Reinforcement Learning Podcast

Dr. Rohin Shah is a Research Scientist at DeepMind, and the editor and main contributor of the Alignment Newsletter. Featured References The MineRL BASALT Competition on Learning from Human Feedback Rohin Shah, Cody Wild, Steven H. Wang, Neel Alex, Brandon Houghton, William Guss, Sharada Mohanty, Anssi Kanervisto, Stephanie Milani, Nicholay Topin, Pieter Abbeel, Stuart Russell, Anca Dragan Preferences Implicit in the State of the World Rohin Shah, Dmitrii Krasheninnikov, Jordan Alexander, Pieter Abbeel, Anca Dragan Benefits of Assistance over Reward Learning Rohin Shah, Pedro Freire, Neel Alex, Rachel Freedman, Dmitrii Krasheninnikov, Lawrence Chan, Michael D Dennis, Pieter Abbeel, Anca Dragan, Stuart Russell On the Utility of Learning about Humans for Human-AI Coordination Micah Carroll, Rohin Shah, Mark K. Ho, Thomas L. Griffiths, Sanjit A. Seshia, Pieter Abbeel, Anca Dragan Evaluating the Robustness of Collaborative Agents Paul Knott, Micah Carroll, Sam Devlin, Kamil Ciosek, Katja Hofmann, A. D. Dragan, Rohin Shah Additional References AGI Safety Fundamentals , EA Cambridge…
T
TalkRL: The Reinforcement Learning Podcast

Jordan Terry is a PhD candidate at University of Maryland, the maintainer of Gym, the maintainer and creator of PettingZoo and the founder of Swarm Labs. Featured References PettingZoo: Gym for Multi-Agent Reinforcement Learning J. K. Terry, Benjamin Black, Nathaniel Grammel, Mario Jayakumar, Ananth Hari, Ryan Sullivan, Luis Santos, Rodrigo Perez, Caroline Horsch, Clemens Dieffendahl, Niall L. Williams, Yashas Lokesh, Praveen Ravi PettingZoo on Github gym on Github Additional References Time Limits in Reinforcement Learning , Pardo et al 2017 Deep Reinforcement Learning at the Edge of the Statistical Precipice , Agarwal et al 2021…
T
TalkRL: The Reinforcement Learning Podcast

Robert Tjarko Lange is a PhD student working at the Technical University Berlin. Featured References Learning not to learn: Nature versus nurture in silico Lange, R. T., & Sprekeler, H. (2020) On Lottery Tickets and Minimal Task Representations in Deep Reinforcement Learning Vischer, M. A., Lange, R. T., & Sprekeler, H. (2021). Semantic RL with Action Grammars: Data-Efficient Learning of Hierarchical Task Abstractions Lange, R. T., & Faisal, A. (2019). MLE-Infrastructure on Github Additional References RL^2: Fast Reinforcement Learning via Slow Reinforcement Learning , Duan et al 2016 Learning to reinforcement learn , Wang et al 2016 Decision Transformer: Reinforcement Learning via Sequence Modeling , Chen et al 2021…
T
TalkRL: The Reinforcement Learning Podcast

1 NeurIPS 2021 Political Economy of Reinforcement Learning Systems (PERLS) Workshop 24:07
24:07
التشغيل لاحقا
التشغيل لاحقا
قوائم
إعجاب
احب24:07
We hear about the idea of PERLS and why its important to talk about. Political Economy of Reinforcement Learning (PERLS) Workshop at NeurIPS 2021 on Tues Dec 14th NeurIPS 2021
T
TalkRL: The Reinforcement Learning Podcast

Amy Zhang is a postdoctoral scholar at UC Berkeley and a research scientist at Facebook AI Research. She will be starting as an assistant professor at UT Austin in Spring 2023. Featured References Invariant Causal Prediction for Block MDPs Amy Zhang, Clare Lyle, Shagun Sodhani, Angelos Filos, Marta Kwiatkowska, Joelle Pineau, Yarin Gal, Doina Precup Multi-Task Reinforcement Learning with Context-based Representations Shagun Sodhani, Amy Zhang, Joelle Pineau MBRL-Lib: A Modular Library for Model-based Reinforcement Learning Luis Pineda, Brandon Amos, Amy Zhang, Nathan O. Lambert, Roberto Calandra Additional References Amy Zhang - Exploring Context for Better Generalization in Reinforcement Learning @ UCL DARK ICML 2020 Poster session: Invariant Causal Prediction for Block MDPs Clare Lyle - Invariant Prediction for Generalization in Reinforcement Learning @ Simons Institute…
T
TalkRL: The Reinforcement Learning Podcast

Xianyuan Zhan is currently a research assistant professor at the Institute for AI Industry Research (AIR), Tsinghua University. He received his Ph.D. degree at Purdue University. Before joining Tsinghua University, Dr. Zhan worked as a researcher at Microsoft Research Asia (MSRA) and a data scientist at JD Technology. At JD Technology, he led the research that uses offline RL to optimize real-world industrial systems. Featured References DeepThermal: Combustion Optimization for Thermal Power Generating Units Using Offline Reinforcement Learning Xianyuan Zhan, Haoran Xu, Yue Zhang, Yusen Huo, Xiangyu Zhu, Honglei Yin, Yu Zheng…
T
TalkRL: The Reinforcement Learning Podcast

Eugene Vinitsky is a PhD student at UC Berkeley advised by Alexandre Bayen. He has interned at Tesla and Deepmind. Featured References A learning agent that acquires social norms from public sanctions in decentralized multi-agent settings Eugene Vinitsky, Raphael Köster, John P. Agapiou, Edgar Duéñez-Guzmán, Alexander Sasha Vezhnevets, Joel Z. Leibo Optimizing Mixed Autonomy Traffic Flow With Decentralized Autonomous Vehicles and Multi-Agent RL Eugene Vinitsky, Nathan Lichtle, Kanaad Parvate, Alexandre Bayen Lagrangian Control through Deep-RL: Applications to Bottleneck Decongestion Eugene Vinitsky; Kanaad Parvate; Aboudy Kreidieh; Cathy Wu; Alexandre Bayen 2018 The Surprising Effectiveness of PPO in Cooperative Multi-Agent Games Chao Yu, Akash Velu, Eugene Vinitsky, Yu Wang, Alexandre Bayen, Yi Wu Additional References SUMO: Simulation of Urban MObility…
T
TalkRL: The Reinforcement Learning Podcast

Posters and Hallway episodes are short interviews and poster summaries. Recorded at RLC 2024 in Amherst MA. Featuring: 0:01 Ann Huang from Harvard on Learning Dynamics and the Geometry of Neural Dynamics in Recurrent Neural Controllers 1:37 Jannis Blüml from TU Darmstadt on HackAtari: Atari Learning Environments for Robust and Continual Reinforcement Learning 3:13 Benjamin Fuhrer from NVIDIA on Gradient Boosting Reinforcement Learning 3:54 Paul Festor from Imperial College London on Evaluating the impact of explainable RL on physician decision-making in high-fidelity simulations: insights from eye-tracking metrics…
T
TalkRL: The Reinforcement Learning Podcast

1 Finale Doshi-Velez on RL for Healthcare @ RCL 2024 7:35
7:35
التشغيل لاحقا
التشغيل لاحقا
قوائم
إعجاب
احب7:35
Finale Doshi-Velez is a Professor at the Harvard Paulson School of Engineering and Applied Sciences. This off-the-cuff interview was recorded at UMass Amherst during the workshop day of RL Conference on August 9th 2024. Host notes: I've been a fan of some of Prof Doshi-Velez' past work on clinical RL and hoped to feature her for some time now, so I jumped at the chance to get a few minutes of her thoughts -- even though you can tell I was not prepared and a bit flustered tbh. Thanks to Prof Doshi-Velez for taking a moment for this, and I hope to cross paths in future for a more in depth interview. References Finale Doshi-Velez Homepage @ Harvard Finale Doshi-Velez on Google Scholar…
T
TalkRL: The Reinforcement Learning Podcast

1 David Silver 2 - Discussion after Keynote @ RCL 2024 16:17
16:17
التشغيل لاحقا
التشغيل لاحقا
قوائم
إعجاب
احب16:17
Thanks to Professor Silver for permission to record this discussion after his RLC 2024 keynote lecture. Recorded at UMass Amherst during RCL 2024. Due to the live recording environment, audio quality varies. We publish this audio in its raw form to preserve the authenticity and immediacy of the discussion. References AlphaProof announcement on DeepMind's blog Discovering Reinforcement Learning Algorithms , Oh et al -- His keynote at RLC 2024 referred to more recent update to this work, yet to be published Reinforcement Learning Conference 2024 David Silver on Google Scholar…
T
TalkRL: The Reinforcement Learning Podcast

David Silver is a principal research scientist at DeepMind and a professor at University College London. This interview was recorded at UMass Amherst during RLC 2024. References Discovering Reinforcement Learning Algorithms , Oh et al -- His keynote at RLC 2024 referred to more recent update to this work, yet to be published Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm , Silver et al 2017 -- the AlphaZero algo was used in his recent work on AlphaProof AlphaProof on the DeepMind blog AlphaFold on the DeepMind blog Reinforcement Learning Conference 2024 David Silver on Google Scholar…
T
TalkRL: The Reinforcement Learning Podcast

Dr. Vincent Moens is an Applied Machine Learning Research Scientist at Meta, and an author of TorchRL and TensorDict in pytorch. Featured References TorchRL: A data-driven decision-making library for PyTorch Albert Bou, Matteo Bettini, Sebastian Dittert, Vikash Kumar, Shagun Sodhani, Xiaomeng Yang, Gianni De Fabritiis, Vincent Moens Additional References TorchRL on github TensorDict Documentation…
T
TalkRL: The Reinforcement Learning Podcast

Arash Ahmadian is a Researcher at Cohere and Cohere For AI focussed on Preference Training of large language models. He’s also a researcher at the Vector Institute of AI. Featured Reference Back to Basics: Revisiting REINFORCE Style Optimization for Learning from Human Feedback in LLMs Arash Ahmadian, Chris Cremer, Matthias Gallé, Marzieh Fadaee, Julia Kreutzer, Olivier Pietquin, Ahmet Üstün, Sara Hooker Additional References Self-Rewarding Language Models , Yuan et al 2024 Reinforcement Learning: An Introduction , Sutton and Barto 1992 Learning from Delayed Rewards , Chris Watkins 1989 Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , Williams 1992…
T
TalkRL: The Reinforcement Learning Podcast

Glen Berseth is an assistant professor at the Université de Montréal, a core academic member of the Mila - Quebec AI Institute, a Canada CIFAR AI chair, member l'Institute Courtios, and co-director of the Robotics and Embodied AI Lab (REAL). Featured Links Reinforcement Learning Conference Closing the Gap between TD Learning and Supervised Learning--A Generalisation Point of View Raj Ghugare, Matthieu Geist, Glen Berseth, Benjamin Eysenbach…
T
TalkRL: The Reinforcement Learning Podcast

Ian Osband is a Research scientist at OpenAI (ex DeepMind, Stanford) working on decision making under uncertainty. We spoke about: - Information theory and RL - Exploration, epistemic uncertainty and joint predictions - Epistemic Neural Networks and scaling to LLMs Featured References Reinforcement Learning, Bit by Bit Xiuyuan Lu, Benjamin Van Roy, Vikranth Dwaracherla, Morteza Ibrahimi, Ian Osband, Zheng Wen From Predictions to Decisions: The Importance of Joint Predictive Distributions Zheng Wen, Ian Osband, Chao Qin, Xiuyuan Lu, Morteza Ibrahimi, Vikranth Dwaracherla, Mohammad Asghari, Benjamin Van Roy Epistemic Neural Networks Ian Osband, Zheng Wen, Seyed Mohammad Asghari, Vikranth Dwaracherla, Morteza Ibrahimi, Xiuyuan Lu, Benjamin Van Roy Approximate Thompson Sampling via Epistemic Neural Networks Ian Osband, Zheng Wen, Seyed Mohammad Asghari, Vikranth Dwaracherla, Morteza Ibrahimi, Xiuyuan Lu, Benjamin Van Roy Additional References Thesis defence , Ian Osband Homepage , Ian Osband Epistemic Neural Networks at Stanford RL Forum Behaviour Suite for Reinforcement Learning , Osband et al 2019 Efficient Exploration for LLMs , Dwaracherla et al 2024…
T
TalkRL: The Reinforcement Learning Podcast

Sharath Chandra Raparthy on In-Context Learning for Sequential Decision Tasks, GFlowNets, and more! Sharath Chandra Raparthy is an AI Resident at FAIR at Meta, and did his Master's at Mila. Featured Reference Generalization to New Sequential Decision Making Tasks with In-Context Learning Sharath Chandra Raparthy , Eric Hambro, Robert Kirk , Mikael Henaff, , Roberta Raileanu Additional References Sharath Chandra Raparthy Homepage Human-Timescale Adaptation in an Open-Ended Task Space , Adaptive Agent Team 2023 Data Distributional Properties Drive Emergent In-Context Learning in Transformers , Chan et al 2022 Decision Transformer: Reinforcement Learning via Sequence Modeling , Chen et al 2021…
T
TalkRL: The Reinforcement Learning Podcast

Pierluca D'Oro and Martin Klissarov on Motif and RLAIF, Noisy Neighborhoods and Return Landscapes, and more! Pierluca D'Oro is PhD student at Mila and visiting researcher at Meta. Martin Klissarov is a PhD student at Mila and McGill and research scientist intern at Meta. Featured References Motif: Intrinsic Motivation from Artificial Intelligence Feedback Martin Klissarov*, Pierluca D'Oro*, Shagun Sodhani, Roberta Raileanu, Pierre-Luc Bacon, Pascal Vincent, Amy Zhang, Mikael Henaff Policy Optimization in a Noisy Neighborhood: On Return Landscapes in Continuous Control Nate Rahn*, Pierluca D'Oro*, Harley Wiltzer, Pierre-Luc Bacon, Marc G. Bellemare To keep doing RL research, stop calling yourself an RL researcher Pierluca D'Oro…
T
TalkRL: The Reinforcement Learning Podcast

Martin Riedmiller of Google DeepMind on controlling nuclear fusion plasma in a tokamak with RL, the original Deep Q-Network, Neural Fitted Q-Iteration, Collect and Infer, AGI for control systems, and tons more! Martin Riedmiller is a research scientist and team lead at DeepMind. Featured References Magnetic control of tokamak plasmas through deep reinforcement learning Jonas Degrave, Federico Felici, Jonas Buchli, Michael Neunert, Brendan Tracey, Francesco Carpanese, Timo Ewalds, Roland Hafner, Abbas Abdolmaleki, Diego de las Casas, Craig Donner, Leslie Fritz, Cristian Galperti, Andrea Huber, James Keeling, Maria Tsimpoukelli, Jackie Kay, Antoine Merle, Jean-Marc Moret, Seb Noury, Federico Pesamosca, David Pfau, Olivier Sauter, Cristian Sommariva, Stefano Coda, Basil Duval, Ambrogio Fasoli, Pushmeet Kohli, Koray Kavukcuoglu, Demis Hassabis & Martin Riedmiller Human-level control through deep reinforcement learning Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg, Demis Hassabis Neural fitted Q iteration–first experiences with a data efficient neural reinforcement learning method Martin Riedmiller…
T
TalkRL: The Reinforcement Learning Podcast

Max Schwarzer is a PhD student at Mila, with Aaron Courville and Marc Bellemare, interested in RL scaling, representation learning for RL, and RL for science. Max spent the last 1.5 years at Google Brain/DeepMind, and is now at Apple Machine Learning Research. Featured References Bigger, Better, Faster: Human-level Atari with human-level efficiency Max Schwarzer, Johan Obando-Ceron, Aaron Courville, Marc Bellemare, Rishabh Agarwal, Pablo Samuel Castro Sample-Efficient Reinforcement Learning by Breaking the Replay Ratio Barrier Pierluca D'Oro, Max Schwarzer, Evgenii Nikishin, Pierre-Luc Bacon, Marc G Bellemare, Aaron Courville The Primacy Bias in Deep Reinforcement Learning Evgenii Nikishin, Max Schwarzer, Pierluca D'Oro, Pierre-Luc Bacon, Aaron Courville Additional References Rainbow: Combining Improvements in Deep Reinforcement Learning , Hessel et al 2017 When to use parametric models in reinforcement learning? Hasselt et al 2019 Data-Efficient Reinforcement Learning with Self-Predictive Representations , Schwarzer et al 2020 Pretraining Representations for Data-Efficient Reinforcement Learning , Schwarzer et al 2021…
T
TalkRL: The Reinforcement Learning Podcast

Julian Togelius is an Associate Professor of Computer Science and Engineering at NYU, and Cofounder and research director at modl.ai Featured References Choose Your Weapon: Survival Strategies for Depressed AI Academics Julian Togelius, Georgios N. Yannakakis Learning Controllable 3D Level Generators Zehua Jiang, Sam Earle, Michael Cerny Green, Julian Togelius PCGRL: Procedural Content Generation via Reinforcement Learning Ahmed Khalifa, Philip Bontrager, Sam Earle, Julian Togelius Illuminating Generalization in Deep Reinforcement Learning through Procedural Level Generation Niels Justesen, Ruben Rodriguez Torrado, Philip Bontrager, Ahmed Khalifa, Julian Togelius, Sebastian Risi…
T
TalkRL: The Reinforcement Learning Podcast

Jakob Foerster on Multi-Agent learning, Cooperation vs Competition, Emergent Communication, Zero-shot coordination, Opponent Shaping, agents for Hanabi and Prisoner's Dilemma, and more. Jakob Foerster is an Associate Professor at University of Oxford. Featured References Learning with Opponent-Learning Awareness Jakob N. Foerster, Richard Y. Chen, Maruan Al-Shedivat, Shimon Whiteson, Pieter Abbeel, Igor Mordatch Model-Free Opponent Shaping Chris Lu, Timon Willi, Christian Schroeder de Witt, Jakob Foerster Off-Belief Learning Hengyuan Hu, Adam Lerer, Brandon Cui, David Wu, Luis Pineda, Noam Brown, Jakob Foerster Learning to Communicate with Deep Multi-Agent Reinforcement Learning Jakob N. Foerster, Yannis M. Assael, Nando de Freitas, Shimon Whiteson Adversarial Cheap Talk Chris Lu, Timon Willi, Alistair Letcher, Jakob Foerster Cheap Talk Discovery and Utilization in Multi-Agent Reinforcement Learning Yat Long Lo, Christian Schroeder de Witt, Samuel Sokota, Jakob Nicolaus Foerster, Shimon Whiteson Additional References Lectures by Jakob on youtube…
T
TalkRL: The Reinforcement Learning Podcast

Danijar Hafner on the DreamerV3 agent and world models, the Director agent and heirarchical RL, realtime RL on robots with DayDreamer, and his framework for unsupervised agent design! Danijar Hafner is a PhD candidate at the University of Toronto with Jimmy Ba, a visiting student at UC Berkeley with Pieter Abbeel, and an intern at DeepMind. He has been our guest before back on episode 11. Featured References Mastering Diverse Domains through World Models [ blog ] DreaverV3 Danijar Hafner, Jurgis Pasukonis, Jimmy Ba, Timothy Lillicrap DayDreamer: World Models for Physical Robot Learning [ blog ] Philipp Wu, Alejandro Escontrela, Danijar Hafner, Ken Goldberg, Pieter Abbeel Deep Hierarchical Planning from Pixels [ blog ] Danijar Hafner, Kuang-Huei Lee, Ian Fischer, Pieter Abbeel Action and Perception as Divergence Minimization [ blog ] Danijar Hafner, Pedro A. Ortega, Jimmy Ba, Thomas Parr, Karl Friston, Nicolas Heess Additional References Mastering Atari with Discrete World Models [ blog ] DreaverV2 ; Danijar Hafner, Timothy Lillicrap, Mohammad Norouzi, Jimmy Ba Dream to Control: Learning Behaviors by Latent Imagination [ blog ] Dreamer ; Danijar Hafner, Timothy Lillicrap, Jimmy Ba, Mohammad Norouzi Planning to Explore via Self-Supervised World Models ; Ramanan Sekar, Oleh Rybkin, Kostas Daniilidis, Pieter Abbeel, Danijar Hafner, Deepak Pathak…
مرحبًا بك في مشغل أف ام!
يقوم برنامج مشغل أف أم بمسح الويب للحصول على بودكاست عالية الجودة لتستمتع بها الآن. إنه أفضل تطبيق بودكاست ويعمل على أجهزة اندرويد والأيفون والويب. قم بالتسجيل لمزامنة الاشتراكات عبر الأجهزة.