Artwork

المحتوى المقدم من Mr. Dew. يتم تحميل جميع محتويات البودكاست بما في ذلك الحلقات والرسومات وأوصاف البودكاست وتقديمها مباشرة بواسطة Mr. Dew أو شريك منصة البودكاست الخاص بهم. إذا كنت تعتقد أن شخصًا ما يستخدم عملك المحمي بحقوق الطبع والنشر دون إذنك، فيمكنك اتباع العملية الموضحة هنا https://ar.player.fm/legal.
Player FM - تطبيق بودكاست
انتقل إلى وضع عدم الاتصال باستخدام تطبيق Player FM !

The Human Touch - How RLHF Aligns Models with Our Values

5:54
 
مشاركة
 

Manage episode 507216557 series 3690669
المحتوى المقدم من Mr. Dew. يتم تحميل جميع محتويات البودكاست بما في ذلك الحلقات والرسومات وأوصاف البودكاست وتقديمها مباشرة بواسطة Mr. Dew أو شريك منصة البودكاست الخاص بهم. إذا كنت تعتقد أن شخصًا ما يستخدم عملك المحمي بحقوق الطبع والنشر دون إذنك، فيمكنك اتباع العملية الموضحة هنا https://ar.player.fm/legal.

How do we make AI not just smart, but safe and genuinely helpful? In this episode of "All Things LLM," Alex and Ben break down the vital process of alignment—transforming a powerful language model into a trustworthy assistant you can rely on.

Inside this episode:

  • What is RLHF? Discover Reinforcement Learning from Human Feedback—the multi-stage process that transforms next-word predictors into helpful, instruction-following bots like ChatGPT or Claude.
  • Step-by-Step Alignment:
    1. Supervised Fine-Tuning (SFT) — Human-written prompt-response pairs teach the model how to answer like a real assistant.
    2. Reward Modeling — Human labelers rank AI-generated responses so a reward model can learn to “judge” answers the way people do.
    3. Reinforcement Learning — Using techniques like PPO, the model iteratively improves, getting nudged to produce ever more helpful, safe, and truthful outputs.
  • Why Human Judgment Matters: Learn how the quality of human feedback and rating instructions shape an AI’s values and its ability to avoid bias, harmful outputs, and unhelpful answers.
  • Limitations & Costs: Understand why RLHF is so powerful yet labor-intensive, and get a practical sense of the real-world constraints involved in aligning advanced AI.

Perfect for listeners searching for:

  • How does RLHF work in AI?
  • Alignment in language models
  • Safe, human-aligned AI
  • PPO and reward modeling
  • Instruction tuning for LLMs
  • Factual and helpful AI assistants

This is the final word on the “human touch” behind the future of trustworthy, reliable AI. Subscribe now—and don’t miss next week’s launch of a new season, as the show tackles the open-source vs. closed-source model debate and what it means for the future of AI development!

All Things LLM is a production of MTN Holdings, LLC. © 2025. All rights reserved.
For more insights, resources, and show updates, visit allthingsllm.com.
For business inquiries, partnerships, or feedback, contact: [email protected]

The views and opinions expressed in this episode are those of the hosts and guests, and do not necessarily reflect the official policy or position of MTN Holdings, LLC.

Unauthorized reproduction or distribution of this podcast, in whole or in part, without written permission is strictly prohibited.
Thank you for listening and supporting the advancement of transparent, accessible AI education.

  continue reading

15 حلقات

Artwork
iconمشاركة
 
Manage episode 507216557 series 3690669
المحتوى المقدم من Mr. Dew. يتم تحميل جميع محتويات البودكاست بما في ذلك الحلقات والرسومات وأوصاف البودكاست وتقديمها مباشرة بواسطة Mr. Dew أو شريك منصة البودكاست الخاص بهم. إذا كنت تعتقد أن شخصًا ما يستخدم عملك المحمي بحقوق الطبع والنشر دون إذنك، فيمكنك اتباع العملية الموضحة هنا https://ar.player.fm/legal.

How do we make AI not just smart, but safe and genuinely helpful? In this episode of "All Things LLM," Alex and Ben break down the vital process of alignment—transforming a powerful language model into a trustworthy assistant you can rely on.

Inside this episode:

  • What is RLHF? Discover Reinforcement Learning from Human Feedback—the multi-stage process that transforms next-word predictors into helpful, instruction-following bots like ChatGPT or Claude.
  • Step-by-Step Alignment:
    1. Supervised Fine-Tuning (SFT) — Human-written prompt-response pairs teach the model how to answer like a real assistant.
    2. Reward Modeling — Human labelers rank AI-generated responses so a reward model can learn to “judge” answers the way people do.
    3. Reinforcement Learning — Using techniques like PPO, the model iteratively improves, getting nudged to produce ever more helpful, safe, and truthful outputs.
  • Why Human Judgment Matters: Learn how the quality of human feedback and rating instructions shape an AI’s values and its ability to avoid bias, harmful outputs, and unhelpful answers.
  • Limitations & Costs: Understand why RLHF is so powerful yet labor-intensive, and get a practical sense of the real-world constraints involved in aligning advanced AI.

Perfect for listeners searching for:

  • How does RLHF work in AI?
  • Alignment in language models
  • Safe, human-aligned AI
  • PPO and reward modeling
  • Instruction tuning for LLMs
  • Factual and helpful AI assistants

This is the final word on the “human touch” behind the future of trustworthy, reliable AI. Subscribe now—and don’t miss next week’s launch of a new season, as the show tackles the open-source vs. closed-source model debate and what it means for the future of AI development!

All Things LLM is a production of MTN Holdings, LLC. © 2025. All rights reserved.
For more insights, resources, and show updates, visit allthingsllm.com.
For business inquiries, partnerships, or feedback, contact: [email protected]

The views and opinions expressed in this episode are those of the hosts and guests, and do not necessarily reflect the official policy or position of MTN Holdings, LLC.

Unauthorized reproduction or distribution of this podcast, in whole or in part, without written permission is strictly prohibited.
Thank you for listening and supporting the advancement of transparent, accessible AI education.

  continue reading

15 حلقات

كل الحلقات

×
 
Loading …

مرحبًا بك في مشغل أف ام!

يقوم برنامج مشغل أف أم بمسح الويب للحصول على بودكاست عالية الجودة لتستمتع بها الآن. إنه أفضل تطبيق بودكاست ويعمل على أجهزة اندرويد والأيفون والويب. قم بالتسجيل لمزامنة الاشتراكات عبر الأجهزة.

 

دليل مرجعي سريع

حقوق الطبع والنشر 2025 | سياسة الخصوصية | شروط الخدمة | | حقوق النشر
استمع إلى هذا العرض أثناء الاستكشاف
تشغيل