انتقل إلى وضع عدم الاتصال باستخدام تطبيق Player FM !
المدونة الصوتية تستحق الاستماع
برعاية


[QA] Vision as a Dialect: Unifying Visual Understanding and Generation via Text-Aligned Representations
Manage episode 490477515 series 3524393
This paper introduces Tar, a multimodal framework integrating visual understanding and generation through a shared semantic representation, enhancing efficiency and performance in cross-modal tasks.
https://arxiv.org/abs//2506.18898
YouTube: https://www.youtube.com/@ArxivPapers
TikTok: https://www.tiktok.com/@arxiv_papers
Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016
Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers
2489 حلقات
Manage episode 490477515 series 3524393
This paper introduces Tar, a multimodal framework integrating visual understanding and generation through a shared semantic representation, enhancing efficiency and performance in cross-modal tasks.
https://arxiv.org/abs//2506.18898
YouTube: https://www.youtube.com/@ArxivPapers
TikTok: https://www.tiktok.com/@arxiv_papers
Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016
Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers
2489 حلقات
كل الحلقات
×
1 [QA] On the Theoretical Limitations of Embedding-Based Retrieval 8:55

1 On the Theoretical Limitations of Embedding-Based Retrieval 23:17

1 [QA] Beyond GPT-5: Making LLMs Cheaper and Better via Performance–Efficiency Optimized Routing 7:03

1 Beyond GPT-5: Making LLMs Cheaper and Better via Performance–Efficiency Optimized Routing 9:39

1 [QA] Measuring the environmental impact of delivering AI at Google Scale 8:17

1 Measuring the environmental impact of delivering AI at Google Scale 22:09



1 [QA] Intern-S1: A Scientific Multimodal Foundation Model 8:33

1 Intern-S1: A Scientific Multimodal Foundation Model 49:42





1 [QA] SSRL: Self-Search Reinforcement Learning 7:39

1 SSRL: Self-Search Reinforcement Learning 32:32

1 [QA] Deep Ignorance: Filtering Pretraining Data Builds Tamper-Resistant Safeguards into Open-Weight LLMs 7:19

1 Deep Ignorance: Filtering Pretraining Data Builds Tamper-Resistant Safeguards into Open-Weight LLMs 31:24

1 [QA] Beyond Ten Turns: Unlocking Long-Horizon Agentic Search with Large-Scale Asynchronous RL 7:42

1 Beyond Ten Turns: Unlocking Long-Horizon Agentic Search with Large-Scale Asynchronous RL 28:28

1 [QA] Part 1: Tricks or Traps? A Deep Dive into RL for LLM Reasoning 7:57

1 Part 1: Tricks or Traps? A Deep Dive into RL for LLM Reasoning 25:13

1 [QA] MolmoAct: Action Reasoning Models that can Reason in Space 7:34

1 MolmoAct: Action Reasoning Models that can Reason in Space 36:14

1 [QA] On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification 7:59

1 On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification 21:20

1 [QA] R-Zero: Self-Evolving Reasoning LLM from Zero Data 7:18

1 R-Zero: Self-Evolving Reasoning LLM from Zero Data 22:10
مرحبًا بك في مشغل أف ام!
يقوم برنامج مشغل أف أم بمسح الويب للحصول على بودكاست عالية الجودة لتستمتع بها الآن. إنه أفضل تطبيق بودكاست ويعمل على أجهزة اندرويد والأيفون والويب. قم بالتسجيل لمزامنة الاشتراكات عبر الأجهزة.