#2: Large Language Models Can Self-Improve
Manage episode 347635558 series 3413483
Google recently announced a significant breakthrough: a new Language Model Self-Improvement (LMSI) system that makes it possible for big language models to improve their own performance on many tasks without using any additional labeled data. In this post, and its accompanying podcast, we’ll take a look at LMSI to understand why it’s such a big deal.
When applying LMSI to a 540B parameter PaLM model, the Google researchers achieved state-of-the-art results across a variety of arithmetic reasoning, commonsense reasoning, and natural language inference tasks.
The LMSI system allows a language model to self-improve in 3 steps:
- First, you give the system some questions like “Stefan goes to a restaurant with his family. They order an appetizer that costs $10 and 4 entrees that are $20 each. If they tip 20% of the total, what is the total amount of money that they spend?”
- Then, you ask the language model to explain the answer to the question in 32 different ways. For example, one explanation could be “The appetizer costs $10. The entrees cost 4 * $20 = $80. The tip is 20% of the total, so it is 20% of the $90 they have spent. The tip is 0.2 * 90 = $18. The total they spent is $90 + $18 = $108. The answer is 108.”
- Finally, the system picks the explanations with the most common answer and trains the language model on these explanations. For example, if 16 out of 32 explanations give $108 as the answer, and the other explanations have a mix of different answers, then the system will pick the explanations that gave $108 as the answer.
This approach lets an LMSI-augmented language model significantly improve its own performance and achieve state-of-the-art results on reasoning problems.
The authors found that the LMSI system makes language models much more powerful. When they fine-tuned a small language model with LMSI, they found that the model could answer questions better than language models that are 9 times bigger, that didn’t use LMSI.
Industry Context
With only some text-based questions, large language models like PaLM fine-tuned with the LMSI system were able to outperform existing state-of-the-art benchmarks that use more complex reasoning methods and/or ground truth labels. Small language models fine-tuned using LMSI were also able to outperform models that were 9 times larger and did not use LMSI.
This example shows that we are still discovering ways to improve large language models, without increasing model or dataset size, and that it is possible to improve language models without any labeled data. Since LMSI enables small language models to work better than large models without LMSI, malicious uses that leverage LMSI are less expensive to access than they were before.
2 حلقات