An improvisation on self-improving AI
Consider a simple idea. The intellectual capabilities of humanity increase with each generation1. One way to explain the rise of intelligence is by looking at the intelligence we store in our environment. We store knowledge in books, computers, art, in the things we use in everyday life, in the systems we live in, the cities, the modes of transportation. Each new “invention” is a way of storing some key piece of information. And most importantly, a new generation will grow up with that invention and will therefore think of it, or not think of it, in a completely different light to the generation that came before. Just think of the way different generations have interacted with computers throughout their development. The same is likely to be observed with AI and everything that it affects. This rise in intelligence is a gradual process, but it can be abstracted away as a form of shaping one’s environment and then allowing the next generation to develop in it without the burden of what came before. This frees up computational resources to be used on other things, like changing the environment in even more novel ways. And thus the cycle continues. There are definitely nuances to be worked out here, this is only supposed to be a sketch, but let’s now see how it might apply to AI, specifically LLMs2.
1 there’s probably many ways of interpreting this but there are many measurable metrics which have steadily increased over the course of known history, let’s assume at least one of those metrics is a good proxy for intelligence
2 though this should in principle be applicable to other models too
LLMs are trained mostly on human-created data. Take the entirety of the internet, apply some preprocessing steps and general data cleaning, and you’re left with a dataset, call it \(\mathcal{D}\). You then train the LLM, which attempts to minimize the loss in predicting the next token given a sequence from the dataset, in other words the LLM is trying to become a sufficient representation of \(\mathcal{D}\). If we think of the occurrence of words, sentences, paragraphs, concepts, and anything else in \(\mathcal{D}\) as having some distribution, the model will tend to approximate that distribution. The power of the trained LLM is then bending the distribution in interesting ways, this would be called “prompting”. By prompting the model, you skew the outputs to a particular part of the distribution that is likely to come with that set of tokens as a prefix and this way you can get the LLM to generate data that is useful to you. Importantly, you can do this indefinitely. Of course, while there is a LOT of data out there, it is still limited, and while you can build very useful systems out of LLMs, it’s an open question as to whether that can scale indefinitely3. Nevertheless, you can use current LLMs to generate effectively infinite amounts of data. The question is, is there a process similar to the environment augmentation that facilitates increased fitness of the human population across generations for language and LLMs?
3 human data in → human-like data out?
There’s already a lot of work that begins to address this question Prystawski, Li, and Goodman (2023), and what it shows is roughly that applying methods colloquially useful for human performance (e.g. “don’t just jump into conclusions, think through the problem step-by-step”) to LLMs tends to improve task performance. Furthermore, some of these works have shown that you can improve performance by either forcing the model to structure its responses in a particular way or to generate new data for itself that it’s then finetuned on. And importantly these are phenomena which have been previously observed to work for humans. Human data → human cognitive science and psychology → mechanisms for improving AI. The follow-up question is, what are the scaling laws of these methods? Can STaR improve itself indefinitely in more complex tasks? And if not, what do you attribute the tapering-off to, is it the complexity of the task, the expressivity of the model, the method, or our execution of the method? Can we say something about the limits of human reasoning by trying to apply human reasoning mechanisms to improve LLMs? Is there a way to discover new mechanisms that are better suited for LLMs but are perhaps not relevant for humans? The fundamental question is, can you build a model that’s as powerful as GPT-x by using only data generated by GPT-(x-1)? And further, can you build a model that’s better than GPT-x by using data only generated by GPT-x? Can you do that indefinitely? If not, why not?
Let the dataset generated by an LLM \(\mathcal{M}\) be called \(\mathcal{D}'\). Then we are interested in the following questions:
- Can you train a model \(\mathcal{M}'\) with \(\mathcal{D}'\) that surpasses the performance of \(\mathcal{M}\)?
- Can you train a model \(\mathcal{M}'\) on the ensemble \(\{\mathcal{D}, \mathcal{D}'\}\) that surpasses the performance of \(\mathcal{M}\)?
- Can you fine-tune \(\mathcal{M}\) to get a model \(\mathcal{M}'\) with better performance?
- Can you generate successive datasets \(\mathcal{D}'', \mathcal{D}''', ...\) and better models \(\mathcal{M}', \mathcal{M}'', ...\) indefinitely? If not, what are the scaling laws? And how can you quantify the best possible performance that you can reach? How should you reason about the structure you’re imposing on \(\mathcal{D}'\) to maximize its usefulness in improving the model?
Citation
@misc{smékal2024,
author = {Jakub Smékal},
title = {An Improvisation on Self-Improving {AI}},
date = {2024-08-10},
url = {https://jakubsmekal.com/writings/081024_self_improving_ai/self_improving_ai.html},
langid = {en}
}