Open Source LLMs: Llama and Its Competitors
In this video, Christopher Brooks, Associate Professor of Information, discusses Llama 2 as an open-source large language model (LLM) and what it means for a large language model to be considered "open-source."
"AI large language models" by Wes Cockx & Google DeepMind is licensed under CC BY 4.0.
Excerpt From
Transcript
Perhaps the most exciting thing about Llama 2 is that Meta announced it as an open-source LLM and is pledged to continue open sourcing Llama in future variants as well. But what's it mean for a large language model to be open source? Well, it's not totally clear; with software, the definition of Open Source is set by the Open Source Initiative, the OSI, and it focuses explicitly on improving licenses which adhere to 10 specific principles. But are these principles as relevant or even appropriate when we consider that the software part of the large language model is really only a small part of what makes the model useful? For instance, what about the data the model was trained on; should that have to be open and freely distributed for a model to be considered open source? Or what about the weights of the pre-trained model; is that sufficient in order to consider the model itself as open source? These questions aren't well answered, but nonetheless, we're starting to see a more clear distinction that arises between models where the weights are available for download, like with Llama 2, and where they aren't, for instance, with OpenAI's ChatGPT. Now, Meta certainly made waves when they released Llama 2 as an open-source large language model, and as I've said, they've doubled down on this and have pledged that future Lama models will be open source too. The historical aspect of how Llama 2 was released is actually quite intriguing, and it turns out that Meta didn't release the first Llama model openly; instead, you needed to apply for access, and it was only provided to certain researchers. However, someone else ended up releasing the model as a torrent on 4chan, presumably after getting a copy from someone who has registered with Meta as a researcher. And using at least a pragmatic definition of what it means to be open, it does seem like the Llama 2 codebase is indeed open source. It doesn't fit the OSI definition as the license being used here is a proprietary one, the Llama Community License, which the OSI has not yet endorsed, but you can go to GitHub, and you can freely browse the model architecture source code. Similarly, the pre-trained model data—the weights—are also available, at least once you apply for and are granted access by Meta. And we'll talk about that in a moment. But if you consider the larger ecosystem of activities—all of the training data, the system for doing reinforcement learning, and fine-tuning, and so forth—there's some pieces missing, so it's not quite as trivial to train your own Llama 2 variant just based on the papers that you've read and the code that's been provided.