Your browser is ancient!
Upgrade to a different browser to experience this site.

Skip to main content

Artificial Intelligence

What is Generative AI? What are Large Language Models (LLMs)?

In this video, Joyce Chai, Professor of Electrical Engineering and Computer Science, shares insights about generative AI, embodied AI, and large language models (LLMs).

Excerpt From

Transcript

My name is Joyce Chai. I'm a professor in the Computer Science and Engineering Division. I'm a member of Michigan AI lab. I'm also associate director of the Michigan Institute of a data science. My research is in the area of a natural language processing and artificial intelligence. My recent work focuses on developing situated language understanding models and enable language communication was embodied AI agents. Generative AI is a new branch in AI, which is about model's ability to generate new content based on some kind of a textual input. It can generate answers to a question, or generate image, or generate music or generate programming languages such as code. What enables this generative AI is the recent advances in foundation models. Such as large language models ChatGPT and models that generate language such as [inaudible]. These are the models that are trained based on huge amount of data, usually at the Internet scale. Once these models are trained, they can be used for downstream tasks. In natural language processing, language modeling is used to refer to computational models that can be trained based on corporate texts to predict the next word so that is what language modeling is about. For example, given a linguistic context, such as I went to a conference last, what would be the word following that last? It could be last week, last year, last May. There are many possibilities. Language model is used to predict what will be the most likely word following that word last. Large language models refer to the use of deep neural network to predict the next word, and these models are large in the sense that they have billions, or hundreds of billions of parameters. Researchers use tons of texture data from the Internet, such as hundreds of billions of words to train these large models. Neural networks are inspired by human brains, which consists of billions of neuron. These large language models also are made-up of numerous nodes and their interactions. The parameters of these models refer to the weights of those connections between the nodes. When we say training the large language model, so we're trying to get is the numerical values of these parameters that can make sure the models do a good job as certain tasks. For example, when predicting the next word, what we have is this tons of these sentences, and we know exactly what the next word it is. This is actually really, really helpful because this gives us the supervision to train the language model, which is called self supervision. Essentially, we want to train this model so that they can predict that next true word in the data. By comparing the true next word and the model predict the next word, there is a mathematical procedure involved which is called back propagation, which essentially is a truest models to improve its prediction. You can imagine that these models are trained for every word in the data and at the end of a training, these models, this is called the pre-trained models. You have these parameters tuned and these models can be used for downstream tasks. ChatGPT is made available by OpenAI. These models on top of the models that are trained based on predicting the next word this model trained further by using instructional data. This instructional data essentially it just tells the model, given some input, what should be the right output. If you try ChatGPT on the Open AI playground you'll notice that they ask you to give us thumb up or thumb down to the generated content. ChatGPT also use this kind of a human feedback to fine tune their model.