Earlier this year, OpenAI made waves when the company previewed their artificial intelligence model Sora. A YouTube video for the text-to-video generator showed hyper realistic depictions of astronauts on an alien world, Woolly mammoths barreling through the snow, a papercraft coral reef, and more.
The type of neural network behind Sora is called a diffusion model. “Diffusion models are my favorite deep learning model because they have elegant underlying mathematics,” said Mengdi Wang, an associate professor of Electrical and Computer Engineering at Princeton University. On April 23, Wang gave a seminar at the Center for Statistics and Machine Learning on diffusion models, how they work, and how they can be used to solve complex tasks.
A diffusion model is like a map which charts the path from noise – like static on the television – into data. When a researcher trains a diffusion model, “we’re thinking about figuring out a sequence of state transitions,” said Wang.
For example, take an image of a dog and add statistical noise. A diffusion model aims to learn to reverse this process. The goal is to have the neural network start with the noisy version and generate an output which is the same as the original data. By reversing the process of adding noise to the data, the diffusion model should be able to take in noise and produce an image of a dog in return.
Diffusion models aren’t only used to generate images of cute dogs and realistic videos of Woolly mammoths frolicking through icy tundras, though. They’re also used for important scientific research, such as the design of proteins. “When we use diffusion models for these critical tasks, we need guarantees [that the generated data is accurate],” said Wang.
The question, Wang said in her talk, is, “Can we control the generation of diffusion models towards specific objectives? How efficiently?”
The answer, Wang showed, is that a neural network can be given guidance in the generation process to complete more complex tasks through the use of an optimization problem reconfigured for generative AI. Optimization can then be used to select the best element from a set of options that the model can generate.
In Wang’s own research, she uses large language models to optimize partial genome sequences for use in biology and medicine. In a paper published in Nature Machine Intelligence on April 5, Wang and colleagues detail a large language model used to design a more effective mRNA vaccine. “I found this very cool because this is another form of generative AI models and we can utilize it for optimization,” said Wang.