Cookie Consent by Free Privacy Policy Generator

From noise to artwork - How does a diffusion model work?

A photo of a self-portrait by Vincent Van Gogh
Photo by Alina Grubnyak / Unsplash

In the past, we've touched on how language models work and that, broadly speaking, they only predict the next words based on what has already been said. But have you ever wondered how AIs actually generate images?

Imagine you have painstakingly drawn an image only to destroy it with an eraser until all that remains is a messy noise. And then you have to conjure up everything cleanly from this noise in exactly the opposite way: step by step, bring back the details until an impressive work of art emerges from the chaos, over and over again. This is exactly what a diffusion model does.

An example of the use of diffusion models for image generation.
Source: www.ultralytics.com/

In essence, it's about breaking down real images into tiny puzzle pieces of noise and then learning how to put these puzzle pieces back together. In the first phase - the forward process - a photo or an illustration is noisy several times. With each step, a little more "chance" is added until the original motif is barely recognizable. You can imagine it like an increasingly blurred watercolor painting. Then comes the exciting part: the backward process. Here, a neural network is trained to see through precisely this noise and systematically remove it. It learns what real images typically look like - which shapes, colors and structures belong together. During training, it is given countless examples where the computer already knows what the image looked like before the noise. In this way, it recognizes patterns that make a photo or drawing "realistic".

💡
Forward diffusion: Image is gradually noisy until only random patterns remain.

Backward diffusion: A trained model removes the noise step by step and reconstructs a realistic image from it.

After training, you can trigger the model by feeding it with pure noise, i.e. practically the blank sheet of paper, which is completely random. Step by step, the program creates a new image from it - for example, a sunset on the beach, a comic book hero or a surreal landscape. Everything is created from nothing, based solely on what the model has learned.

Another advantage is the so-called guidance mechanisms, which you can use to control the result. For example, you can give the model the instruction "paint me a dragon in the style of an old woodcut illustration". It then uses the image statistics it has learned to create something that exactly matches it.

Why is this so fascinating? Because diffusion models are incredibly flexible: They can generate not only images, but also music or speech and even restore noisy audio data. They play a major role in current AI systems such as Stable Diffusion or DALL-E 2 and drive creative collaboration between humans and machines. In principle, you could say that any type of digital media can be generated by a trained model.

The end result is something that looks like handmade art, even though it was generated entirely algorithmically. And all because a clever trick from physics - the idea of slowly mixing and then unmixing particles - has been applied to modern AI. A little excursion into the world of diffusion that will hopefully make you want to experiment creatively with AI yourself!


Sources:

Diffusion Models: Generative AI Explained | Ultralytics
Explore with us how diffusion models can be used to create realistic content and redefine areas such as design, music and film with different applications.