Diffusion Models
Also known as: diffusion, denoising diffusion, latent diffusion, diffusion-based generation
What are Diffusion Models?
Diffusion models are a class of generative AI models that learn to create data by reversing a gradual noising process. During training, the model learns how to take pure noise and iteratively refine it into coherent outputs, whether images, audio, video, or 3D structures. First introduced theoretically in 2015 and made practical by the DDPM paper in 2020, diffusion models now power the majority of state-of-the-art image and video generation systems including Stable Diffusion, DALL-E 3, Midjourney, and OpenAI’s Sora.
How Diffusion Works
The process has two phases. In the forward process, Gaussian noise is gradually added to training data over many steps until the original signal is completely destroyed. In the reverse process, a neural network learns to predict and remove the noise at each step, effectively learning the path from pure randomness back to structured data. At generation time, the model starts from random noise and applies its learned denoising steps to produce a new sample. Latent diffusion models, the variant behind Stable Diffusion, operate in a compressed latent space rather than pixel space, making the process far more computationally efficient.
Why Diffusion Models Won
Diffusion models displaced previous generative approaches like GANs (Generative Adversarial Networks) because they produce higher-quality, more diverse outputs without the training instability that plagued GANs. They are also more controllable: text-to-image diffusion models accept natural language prompts that guide the denoising process through cross-attention mechanisms. This controllability is what made tools like Midjourney and DALL-E accessible to non-technical users and kicked off the generative AI boom in 2022.
The Frontier: Video and Beyond
The same diffusion principles now extend to video generation (Sora, Kling, Wan), audio synthesis, 3D object generation, and even molecular design for drug discovery. Each domain adapts the core framework by operating in the appropriate latent space and adding temporal or structural consistency constraints.
Related Reading
- Sora - OpenAI’s diffusion-based video generation model
- Deep Learning - The foundation for diffusion model architectures