Skip to article frontmatterSkip to article content

Diffusion Models

Generative models are a class of machine learning techniques designed to learn the underlying distribution of a dataset and generate new samples that resemble the training data. They can be broadly categorized into two types: explicit and implicit generative models. Explicit models, like Gaussian Mixture Models (GMMs), define a probability distribution over the data space, while implicit models, such as Generative Adversarial Networks (GANs) and Diffusion Models, do not explicitly define a distribution but instead learn to generate samples through adversarial training or iterative denoising processes.

Beyond predicting properties of known or hypothetical materials, a grand challenge in materials informatics is inverse design: computationally generating novel material structures that exhibit desired properties. While various generative models exist (like VAEs and GANs), Diffusion Models have recently emerged as a particularly powerful and promising approach, demonstrating the ability to generate high-quality, realistic data in domains like image synthesis, and increasingly, in materials science.

Controlled Corruption and Learned Reversal

Diffusion model framework: forward process (corruption) and reverse process (denoising). Figure adapted from Deep Unsupervised Learning using Nonequilibrium Thermodynamics.

Diffusion model framework: forward process (corruption) and reverse process (denoising). Figure adapted from Deep Unsupervised Learning using Nonequilibrium Thermodynamics.

Diffusion models operate based on two complementary processes:

Forward Process (Noise Addition)

This is a fixed process (no learning involved) where we start with a real data sample x0x_0 (e.g., the atomic coordinates and lattice parameters of a crystal structure). We gradually add a small amount of Gaussian noise over a large number of discrete time steps TT. If the steps are small enough and TT is large enough, the distribution at the final step, xTx_T, becomes indistinguishable from pure Gaussian noise N(0,I)\mathcal{N}(0, I). Mathematically, this defines a sequence of increasingly noisy samples x1,x2,,xTx_1, x_2, \dots, x_T.

To formalize this, we define a Markov chain where each step tt is conditioned on the previous step xt1x_{t-1} and a noise term ϵt1N(0,I)\epsilon_{t-1} \sim \mathcal{N}(0, I):

xt=αtxt1+1αtϵt1x_t = \sqrt{\alpha_t} x_{t-1} + \sqrt{1-\alpha_t} \epsilon_{t-1}

where ϵt1N(0,I)\epsilon_{t-1} \sim \mathcal{N}(0, I) and αt\alpha_t are predefined noise schedule constants.

Reverse Process (Denoising)

This is where the learning happens. The goal is to learn a “denoiser” model ϵθ(xt1xt)\epsilon_\theta(x_{t-1} | x_t) that can reverse the noise addition process. Starting from pure noise xTN(0,I)x_T \sim \mathcal{N}(0, I), the model iteratively predicts the noise that was added at step tt (or equivalently, predicts the slightly less noisy sample xt1x_{t-1}) and gradually denoises the sample step-by-step, eventually yielding a generated sample x0x_0 that should resemble the original data distribution.

U-Net architecture for denoising. The U-Net consists of an encoder-decoder structure with skip connections, allowing it to capture both local and global features effectively. Figure adapted from U-Net: Convolutional Networks for Biomedical Image Segmentation.

U-Net architecture for denoising. The U-Net consists of an encoder-decoder structure with skip connections, allowing it to capture both local and global features effectively. Figure adapted from U-Net: Convolutional Networks for Biomedical Image Segmentation.

This reverse process is typically parameterized by a neural network (often a U-Net architecture, often incorporating specialized layers like GNNs or equivariant layers when dealing with atomic structures) which is trained to predict the noise ϵt\epsilon_t added at each step tt, given the noisy input xtx_t and the time step tt.

Generation

Once the denoiser model ϵθ\epsilon_\theta is trained, we can generate new crystal structures:

Training

The model is trained on many examples of real crystal structures. For each structure, we can simulate the forward process to get noisy versions (xtx_t) and the actual noise (ε) added at each step tt. The model ϵθ\epsilon_\theta learns by trying to match its noise prediction to the actual noise, using standard deep learning optimization techniques.

Application to Materials Generation

Applying diffusion models to generate crystal structures presents unique challenges and opportunities:

Strengths and Limitations

Strengths:

Limitations: