Learning Stochastic Dyanmics using Diffusion Models

In this blog post, I talk about the project of Learning Stochastic Dynamics using Diffusion models. The repository of this project will be open sourced soon.

Problem Definition

In this section we discuss the exact mathematical formulation of the learning problem. First, we start with the definition of dataset: in our dataset we assume observing $N$ 3-D trajectories of length $L$, e.g., a data sample is a tuple ${(x_i,y_i,z_i), t_i} \in \mathbb{R}^3 \times [0,T]$ where $T$ is discretized by time stamps ${t_i}_{i=1}^L$. We assume that the trajectories are sample paths of a system of stochastic ordinary differential equations (SODE), such as the stochastic Lorenz System:

$$\begin{aligned} & y \sim \mathcal{N}(0, I_{3\times 3})\\ & dx = \sigma_1(y-x) dt + \sigma_2 x dw(t)\\ & dy = (x(\rho_1 - z) - y)dt + \rho_2 y dw(t)\\ & dz = (xy - \beta_1 z)dt + \beta_2 z dw(t) \end{aligned}$$

where $dw_t$ is the Brownian motion. This is an example of chaotic system with explicit noise terms, and in particular, this system with mild noise term and proper choice of parameters, admit a global attractor. Some good properties of systems like this is that the system is Markovian near the attractor (this is known as a dissipative system). Therefore, when near the equilibrium state, we can approach this system as a Markov process and lay out the following assumption:

Assumption (Invariant Markov Kernel): We assume that the underlying stochastic process ${ X(t), t\in [0,T]}$ generating the data has an invariant transition kernel for specific time discretization. In particular, the transition kernel for a fixed $\Delta t$ is invariant with respect to start and end time. i.e.:

$$\begin{aligned} \forall t, t’\in [0,T-\Delta t], \quad T_{\Delta t} := P(X_{t+\Delta t} \mid X_t) = P(X_{t’+\Delta t} \mid X_{t’}) \end{aligned}$$

Where $P$ is the conditional distribution induced by the process. Moreover, the process a Markov process.

This way we can slice the time series observed into equal chunks and try to learn the transition kernel by formulating it as a supervised learning problem. In this supervised learning problem, taking a specific discretization $\delta t$, we observe tuples of the form ${(x_i,y_i,z_i), (x_{i’}, y_{i’}, z_{i’})}$, which are realizations of the random variable $X(t)$ at time $t$ and $t+\Delta t$, respectively. Using these empirically observed pairs, or goal is to uncover the invariant transition kernel $T_{\Delta t}$, and we aim to learn this kernel using diffusion model. Suppose the dynamic and $\Delta t$ is fixed, i.e., the transition kernel $T$ is fixed. Given data ${(X_{t_i}, X_{t_i+\Delta t})}_i$, can we train a model to approximate the transition kernel? The model input is $(X,\xi)$, where $\xi$ is random noise, the output is the random variable sampled from the target distribution induced by the kernel.

Conditional Diffusion Model

The approach for this problem is rather straightforward. To learn the kernel and to capture the stochastic nature of this process, we decide to use Conditional Diffusion Models:

We use diffusion models since they are known to be effective approximators for probability distributions.
We use conditional models, since the transition kernel $P(X_{t+\Delta t} \mid X_t)$ depends on the state $X_t$.

To formulate the training loss for a conditional diffusion model, during training, we perform diffusion on the target $X_{t+ \Delta t}$, with $X_t$ as conditional information, much like the concept of guidance in conditional diffusion models. Therefore, we formualte the loss as following:

$$\begin{align} & \mathcal{L}(\theta) = \mathbb{E}_{p(X_t)}\mathbb{E}_{P(X^{(\tau)}_{t+\Delta t}, X_{t+\Delta t} \mid X_t)} \mid \mid l(\theta) \mid \mid_2^2\\ & l(\theta) = h(X^{(\tau)}_{t+\Delta t} \mid X_{t+\Delta t}) - \nabla_{X_{t+\Delta t}^{(\tau)}} \log P(X_{t+\Delta t}^{(\tau)}\mid X_{t+\Delta t}) \end{align}$$

where we use the following notation to remove the ambiguity in time:

$t$ represents the time stamps in the time series we are interested in, which is the trajectory produced by the Stochastic Lorenz System.
$\tau$ represents the time steps in the diffusion process, i.e., forward diffusion and backward sampling steps.

For implementation of this model, we needs to:

Generate the dataset: this is to generate ground truth dataset for the Stochastic Lorenz System, using the torch.sde library.
Formulate training: this is the score-based generative modeling training, by adapting the skeleton code from score_sde_pytorch , but with additional conditional information.
Formualte Sampling (inference), with conditional information.

These will be covered once we open source the code.

Problem Definition#

Conditional Diffusion Model#

Problem Definition

Conditional Diffusion Model