Notice that for this course (and all UCLA courses) the slides are intellectual properties of the instructor; therefore the post is just my own curated notes and projects based on the course (which are therefore my own intellectual properties).
The notes here should consist of two parts:
- My learning journey, taking this challenging course as someone who has no fluid mechanics background (although plenty of backgrounds in applied mathematics and machine learning).
- Recording of the course project (this class is a project-based course.)
During week 1 and week2, I conducted a crash course study on fluid mechanics, using the textbook
This course is project-based, and for myself, I will log the week-to-week journal on my journey to explore graph/network-based methods for fluid dynamics, with the forcus of using neural-network based methods.
Week 1
Materials
The main reading material is by professor Taira . This paper contains a lot more contents than what’s covered in week 1’s lecture. Materials covered in lecture: review of Linear Algebra. In particular, matrix eigendecomposition and Singular Value Decomposition (SVD) and their geometric interpretations. These are fundamental stuff, but some special aspects of fluid dynamics data: Data is usually obtained from a fluid field snapshot, e.g., a 2D grid of size $(n_x, n_y)$, where each grid point has a feature value (for the Navier-Stokes equation, this corresponds to 5 values):
$\mathbf{u} = (u,v,w)$, velocity.
$p$: pressure.
$\mathbf{w} = \nabla \times \mathbf{u} = (w_x, w_y, w_z)$, vertocity.
$p$: density.
$e$: energy.
$T$: temperature.
Type of features in a fluid field matrix:
- Each column is usually a snapshot that’s flattened. e.g., length of $n_x n_y d$, where $d$ is the number of features of each grid point.
- The matrix is of size $5 n_x n_y \times T$, where $T$ is the number of time stamps where the data is collected.
- Usually in Fluid dynamics, $5 n_x n_y » T$, meaning that the matrix is very tall.
Some comments on machine learning for physics and scientific ML:
it is crucial to remind ourselves that data contains physical insight that researchers hope to extract; paraphrasing Bruton and Taira, ‘appropriate methods should be chosen to learn from the data and the problem should be formulated in a suitable manner’. Rather than using a black-box approach.
Asking the right question from the following starting observations/facts:
- Turbulent flows have multi-scale structure in both space and time, and smaller scale behavior can aggregate to larger scale behavior, especially over long time horizon.
- Frequency analysis (Fourier analysis) shows that signal has multiple peaks (modes), corresponding to vortical structures associated with those frquencies.
- Modeling turbulent as stochastic system makes more physical sense than deterministic system; modeling the ensemble (average behavior) is more meaningful.
overview of data-driven models:
- Dimensionality Reduction: PCA, POD, SVD, auto-encoders
- Modal analysis: DMD, Koopman Analysis, global linear stability analysis, resolvent analysis.
- Dynamical system model (usually in latent space): DMD (linear + linear), Galerkin projection. Neural ODEs, latent ODE/SDEs, auto-regressive models.
- Loss: L-2, L-2 (sparsity), norm of derivatives (physics-informed), and so on.
Determination of Project: Network Science for Fluid Dynamics
The instructor, Professor Taira has an overlapping interest with myself, on network science applied to fluid dynamics, where they even hosted a series of workshops here . The first goal is to understand the motivation to use network in fluid dynamics and the kind of data in use; then the potential to use neural tools such as Graph Neural Networks. Several existing works for potential expansion:
- Deep Learning for Turbulent Convection Network : they used U-Net to map the turbulent flow generated by the Rayleigh-Benard convection into a low-dimensional space, where a planar network whose nodes are superstructures, is learned. This is an example of using deep learning auto-encoders to compress high-dimensional fluid dynamics data and to extract physical insights.
- Network-based Study of Lagrangian transport and Mixing : they use network to uncover coherent structures in fluid, where the network is called Lagrangian transport network. This type of network has nodes as Lagrangian trajectories; in this paper, a very simple approach is proposed, where nodes are trajectories (dimension $3\times T$, where $T$ is the number of time stamps, and $3$ for 3D coordinate), and a static network is formed. Classical network analysis techniques (spectral clustering) is then used to find coherent structures.
- GNN to accelerate Lagrangian Fluid Simulation
Week 2
This week, more on the level of thinking about a particular domain that benefits from study of fluid dynamics and for the course project. In the end we think about climate modeling. Currently the machine learning community cares a lot about the prediction accuracy, especially about weather forecasting, but it provides little on physical insights and hence lacks on climate modeling, which requires longer-term, stable rollouts, and wants a certain degree of interpretability. The hope now is to gain more physical insights for climate and weather dataset using network science and models that can predict over a longer rollout period.
In short, at this point, focus on the following:
- Climate modeling / weather forecast using spatiotemporal datasets (ERA5)
- Achieve long-term rollouts (stable long time series prediction)
- Gain physical insights, perhaps on superstructures and vortices, using network based methods, on sea temperature.
1 and 2 suggests the use of graph networks or neural operators. 3 Suggests network-based methods. For weather dataset such as ERA5, there exists a natural grid of (lattitude, longitude) on the earth’s surface, so naturally we have a network. The kind of network we want to obtain through learning should be coarse-grained, stable structures evolving overtime. Currently the neural operators lack interpretability, but graph networks maybe leveraged to obtain insights on the network underneath. For example, if a graph neural network based model (such as a graph auto-encoder) that’s able to accurately predict long-term dynamics, then feature extraction from this network (based on network science) may provide us physical insights.
There are two major dimensions to this modeling problem: Spatial and Temporal:
Spatial Dimension
The spatial dimension should consist of a graph-based auto-encoder based structure, or an auto-encoder structure in general. The input is the meshgrid for earth, of dimension fill in blank here, the auto-encoder should use a graph auto-encoder based approach to encode the features.
Temporal Dimension
The temporal dimension, by the above description, happens in the latent space. In the latent space we should learn a dynamics that changes node features to generate accurate predictions in the end. We have the specific requirement that such prediction should be stable over a long period.
For the purpose of stability, two recent works stand out:
- Spherical Fourier Neural Operator leverages the spherical symmetry for climate data and proposes to combine operator learning with symmetry on sphere.
- PDE Refiner uses noise injection and diffusion-like learning methods to promote long term simulation of PDEs.
Combining these two approaches may yield stable predictions on the time scale.
Physical Insights
For physical insights, we can focus on the heat diffusion in the ocean and at the poles.
In the end, we would like to uncover some physical insights, rather than just producing stable predictions (although this is already a great achievement). For this reason, it is preferable to pick a more interpretable model as encoder/decoder and operator in the latent space, or to have some good physics-based guidance in training.
Some curious phenomenon to hope uncover are the vortices governing heat diffusion on earth, especially the ocean currents and has the potential to provide some cool visualizations; an example of using the ERA5 dataset for ocean temperature prediction is this workshop paper . If we have good predictive model, we can use network science to perform cluster-based analysis (classical analysis method) to predict the future and interpret the present with vortices in the ocean current.
Another modeling is to model the temperature at the poles (and to visualize the polar vortex ) over a long term rollout. For this purpose we may want to change the loss function, such that the prediction accuracy at the poles are prioritized over the rest, as the polar vortex is known to have huge impacts on the global weather patterns and can be seen as a stable climate pattern over a long time.
Challenges
- ERA5: Dataset size (scale) and long time rollout could require massive memory on the GPU.
- may need adjustment as to only consider data around the poles, or use a pole-specific dataset.
- Implementation of the two approaches.
- Interpretation via the ocean current subset.
Some Lingering Resources to Go Over
The video/lecture on ML in fluid dynamics and climate physics:
Quick Summary:
Documentary on polar vortex:
Heat Transfer on Earth and the cold blob.
Getting Familiar with Spherical Fourier Neural Operators (SFNO)
Basic Definitions, SFNO relies on the spherical fourier transform (called Spherical Harmonics Transform, or SHF), defined with two input parameters for angles on spherical following the polar coordinate (e.g., $\theta$ is the colatitude, $\lambda$ is the longitude). A function $f$ on sphere has the following series expansion using Legendre polynomial $\bar{P}_n^m$ and coefficients $F_n^m$:
$$f(\theta, \lambda) = \sum_{m=-M}^M \exp (im\lambda) \sum_{n=|m|}^m F_n^m \bar{P}_n^m (\cos\theta)$$
The Fourier transform is then defined as a Fourier transform in longitude and a Legendre transform in latitude:
$$F^m(\theta) = \frac{1}{2\pi}\int_0^{2\pi} f(\theta,\lambda)\exp(-im\lambda) d\lambda$$
$$F_n^m = \frac{1}{2}\int_{-1}^1 F^m(\theta)\bar{P}_n^m(\cos\theta)d\cos\theta$$
The transform and the inverse transform have simple calling signatures from the torch-harmonics library , where in the forward pass, the last two dimensions of the input tensor are assumed to be spatial dimension (colatitude, longtitude)
import torch
from torch_harmonics import RealSHT, InverseRealSHT
device = torch.device("cuda")
# data is of size 512x1024 grid
n_theta, n_lambda = data.shape
sht = RealSHT(n_theta, n_lambda, grid="equiangular").to(device)
isht = InverseSHT(n_theta, n_lambda, grid="equiangular").to(device)
coeffs = sht(data) # shape = (512, 513)
data = isht(coeffs)
For end-to-end training of a spherical fourier neural operator, the example can be found here
The SFNO model itself has the following calling conventions, as seen from the source code implementation :
model = SphericalFourierNeuralOperatorNet(
img_shape=(128, 256), # grid dimensions
scale_factor=4, # used for image downsampling
in_chans=2, # input channel
out_chans=2, # output channel
embed_dim=16, # embedding dimension (spectral)
num_layers=4, # number of SFT layers
use_mlp=True,)
input = torch.randn(1,2,128,256) # input (batch_size, in_chans, img_height, img_width)
output = model(input) # output has same shape of (1,2,128,256)
A peek on how training is done: in the sample code, the training is done using torch-lightning, but the simple setup are:
- a minibatch is of size $(B,C_{in}, H, W)$ for both input $x$ and output $y$
- operator learning usually uses the default $L^1$-loss.
FNO for the ERA5 Dataset
Simple analysis on ERA5 dataset, dataloading, and a baseline model of FNO from the FourcastNet model. For the ERA 5 dataset, I have a technical note on how this dataset is processed .
After a proper processing of the dataset, which gives pairs of minibatches of size (batch_size, input_channel, width, height)
and (batch_size, output_channel, width, height)
, can perform standard operator learning.
Week 3
Lectures
This week in the first lecture, several applications of Convolutional Network to Fluid Dynamics are listed: First, it was mentioned that CNN padding can correspond to boundary conditions in fluid simulation data (periodic boundary condition, replicate padding for non-slip boundary condition, and convective padding for outflow boundary condition).
Also in CFD, the kernel size of the CNN should be proportional to the resolution (this is related to the super-resolution topic to be covered later on); in particular, it was mentioned that in fluid dynamics problem, kernel size should be odd numbers.
Applications of CNN to fluid data:
- Auto-encoder based: This refer to the reduced order modeling with auto-encoders to map the data into a low-dimensional space, then use the lower-dimensional representation to do predictive or other moedling tasks.
- Impainting Based Reconstruction: this refers to images with flawed/damaged inputs (such as fluid field snapshot with missing parts). A question is how to evalaute the produced output (its physics, etc.)
- It was mentioned that Reynold number difference between train and test may not be a huge issue for using CNNs, because of some self-similar structures locally.
- Off wall state estimation
- 3D reconstruction from limited 2D sectional data: this is related to the NeRF field in computer vision.
- Sparse sensor reconstruction: inverse problem with large difference between degree-of-freedom for input output pairs.
- Temporal prediction: this refers to the modeling and prediction of temporal evolution of dynamical systems. The assessment of the model can be short term or long-term, where for the latter case it is often sensible to measure performance using some long-term statistics rather than L-2 based error.
- ML-based Reduced Order Surrogate Modeling: latent temporal evolution model; time average statics (P-K-D attractor) dissipation, GNN-based models.
In the second lecture, the focus is on super-resolution, for which professor Taira and Fukumi have a recent survey paper . In the field of fluid dynamics, there are different perspectives on what exactly super-resolution means:
- sparse sensor to glbal field reconstruction
- remove noise from experimental data
- time super-resolution (finer-grained inbetween prediction for time)
- turbulence modeling with multiscale physics.
An interesting remark was made about the problem of extrapolation in fluid mechanics: simply using Reynold number difference would not translate to extrapolation, because turbulent flows can exhibit scale invariance. This symmetry can be captured by downsample-then-upsample and multi-scale filtering techniques in computer vision. Also, at the time of writing, the fluid dynamics community is overall underexplored for more recent models in ML, such as vision transformers and diffusion models, which can also provide insights on uncertainty quantification.
SFNO for the ERA5 Dataset
Using the SFNO model for the ERA5 dataset and compare against the FNO dataset under the default setup of training, validation, and testing.
- Done: can use the SFNO off-the-shelf to predict on ERA 5 (once training finished, record the performance against FNO)
SFNO seems to be trained using different types of losses, so need to check those as well.
Week 4
Lectures
This week, the main focus is on Modal analysis, which studies the dominant behavior of fluid flow, as for high reynolds number flows, only dominant modes are often of interest. In short, unsteady fluid flows exhibit nonlinear, high-dimensional, and multiscale behavior, and modal analysis techniques extract dominant modes, compress data, identify instabilities, and reveal input and output properties.
There are mainly two types of modal analysis methods:
- Data-driven ones, such as POD, DMD, Koopman Analysis, etc.
- Operator methods by linearize the Navier-Stokes operator, usually from stability and resolvent analysis and uses CFD.
POD: in the computer science and machine learning community, this is known as PCA, which is an algorithm to find an optimal linear projection that maximizes variance of the data. This is also equivalently formulated as finding a linear basis to project onto. For fluid mechanics, this usually involves the data matrix $X$ of size $(n,d)$, where $n»d$, so instead of dealing with $XX^T$, they usually start with $X^TX$.
DMD (Dynamic Mode Decomposition): is a data driven way to approximate any dynamics using linear transition kernels, that is:
$$ x(t_{k+1}) = f(x(t_k)) \approx A x(t_k) $$
where it is assumed that the autoregressive transition between snapshots is invariant and can be approximated linearly. Learning in tihs case therefore corresponds to collecting pairs of snapshots and then solve eigenvalue problem. It is worth noting that DMD is a finite and discrete version of the Koopman Operator based analysis of dynamics, which says that any nonlinear dynamics can be represented by an infinite-dimensional linear operator.
The second part of the lecture is about Resolvent Analysis, this belongs to operator-based study of the NS operator. Consider the dynamical system of the form $\frac{\partial u}{\partial t} = F(u)$, where $F$ is nonlinear, a straightforward way to study this dynamics is to linearize it, using Taylor expansion of first order on the decomposition below:
$$ u(\underline{x}, t) = \underbrace{\bar{u}(\underline{x})}_{\text{base state, fixed points}} + \underbrace{u’(\underline{x}, t)}_{\text{perturbation}} $$
then, since $\frac{\partial \bar{u}}{\partial t} = 0$, we have:
$$ \frac{\partial u’}{\partial t} = F(u) = F(\bar{u} + u’) = F(\bar{u}) + \nabla_u F\mid_{\bar{u}} \cdot u’ + \text{H.O.T} $$
where H.O.T are higher order terms, and the term $\nabla_u F\mid_{\bar{u}}$ is usually represented as $L$, the linear Jacobian operator. $F(\bar{u}) = 0$ if $\bar{q}$ is a fixed point, else it can be something like mean flow. Therefore, under these assumptions, the analysis of this dynamics can be done by studying the resulting linear dynamicaly system:
$$ \frac{\partial u’}{\partial t} = \mathcal{L} u’, \quad \mathcal{L} = \nabla F \mid_{\bar{u}} $$
This is a first order linear dynamical system, which can be analyzed using traditional methods in dynamical systems. Notice that in the context of fluid flow, usually $F$ is the Navier-stokes operator, $\mathcal{N}$.
The general solution of this dynamical system is $u’ = \hat{u}(x)\exp (\lambda t)$, where $\lambda$ is the set of eigenvalues for the eigenvalue problem: $\mathcal{L}\hat{u} = \lambda \hat{u}$, with $\hat{u}$ being the Fourier transform of $u$. In the end, this problem becomes that of spectral analysis, which can produce a great number of eigenmodes.
- Among these eigenmodes, however, there are usually lots of spurious modes due to the discretization error from initial and boundary conditions, and require some correction by numerical simulation.
- After the examination, the remaining eigenmodes can aid stability analysis; in this case, eigenvectors are called stability modes and eigenvalues are called growth rate/frequency.
The topic of Resolvent Analysis is about studying the same linear dynamical system, but with a harmonic forcing term:
$$ \frac{\partial u’}{\partial t} = \mathcal{L}q’ + f’ $$
After Fourier transform, this becomes:
$$ \hat{u} = \underbrace{(i\omega I - L)^{-1}}_{\text{Resolvent operator} H} \hat{f} $$
Then usually SVD is applied to $H$ to find dominant modes; after the decomposition we have:
$$ H = \underbrace{Q}_{\text{Response mode}} \underbrace{\Sigma}_{Amplification Gain} \underbrace{F^{\ast}}_{\text{Forcing mode}} $$
Some comments about this approach:
- It is closely related to pseudospectral analysis
- Usually this can be applied to even turbulent flows, with $\bar{u}$ being the mean flow and flow entering statistical stationary state (e.g., with stable Fourier statistics).
In the second scenario, then, $f$ can be a sum of Fourier terms.
After Resolvent analysis and the discovery of dominant modes, the flow can be decomposed into:
$$ u(t,\underline{x}) = \rho_0 (\underline{x}) + \sum_{i=1}^r a_j(t) \phi_j(\underline{x}) $$
with $r « n$ potentially. Taking derivative with respect to time, we have:
$$ \frac{du}{dt} = f(x) \Longrightarrow \sum_{j=1}^r \frac{da_j}{dt} \langle \phi_j, \phi_k \rangle = \langle f, \phi_k \rangle $$
This then motivates the line of analysis called Galerkin Projection methods, which proposes to project the flow into a subspace spanned by the modes. The entire approach in this case is known as Reduced Order Modeling (ROM), which projects the flow/function into a linear subspace spanned by dominant modes (analogous to POD); an example of reduced order modeling can be seen below (from Stanford).
SFNO: Training Tricks
To use the loss functions that are specific to the SFNO model (spherical ones), a solver
class is required. The Github repository contains the solver class for shallow water equation, but doesn’t provide example codes for the ERA5 dataset. They do claim to have steady and decent results on ~2 week weather forecast and ~1 year rollouts, over 0.25 degree lat-long grid (this means the grid dimension is $360\times 720$). In their setup, some tricks are used to make it work better:
- two-stage-training: initial training step with a single autoregressive step and a second fine-tune stage with two autoregressive steps. (Need to checkout the code for FourCasNet).
- In their Appendix C, they reported some choice of hyperparameters.
- They still use L1 and L2 losses (rather than the spherical based losses).
Their claim:
While both FFT- and SHT-based approaches achieve similar accuracy for medium-range forecasts, the advantage of respecting the underlying spherical geometry becomes evident at longer rollouts
we have verified the medium range result, but for long term rollout we haven’t done so.
Study of FourCastNet Code base
The aim for this task is to better examine the training process.
- ERA5 with more features: This corresponds to adding more features and observe the performance behavior
- ERA5 with higher resolution: This corresponds to using 0.25 degree data (rather than 5.625 degree data) and check the performance.
- Spherical Loss: This is about the question: can I use the solver for the Shallow Water Equation to function as a spherical based loss for the ERA5 dataset? (code is working at least, but needs to try different combinations)
How to Analyze the Spectrum of Spherical Harmonics
To understand how this is done, need the following two pieces:
- How is frequency component based analysis peformed (say, in FNO and in PDE-Refiner?)
- How does the above analysis translate to the study of spherical harmonics? These are explored in the various posts in the technical blogs.
Week 5-6
This week I was able to generate spectrum based visualizations about prediction and target snapshots, as can be seen from this post , which talks about how to perform visualization for 2D FFT and 2D SHT, using the spectrum energy plot.
The lecture content of these two weeks focus on dynamical system modeling. Some new stuff I learned are:
- If a nonlinear system has an equilibrium state, we are allowed to perform linearization about that equilibrium. The beauty of taking a project-focused class is that, if you are overwhelmed by your other duties (TA, publishing papers, etc.) you can take it easy on this class.