Last time we had an introductory look at OT, the definition and computation of the Wasserstein distance.

A few years ago, I wrote a solver for this variational formulation of mean-field games, as part of a project. This eventually led to a small C++ library as a toy project, which might or might not be usable.

Mean-Field Games

The dynamics are defined by \begin{equation} dX_t = \alpha_t X_tdt + \sigma dW_t \end{equation} where \( W \) is a Brownian motion. Typically \( W \) could be an \( \RR^d \)-brownian motion with independent components.

\begin{equation} \inf_{(\alpha_t)_t} J(\alpha) = \EE\left[ \int_0^T \frac{1}{2}|\alpha_t|^2 + f(X_t, \rho_t) dt + g(X_T, \rho_T) \right] \end{equation}

where \( \rho_t = \mathcal{L}(X_t) \) is the marginal distribution of the random variable \( X_t \). The optimality condition for this stochastic control problem where established by Lasry and Lions: they are known as the mean-field game equations. They are a system of coupled Fokker-Planck and Hamilton-Jacobi-Bellman equations: \begin{align} \partial_t \rho &= \frac{\sigma^2}{2}\Delta \rho - \mathrm{div}(\rho\nabla u) \ -\partial_t u - \frac{\sigma^2}{2}\Delta u + \frac{1}{2}|\nabla u|^2 &= f(x, \rho) \\ u(T,\cdot) &= g(x, \rho_T) \end{align} These are nonlinear equations due to the control cost term \( |\nabla u|^2 \) in the second, HJB partial differential equation. The first PDE is a linear, Fokker-Planck PDE. The optimal control is given in feedback form as \( \alpha^*_t = \nabla_x u(t, X_t) \). Numerical methods for these equations can involve a fair bit of finite-difference discretisation, Krylov subspace methods...

Here, we lay out a solution method for this particular case of MFG (some might say the control Hamiltonian has a rather specific form...) using optimal transport theory.

A variational approach

Define \( F(\rho) = \int f(x, \rho)d\rho(x) \) and similarly \( G(\rho) = \int g(x, \rho)d\rho(x) \). The variational approach separates the probability distribution path \( {\rho_t} \) as a problem variable, and directly optimizes over the feedback control function \( v(t,x) = \nabla_x u (t,x) \), which we recall should be drift coefficient of the underlying SDE. That formulation, reads $$ \begin{equation} \begin{split} &\inf_{\rho,v} \int_0^T \frac{1}{2}|v(t,x)|_2^2 + F(\rho_t)dt + G(\rho_T) \\ &\suchthat \partial_t\rho = \frac{\sigma^2}{2}\Delta\rho - \mathrm{div}(\rho v) \end{split} \end{equation} $$ The optimality conditions of this equation, should yield the HJB equation from before. There are several ways of solving this problem directly, by casting it as a nonlinear program: the transcription step will discretise the continuous space, (using a grid, triangulated mesh, PL manifold...), and discretise the integral and constraint in time. The obtained finite-dimensional program will be a traditional NLP, but with the structure of a discrete-time optimal control problem (DTOCP). Benamou and Carlier introduced an augmented Lagrangian method for this problem in [1].

[2] introduces a Lagrangian variational formulation of the mean-field game, which reads \begin{equation} \begin{split} &\inf_{Q, {\mu_t}} \mathcal{K}(Q) + \int_0^T F(\mu_t)dt + G(\mu_T) \\ &\suchthat P_tQ = \mu_t \end{split} \end{equation} where the unknown \( Q \) is a measure over the space of trajectories \( \mathbb{X} \), and \( P_t \) denotes the marginal operator at time \( t \), which is the pushforward of the evaluation map \( \delta_t: \omega \longmapsto \omega(t) \).1 The important part is the kinetic energy term \[ \mathcal{K}(Q) = \int_{\mathbb{X}} \int_{[0,T]} \frac{1}{2} |\dot{\omega}(t)|^2dtdQ(\omega). \]

Now, you may ask where the dynamics went. It's all very subtle here, but they're actually not that far. They're actually baked into the optimization objective over path distributions \( Q \).

The result is that the above problem is an optimal transport problem! Granted, it has an uncountably infinite number of marginals, but we can still discretize it. We consider $ N + 1 $ time steps $(t_0, \ldots, t_N)$ (for convenience, with fixed step size $\Delta t$), and corresponding marginals $ (\mu_0, \ldots, \mu_N) $.

Then, the discrete-time counterpart is a multimarginal optimal transport problem $$ \begin{equation} \inf_{Q, (\mu_i) } \mathcal{K}(Q) + \sum_{i=0}^N F(\mu_i) + G(\mu_N)\ \suchthat P_{t_i}Q = \mu_i. \end{equation} $$


1

Meaning \( P_t \) takes a measure \( Q \) over the space of paths, and maps it to the marginal distribution of \( \omega(t) \) given $ \omega \sim Q $.