Last time we had an introductory look at OT, the definition and computation of the Wasserstein distance.

A few years ago, I wrote a solver for this variational formulation of mean-field games, as part of a project. This eventually led to a small C++ library as a toy project, which might or might not be usable.

Mean-Field Games

The dynamics are defined by dXt=αtXtdt+σdWt\begin{equation} dX_t = \alpha_t X_tdt + \sigma dW_t \end{equation} where W W is a Brownian motion. Typically W W could be an Rd \RR^d -brownian motion with independent components.

inf(αt)tJ(α)=E[0T12αt2+f(Xt,ρt)dt+g(XT,ρT)]\begin{equation} \inf_{(\alpha_t)_t} J(\alpha) = \EE\left[ \int_0^T \frac{1}{2}|\alpha_t|^2 + f(X_t, \rho_t) dt + g(X_T, \rho_T) \right] \end{equation}

where ρt=L(Xt) \rho_t = \mathcal{L}(X_t) is the marginal distribution of the random variable Xt X_t . The optimality condition for this stochastic control problem where established by Lasry and Lions: they are known as the mean-field game equations. They are a system of coupled Fokker-Planck and Hamilton-Jacobi-Bellman equations: tρ=σ22Δρdiv(ρu) tuσ22Δu+12u2=f(x,ρ)u(T,)=g(x,ρT)\begin{align} \partial_t \rho &= \frac{\sigma^2}{2}\Delta \rho - \mathrm{div}(\rho\nabla u) \ -\partial_t u - \frac{\sigma^2}{2}\Delta u + \frac{1}{2}|\nabla u|^2 &= f(x, \rho) \\ u(T,\cdot) &= g(x, \rho_T) \end{align} These are nonlinear equations due to the control cost term u2 |\nabla u|^2 in the second, HJB partial differential equation. The first PDE is a linear, Fokker-Planck PDE. The optimal control is given in feedback form as αt=xu(t,Xt) \alpha^*_t = \nabla_x u(t, X_t) . Numerical methods for these equations can involve a fair bit of finite-difference discretisation, Krylov subspace methods...

Here, we lay out a solution method for this particular case of MFG (some might say the control Hamiltonian has a rather specific form...) using optimal transport theory.

A variational approach

Define F(ρ)=f(x,ρ)dρ(x) F(\rho) = \int f(x, \rho)d\rho(x) and similarly G(ρ)=g(x,ρ)dρ(x) G(\rho) = \int g(x, \rho)d\rho(x) . The variational approach separates the probability distribution path ρt {\rho_t} as a problem variable, and directly optimizes over the feedback control function v(t,x)=xu(t,x) v(t,x) = \nabla_x u (t,x) , which we recall should be drift coefficient of the underlying SDE. That formulation, reads infρ,v0T12v(t,x)22+F(ρt)dt+G(ρT)s.t. tρ=σ22Δρdiv(ρv) \begin{equation} \begin{split} &\inf_{\rho,v} \int_0^T \frac{1}{2}|v(t,x)|_2^2 + F(\rho_t)dt + G(\rho_T) \\ &\suchthat \partial_t\rho = \frac{\sigma^2}{2}\Delta\rho - \mathrm{div}(\rho v) \end{split} \end{equation} The optimality conditions of this equation, should yield the HJB equation from before. There are several ways of solving this problem directly, by casting it as a nonlinear program: the transcription step will discretise the continuous space, (using a grid, triangulated mesh, PL manifold...), and discretise the integral and constraint in time. The obtained finite-dimensional program will be a traditional NLP, but with the structure of a discrete-time optimal control problem (DTOCP). Benamou and Carlier introduced an augmented Lagrangian method for this problem in [1].

[2] introduces a Lagrangian variational formulation of the mean-field game, which reads infQ,μtK(Q)+0TF(μt)dt+G(μT)s.t. PtQ=μt\begin{equation} \begin{split} &\inf_{Q, {\mu_t}} \mathcal{K}(Q) + \int_0^T F(\mu_t)dt + G(\mu_T) \\ &\suchthat P_tQ = \mu_t \end{split} \end{equation} where the unknown Q Q is a measure over the space of trajectories X \mathbb{X} , and Pt P_t denotes the marginal operator at time t t , which is the pushforward of the evaluation map δt:ωω(t) \delta_t: \omega \longmapsto \omega(t) .1 The important part is the kinetic energy term K(Q)=X[0,T]12ω˙(t)2dtdQ(ω). \mathcal{K}(Q) = \int_{\mathbb{X}} \int_{[0,T]} \frac{1}{2} |\dot{\omega}(t)|^2dtdQ(\omega).

Now, you may ask where the dynamics went. It's all very subtle here, but they're actually not that far. They're actually baked into the optimization objective over path distributions Q Q .

The result is that the above problem is an optimal transport problem! Granted, it has an uncountably infinite number of marginals, but we can still discretize it. We consider N+1 N + 1 time steps (t0,,tN)(t_0, \ldots, t_N) (for convenience, with fixed step size Δt\Delta t), and corresponding marginals (μ0,,μN) (\mu_0, \ldots, \mu_N) .

Then, the discrete-time counterpart is a multimarginal optimal transport problem infQ,(μi)K(Q)+i=0NF(μi)+G(μN) s.t. PtiQ=μi. \begin{equation} \inf_{Q, (\mu_i) } \mathcal{K}(Q) + \sum_{i=0}^N F(\mu_i) + G(\mu_N)\ \suchthat P_{t_i}Q = \mu_i. \end{equation}


1

Meaning Pt P_t takes a measure Q Q over the space of paths, and maps it to the marginal distribution of ω(t) \omega(t) given ωQ \omega \sim Q .