Optimal Control III: what is the Hamilton-Jacobi-Bellman equation?

Consider once again an optimal control problem (OCP) in continuous time: $$ \begin{equation} \begin{alignedat}{2} &\min_{x, u}{}&& \int_0^T \ell(x(t), u(t)) dt + h(x(T)) \\ &\suchthat&& \dot{x} = f(x, u), \quad t \in [0, T) \end{alignedat} \end{equation} $$

The Hamilton-Jacobi-Bellman (HJB) equation takes the form of a partial differential equation: $$ \begin{equation} \frac{\partial V}{\partial t} + \min_{u\in\mathcal{U}} { \ell(x, u) + \langle \nabla_x V(x, t), f(x, u)\rangle } = 0 \end{equation} $$ subject to the terminal condition $V(x, T) = h(x)$ for all $x \in \mathcal{X}$.

A solution of the HJB equation easily provides the optimal feedback control policy as the optimum of the inner minimization: $$ \begin{equation} u_t^\star(x) = \argmin_{u} { \ell(x, u) + \langle \nabla_x V(x, t), f(x, u) \rangle }. \end{equation} $$

At first hand, the HJB equation looks incredibly powerful. If one can easily find an expression for $u^\star$ as a function of the value gradient $\nabla_xV$, then we can formulate the PDE and solve it. Then, we could recover a global optimal policy which will work for a whole range of initial conditions $x(0)$.

The inner function in the minimization is the continuous-time $ Q $-function $ Q(x, u) $. Notice its resemblance to the Hamiltonian in the Pontryaguine minimum principle. Actually, we have \[ Q(x, u) = \mathcal{H}(x, u, p=\nabla_x V(x, t)) \]

However, the resulting PDE is nonlinear, and nonlinear PDEs are a very intensive field of research. The existence of solutions in itself is a question of importance, though Lions has introduced the notion of viscosity solutions which are shown to exist under mild assumptions. Numerical solutions can be investigated, for instance Galerkin or finite-difference methods (converging to the above viscosity solutions). These are only practial for low-dimensional systems (typically $d \leq 3$), if one can be satisfied with a grid that is not too large: the curse of dimensionality comes into play. For higher-dimensional systems (one need only look at something as simple as a quadrotor), there is no really practical method.

The LQR case

Let's remind ourselves of the continuous-time LQR model: $$ \begin{equation} f(x, u) = Ax + Bu, \quad \ell(x, u) = \frac12 x^\top Qx + \frac12 u^\top Ru, \quad h(x) = \frac12 x^\top Q_f x \end{equation} $$ with a control space $\calU = \RR^{n_u}$. Then, the minimizer and PDE are given by $$ \begin{equation} u = -R^{-1}B^\top V_x,\quad \frac{\partial V}{\partial t} + \frac12 x^\top Qx + V_x^\top A x - \frac12 V_x^\top BR^{-1}B^\top V_x \end{equation} $$ Injecting $ V(x,t) = \frac12 x^\top P(t)x $ leads to a matrix differential equation after removing $ x $: $$ \begin{equation} \textcolor{RoyalBlue}{ \dot{P}(t) + Q + P(t)^\top A + A^\top P(t) - P(t)^\top BR^{-1}B^\top P(t)}, \quad P(T) = Q_f. \end{equation} $$ this equation is called the matrix continuous Riccati equation.