Consider once again an optimal control problem (OCP) in continuous time: minx,u0T(x(t),u(t))dt+h(x(T))s.t. x˙=f(x,u),t[0,T) \begin{equation} \begin{alignedat}{2} &\min_{x, u}{}&& \int_0^T \ell(x(t), u(t)) dt + h(x(T)) \\ &\suchthat&& \dot{x} = f(x, u), \quad t \in [0, T) \end{alignedat} \end{equation}

The Hamilton-Jacobi-Bellman (HJB) equation takes the form of a partial differential equation: Vt+minuU(x,u)+xV(x,t),f(x,u)=0 \begin{equation} \frac{\partial V}{\partial t} + \min_{u\in\mathcal{U}} { \ell(x, u) + \langle \nabla_x V(x, t), f(x, u)\rangle } = 0 \end{equation} subject to the terminal condition V(x,T)=h(x)V(x, T) = h(x) for all xXx \in \mathcal{X}.

A solution of the HJB equation easily provides the optimal feedback control policy as the optimum of the inner minimization: ut(x)=argminu(x,u)+xV(x,t),f(x,u). \begin{equation} u_t^\star(x) = \argmin_{u} { \ell(x, u) + \langle \nabla_x V(x, t), f(x, u) \rangle }. \end{equation}

At first hand, the HJB equation looks incredibly powerful. If one can easily find an expression for uu^\star as a function of the value gradient xV\nabla_xV, then we can formulate the PDE and solve it. Then, we could recover a global optimal policy which will work for a whole range of initial conditions x(0)x(0).

The inner function in the minimization is the continuous-time Q Q -function Q(x,u) Q(x, u) . Notice its resemblance to the Hamiltonian in the Pontryaguine minimum principle. Actually, we have Q(x,u)=H(x,u,p=xV(x,t)) Q(x, u) = \mathcal{H}(x, u, p=\nabla_x V(x, t))

However, the resulting PDE is nonlinear, and nonlinear PDEs are a very intensive field of research. The existence of solutions in itself is a question of importance, though Lions has introduced the notion of viscosity solutions which are shown to exist under mild assumptions. Numerical solutions can be investigated, for instance Galerkin or finite-difference methods (converging to the above viscosity solutions). These are only practial for low-dimensional systems (typically d3d \leq 3), if one can be satisfied with a grid that is not too large: the curse of dimensionality comes into play. For higher-dimensional systems (one need only look at something as simple as a quadrotor), there is no really practical method.

The LQR case

Let's remind ourselves of the continuous-time LQR model: f(x,u)=Ax+Bu,(x,u)=12xQx+12uRu,h(x)=12xQfx \begin{equation} f(x, u) = Ax + Bu, \quad \ell(x, u) = \frac12 x^\top Qx + \frac12 u^\top Ru, \quad h(x) = \frac12 x^\top Q_f x \end{equation} with a control space U=Rnu\calU = \RR^{n_u}. Then, the minimizer and PDE are given by u=R1BVx,Vt+12xQx+VxAx12VxBR1BVx \begin{equation} u = -R^{-1}B^\top V_x,\quad \frac{\partial V}{\partial t} + \frac12 x^\top Qx + V_x^\top A x - \frac12 V_x^\top BR^{-1}B^\top V_x \end{equation} Injecting V(x,t)=12xP(t)x V(x,t) = \frac12 x^\top P(t)x leads to a matrix differential equation after removing x x : P˙(t)+Q+P(t)A+AP(t)P(t)BR1BP(t),P(T)=Qf. \begin{equation} \textcolor{RoyalBlue}{ \dot{P}(t) + Q + P(t)^\top A + A^\top P(t) - P(t)^\top BR^{-1}B^\top P(t)}, \quad P(T) = Q_f. \end{equation} this equation is called the matrix continuous Riccati equation.