Optimal Control III: what is the Hamilton-Jacobi-Bellman equation?
Consider once again an optimal control problem (OCP) in continuous time:
The Hamilton-Jacobi-Bellman (HJB) equation takes the form of a partial differential equation: subject to the terminal condition for all .
A solution of the HJB equation easily provides the optimal feedback control policy as the optimum of the inner minimization:
At first hand, the HJB equation looks incredibly powerful. If one can easily find an expression for as a function of the value gradient , then we can formulate the PDE and solve it. Then, we could recover a global optimal policy which will work for a whole range of initial conditions .
The inner function in the minimization is the continuous-time -function . Notice its resemblance to the Hamiltonian in the Pontryaguine minimum principle. Actually, we have
However, the resulting PDE is nonlinear, and nonlinear PDEs are a very intensive field of research. The existence of solutions in itself is a question of importance, though Lions has introduced the notion of viscosity solutions which are shown to exist under mild assumptions. Numerical solutions can be investigated, for instance Galerkin or finite-difference methods (converging to the above viscosity solutions). These are only practial for low-dimensional systems (typically ), if one can be satisfied with a grid that is not too large: the curse of dimensionality comes into play. For higher-dimensional systems (one need only look at something as simple as a quadrotor), there is no really practical method.
The LQR case
Let's remind ourselves of the continuous-time LQR model: with a control space . Then, the minimizer and PDE are given by Injecting leads to a matrix differential equation after removing : this equation is called the matrix continuous Riccati equation.