Most control loops out there used in real-world systems are simple feedback loops proportional to the error, its derivative or integral (this is called PID control). However, this kind of control can exhibit undesirable behavior such as oscillations or failing to converge to a given setpoint quickly if at all. Some more complex systems such robots, satellites or cars can come with precise performance requirements, and more carefully constructed control actions need to be supplied with guarantees about their optimality.

In this blog post, we'll be looking at the most basic system in control theory, the Linear Quadratic Regulator (LQR), which is analogous to the classic Kalman filter in signal processing.

The goal is to control a linear dynamical system \[ x_{t+1} = Ax_t + Bu_t \] where the state is \(x_t\in\RR^n\) and control actions \(u_t\in\RR^k\), which can represent motor torques in robotic systems for instance. The objective is to find a sequence of controls \(u_0,\ldots,u_{T-1}\).

We define a running cost \[ C(x,u) = \frac{1}{2}x^\top Qx + \frac{1}{2}u^\top Ru \] and terminal cost \[ C_f(x) = \frac{1}{2}x^\top Q_fx \] which are quadratic functions in the state \(x\) and control \(u\). For assumptions, we assume \( Q \) is a semidefinite positive matrix and \( R \) is positive definite, making the cost function strongly convex in the control.

This is the most basic form of the LQR problem: more complications can be introduced, such as state-control cross terms and linear terms in the cost functions, or making the dynamics affine rather than linear in \( (x_t, u_t) \).

The optimal control problem is written as a constrained optimization problem \begin{align} &\min_{u_0,\ldots,u_T} \sum_{t=0}^{T-1} C(x_t,u_t) + C_f(x_T) \\\ &\mathrm{s.t.} \quad x_{t+1} = Ax_t + Bu_t \end{align}

Obviously, one can solve the problem by writing the corresponding KKT conditions. Actually, this view reduces to a sparse least-squares problem. Looking at the KKT conditions also introduces the familiar co-state equations \( \lambda_t = A^\top \lambda_{t+1} \) for the Lagrange multipliers seen in Pontryagin's minimum principle when solving continuous-time problems.

Dynamic programming

The canonical way of solving this is by introducing the optimal cost-to-go or value function \[ V_t(x) = \min_{u_t,u_{t+1},\ldots,u_T} \sum_{\tau=t}^{T-1} C(x_\tau, u_\tau) + C_f(x_T), \quad\text{where } x_t = x \] and showing that this sequence of functions obeys a dynamic programming principle, Bellman's equation:

\[ V_t(x) = \min_u C(x, u) + V_t(Ax + Bu) \]

This equation can be solved by noticing that \( V_T(x) = \frac{1}{2}x^\top P_Tx \) with \( P_T = Q_f \) and introducing the ansatz \( V_t(x) = \frac{1}{2}x^\top P_tx \). This is an educated guess, as quadratics are closed under (partial) minimization. Plugging this into the Bellman equation yields

\[ V_t(x) = \min_u \frac12 x^\top (Q + A^\top P_{t+1}A)x + \frac12 u^\top (R + B^\top P_{t+1}B)u + u^\top B^\top P_{t+1} Ax. \] The optimum of this is \( u = K_tx\), \[ K_t = -(R + B^\top P_{t+1}B)^{-1}B^\top P_{t+1} A \] is called the feedback gain matrix. The cost to go matrix then satisfies \begin{equation} P_t = Q + A^\top P_{t+1}A - K_t^\top B^\top P_{t+1}A = Q + (A - BK_t)^\top P_{t+1}A \end{equation}

What about KKT conditions?

The equations above can still be derived using the Karush-Kuhn-Tucker conditions of the control problem, seeing it as a quadratic program over the variables \(\underline{x}=(x_1,\ldots,x_T), \underline{u}=(u_0,\ldots,u_{T-1})\).

Its Lagrangian reads \[ L(x,u,\lambda) = \sum_{t=0}^{T-1}\frac12u_t^\top Ru_t + \frac12x_t^\top Qx_t + \lambda_{t+1}^\top(x_{t+1} - Ax_t - Bu_t) + \frac{1}{2}x_T^\top Q_f x_T \] Its associated KKT conditions boil down to \begin{align} \lambda_t &= A^\top\lambda_{t+1} - Qx_t, \ t = 1,\ldots,T-1 \\\ \lambda_T &= -Q_f x_T \\\ u_t &= B^\top\lambda_{t+1} \end{align} Notice at least semi-definiteness is required to write these KKT conditions, and positiveness is required from \( R \) to solve for \( u_t \) and get a well-defined solution. The Lagrange multipliers are also called the co-states. This result is a discrete equivalent to Pontryaguin's minimum principle, that is established in continuous-time control literature.

Derivation and equivalence to dynamic programming. We can introduce a co-state gain matrix \( M_t \) such that \( \lambda_t = -M_tx_t \), and the feedback \( u_t = K_tx_t \). The terminal co-state matrix is \( M_T = Q_f \). Let's establish a recurrence equation for these matrices: starting from the KKT conditions, \begin{equation} \begin{aligned} \lambda_t &= A^\top\lambda_{t+1} - Qx_t \\\ &= -A^\top M_{t+1}x_{t+1} - Qx_t \end{aligned} \end{equation} then \begin{equation} \begin{aligned} \lambda_t &= -(A^\top M_{t+1}A + Q)x_t - A^\top M_{t+1} B u_t \\\ &= -(A^\top M_{t+1}A + Q - A^\top M_{t+1}BK_t)x_t \end{aligned} \end{equation} That is, \( M_t = Q + A^\top M_{t+1}(A - BK_t) \). Also, \( Ru_t = B^\top M_{t+1}x_{t+1} = -B^\top M_{t+1} (Ax_t + Bu_t) \), thus \begin{equation} (R + B^\top M_{t+1}B)u_t = -B^\top M_{t+1}A \quad\Rightarrow \quad K_t = -(R + B^\top M_{t+1}B)^{-1}B^\top M_{t+1}A \end{equation}

Hence, the sequence of co-state and control feedback matrices \( (M_t)_t \) and \( (K_t)_t \) satisfy the same recurrence equations as the Riccati and feedback matrices of the dynamic programming approach, thus they are identical.