Least Squares Method – Intro to Kalman Filter

Consider a Linear Equation,

$$ y_i = \sum_{j=1}^n C_{i,j} x_j +v_i,\quad i=1,2,…$$

, where C_{i,j} are scalars and v_i\in \mathbb{R} is the measurement noise. The noise is unknown, while we assume it follows certain patterns (the assumptions are due to some statistical properties of the noise). We assume v_i, v_j are independent for i\neq j. Properties are mean of zero, and variance equals sigma squared.

$$\mathbb{E}(v_i)=0$$

$$\mathbb{E}(v_i^2) = \sigma_i^2$$


We can rewrite y_i = \sum_{j=1}^n C_{i,j} x_j +v_i as,

$$ \begin{pmatrix} y_1 \ y_2 \ \vdots\ y_s\end{pmatrix} = \begin{pmatrix} C_{11} & C_{12} & \cdots & C_{1n} \ C_{21} & C_{22}& \cdots & C_{2n} \ \vdots & \vdots & \cdots & \vdots \ C_{s1} & C_{s2} & \cdots & C_{sn}\end{pmatrix} \begin{pmatrix} x_1 \ x_2 \ \vdots\ x_n\end{pmatrix} + \begin{pmatrix} v_1 \ v_2 \ \vdots\ v_s\end{pmatrix} $$

, in a matrix form,

$$ \vec{y} = C \vec{x} + \vec{v} $$

, but I would write in a short form,

$$ y= C x +v$$

We solve for the least squared estimator from the optimisation problem, (there is a squared L2 norm)

$$ \min_x || y-Cx ||_2^2 $$

Recursive Least Squared Method

The classic least squared estimator might not work well when data evolving. So, there emerges a Recursive Least Squared Method to deal with the discrete-time instance. Let’s say, for a discrete-time instance k, y_k \in \mathbb{R}’ is within a set of measurements group follows,

$$y_k = C_k x + v_k$$

, where C_k \in \mathbb{R}^{l\times n}, and v_k \in \mathbb{R}^l is the measurement noise vector. We assume that the covariance of the measurement noise is given by,

$$ \mathbb{E}[v_k v_k^T] = R_k$$

, and

$$\mathbb{E}[v_k]=0$$

The recursive least squared method has the following form in this section,

$$\hat{x}k = \hat{x}{k-1} + K_k (y_k – C_k \hat{x}_{k-1})$$

, where \hat{x}k and \hat{x}{k-1} are the estimates of the vector x at the discrete-time instants k and k-1, and K_k \in \mathbb{R}^{n\times l} is the gain matrix that we need to determine. K_k is coined the ‘Gain Matrix’

The above equation updates the estimate of x at the time instant k on the basis of the estimate \hat{x}_{k-1} at the previous time instant k-1 and on the basis of the measurement y_k obtained at the time instant k, as well as on the basis of the gain matrix K_k computed at the time instant k.

Notation

$\hat{x}$ is the estimate.

$$ \hat{x}k = \begin{pmatrix} \hat{x}{1,k} \ \hat{x}{2,k} \ \vdots \\hat{x}{n,k} \end{pmatrix} $$

, which is corresponding with the true vector x.

$$x = \begin{pmatrix} x_1 \ x_2 \ \vdots \ x_n \end{pmatrix}$$

The estimation error, \epsilon_{i,k} = x_i – \hat{x}_{i,k} \quad i=1,2,…,n.

$$\epsilon_k = \begin{pmatrix} \epsilon_{1,k} \ \epsilon_{2,k} \ \vdots \\epsilon_{n,k} \end{pmatrix} = x – \hat{x}_k = \begin{pmatrix} x_1-\hat{x}_{1,k} \ x_2 – \hat{x}_{2,k} \ \vdots \x_n-\hat{x}_{n,k} \end{pmatrix} $$

The gain K_k is computed by minimising the sum of variances of the estimation errors,

$$ W_k = \mathbb{E}(\epsilon_{1,k}^2) + \mathbb{E}(\epsilon_{2,k}^2) + \cdots + \mathbb{E}(\epsilon_{n,k}^2) $$

Next, let’s show the cost function could be represented as follows, (tr(.) is the trace of a matrix)

$$ W_k = tr(P_k) $$

, and P_k is the estimation error covariance matrix defined by

$$ P_k = \mathbb{E}(\epsilon_k \epsilon_k^T )$$

Or, says,

$$ K_k = arg\min_{K_k} W_k = tr\bigg( \mathbb{E}(\epsilon_k \epsilon_k^T ) \bigg)$$

Why is that?

$$\epsilon_k \epsilon_k^T = \begin{pmatrix} \epsilon_{1,k} \ \epsilon_{2,k} \\vdots \ \epsilon_{n,k} \end{pmatrix} \begin{pmatrix} \epsilon_{1,k} & \epsilon_{2,k} & \cdots & \epsilon_{n,k} \end{pmatrix}$$

$$ = \begin{pmatrix} \epsilon_{1,k}^2 & \cdots & \epsilon_{1,k}\epsilon_{n,k} \ \vdots & \epsilon_{i,k}^2 & \vdots \ \epsilon_{1,k}\epsilon_{n,k} & \cdots & \epsilon_{n,k}^2\end{pmatrix} $$

So,

$$ P_k = \mathbb{E}[\epsilon_k \epsilon_k^T] $$

$$tr(P_k) = \mathbb{E}(\epsilon_{1,k}^2) + \mathbb{E}(\epsilon_{2,k}^2) + \cdots + \mathbb{E}(\epsilon_{n,k}^2)$$


Optimisation

$$ K_k = arg\min_{K_k} W_k = tr\bigg( \mathbb{E}(\epsilon_k \epsilon_k^T ) \bigg) = tr(P_k)$$

Let’s derive the optimisation problem.

$$\epsilon_k = x-\hat{x}_k$$

$$ =x-\hat{x}{k-1} – K_k(y_k – C_k \hat{x}{k-1}) $$

$$ = x- \hat{x}{k-1} – K_k (C_k x + v_k – C_k \hat{x}{k-1}) $$

$$ = (I – K_k C_k)(x-\hat{x}_{k-1}) – K_k v_k $$

$$ =(I-K_k C_k )\epsilon_{k-1} – K_k v_k $$

Recall y_k = C_k x + v_k and \hat{x}k = \hat{x}{k-1} + K_k (y_k – C_k \hat{x}_{k-1})

So, \epsilon_k \epsilon_k^T would be,

$$\epsilon_k \epsilon_k^T = \bigg((I-K_k C_k )\epsilon_{k-1} – K_k v_k\bigg)\bigg((I-K_k C_k )\epsilon_{k-1} – K_k v_k\bigg)^T$$

$P_k = \mathbb{E}(\epsilon_k \epsilon_k^T)$, and $P_{k-1} = \mathbb{E}(\epsilon_{k-1} \epsilon_{k-1}^T)$.

$\mathbb{E}(\epsilon_{k-1} v_k^T) = \mathbb{E}(\epsilon_{k-1}) \mathbb{E}(v_k^T) =0$ by the white noise property of $\epsilon$ and $v$. However, $\mathbb{E}(v_k v_k^T) = R_k$. Substituting all those into $P_k$, we would get,

$$P_k = (I – K_k C_k)P_{k-1}(I – K_k C_k)^T + K_k R_k K_k^T$$

$$ P_k = P_{k-1} – P_{k-1} C_k^T K_k^T – K_k C_k P_{k-1} + K_k C_k P_{k-1}C_k^T K_k^T + K_k R_k K_k^T $$

$$W = tr(P_k)= tr(P_{k-1}) – tr(P_{k-1} C_k^T K_k^T) – tr(K_k C_k P_{k-1}) + tr(K_k C_k P_{k-1}C_k^T K_k^T) + tr(K_k R_k K_k^T) $$

We take F.O.C. to solve for K_k = arg\min_{K_k} W_k = tr\bigg( \mathbb{E}(\epsilon_k \epsilon_k^T ) \bigg) = tr(P_k), by letting \frac{\partial W_k}{\partial K_k} = 0. See the Matrix Cookbook and find how to do derivatives w.r.t. K_k.

$$\frac{\partial W_k}{\partial K_k} = -2P_{k-1} C_k^T + 2K_k C_k P_{k-1} C_k^T + 2K_k R_k = 0$$

We solve for K_k,

$$ K_k = P_{k-1} C_k^T (R_k + C_k P_{k-1} C_k^T)^{-1}$$

, we let L_k = R_k + C_k P_{k-1} C_k^T, and L_k has the following property L_k = L_k^T and L_k^{-1} = (L_k^{-1})^T

$$ K_k = P_{k-1} C_k^T L_k^{-1} $$

Plug K_k = P_{k-1} C_k^T K_k^{-1} back into P_k.

$$ P_k = P_{k-1} – K_kC_k P_{k-1} = (I-K_k C_k)P_{k-1} $$


Summary

In the end, the Recursive Least Squared Method could be summarised as the following three equations.

  • 1. Update the Gain Matrix.

$$ K_k = P_{k-1} C_k^T (R_k + C_k P_{k-1} C_k^T)^{-1}$$

  • 2. Update the Estimate.

$$\hat{x}_k = \hat{x}_{k-1} + K_k (y_k – C_k \hat{x}_{k-1})$$

  • 3. Propagation of the estimation error covariance matrix by using this equation.

(I-K_k C_k)P_{k-1}

Reference