Least Squares Method – Intro to Kalman Filter

Consider a Linear Equation,

$$ y_i = \sum_{j=1}^n C_{i,j} x_j +v_i,\quad i=1,2,…$$

, where $C_{i,j}$ are scalars and $v_i\in \mathbb{R}$ is the measurement noise. The noise is unknown, while we assume it follows certain patterns (the assumptions are due to some statistical properties of the noise). We assume $v_i, v_j$ are independent for $i\neq j$ . Properties are mean of zero, and variance equals sigma squared.

$$\mathbb{E}(v_i)=0$$

$$\mathbb{E}(v_i^2) = \sigma_i^2$$

We can rewrite $y_i = \sum_{j=1}^n C_{i,j} x_j +v_i$ as,

$$ \begin{pmatrix} y_1 \ y_2 \ \vdots\ y_s\end{pmatrix} = \begin{pmatrix} C_{11} & C_{12} & \cdots & C_{1n} \ C_{21} & C_{22}& \cdots & C_{2n} \ \vdots & \vdots & \cdots & \vdots \ C_{s1} & C_{s2} & \cdots & C_{sn}\end{pmatrix} \begin{pmatrix} x_1 \ x_2 \ \vdots\ x_n\end{pmatrix} + \begin{pmatrix} v_1 \ v_2 \ \vdots\ v_s\end{pmatrix} $$

, in a matrix form,

$$ \vec{y} = C \vec{x} + \vec{v} $$

, but I would write in a short form,

$$ y= C x +v$$

We solve for the least squared estimator from the optimisation problem, (there is a squared L2 norm)

$$ \min_x || y-Cx ||_2^2 $$

Recursive Least Squared Method

The classic least squared estimator might not work well when data evolving. So, there emerges a Recursive Least Squared Method to deal with the discrete-time instance. Let’s say, for a discrete-time instance $k$ , $y_k \in \mathbb{R}’$ is within a set of measurements group follows,

$$y_k = C_k x + v_k$$

, where $C_k \in \mathbb{R}^{l\times n}$ , and $v_k \in \mathbb{R}^l$ is the measurement noise vector. We assume that the covariance of the measurement noise is given by,

$$ \mathbb{E}[v_k v_k^T] = R_k$$

, and

$$\mathbb{E}[v_k]=0$$

The recursive least squared method has the following form in this section,

$$\hat{x}k = \hat{x}{k-1} + K_k (y_k – C_k \hat{x}_{k-1})$$

, where $\hat{x}k$ and $\hat{x}{k-1}$ are the estimates of the vector $x$ at the discrete-time instants $k$ and $k-1$ , and $K_k \in \mathbb{R}^{n\times l}$ is the gain matrix that we need to determine. $K_k$ is coined the ‘Gain Matrix’

The above equation updates the estimate of $x$ at the time instant $k$ on the basis of the estimate $\hat{x}_{k-1}$ at the previous time instant $k-1$ and on the basis of the measurement $y_k$ obtained at the time instant $k$ , as well as on the basis of the gain matrix $K_k$ computed at the time instant $k$ .

Notation

$\hat{x}$ is the estimate.

$$ \hat{x}k = \begin{pmatrix} \hat{x}{1,k} \ \hat{x}{2,k} \ \vdots \\hat{x}{n,k} \end{pmatrix} $$

, which is corresponding with the true vector $x$ .

$$x = \begin{pmatrix} x_1 \ x_2 \ \vdots \ x_n \end{pmatrix}$$

The estimation error, $\epsilon_{i,k} = x_i – \hat{x}_{i,k} \quad i=1,2,…,n$ .

$$\epsilon_k = \begin{pmatrix} \epsilon_{1,k} \ \epsilon_{2,k} \ \vdots \\epsilon_{n,k} \end{pmatrix} = x – \hat{x}_k = \begin{pmatrix} x_1-\hat{x}_{1,k} \ x_2 – \hat{x}_{2,k} \ \vdots \x_n-\hat{x}_{n,k} \end{pmatrix} $$

The gain $K_k$ is computed by minimising the sum of variances of the estimation errors,

$$ W_k = \mathbb{E}(\epsilon_{1,k}^2) + \mathbb{E}(\epsilon_{2,k}^2) + \cdots + \mathbb{E}(\epsilon_{n,k}^2) $$

Next, let’s show the cost function could be represented as follows, ( $tr(.)$ is the trace of a matrix)

$$ W_k = tr(P_k) $$

, and $P_k$ is the estimation error covariance matrix defined by

$$ P_k = \mathbb{E}(\epsilon_k \epsilon_k^T )$$

Or, says,

$$ K_k = arg\min_{K_k} W_k = tr\bigg( \mathbb{E}(\epsilon_k \epsilon_k^T ) \bigg)$$

Why is that?

$$\epsilon_k \epsilon_k^T = \begin{pmatrix} \epsilon_{1,k} \ \epsilon_{2,k} \\vdots \ \epsilon_{n,k} \end{pmatrix} \begin{pmatrix} \epsilon_{1,k} & \epsilon_{2,k} & \cdots & \epsilon_{n,k} \end{pmatrix}$$

$$ = \begin{pmatrix} \epsilon_{1,k}^2 & \cdots & \epsilon_{1,k}\epsilon_{n,k} \ \vdots & \epsilon_{i,k}^2 & \vdots \ \epsilon_{1,k}\epsilon_{n,k} & \cdots & \epsilon_{n,k}^2\end{pmatrix} $$

So,

$$ P_k = \mathbb{E}[\epsilon_k \epsilon_k^T] $$

$$tr(P_k) = \mathbb{E}(\epsilon_{1,k}^2) + \mathbb{E}(\epsilon_{2,k}^2) + \cdots + \mathbb{E}(\epsilon_{n,k}^2)$$

Optimisation

$$ K_k = arg\min_{K_k} W_k = tr\bigg( \mathbb{E}(\epsilon_k \epsilon_k^T ) \bigg) = tr(P_k)$$

Let’s derive the optimisation problem.

$$\epsilon_k = x-\hat{x}_k$$

$$ =x-\hat{x}{k-1} – K_k(y_k – C_k \hat{x}{k-1}) $$

$$ = x- \hat{x}{k-1} – K_k (C_k x + v_k – C_k \hat{x}{k-1}) $$

$$ = (I – K_k C_k)(x-\hat{x}_{k-1}) – K_k v_k $$

$$ =(I-K_k C_k )\epsilon_{k-1} – K_k v_k $$

Recall $y_k = C_k x + v_k$ and $\hat{x}k = \hat{x}{k-1} + K_k (y_k – C_k \hat{x}_{k-1})$

So, $\epsilon_k \epsilon_k^T$ would be,

$$\epsilon_k \epsilon_k^T = \bigg((I-K_k C_k )\epsilon_{k-1} – K_k v_k\bigg)\bigg((I-K_k C_k )\epsilon_{k-1} – K_k v_k\bigg)^T$$

$P_k = \mathbb{E}(\epsilon_k \epsilon_k^T)$, and $P_{k-1} = \mathbb{E}(\epsilon_{k-1} \epsilon_{k-1}^T)$.

$\mathbb{E}(\epsilon_{k-1} v_k^T) = \mathbb{E}(\epsilon_{k-1}) \mathbb{E}(v_k^T) =0$ by the white noise property of $\epsilon$ and $v$. However, $\mathbb{E}(v_k v_k^T) = R_k$. Substituting all those into $P_k$, we would get,

$$P_k = (I – K_k C_k)P_{k-1}(I – K_k C_k)^T + K_k R_k K_k^T$$

$$ P_k = P_{k-1} – P_{k-1} C_k^T K_k^T – K_k C_k P_{k-1} + K_k C_k P_{k-1}C_k^T K_k^T + K_k R_k K_k^T $$

$$W = tr(P_k)= tr(P_{k-1}) – tr(P_{k-1} C_k^T K_k^T) – tr(K_k C_k P_{k-1}) + tr(K_k C_k P_{k-1}C_k^T K_k^T) + tr(K_k R_k K_k^T) $$

We take F.O.C. to solve for $K_k = arg\min_{K_k} W_k = tr\bigg( \mathbb{E}(\epsilon_k \epsilon_k^T ) \bigg) = tr(P_k)$ , by letting $\frac{\partial W_k}{\partial K_k} = 0$ . See the Matrix Cookbook and find how to do derivatives w.r.t. $K_k$ .

$$\frac{\partial W_k}{\partial K_k} = -2P_{k-1} C_k^T + 2K_k C_k P_{k-1} C_k^T + 2K_k R_k = 0$$

We solve for $K_k$ ,

$$ K_k = P_{k-1} C_k^T (R_k + C_k P_{k-1} C_k^T)^{-1}$$

, we let $L_k = R_k + C_k P_{k-1} C_k^T$ , and $L_k$ has the following property $L_k = L_k^T$ and $L_k^{-1} = (L_k^{-1})^T$

$$ K_k = P_{k-1} C_k^T L_k^{-1} $$

Plug $K_k = P_{k-1} C_k^T K_k^{-1}$ back into $P_k$ .

$$ P_k = P_{k-1} – K_kC_k P_{k-1} = (I-K_k C_k)P_{k-1} $$

Summary

In the end, the Recursive Least Squared Method could be summarised as the following three equations.

1. Update the Gain Matrix.

$$ K_k = P_{k-1} C_k^T (R_k + C_k P_{k-1} C_k^T)^{-1}$$

2. Update the Estimate.

$$\hat{x}_k = \hat{x}_{k-1} + K_k (y_k – C_k \hat{x}_{k-1})$$

3. Propagation of the estimation error covariance matrix by using this equation.

$(I-K_k C_k)P_{k-1}$

Reference

Introduction to Kalman Filter: Derivation of the Recursive Least Squares Method