Consider a Linear Equation,
$$ y_i = \sum_{j=1}^n C_{i,j} x_j +v_i,\quad i=1,2,…$$
, where C_{i,j} are scalars and v_i\in \mathbb{R} is the measurement noise. The noise is unknown, while we assume it follows certain patterns (the assumptions are due to some statistical properties of the noise). We assume v_i, v_j are independent for i\neq j. Properties are mean of zero, and variance equals sigma squared.
$$\mathbb{E}(v_i)=0$$
$$\mathbb{E}(v_i^2) = \sigma_i^2$$
We can rewrite y_i = \sum_{j=1}^n C_{i,j} x_j +v_i as,
$$ \begin{pmatrix} y_1 \ y_2 \ \vdots\ y_s\end{pmatrix} = \begin{pmatrix} C_{11} & C_{12} & \cdots & C_{1n} \ C_{21} & C_{22}& \cdots & C_{2n} \ \vdots & \vdots & \cdots & \vdots \ C_{s1} & C_{s2} & \cdots & C_{sn}\end{pmatrix} \begin{pmatrix} x_1 \ x_2 \ \vdots\ x_n\end{pmatrix} + \begin{pmatrix} v_1 \ v_2 \ \vdots\ v_s\end{pmatrix} $$
, in a matrix form,
$$ \vec{y} = C \vec{x} + \vec{v} $$
, but I would write in a short form,
$$ y= C x +v$$
We solve for the least squared estimator from the optimisation problem, (there is a squared L2 norm)
$$ \min_x || y-Cx ||_2^2 $$
Recursive Least Squared Method
The classic least squared estimator might not work well when data evolving. So, there emerges a Recursive Least Squared Method to deal with the discrete-time instance. Let’s say, for a discrete-time instance k, y_k \in \mathbb{R}’ is within a set of measurements group follows,
$$y_k = C_k x + v_k$$
, where C_k \in \mathbb{R}^{l\times n}, and v_k \in \mathbb{R}^l is the measurement noise vector. We assume that the covariance of the measurement noise is given by,
$$ \mathbb{E}[v_k v_k^T] = R_k$$
, and
$$\mathbb{E}[v_k]=0$$
The recursive least squared method has the following form in this section,
$$\hat{x}k = \hat{x}{k-1} + K_k (y_k – C_k \hat{x}_{k-1})$$
, where \hat{x}k and \hat{x}{k-1} are the estimates of the vector x at the discrete-time instants k and k-1, and K_k \in \mathbb{R}^{n\times l} is the gain matrix that we need to determine. K_k is coined the ‘Gain Matrix’
The above equation updates the estimate of x at the time instant k on the basis of the estimate \hat{x}_{k-1} at the previous time instant k-1 and on the basis of the measurement y_k obtained at the time instant k, as well as on the basis of the gain matrix K_k computed at the time instant k.
Notation
$\hat{x}$ is the estimate.
$$ \hat{x}k = \begin{pmatrix} \hat{x}{1,k} \ \hat{x}{2,k} \ \vdots \\hat{x}{n,k} \end{pmatrix} $$
, which is corresponding with the true vector x.
$$x = \begin{pmatrix} x_1 \ x_2 \ \vdots \ x_n \end{pmatrix}$$
The estimation error, \epsilon_{i,k} = x_i – \hat{x}_{i,k} \quad i=1,2,…,n.
$$\epsilon_k = \begin{pmatrix} \epsilon_{1,k} \ \epsilon_{2,k} \ \vdots \\epsilon_{n,k} \end{pmatrix} = x – \hat{x}_k = \begin{pmatrix} x_1-\hat{x}_{1,k} \ x_2 – \hat{x}_{2,k} \ \vdots \x_n-\hat{x}_{n,k} \end{pmatrix} $$
The gain K_k is computed by minimising the sum of variances of the estimation errors,
$$ W_k = \mathbb{E}(\epsilon_{1,k}^2) + \mathbb{E}(\epsilon_{2,k}^2) + \cdots + \mathbb{E}(\epsilon_{n,k}^2) $$
Next, let’s show the cost function could be represented as follows, (tr(.) is the trace of a matrix)
$$ W_k = tr(P_k) $$
, and P_k is the estimation error covariance matrix defined by
$$ P_k = \mathbb{E}(\epsilon_k \epsilon_k^T )$$
Or, says,
$$ K_k = arg\min_{K_k} W_k = tr\bigg( \mathbb{E}(\epsilon_k \epsilon_k^T ) \bigg)$$
Why is that?
$$\epsilon_k \epsilon_k^T = \begin{pmatrix} \epsilon_{1,k} \ \epsilon_{2,k} \\vdots \ \epsilon_{n,k} \end{pmatrix} \begin{pmatrix} \epsilon_{1,k} & \epsilon_{2,k} & \cdots & \epsilon_{n,k} \end{pmatrix}$$
$$ = \begin{pmatrix} \epsilon_{1,k}^2 & \cdots & \epsilon_{1,k}\epsilon_{n,k} \ \vdots & \epsilon_{i,k}^2 & \vdots \ \epsilon_{1,k}\epsilon_{n,k} & \cdots & \epsilon_{n,k}^2\end{pmatrix} $$
So,
$$ P_k = \mathbb{E}[\epsilon_k \epsilon_k^T] $$
$$tr(P_k) = \mathbb{E}(\epsilon_{1,k}^2) + \mathbb{E}(\epsilon_{2,k}^2) + \cdots + \mathbb{E}(\epsilon_{n,k}^2)$$
Optimisation
$$ K_k = arg\min_{K_k} W_k = tr\bigg( \mathbb{E}(\epsilon_k \epsilon_k^T ) \bigg) = tr(P_k)$$
Let’s derive the optimisation problem.
$$\epsilon_k = x-\hat{x}_k$$
$$ =x-\hat{x}{k-1} – K_k(y_k – C_k \hat{x}{k-1}) $$
$$ = x- \hat{x}{k-1} – K_k (C_k x + v_k – C_k \hat{x}{k-1}) $$
$$ = (I – K_k C_k)(x-\hat{x}_{k-1}) – K_k v_k $$
$$ =(I-K_k C_k )\epsilon_{k-1} – K_k v_k $$
Recall y_k = C_k x + v_k and \hat{x}k = \hat{x}{k-1} + K_k (y_k – C_k \hat{x}_{k-1})
So, \epsilon_k \epsilon_k^T would be,
$$\epsilon_k \epsilon_k^T = \bigg((I-K_k C_k )\epsilon_{k-1} – K_k v_k\bigg)\bigg((I-K_k C_k )\epsilon_{k-1} – K_k v_k\bigg)^T$$
$P_k = \mathbb{E}(\epsilon_k \epsilon_k^T)$, and $P_{k-1} = \mathbb{E}(\epsilon_{k-1} \epsilon_{k-1}^T)$.
$\mathbb{E}(\epsilon_{k-1} v_k^T) = \mathbb{E}(\epsilon_{k-1}) \mathbb{E}(v_k^T) =0$ by the white noise property of $\epsilon$ and $v$. However, $\mathbb{E}(v_k v_k^T) = R_k$. Substituting all those into $P_k$, we would get,
$$P_k = (I – K_k C_k)P_{k-1}(I – K_k C_k)^T + K_k R_k K_k^T$$
$$ P_k = P_{k-1} – P_{k-1} C_k^T K_k^T – K_k C_k P_{k-1} + K_k C_k P_{k-1}C_k^T K_k^T + K_k R_k K_k^T $$
$$W = tr(P_k)= tr(P_{k-1}) – tr(P_{k-1} C_k^T K_k^T) – tr(K_k C_k P_{k-1}) + tr(K_k C_k P_{k-1}C_k^T K_k^T) + tr(K_k R_k K_k^T) $$
We take F.O.C. to solve for K_k = arg\min_{K_k} W_k = tr\bigg( \mathbb{E}(\epsilon_k \epsilon_k^T ) \bigg) = tr(P_k), by letting \frac{\partial W_k}{\partial K_k} = 0. See the Matrix Cookbook and find how to do derivatives w.r.t. K_k.
$$\frac{\partial W_k}{\partial K_k} = -2P_{k-1} C_k^T + 2K_k C_k P_{k-1} C_k^T + 2K_k R_k = 0$$
We solve for K_k,
$$ K_k = P_{k-1} C_k^T (R_k + C_k P_{k-1} C_k^T)^{-1}$$
, we let L_k = R_k + C_k P_{k-1} C_k^T, and L_k has the following property L_k = L_k^T and L_k^{-1} = (L_k^{-1})^T
$$ K_k = P_{k-1} C_k^T L_k^{-1} $$
Plug K_k = P_{k-1} C_k^T K_k^{-1} back into P_k.
$$ P_k = P_{k-1} – K_kC_k P_{k-1} = (I-K_k C_k)P_{k-1} $$
Summary
In the end, the Recursive Least Squared Method could be summarised as the following three equations.
- 1. Update the Gain Matrix.
$$ K_k = P_{k-1} C_k^T (R_k + C_k P_{k-1} C_k^T)^{-1}$$
- 2. Update the Estimate.
$$\hat{x}_k = \hat{x}_{k-1} + K_k (y_k – C_k \hat{x}_{k-1})$$
- 3. Propagation of the estimation error covariance matrix by using this equation.
(I-K_k C_k)P_{k-1}