Least Squares Method – Intro to Kalman Filter

Consider a Linear Equation,

yi=j=1nCi,jxj+vi,i=1,2,

, where Ci,jC_{i,j} are scalars and viRv_i\in \mathbb{R} is the measurement noise. The noise is unknown, while we assume it follows certain patterns (the assumptions are due to some statistical properties of the noise). We assume vi,vjv_i, v_j are independent for iji\neq j. Properties are mean of zero, and variance equals sigma squared.

E(vi)=0

E(vi2)=σi2


We can rewrite yi=j=1nCi,jxj+viy_i = \sum_{j=1}^n C_{i,j} x_j +v_i as,

(y1 y2  ys)=(C11C12C1n C21C22C2n  Cs1Cs2Csn)(x1 x2  xn)+(v1 v2  vs)

, in a matrix form,

y=Cx+v

, but I would write in a short form,

y=Cx+v

We solve for the least squared estimator from the optimisation problem, (there is a squared L2 norm)

minx||yCx||22

Recursive Least Squared Method

The classic least squared estimator might not work well when data evolving. So, there emerges a Recursive Least Squared Method to deal with the discrete-time instance. Let’s say, for a discrete-time instance kk, ykRy_k \in \mathbb{R}’ is within a set of measurements group follows,

yk=Ckx+vk

, where CkRl×nC_k \in \mathbb{R}^{l\times n}, and vkRlv_k \in \mathbb{R}^l is the measurement noise vector. We assume that the covariance of the measurement noise is given by,

E[vkvkT]=Rk

, and

E[vk]=0

The recursive least squared method has the following form in this section,

x^k=x^k1+Kk(ykCkx^k1)

, where x^k\hat{x}k and x^k1\hat{x}{k-1} are the estimates of the vector xx at the discrete-time instants kk and k1k-1, and KkRn×lK_k \in \mathbb{R}^{n\times l} is the gain matrix that we need to determine. KkK_k is coined the ‘Gain Matrix’

The above equation updates the estimate of xx at the time instant kk on the basis of the estimate x^k1\hat{x}_{k-1} at the previous time instant k1k-1 and on the basis of the measurement yky_k obtained at the time instant kk, as well as on the basis of the gain matrix KkK_k computed at the time instant kk.

Notation

x^ is the estimate.

x^k=(x^1,k x^2,k hatxn,k)

, which is corresponding with the true vector xx.

x=(x1 x2  xn)

The estimation error, ϵi,k=xix^i,ki=1,2,,n\epsilon_{i,k} = x_i – \hat{x}_{i,k} \quad i=1,2,…,n.

ϵk=(ϵ1,k ϵ2,k epsilonn,k)=xx^k=(x1x^1,k x2x^2,k \xnx^n,k)

The gain KkK_k is computed by minimising the sum of variances of the estimation errors,

Wk=E(ϵ1,k2)+E(ϵ2,k2)++E(ϵn,k2)

Next, let’s show the cost function could be represented as follows, (tr(.)tr(.) is the trace of a matrix)

Wk=tr(Pk)

, and PkP_k is the estimation error covariance matrix defined by

Pk=E(ϵkϵkT)

Or, says,

Kk=argminKkWk=tr(E(ϵkϵkT))

Why is that?

ϵkϵkT=(ϵ1,k ϵ2,kvdots ϵn,k)(ϵ1,kϵ2,kϵn,k)

=(ϵ1,k2ϵ1,kϵn,k ϵi,k2 ϵ1,kϵn,kϵn,k2)

So,

Pk=E[ϵkϵkT]

tr(Pk)=E(ϵ1,k2)+E(ϵ2,k2)++E(ϵn,k2)


Optimisation

Kk=argminKkWk=tr(E(ϵkϵkT))=tr(Pk)

Let’s derive the optimisation problem.

ϵk=xx^k

=xx^k1Kk(ykCkx^k1)

=xx^k1Kk(Ckx+vkCkx^k1)

=(IKkCk)(xx^k1)Kkvk

=(IKkCk)ϵk1Kkvk

Recall yk=Ckx+vky_k = C_k x + v_k and x^k=x^k1+Kk(ykCkx^k1)\hat{x}k = \hat{x}{k-1} + K_k (y_k – C_k \hat{x}_{k-1})

So, ϵkϵkT\epsilon_k \epsilon_k^T would be,

ϵkϵkT=((IKkCk)ϵk1Kkvk)((IKkCk)ϵk1Kkvk)T

Pk=E(ϵkϵkT), and Pk1=E(ϵk1ϵk1T).

E(ϵk1vkT)=E(ϵk1)E(vkT)=0 by the white noise property of ϵ and v. However, E(vkvkT)=Rk. Substituting all those into Pk, we would get,

Pk=(IKkCk)Pk1(IKkCk)T+KkRkKkT

Pk=Pk1Pk1CkTKkTKkCkPk1+KkCkPk1CkTKkT+KkRkKkT

W=tr(Pk)=tr(Pk1)tr(Pk1CkTKkT)tr(KkCkPk1)+tr(KkCkPk1CkTKkT)+tr(KkRkKkT)

We take F.O.C. to solve for Kk=argminKkWk=tr(E(ϵkϵkT))=tr(Pk)K_k = arg\min_{K_k} W_k = tr\bigg( \mathbb{E}(\epsilon_k \epsilon_k^T ) \bigg) = tr(P_k), by letting WkKk=0\frac{\partial W_k}{\partial K_k} = 0. See the Matrix Cookbook and find how to do derivatives w.r.t. KkK_k.

WkKk=2Pk1CkT+2KkCkPk1CkT+2KkRk=0

We solve for KkK_k,

Kk=Pk1CkT(Rk+CkPk1CkT)1

, we let Lk=Rk+CkPk1CkTL_k = R_k + C_k P_{k-1} C_k^T, and LkL_k has the following property Lk=LkTL_k = L_k^T and Lk1=(Lk1)TL_k^{-1} = (L_k^{-1})^T

Kk=Pk1CkTLk1

Plug Kk=Pk1CkTKk1K_k = P_{k-1} C_k^T K_k^{-1} back into PkP_k.

Pk=Pk1KkCkPk1=(IKkCk)Pk1


Summary

In the end, the Recursive Least Squared Method could be summarised as the following three equations.

  • 1. Update the Gain Matrix.

Kk=Pk1CkT(Rk+CkPk1CkT)1

  • 2. Update the Estimate.

x^k=x^k1+Kk(ykCkx^k1)

  • 3. Propagation of the estimation error covariance matrix by using this equation.

(IKkCk)Pk1(I-K_k C_k)P_{k-1}

Reference