{"id":5592,"date":"2023-08-25T16:22:14","date_gmt":"2023-08-25T08:22:14","guid":{"rendered":"https:\/\/fanyuzhao.com\/?p=5592"},"modified":"2023-08-25T16:32:39","modified_gmt":"2023-08-25T08:32:39","slug":"least-squares-method-intro-to-kalman-filter","status":"publish","type":"post","link":"https:\/\/fanyuzhao.com\/?p=5592","title":{"rendered":"Least Squares Method &#8211; Intro to Kalman Filter"},"content":{"rendered":"\n<p>Consider a Linear Equation,<\/p>\n\n\n\n<p>$$ y_i = \\sum_{j=1}^n C_{i,j} x_j +v_i,\\quad i=1,2,\u2026$$<\/p>\n\n\n\n<p>, where <span class=\"katex math inline\">C_{i,j}<\/span> are scalars and <span class=\"katex math inline\">v_i\\in \\mathbb{R}<\/span> is the measurement noise. The noise is unknown, while we assume it follows certain patterns (the assumptions are due to some statistical properties of the noise). We assume <span class=\"katex math inline\">v_i, v_j<\/span> are independent for <span class=\"katex math inline\">i\\neq j<\/span>. Properties are mean of zero, and variance equals sigma squared.<\/p>\n\n\n\n<p>$$\\mathbb{E}(v_i)=0$$<\/p>\n\n\n\n<p>$$\\mathbb{E}(v_i^2) = \\sigma_i^2$$<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<p>We can rewrite <span class=\"katex math inline\">y_i = \\sum_{j=1}^n C_{i,j} x_j +v_i<\/span> as,<\/p>\n\n\n\n<p>$$ \\begin{pmatrix} y_1 \\ y_2 \\ \\vdots\\ y_s\\end{pmatrix} = \\begin{pmatrix} C_{11} &amp; C_{12} &amp; \\cdots &amp; C_{1n} \\ C_{21} &amp; C_{22}&amp; \\cdots &amp; C_{2n} \\ \\vdots &amp; \\vdots &amp; \\cdots &amp; \\vdots \\ C_{s1} &amp; C_{s2} &amp; \\cdots &amp; C_{sn}\\end{pmatrix} \\begin{pmatrix} x_1 \\ x_2 \\ \\vdots\\ x_n\\end{pmatrix} + \\begin{pmatrix} v_1 \\ v_2 \\ \\vdots\\ v_s\\end{pmatrix} $$<\/p>\n\n\n\n<p>, in a matrix form,<\/p>\n\n\n\n<p>$$ \\vec{y} = C \\vec{x} + \\vec{v} $$<\/p>\n\n\n\n<p>, but I would write in a short form,<\/p>\n\n\n\n<p>$$ y= C x +v$$<\/p>\n\n\n\n<p>We solve for the least squared estimator from the optimisation problem, (there is a squared L2 norm)<\/p>\n\n\n\n<p>$$ \\min_x || y-Cx ||_2^2 $$<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Recursive Least Squared Method<\/h3>\n\n\n\n<p>The classic least squared estimator might not work well when data evolving. So, there emerges a Recursive Least Squared Method to deal with the discrete-time instance. Let&#8217;s say, for a discrete-time instance <span class=\"katex math inline\">k<\/span>, <span class=\"katex math inline\">y_k \\in \\mathbb{R}&#8217;<\/span> is within a set of measurements group follows,<\/p>\n\n\n\n<p>$$y_k = C_k x + v_k$$<\/p>\n\n\n\n<p>, where <span class=\"katex math inline\">C_k \\in \\mathbb{R}^{l\\times n}<\/span>, and <span class=\"katex math inline\">v_k \\in \\mathbb{R}^l<\/span> is the measurement noise vector. We assume that the covariance of the measurement noise is given by,<\/p>\n\n\n\n<p>$$ \\mathbb{E}[v_k v_k^T] = R_k$$<\/p>\n\n\n\n<p>, and<\/p>\n\n\n\n<p>$$\\mathbb{E}[v_k]=0$$<\/p>\n\n\n\n<p>The recursive least squared method has the following form in this section,<\/p>\n\n\n\n<p>$$\\hat{x}k = \\hat{x}{k-1} + K_k (y_k &#8211; C_k \\hat{x}_{k-1})$$<\/p>\n\n\n\n<p>, where <span class=\"katex math inline\">\\hat{x}k<\/span> and <span class=\"katex math inline\">\\hat{x}{k-1}<\/span> are the estimates of the vector <span class=\"katex math inline\">x<\/span> at the discrete-time instants <span class=\"katex math inline\">k<\/span> and <span class=\"katex math inline\">k-1<\/span>, and <span class=\"katex math inline\">K_k \\in \\mathbb{R}^{n\\times l}<\/span> is the gain matrix that we need to determine. <span class=\"katex math inline\">K_k<\/span> is coined the &#8216;Gain Matrix&#8217;<\/p>\n\n\n\n<p>The above equation updates the estimate of <span class=\"katex math inline\">x<\/span> at the time instant <span class=\"katex math inline\">k<\/span> on the basis of the estimate <span class=\"katex math inline\">\\hat{x}_{k-1}<\/span> at the previous time instant <span class=\"katex math inline\">k-1<\/span> and on the basis of the measurement <span class=\"katex math inline\">y_k<\/span> obtained at the time instant <span class=\"katex math inline\">k<\/span>, as well as on the basis of the gain matrix <span class=\"katex math inline\">K_k<\/span> computed at the time instant <span class=\"katex math inline\">k<\/span>.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Notation<\/h4>\n\n\n\n<p>$\\hat{x}$ is the estimate.<\/p>\n\n\n\n<p>$$ \\hat{x}k = \\begin{pmatrix} \\hat{x}{1,k} \\ \\hat{x}{2,k} \\ \\vdots \\\\hat{x}{n,k} \\end{pmatrix} $$<\/p>\n\n\n\n<p>, which is corresponding with the true vector <span class=\"katex math inline\">x<\/span>.<\/p>\n\n\n\n<p>$$x = \\begin{pmatrix} x_1 \\ x_2 \\ \\vdots \\ x_n \\end{pmatrix}$$<\/p>\n\n\n\n<p>The estimation error, <span class=\"katex math inline\">\\epsilon_{i,k} = x_i &#8211; \\hat{x}_{i,k} \\quad i=1,2,\u2026,n<\/span>.<\/p>\n\n\n\n<p>$$\\epsilon_k = \\begin{pmatrix} \\epsilon_{1,k} \\ \\epsilon_{2,k} \\ \\vdots \\\\epsilon_{n,k} \\end{pmatrix} = x &#8211; \\hat{x}_k = \\begin{pmatrix} x_1-\\hat{x}_{1,k} \\ x_2 &#8211; \\hat{x}_{2,k} \\ \\vdots \\x_n-\\hat{x}_{n,k} \\end{pmatrix} $$<\/p>\n\n\n\n<p>The gain <span class=\"katex math inline\">K_k<\/span> is computed by minimising the sum of variances of the estimation errors,<\/p>\n\n\n\n<p>$$ W_k = \\mathbb{E}(\\epsilon_{1,k}^2) + \\mathbb{E}(\\epsilon_{2,k}^2) + \\cdots + \\mathbb{E}(\\epsilon_{n,k}^2) $$<\/p>\n\n\n\n<p>Next, let&#8217;s show the cost function could be represented as follows, (<span class=\"katex math inline\">tr(.)<\/span> is the trace of a matrix)<\/p>\n\n\n\n<p>$$ W_k = tr(P_k) $$<\/p>\n\n\n\n<p>, and <span class=\"katex math inline\">P_k<\/span> is the estimation error covariance matrix defined by<\/p>\n\n\n\n<p>$$ P_k = \\mathbb{E}(\\epsilon_k \\epsilon_k^T )$$<\/p>\n\n\n\n<p>Or, says,<\/p>\n\n\n\n<p>$$ K_k = arg\\min_{K_k} W_k = tr\\bigg( \\mathbb{E}(\\epsilon_k \\epsilon_k^T ) \\bigg)$$<\/p>\n\n\n\n<p>Why is that?<\/p>\n\n\n\n<p>$$\\epsilon_k \\epsilon_k^T = \\begin{pmatrix} \\epsilon_{1,k} \\ \\epsilon_{2,k} \\\\vdots \\ \\epsilon_{n,k} \\end{pmatrix} \\begin{pmatrix} \\epsilon_{1,k} &amp; \\epsilon_{2,k} &amp; \\cdots &amp; \\epsilon_{n,k} \\end{pmatrix}$$<\/p>\n\n\n\n<p>$$ = \\begin{pmatrix} \\epsilon_{1,k}^2 &amp; \\cdots &amp; \\epsilon_{1,k}\\epsilon_{n,k} \\ \\vdots &amp; \\epsilon_{i,k}^2 &amp; \\vdots \\ \\epsilon_{1,k}\\epsilon_{n,k} &amp; \\cdots &amp; \\epsilon_{n,k}^2\\end{pmatrix} $$<\/p>\n\n\n\n<p>So,<\/p>\n\n\n\n<p>$$ P_k = \\mathbb{E}[\\epsilon_k \\epsilon_k^T] $$<\/p>\n\n\n\n<p>$$tr(P_k) = \\mathbb{E}(\\epsilon_{1,k}^2) + \\mathbb{E}(\\epsilon_{2,k}^2) + \\cdots + \\mathbb{E}(\\epsilon_{n,k}^2)$$<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h4 class=\"wp-block-heading\">Optimisation<\/h4>\n\n\n\n<p>$$ K_k = arg\\min_{K_k} W_k = tr\\bigg( \\mathbb{E}(\\epsilon_k \\epsilon_k^T ) \\bigg) = tr(P_k)$$<\/p>\n\n\n\n<p>Let&#8217;s derive the optimisation problem.<\/p>\n\n\n\n<p>$$\\epsilon_k = x-\\hat{x}_k$$<\/p>\n\n\n\n<p>$$ =x-\\hat{x}{k-1} &#8211; K_k(y_k &#8211; C_k \\hat{x}{k-1}) $$<\/p>\n\n\n\n<p>$$ = x- \\hat{x}{k-1} &#8211; K_k (C_k x + v_k &#8211; C_k \\hat{x}{k-1}) $$<\/p>\n\n\n\n<p>$$ = (I &#8211; K_k C_k)(x-\\hat{x}_{k-1}) &#8211; K_k v_k $$<\/p>\n\n\n\n<p>$$ =(I-K_k C_k )\\epsilon_{k-1} &#8211; K_k v_k $$<\/p>\n\n\n\n<p>Recall <span class=\"katex math inline\">y_k = C_k x + v_k<\/span> and <span class=\"katex math inline\">\\hat{x}k = \\hat{x}{k-1} + K_k (y_k &#8211; C_k \\hat{x}_{k-1})<\/span><\/p>\n\n\n\n<p>So, <span class=\"katex math inline\">\\epsilon_k \\epsilon_k^T<\/span> would be,<\/p>\n\n\n\n<p>$$\\epsilon_k \\epsilon_k^T = \\bigg((I-K_k C_k )\\epsilon_{k-1} &#8211; K_k v_k\\bigg)\\bigg((I-K_k C_k )\\epsilon_{k-1} &#8211; K_k v_k\\bigg)^T$$<\/p>\n\n\n\n<p>$P_k = \\mathbb{E}(\\epsilon_k \\epsilon_k^T)$, and $P_{k-1} = \\mathbb{E}(\\epsilon_{k-1} \\epsilon_{k-1}^T)$.<\/p>\n\n\n\n<p>$\\mathbb{E}(\\epsilon_{k-1} v_k^T) = \\mathbb{E}(\\epsilon_{k-1}) \\mathbb{E}(v_k^T) =0$ by the white noise property of $\\epsilon$ and $v$. However, $\\mathbb{E}(v_k v_k^T) = R_k$. Substituting all those into $P_k$, we would get,<\/p>\n\n\n\n<p>$$P_k = (I &#8211; K_k C_k)P_{k-1}(I &#8211; K_k C_k)^T + K_k R_k K_k^T$$<\/p>\n\n\n\n<p>$$ P_k = P_{k-1} &#8211; P_{k-1} C_k^T K_k^T &#8211; K_k C_k P_{k-1} + K_k C_k P_{k-1}C_k^T K_k^T + K_k R_k K_k^T $$<\/p>\n\n\n\n<p>$$W = tr(P_k)= tr(P_{k-1}) &#8211; tr(P_{k-1} C_k^T K_k^T) &#8211; tr(K_k C_k P_{k-1}) + tr(K_k C_k P_{k-1}C_k^T K_k^T) + tr(K_k R_k K_k^T) $$<\/p>\n\n\n\n<p>We take F.O.C. to solve for <span class=\"katex math inline\">K_k = arg\\min_{K_k} W_k = tr\\bigg( \\mathbb{E}(\\epsilon_k \\epsilon_k^T ) \\bigg) = tr(P_k)<\/span>, by letting <span class=\"katex math inline\">\\frac{\\partial W_k}{\\partial K_k} = 0<\/span>. See the <a href=\"https:\/\/www.math.uwaterloo.ca\/~hwolkowi\/matrixcookbook.pdf\">Matrix Cookbook<\/a> and find how to do derivatives w.r.t. <span class=\"katex math inline\">K_k<\/span>.<\/p>\n\n\n\n<p>$$\\frac{\\partial W_k}{\\partial K_k} = -2P_{k-1} C_k^T + 2K_k C_k P_{k-1} C_k^T + 2K_k R_k = 0$$<\/p>\n\n\n\n<p>We solve for <span class=\"katex math inline\">K_k<\/span>,<\/p>\n\n\n\n<p>$$ K_k = P_{k-1} C_k^T (R_k + C_k P_{k-1} C_k^T)^{-1}$$<\/p>\n\n\n\n<p>, we let <span class=\"katex math inline\">L_k = R_k + C_k P_{k-1} C_k^T<\/span>, and <span class=\"katex math inline\">L_k<\/span> has the following property <span class=\"katex math inline\">L_k = L_k^T<\/span> and <span class=\"katex math inline\">L_k^{-1} = (L_k^{-1})^T<\/span><\/p>\n\n\n\n<p>$$ K_k = P_{k-1} C_k^T L_k^{-1} $$<\/p>\n\n\n\n<p>Plug <span class=\"katex math inline\">K_k = P_{k-1} C_k^T K_k^{-1}<\/span> back into <span class=\"katex math inline\">P_k<\/span>.<\/p>\n\n\n\n<p>$$ P_k = P_{k-1} &#8211; K_kC_k P_{k-1} = (I-K_k C_k)P_{k-1} $$<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h4 class=\"wp-block-heading\">Summary<\/h4>\n\n\n\n<p>In the end, the Recursive Least Squared Method could be summarised as the following three equations.<\/p>\n\n\n\n<ul><li>1. Update the Gain Matrix.<\/li><\/ul>\n\n\n\n<p>$$ K_k = P_{k-1} C_k^T (R_k + C_k P_{k-1} C_k^T)^{-1}$$<\/p>\n\n\n\n<ul><li>2. Update the Estimate. <\/li><\/ul>\n\n\n\n<p>$$\\hat{x}_k = \\hat{x}_{k-1} + K_k (y_k &#8211; C_k \\hat{x}_{k-1})$$<\/p>\n\n\n\n<ul><li>3. Propagation of the estimation error covariance matrix by using this equation.<\/li><\/ul>\n\n\n\n<p> <span class=\"katex math multi-line\">(I-K_k C_k)P_{k-1}<\/span><\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Reference<\/h4>\n\n\n\n<figure class=\"wp-block-embed is-type-wp-embed is-provider-fusion-of-engineering-control-coding-machine-learning-and-science wp-block-embed-fusion-of-engineering-control-coding-machine-learning-and-science\"><div class=\"wp-block-embed__wrapper\">\n<blockquote class=\"wp-embedded-content\" data-secret=\"6oDzTHBiF0\"><a href=\"https:\/\/aleksandarhaber.com\/introduction-to-kalman-filter-derivation-of-the-recursive-least-squares-method-with-python-codes\/\">Introduction to Kalman Filter: Derivation of the Recursive Least Squares Method<\/a><\/blockquote><iframe class=\"wp-embedded-content\" sandbox=\"allow-scripts\" security=\"restricted\" style=\"position: absolute; clip: rect(1px, 1px, 1px, 1px);\" title=\"&#8220;Introduction to Kalman Filter: Derivation of the Recursive Least Squares Method&#8221; &#8212; Fusion of Engineering, Control, Coding, Machine Learning, and Science\" src=\"https:\/\/aleksandarhaber.com\/introduction-to-kalman-filter-derivation-of-the-recursive-least-squares-method-with-python-codes\/embed\/#?secret=Lc36Z4vYQb#?secret=6oDzTHBiF0\" data-secret=\"6oDzTHBiF0\" width=\"600\" height=\"338\" frameborder=\"0\" marginwidth=\"0\" marginheight=\"0\" scrolling=\"no\"><\/iframe>\n<\/div><\/figure>\n","protected":false},"excerpt":{"rendered":"<p>Consider a Linear Equation, $$ y_i = \\sum_{j=1}^n C_{i,j} x_j +v_i,\\quad i=1,2,\u2026$$ , where C_{i,j} are scalars and v_i\\in \\mathbb{R} is the measurement noise. The noise is unknown, while we assume it follows certain patterns (the assumptions are due to some statistical properties of the noise). We assume v_i, v_j are independent for i\\neq j. &hellip; <a href=\"https:\/\/fanyuzhao.com\/?p=5592\" class=\"more-link\">Continue reading <span class=\"screen-reader-text\">Least Squares Method &#8211; Intro to Kalman Filter<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[6,18,26],"tags":[],"_links":{"self":[{"href":"https:\/\/fanyuzhao.com\/index.php?rest_route=\/wp\/v2\/posts\/5592"}],"collection":[{"href":"https:\/\/fanyuzhao.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/fanyuzhao.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/fanyuzhao.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/fanyuzhao.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=5592"}],"version-history":[{"count":6,"href":"https:\/\/fanyuzhao.com\/index.php?rest_route=\/wp\/v2\/posts\/5592\/revisions"}],"predecessor-version":[{"id":5598,"href":"https:\/\/fanyuzhao.com\/index.php?rest_route=\/wp\/v2\/posts\/5592\/revisions\/5598"}],"wp:attachment":[{"href":"https:\/\/fanyuzhao.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=5592"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/fanyuzhao.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=5592"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/fanyuzhao.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=5592"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}