Bernoulli’s Inequality

$$lim_{x\rightarrow 0}(1+x)^n ~ 1+nx$$

Proof:

Taylor Expansion at x=0

$$ (1+x)^n\approx (1+x)^n|_{x=0} + \frac{d (1+x)^n}{d x}|_{x=0} \times (x-0) + \frac{d^2 (1+x)^n}{d x^2}|_{x=0}\times (x-0)^2 + O(x^3) $$

$$ \approx 1 + nx + \frac{n(n-1)}{2} x^2 + O(x^3)$$

$$ \approx 1 + nx + O(x^2)$$

Isoquant

An isoquant map where production output Q3 > Q2 > Q1. Typically inputs X and Y would refer to labour and capital respectively. More of input X, input Y, or both are required to move from isoquant Q1 to Q2, or from Q2 to Q3.

MRTS equals the slope of the Isoquant.

Difference with the Indifference Curve

Isoquant and indifference curves behave similarly, as they are all kinds of contour curves. The difference is that the Isoquant maps the output, but the indifference curve maps the utility.

In addition, the indifference curve describes only the preference of individuals but does not capture the exact value of utility. The preference is the relative desire for certain goods or services to others. However, the Isoquant can capture the exact number of production.

Shape of the Isoquant

The shape of the Isoquant depends on whether inputs are substitutions or complements.

Example of an isoquant map with two inputs that are perfect substitutes.
Example of an isoquant map with two inputs that are perfect complements.

Convexity

As we always assume diminishing returns, so MRTS normally is declining. Thus, the Isoquant is convex to the origin.

However, if there is an increasing return of scale, or there is a negative elasticity of substitution ( as the ratio of input A to input B increases, the marginal product of A relative to B increases rather than decreases), then the Isoquant could be non-convex.

A nonconvex isoquant is prone to produce large and discontinuous changes in the price minimizing input mix in response to price changes. Consider for example the case where the isoquant is globally nonconvex, and the isocost curve is linear. In this case the minimum cost mix of inputs will be a corner solution, and include only one input (for example either input A or input B). The choice of which input to use will depend on the relative prices. At some critical price ratio, the optimum input mix will shift from all input A to all input B and vice versa in response to a small change in relative prices.

Reference

Learned from Wikipedia.

https://en.wikipedia.org/wiki/Isoquant

Lagrange Multiplier

Here is a review of the method of Lagrangian method. We find that maximising a utility function s.t. a budget constant by using Lagrangian could also get the MRS.

$$\max_{x,y} U(x,y)\quad s.t.\quad BC$$

Or, in a Cobb-Douglas utility.

$$\max_{x,y} x^a y^b\quad s.t.\quad p_x x+p_y y\leq w $$

Using the Lagrange Multiplier,

$$\mathcal{L}=x^a y^b +\lambda (w-p_x x- p_y y)$$

Discuss the complementary slackness, and take F.O.C.

$$ \frac{\partial \mathcal{L}}{\partial x}=0 \Rightarrow a x^{a-1}y^b=\lambda p_x $$

$$ \frac{\partial \mathcal{L}}{\partial y}=0 \Rightarrow x^a b y^{b-1}=\lambda p_y $$

Divide those two equations then we get,

$$ \frac{MU_x}{MU_y}=\frac{ay}{bx}=\frac{p_x}{p_y}=MRS_{x,y} $$

After knowing the Marshallian Demandm \(x=f(p_x,p_y,w)\), we can then calculate the elasticity.

  • \(\varepsilon=\frac{\partial x}{\partial p_x}\frac{p_x}{x}\), elasticity to price of x.
  • \(\varepsilon_I=\frac{\partial x}{\partial w}\frac{w}{x}\), elasticity to wealth.
  • \( \varepsilon_{xy}=\frac{\partial x}{\partial p_y}\frac{p_y}{x} \), elasticity to price of y.

Meaning of Lambda

Review the graphic version of the utility maximisation problem, the budget constraint is the black plane, the utility function is green, and the value of utility is the contour of the utility function.

After solving the utility maximisation problem, we would get \(x^*\) and \(y^*\) (they have exact values). Then, plug them back into the F.O.C., we get easily get the numerical value of \(\lambda\).

As \(\frac{\partial \mathcal{L}}{\partial w}=\lambda\), \(\lambda\) represents how does the utility changes if wealth changes a unit.

\(\lambda\) is like the slope of the utility surface. With the increase, the wealth, the budget constraint (the black wall) moves outwards, and then the changes would result in an increase of the utility value, which is the intersection of the utility surface and the budget constraint surface.

Similarly, the utility function could be replaced with production and has a similar implication of output production.

Geographical Meaning

\(\lambda\) is when the gradient of the contour of the utility function is in the same direction as the gradient of constraint. Or says, the gradient of \(f\) is equal to the gradient of \(g\).

In another word, the Lagrange multiplier \(\lambda\) gives the max and min value of \(x\) and \(y\), and also the corresponding changing speed of those max or mini values of our objective function, \(f\), if the constraint, \(g\), releases.

Lagrange Multiplier:

Simultaneously solve \(\nabla f=\lambda\nabla g\), and \(g=0\). \(f\) here is the objective function (utility function in our case), and \(g\) here is the constraint (the budget constraint in our case).

Reference

Thanks to the video from Professor Burkey, that helps a lot to let me rethink the meaning of lambda.

https://www.youtube.com/watch?v=O3MFXT7AdPg

And the geographic implication of Lagrange multiplier method.

https://www.youtube.com/watch?v=8mjcnxGMwFo

MRS and MRTS

Derivations

We here derive why \(MRS_{x,y}=\frac{MU_x}{MU_y}\).

Let \(U(x,y)=f(x,y)\), and we know, by definition, MRS measures how many units of x is needed to trade y holding utility constant. Thus, we keep the utility function unchanged, \(U(x,y)=C\), and take differentiation and find \(-dy/dx\).

$$f(x,y) dx=C dx$$

$$ \frac{\partial f(x,y)}{x}+\frac{\partial f(x,y)}{\partial y}\frac{\partial y}{\partial x}=0 $$

$$\frac{\partial y}{\partial x}=-\frac{\frac{\partial f(x,y)}{\partial x}}{\frac{\partial f(x,y)}{\partial y}}=\frac{MU_x}{MU_y}$$

Therefore,

MRS_{x,y}=-\frac{dy}{dx}=\frac{MU_x}{MU_y}

$$|MRS_{x,y}|=-\frac{dy}{dx}=\frac{MU_x}{MU_y} $$

Example 1

$$U=x^2+y^2$$

$$MRS_{x,y}=\frac{MU_x}{MU_y} =\frac{x}{y}$$

Example 2

$$U=x\cdot y$$

, which is similar as the Cobb-Douglas form but has exponenets zero.

$$MRS_{x,y}=\frac{MU_x}{MU_y} =\frac{y}{x}$$

Example 3

Perfect Substitution: MRS constant
Perfect Complement

MRTS

Marginal Rate of Technical Substitution (MRTS) measures the amount of cost which a specific input can be replaced for another resource of production while maintaining a constant output.

$$MRTS_{K,L}=-\frac{\Delta K}{\Delta L}=-\frac{d K}{d L}=\frac{MP_L}{MP_K}$$

How to derive that?

Recall the Isoquant that is equivalent to the contour line of the output function. MRTS is like the slope of the isoquant line. We let,

$$Q=L^a K^b$$

Then,

$$MP_K=\frac{\partial Q}{\partial K}=b L^A K^{b-1}$$

$$MP_L=\frac{\partial Q}{\partial L}=a L^{a-1}K^b$$

$$MRTS=\frac{ b L^A K^{b-1} }{ a L^{a-1}K^b }=\frac{aK}{bL}$$

In short, MRTS is a similar concept to MRS, but in the output aspect.

Cobb-Douglas Function

Cobb-Douglas Utility function

$$U=C x^a y^b$$

While applying the Cobb-Douglas formed utility function, we are actually proxy the preference of people. (The utility function is like a math representation if individuals’ preference is rational). In the utility function, we are focusing more on the Marginal Rate of Substitution between goods.

$$MRS_{x,y}=\frac{MU_x}{MU_y}=\frac{\partial U/\partial x}{\partial U/\partial y}=\frac{Cax^{a-1}y^b}{Cx^a by^{b-1}}$$

$$MRS_{x,y}=\frac{ay}{bx}$$

P.S. Cobb-Douglas gives the same MRS to CES utility function. While solving the utility maximisation problem, we take partial derivatives to the lagrangian and then solve them. Those steps are similar to calculating the MRS.

The key is that the number or value of the utility function does not matter, but the preference represented by the utility function is more important. Any positive monotonic transformation will not change the preference, such as logarithm, square root, and multiply any positive number.

Exponents Do Not Matter

The powers of the Cobb-Douglas function does not really matter as long as they are in the “correct” ratio. For example,

$$ U_1=Cx^7y^1,\quad and \quad U_2=Cx^{7/8}y^{1/8} $$

$$MRS_1=\frac{7y}{x}\quad and \quad MRS_2=\frac{7y/8}{x/8}=\frac{7y}{x}$$

Therefore, we can find that those two utility functions represent the same preference!

Or we can write \(U_1=(U_2)^8 \cdot C^{-7}\). Both taking exponent and multiplying a positive constant are positive monotonic transformations. Therefore, the powers of Cobb-Douglas do not really matter to represent the preference. (\(U=Cx^a y^{1-a}\) the exponents of the utility function does not have to be sum to one).

$$U=x^a y^b \Leftrightarrow x^{\frac{a}{a+b}}y^{\frac{b}{a+b}}$$

Constant Elasticity of Substitution

CES could be either production or utility function. It provides a clear picture of how producers or consumers choose between different choices (elasticity of substitution).

CES Production

The two factor (capital, labour) CES production function was introduced by Solow and later made popular by Arrow.

$$Q=A\cdot(\alpha K^{-\rho}+(1-\alpha)L^{-\rho})^{-\frac{1}{\rho}}$$

  • \(\alpha\) measures the relative proportion spent across K and L.
  • \(\rho=\frac{\sigma-1}{\sigma}\) is the substitution parameter.
  • \(\sigma=\frac{1}{1-\rho}\) is the elasticity of substitution.

While identical producers maximise their profits and markets get competitive, Marginal Product of Labour and Marginal Product of Capital follow,

$$MP_L=\frac{\partial Q}{\partial L}=w$$

$$MP_K=\frac{\partial Q}{\partial K}=r$$

So we get,

$$ \frac{w}{r}=\frac{1-\alpha}{\alpha}(\frac{K}{L})^{\rho+1} $$

$$\frac{K}{L}=(\frac{\alpha}{1-\alpha}\frac{w}{r})^{\frac{1}{1+\rho}}$$

Here, we get the substitution of K and L is a function of the price, w & r. As we are studying the elasticity of substitution, in other words how W/L is affected by w/r, we take derivatives later. We denote \(V=K/L\), and \(Z=w/r\). Then,

$$V=(\frac{\alpha}{1-\alpha}Z)^{\frac{1}{1+\rho}}$$

The Elasticity of Substitution (the percentage change of K/L in terms of the percentage change of w/r) is,

$$ \sigma=\frac{dV/V}{dZ/Z}=\frac{dV}{dZ}\frac{Z}{V}=\frac{1}{1+\rho} $$

Therefore, we get the elasticity of substitution becomes constant, depending on \(\rho\). The interesting thing happens here.

  • If \(-1<\rho<0\), then \(\sigma>1\).
  • If \(0<\rho<\infty\), then \(\sigma<1\).
  • If \(\rho=0\), then, \(\sigma=1\).

Utility Function

Marginal Rate of Substitution (MRS) measures the substitution rate between two goods while holding the utility constant. The elasticity between X and Y could be defined as the following,

$$ Elasticity=\frac{\%\Delta Y}{\% \Delta X}=\frac{\Delta Y/Y}{\Delta X/X}=\frac{X/Y}{\Delta X/\Delta Y} $$

The elasticity of substitution here is defined as how easy is to substitute between inputs, x or y. In another word, the change in the ratio of the use of two goods w.r.t. the ratio of their marginal price. In the utility function case, we can apply the formula,

$$\sigma=\frac{\Delta ln(X/Y)}{\Delta ln(MRS_{X,Y})}=\frac{\Delta ln(X/Y)}{\Delta ln(U_x/U_y)}= \frac{\Delta ln(X/Y)}{\Delta ln(U_x/U_y)} $$

$$\sigma=\frac{\frac{\Delta(X/Y)}{X/Y}}{\frac{\Delta (p_x/p_y)}{p_x/p_y}}$$

  • \(U_x=\frac{\partial U}{\partial X}=p_x\)
  • \(MRS_{X,Y}=\frac{dy}{dx}=\frac{U_x}{U_y}=p_x/p_y\) marginal price in equilibrium.

In the

$$ u(x,y)=(a x^{\rho}+b y^{\rho})^{1/\rho} $$

$$\sigma=\frac{1}{1-\rho}$$

If \(\rho=1\), then \(\sigma\rightarrow \infty\).

If \(\rho\rightarrow -\infty\), then \(\rho=0\).

Two common choices of CES production function are (1) Walras-Leontief-Harrod-Domar function; and (2) Cobb-Douglas function (P.S. but CES is not perfect, coz sigma always equal one).

As \(\rho=1\), the utility function would be a perfect substitute.

As \(\rho=-1\), the utility function would be pretty similar to the Cobb-Douglas form.

Later, the CES utility function could be applied to calculate the Marshallian demand function and Indirect utility function, and so on. Also, easy to show that the indirect utility function \(U(p_x,p_y,w)\) is homogenous degree of 0.

Reference

Arrow, K.J., Chenery, H.B., Minhas, B.S. and Solow, R.M., 1961. Capital-labor substitution and economic efficiency. The review of Economics and Statistics43(3), pp.225-250.

Causal Inference in Statistics

Base: Correlation does not mean casualty.

  • If X and Y are statistically dependent, X does not necessarily cause Y (or Y cause X). 相关性不代表有因果性
  • If X causes Y, then X & Y are very likely to be statistically dependent (but not always, there is extreme condition). 但是因果性代表相关性

Study 1. V Structure:

  • Chain

$$ X\rightarrow Y\rightarrow Z $$

Z and X are likely dependent. However, Z and X are independent, conditional on Y.

$$ P(Z=z|X=x,Y=c)=P(Z=z|Y=c) $$i.e.

i.e.

\(f_x: X=u_x\)

\(f_y: Y=84-X+u_Y := c\)

\(f_z: Z=100\underbrace{Y}_{c}+u_z\)

Now, Z and X are independent.

Therefore, we know, in the Chain:

$$ X\equiv Z$$

$$ X\bot Z|Y $$

  • Folk

$$ Y\leftarrow X\rightarrow Z $$

Y and Z are likely dependent. However, Y and Z are independent conditional on X.

$$P(Z=z|Y=y, X=c)=P(Z=z|X=c)$$

While conditioning on intermediate node X, then Z and Y are independent.

$$Y\bot Z|X$$

  • Collider

$$ X\rightarrow Z\leftarrow Y $$

X and Y are independent. However, X and Y are dependent conditional on Z.

$$ P(X=x|Y=y, Z=c)\neq P(X=x|Z=c) $$

i.e.

If we know \(Z=X+Y+u_Z:=c\), then \( X=c-Y-u_Z\), and thus X and Y become dependent conditional on \(Z=c\). Otherwise, \(X=u_X\) and \(Y=u_Y\).

Once, conditioning on \(Z\), the way gets connected. Otherwise (unconditional), we get independent.

P.S. Descendent of Z:

$$ X (or\ Y)\rightarrow Z\rightarrow W $$

Similarly, we get in the Collider:

$$ X\bot Y $$

$$ X\equiv Y|Z $$

$$ X\equiv Y |W $$

  • See notes for further studies.

Reference

Pearl, J., Glymour, M. and Jewell, N.P., 2016. Causal inference in statistics: A primer. John Wiley & Sons.

Math Tools

1. Homogenous of Degree \(k\)

Definition (Homogeneity of degree \(k\)). A utility function \(u:\mathbb{r}^n\rightarrow \mathbb{R}\) is homogeneous of degree \(k\) if and only if for all \(x \in \mathbb{R}^n\) and all \(\lambda>0\), \(u(\lambda x)=\lambda^ku(x)\).

$$f(\lambda x_1,…,\lambda x_n)=\lambda^kf(x_1,…,x_n)$$

Property

  1. Constant Return to Scale: CRTS production function is homogenous of degree 1. IRTS is homogenous of degree \(k>1\). DRTS is homogenous of degree \(k<1\).
  2. The Marishallian demand is homogeneous of degree zero. \(x(\lambda p,\lambda w)=x(p,w)\). (Maximise \(u(x)\) s.t. \(px<w\). “No Money Illusion”.
  3. Excess demand is also homogeneous degree of zero. Easy to prove by the Marshallian Demand.

$$CRTS:\quad F(aK,aL)=aF(K,L) \quad a>0$$

$$IRTS:\quad F(aK,aL)>aF(K,L) \quad a>1$$

$$DRTS:\quad F(aK,aL)<aF(K,L) \quad a>1$$

2. Euler’s Theorem

Theorem (Euler’s Theorem) Let \(f(x_1,…,x_n)\) be a function that is homogeneous of degree k. Then,

$$ x_1\frac{\partial f(x)}{\partial x_1}+…+ x_n\frac{\partial f(x)}{\partial x_n} =kf(x) $$

or, in gradient notation,

$$ x\cdot \nabla f(x)=kf(x) $$

Proof: Differentiate \(f(tx_1,…,tx_n)=t^k f(x_1,…,x_n)\) w.r.t \(t\) and then set \(t=1\).

P.S. We use Euler’s Theorem in the proof of the Solow Model.

3. Envelop Theorem

Motivation:

Given \(y=ax^2+bx+c, a>0, b,c \in \mathbb{R}\), we need to know how does a change in the parameter \(a\) affect the maximum value of \(y\), \(y^*\)?

We first define \(y^*=\max_{x} y= \max_{x} ax^2+bx+c \). The solution is \(x^*=-\frac{b}{2a}\), and plug it back into \(y\), we get \(y^*=f(x^*)=\frac{4ac-b^2}{4a}\). Now, we take derivative w.r.t. \(a\). \(\frac{\partial y^*}{\partial a}=\frac{b^2}{4a^2}\). We would find that,

$$\frac{\partial y^*}{\partial a}= {\frac{\partial y}{\partial a}}|_{x=x^*} $$

A Simple Envelop Theorem

$$v(q)=\max_{x} f(x,q)$$

$$=f(x^*(q),q)$$

$$ \frac{d}{dq}v(q)=\underbrace{\frac{\partial}{\partial x}f(x^*(q),q)}_{=0\ by\ f.o.c.}\frac{\partial}{\partial q}x^*(q)+\frac{\partial}{\partial q} f(x^*(q),q) $$

$$ \frac{d}{dq}v(q) =\frac{\partial}{\partial q}f(x^*(q),q) $$

Think of the ET as an application of the chain rule and then F.O.C., our goal is to find how does parameter affect the already maximised function \(v(q)=f(x^*(q),q)\).

A formal expression

Theorem (Envelope Theorem). Consider a constrained optimisation problem \(v(\theta)=\max_x f(x,\theta)\) such that \(g_1(x,\theta)\geq0,…,g_K(x,\theta)\geq0\).

Comparative statics on the value function are given by: (\(v(\theta)=f(x,\theta)|_{x=x^*(\theta)}=f(x^*(\theta),\theta)\))

$$ \frac{\partial v}{\partial \theta_i}=\sum_{k=1}^{K}\lambda_k \frac{\partial g_k}{\partial \theta_i}|_{x^*}+{\frac{\partial f}{\partial \theta_i}}|_{x^*}=\frac{\partial \mathcal{L}}{\partial \theta_i}|_{x^*} $$

(for Lagrangian \(\mathcal{L}(x,\theta,\lambda)\equiv f(x,\theta)+\sum_{k}\lambda_k g_k(x,\theta)\)) for all \(\theta\) such that the set of binding constraints does not change in an open neighborhood.

Roughly, the derivative of the value function is the derivative of the Lagrangian w.r.t. parameters, \(\theta\), while argmax those unknows (\(x=x^*\)).

4. Hicksian and Marshallian demand + Shepherd’s Lemma

To be continued.

https://www.bilibili.com/video/BV1VJ411J7ZL?spm_id_from=333.999.0.0

5. KKT

6. Taylor Series

A Taylor series is a series expansion of a function about a point. A one-dimensional Taylor series is an expansion of a real function \(f(x)\) about point \(x=a\) is given by,

$$ f(x)=f(a)+f'(a)(x-a)+\frac{f”(a)}{2!}(x-a)^2\\+\frac{f^{(3)}(a)}{3!}(x-a)^3+…+\frac{f^{(n)}(a)}{n!}(x-a)^n+… $$

Taylor expansion is a way to approximate a functional curve around a certain point, by taking derivatives. We focus the function around this point.

For example, we approximate \(f(x)=x^3\) around \(x=2\).

$$ f(x)\approx f(2)+\frac{f'(2)}{1!}(x-2)+\frac{f”(2)}{2!}(x-2)^2\\+frac{f^{(2)}(1)}{3!}(x-2)^3+… $$

$$ f(x)\approx 8+\frac{12}{1}(x-2)+\frac{12}{2}(x-2)^2+\frac{6}{3\times2}(x-2)^3 $$

Simplifying it, we get \(f(x)\) around \(x=2\) is,

$$ f(x)=x^3 $$

That is a coincidence that the original function and the Taylor polynomial are exactly the same if \(f(x)=x^3\).

Another Example, we take first order Taylor approximation to \(f(k)=ln(k)\) at \(k^*\),

$$ ln(k)\approx ln(k^*)+\frac{1}{ln(k^*)}(k-k^*) $$

Thus, we know,

$$ ln(k)- ln(k^*)\approx\frac{k-k^*}{k^*} $$

New Study and Idea of Taylor Expension

Taylor Expansion aims to use polynomial to approximate a certain function.

For example, in order to describe the shape of function \(cos(x)\) at x=0, we would first construct a polynomial.

(P.S. We let \(c_0=1\) as we need to pin the polynomial equal to 1 at x=0.)

$$ P(x)=c_0+c_1x+c_2x^2 \ and\ at\ x=0\ P(0)=c_0 $$

, where those coefficients are free to change, and the magnitude of those coefficients would affect how the approcimated curve looks like.

To get a better approximation, we would adjust those coefficients. Thus, we consider using different orders of derivatives to simulate our target function.

We need the first order derivate of \(cos'(x)|_{x=0}=sin(x)|_{x=0}\) to be zero, so we set the first-order derivative of our polynomial function to equal to zero as well!

$$\frac{\partial P(x)}{\partial x}|_{x=0}=c_1 \times 1 |_{x=0}=c_1$$

Therefore, \(c_1\) must be zero.

Let’s go one more step. As the second derivative of \(cos^{(2)}(x)=-1\), we need the second derivative (, which is also the second derivate of the second-order term of our constructed polynomial function) of our polynomial function to be also -1.

$$\frac{\partial^2 P(x)}{\partial x^2}|_{x=0}=2\times c_2$$

We adjust that to be negative one, so \(c_2=-\frac{1}{2}\).

Therefore, we get,

$$cos(x)|_{x=0}\approx P(x)=c_0+c_1 x +c_2 x^2 = 1-\frac{1}{2}x^2$$

Great! If we need a more accurate approximation, then we keep on going to more derivates and calculate the coefficient of the higher-order term. However, I would do that, so I just simply add a term \(O(x^3)\) to represent there are other terms that are less equal than \(x^3\). (There are accurate descriptions that I will update in later posts).

Kalman Filter

Definition

In statistics and control theory, Kalman filtering, also known as linear quadratic estimation (LQE), is an algorithm that uses a series of measurements observed over time, including statistical noise and other inaccuracies, and produces estimates of unknown variables that tend to be more accurate than those based on a single measurement alone, by estimating a joint probability distribution over the variables for each timeframe. The filter is named after Rudolf E. Kálmán, who was one of the primary developers of its theory.

Wikipedia

During my study in Cambridge, Professor Oliver Linton introduced the Kalman Filter in Time Series analysis, but I did not get it at that time. So, here is a revisit.

My Thinking of Kalman Filter

Kalman Filter is an algorithm that estimates optimal results from uncertain observation (e.g. Time Series Data. We know only the sample, but never know the true distribution of data or never know the true value when there are no errors).

Consider the case, I need to know my weight, but the bodyweight scale cannot give me the true value. How can I know my true weight?

Assume the bodyweight scale gives me error of 2, and my own estimate gives me error of 1. Or in another word, a weight scale is 1/3 accurate, and my own estimation is 2/3 accurate. Then, the optimal weight should be,

$$ Optimal Result = \frac{1}{3}\times Measurement + \frac{2}{3}\times Estimate $$

, where \( Measurement\) means the measurement value, and \(Estimate\) means the estimated value. We conduct the following transformation.

$$ Optimal Result = \frac{1}{3}\times Measurement +Esimate- \frac{1}{3}\times Estimate $$

Optimal Result = Esimate+\frac{1}{3}\times Measurement – \frac{1}{3}\times Estimate

Optimal Result = Esimate+\frac{1}{3}\times (Measurement – Estimate)

Therefore, we can get

Optimal Result = Esimate+\frac{p}{p+r}\times (Measurement – Estimate)

, where \(p\) is the estimation error and \(r\) is the measurement error.

For example, if the estimation error is zero, then the fraction is equal to zero. Thus, the optimal result is just the estimate.

Applying Time Series Data

$$ Optimal Result_n=\frac{1}{n}\times (meas_1+meas_2+meas_3+…+meas_{n}) $$

Optimal Result_n=\frac{1}{n}\times (meas_1+meas_2+meas_3+…+meas_{n-1})\\ +\frac{1}{n}\times meas_n

Optimal Result_n=\frac{n-1}{n}\times \frac{1}{n-1}\times (meas_1+…+meas_{n-1})\\ +\frac{1}{n}\times meas_n

Iterating the first term because\( \frac{1}{n-1}\times (meas_1+…+meas_{n-1}) = Optimal Result_{n-1} \),

Optimal Result_n=\frac{n-1}{n}\times Optimal Result_{n-1}\\ +\frac{1}{n}\times meas_n

Optimal Result_n=Optimal Result_{n-1}\\ -\frac{1}{n}\times Optimal Result_{n-1} +\frac{1}{n}\times meas_n

OResult_n=OResult_{n-1}+\frac{1}{n}\times (meas_n-OResult_{n-1})

Kalman Filter Equation

$$ \hat{x}_{n,n}=\hat{x}_{n,n-1}+K_n(z_n-\hat{x}_{n,n-1}) $$

$$ K_n=\frac{p_{n,n-1}}{p_{n.n-1}+r_n} $$

, where \(p_{n,n-1}\) is Uncertainty in Estimate, \(r_n\) is Uncertainty in Measurement, \(\hat{x}_{n,n}\) is the Optimal Estimate at \(n\), and \(z_n\) is the Measurement Value at \(n\).

The Optimal Estimate is updated by the estimate uncertainty through a Covariance Update Equation,

$$ p_{n,n}=(1-K_n)p_{n,n-1} $$

In a more intuitive way (1),

$$ OEstimate_n=OEstimate_{n-1}+K_n (meas_n-OEstimate_{n-1})$$

$$ K_n=\frac{OEstimateError_{n-1}}{OEstimateError_{n-1}+MeausreError_n}$$

$$OEstimateError_{n-1}=(1-K_{n-1})\times OEstimateError_{n-2}$$

Example

numMeasMeasErrorKOEstimateOEstimateError
0755
18130.62578.751.875
28330.38461580.384621.153846
37930.277778800.833333
47830.21739179.565220.652174
58130.17857179.821430.535714
67930.15151579.696970.454545
78030.13157979.736840.394737
87830.11627979.534880.348837
98130.10416779.68750.3125
107930.0943479.622640.283019
118030.08620779.655170.258621
127830.07936579.523810.238095
138130.07352979.632350.220588
147930.06849379.589040.205479
158230.06410379.743590.192308

A Senior Study

Estimation Equation:

$$ \hat{x}_k^-=A\hat{x}_{k-1}+Bu_k $$

$$ P_k^-=AP_{k-1}A^T+Q$$

Update Equation (same as the one I just introduced in (1)):

$$K_k=\frac{P_k^- C^T}{CP_k^-C^T+R}$$

$$ \hat{x}_k^-=A\hat{x}_{k-1}+K_k(y_k-C\hat{x}_k^-) $$

$$ P_k=(1-K_kC)P_k^-$$

Intuitively, I need \( \hat{x}_{k-1}\) (, which is the weight last week) to calculate the optimal estimate weight this week \(\hat{x}_k\). Firstly, I estimate the weights this week \(\hat{x}_k^-\) and measure the weight this week \(y_k\). Then, combine them to get the optimal estimate weights this week.

Reading

The application of the Kalman Filter could be found in the following reading. Also, I will continue in my further study.

https://towardsdatascience.com/state-space-model-and-kalman-filter-for-time-series-prediction-basic-structural-dynamic-linear-2421d7b49fa6

Reference

https://www.kalmanfilter.net/kalman1d.html

https://www.bilibili.com/video/BV1aS4y197bT?share_source=copy_web