Least Squares Method – Intro to Kalman Filter

Consider a Linear Equation,

$$ y_i = \sum_{j=1}^n C_{i,j} x_j +v_i,\quad i=1,2,…$$

, where C_{i,j} are scalars and v_i\in \mathbb{R} is the measurement noise. The noise is unknown, while we assume it follows certain patterns (the assumptions are due to some statistical properties of the noise). We assume v_i, v_j are independent for i\neq j. Properties are mean of zero, and variance equals sigma squared.


$$\mathbb{E}(v_i^2) = \sigma_i^2$$

We can rewrite y_i = \sum_{j=1}^n C_{i,j} x_j +v_i as,

$$ \begin{pmatrix} y_1 \ y_2 \ \vdots\ y_s\end{pmatrix} = \begin{pmatrix} C_{11} & C_{12} & \cdots & C_{1n} \ C_{21} & C_{22}& \cdots & C_{2n} \ \vdots & \vdots & \cdots & \vdots \ C_{s1} & C_{s2} & \cdots & C_{sn}\end{pmatrix} \begin{pmatrix} x_1 \ x_2 \ \vdots\ x_n\end{pmatrix} + \begin{pmatrix} v_1 \ v_2 \ \vdots\ v_s\end{pmatrix} $$

, in a matrix form,

$$ \vec{y} = C \vec{x} + \vec{v} $$

, but I would write in a short form,

$$ y= C x +v$$

We solve for the least squared estimator from the optimisation problem, (there is a squared L2 norm)

$$ \min_x || y-Cx ||_2^2 $$

Recursive Least Squared Method

The classic least squared estimator might not work well when data evolving. So, there emerges a Recursive Least Squared Method to deal with the discrete-time instance. Let’s say, for a discrete-time instance k, y_k \in \mathbb{R}’ is within a set of measurements group follows,

$$y_k = C_k x + v_k$$

, where C_k \in \mathbb{R}^{l\times n}, and v_k \in \mathbb{R}^l is the measurement noise vector. We assume that the covariance of the measurement noise is given by,

$$ \mathbb{E}[v_k v_k^T] = R_k$$

, and


The recursive least squared method has the following form in this section,

$$\hat{x}k = \hat{x}{k-1} + K_k (y_k – C_k \hat{x}_{k-1})$$

, where \hat{x}k and \hat{x}{k-1} are the estimates of the vector x at the discrete-time instants k and k-1, and K_k \in \mathbb{R}^{n\times l} is the gain matrix that we need to determine. K_k is coined the ‘Gain Matrix’

The above equation updates the estimate of x at the time instant k on the basis of the estimate \hat{x}_{k-1} at the previous time instant k-1 and on the basis of the measurement y_k obtained at the time instant k, as well as on the basis of the gain matrix K_k computed at the time instant k.


$\hat{x}$ is the estimate.

$$ \hat{x}k = \begin{pmatrix} \hat{x}{1,k} \ \hat{x}{2,k} \ \vdots \\hat{x}{n,k} \end{pmatrix} $$

, which is corresponding with the true vector x.

$$x = \begin{pmatrix} x_1 \ x_2 \ \vdots \ x_n \end{pmatrix}$$

The estimation error, \epsilon_{i,k} = x_i – \hat{x}_{i,k} \quad i=1,2,…,n.

$$\epsilon_k = \begin{pmatrix} \epsilon_{1,k} \ \epsilon_{2,k} \ \vdots \\epsilon_{n,k} \end{pmatrix} = x – \hat{x}_k = \begin{pmatrix} x_1-\hat{x}_{1,k} \ x_2 – \hat{x}_{2,k} \ \vdots \x_n-\hat{x}_{n,k} \end{pmatrix} $$

The gain K_k is computed by minimising the sum of variances of the estimation errors,

$$ W_k = \mathbb{E}(\epsilon_{1,k}^2) + \mathbb{E}(\epsilon_{2,k}^2) + \cdots + \mathbb{E}(\epsilon_{n,k}^2) $$

Next, let’s show the cost function could be represented as follows, (tr(.) is the trace of a matrix)

$$ W_k = tr(P_k) $$

, and P_k is the estimation error covariance matrix defined by

$$ P_k = \mathbb{E}(\epsilon_k \epsilon_k^T )$$

Or, says,

$$ K_k = arg\min_{K_k} W_k = tr\bigg( \mathbb{E}(\epsilon_k \epsilon_k^T ) \bigg)$$

Why is that?

$$\epsilon_k \epsilon_k^T = \begin{pmatrix} \epsilon_{1,k} \ \epsilon_{2,k} \\vdots \ \epsilon_{n,k} \end{pmatrix} \begin{pmatrix} \epsilon_{1,k} & \epsilon_{2,k} & \cdots & \epsilon_{n,k} \end{pmatrix}$$

$$ = \begin{pmatrix} \epsilon_{1,k}^2 & \cdots & \epsilon_{1,k}\epsilon_{n,k} \ \vdots & \epsilon_{i,k}^2 & \vdots \ \epsilon_{1,k}\epsilon_{n,k} & \cdots & \epsilon_{n,k}^2\end{pmatrix} $$


$$ P_k = \mathbb{E}[\epsilon_k \epsilon_k^T] $$

$$tr(P_k) = \mathbb{E}(\epsilon_{1,k}^2) + \mathbb{E}(\epsilon_{2,k}^2) + \cdots + \mathbb{E}(\epsilon_{n,k}^2)$$


$$ K_k = arg\min_{K_k} W_k = tr\bigg( \mathbb{E}(\epsilon_k \epsilon_k^T ) \bigg) = tr(P_k)$$

Let’s derive the optimisation problem.

$$\epsilon_k = x-\hat{x}_k$$

$$ =x-\hat{x}{k-1} – K_k(y_k – C_k \hat{x}{k-1}) $$

$$ = x- \hat{x}{k-1} – K_k (C_k x + v_k – C_k \hat{x}{k-1}) $$

$$ = (I – K_k C_k)(x-\hat{x}_{k-1}) – K_k v_k $$

$$ =(I-K_k C_k )\epsilon_{k-1} – K_k v_k $$

Recall y_k = C_k x + v_k and \hat{x}k = \hat{x}{k-1} + K_k (y_k – C_k \hat{x}_{k-1})

So, \epsilon_k \epsilon_k^T would be,

$$\epsilon_k \epsilon_k^T = \bigg((I-K_k C_k )\epsilon_{k-1} – K_k v_k\bigg)\bigg((I-K_k C_k )\epsilon_{k-1} – K_k v_k\bigg)^T$$

$P_k = \mathbb{E}(\epsilon_k \epsilon_k^T)$, and $P_{k-1} = \mathbb{E}(\epsilon_{k-1} \epsilon_{k-1}^T)$.

$\mathbb{E}(\epsilon_{k-1} v_k^T) = \mathbb{E}(\epsilon_{k-1}) \mathbb{E}(v_k^T) =0$ by the white noise property of $\epsilon$ and $v$. However, $\mathbb{E}(v_k v_k^T) = R_k$. Substituting all those into $P_k$, we would get,

$$P_k = (I – K_k C_k)P_{k-1}(I – K_k C_k)^T + K_k R_k K_k^T$$

$$ P_k = P_{k-1} – P_{k-1} C_k^T K_k^T – K_k C_k P_{k-1} + K_k C_k P_{k-1}C_k^T K_k^T + K_k R_k K_k^T $$

$$W = tr(P_k)= tr(P_{k-1}) – tr(P_{k-1} C_k^T K_k^T) – tr(K_k C_k P_{k-1}) + tr(K_k C_k P_{k-1}C_k^T K_k^T) + tr(K_k R_k K_k^T) $$

We take F.O.C. to solve for K_k = arg\min_{K_k} W_k = tr\bigg( \mathbb{E}(\epsilon_k \epsilon_k^T ) \bigg) = tr(P_k), by letting \frac{\partial W_k}{\partial K_k} = 0. See the Matrix Cookbook and find how to do derivatives w.r.t. K_k.

$$\frac{\partial W_k}{\partial K_k} = -2P_{k-1} C_k^T + 2K_k C_k P_{k-1} C_k^T + 2K_k R_k = 0$$

We solve for K_k,

$$ K_k = P_{k-1} C_k^T (R_k + C_k P_{k-1} C_k^T)^{-1}$$

, we let L_k = R_k + C_k P_{k-1} C_k^T, and L_k has the following property L_k = L_k^T and L_k^{-1} = (L_k^{-1})^T

$$ K_k = P_{k-1} C_k^T L_k^{-1} $$

Plug K_k = P_{k-1} C_k^T K_k^{-1} back into P_k.

$$ P_k = P_{k-1} – K_kC_k P_{k-1} = (I-K_k C_k)P_{k-1} $$


In the end, the Recursive Least Squared Method could be summarised as the following three equations.

  • 1. Update the Gain Matrix.

$$ K_k = P_{k-1} C_k^T (R_k + C_k P_{k-1} C_k^T)^{-1}$$

  • 2. Update the Estimate.

$$\hat{x}_k = \hat{x}_{k-1} + K_k (y_k – C_k \hat{x}_{k-1})$$

  • 3. Propagation of the estimation error covariance matrix by using this equation.

(I-K_k C_k)P_{k-1}


Sigmoid & Logistic

Sigmoid function is largely used for the binary classification, in either machine learning algorithm or econometrics.

Why the Sigmoid Function shapes in this form?

Firstly, let’s introduce the odds.

Odds provide a measure of the likelihood of a particular outcome. They are calculated as the ratio of the number of outcomes that produce that outcome to the number that do not.

Odds also have a simple relation with probability: the odds of an outcome are the ratio of the probability that the outcome occurs to the probability that the outcome does not occur. In mathematical terms, p is the probability of the outcome, and 1-p is the probability of not occurring.

$$ odds = \frac{p}{1-p} $$

Odd and Probability

Let’s find some insights behind the probability and the odd. Probability links with the outcomes in that for each outcomes, the probability give its specific corresponding probability. Pr(Y), where Y is the outcome, and Pr(\cdot) is the probability density function that project outcomes to it’s prob.

What about the odds? Odds is more like a ratio that is calculated by the probability as the formula says.

Implication: Compared to the probability, odds provide more about how the binary classification is balanced or not, but the probability distribution.


Rolling a six-side die. The probability of rolling 6 is 1/6, but the odd is $1/5.


$$ odd = \frac{Pr(Y)}{1-Pr(Y)} $$

, where Y is the outcomes.


As the probability Pr(Y) is always between [0,1], the odds must be non-negative, odd \in [0,\infty]. We may want to apply a monotonic transformation to re-gauge that range of odds. We will apply on the logarithm.

$$ Sigmoid/Logistic := log(odds) =log\bigg( \frac{Pr(Y)}{1-Pr(Y)} \bigg) $$

We then get the Sigmoid function.

As the transformation we apply on is monotonic, the Sigmoid function remains the similar properties as the odd. The Sigmoid function keeps the similar implication, representing the balance of the binary outcomes.

Then, we bridge Y = f(X), the outcome Y is a function of events X. Here, we assume a linear form as Y = X\beta. The sigmoid function would then become a function of X.

$$g(X) = log\bigg( \frac{Pr(X\beta)}{1-Pr(X\beta)} \bigg) $$

$$ e^g = \frac{p}{1-p} $$

$$ p = \frac{e^g}{e^g+1}=\frac{1}{1+e^{-g}}$$

$$ p = \frac{1}{1+e^{-X\beta}}$$

We finally get out logistic sigmoid function as above.

Dirac Delta Function

The Dirac Delta Function could be applied to simplify the differential equation. There are three main properties of Dirac Delta Function.

$$\delta (x-x’) =\lim_{\tau\to0}\delta (x-x’)$$

such that,

$$ \delta (x-x’) = \begin{cases} \infty & x= x’ \ 0 & x\neq x’ \end{cases} $$

$$\int_{-\infty}^{\infty} \delta (x-x’)\ dx =1$$

Three Properties:

  • Property 1:

$$\delta(x-x’)=0 \quad \quad ,x\neq x’ $$

  • Property 2:

$$ \int_{x’-\epsilon}^{x’+\epsilon} \delta (x-x’)dx =1\quad \quad ,\epsilon >0 $$

  • Property 3:

$$\int_{x’-\epsilon}^{x’+\epsilon} f(x)\ \delta (x-x’)dx = f(x’)$$

At x=x’ the Dirac Delta function is sometimes thought of has having an “infinite” value. So, the Dirac Delta function is a function that is zero everywhere except one point and at that point it can be thought of as either undefined or as having an “infinite” value.

Girsanov’s Theorem


We can change the probability measure, and then make a random variable follows a certain probability measure.

  • Radon-Nikodym Derivative:

$$Z(\omega) = \frac{\tilde{P}(\omega)}{P(\omega)}$$

  • $\tilde{P}(\omega)$ is the risk-neutral probability measure.
  • ${P}(\omega)$ is the actual probability measure.
  • Properties:
    • $Z(\omega)>0$
    • $\mathbb{E}(Z)=1$
    • As \tilde{P}(\omega) = Z(\omega) P(\omega), so if Z(\omega), then \tilde{P}(\omega)>P(\omega). vice versa.

We can calculate that,

$$ \underbrace{\tilde{\mathbb{E}}(X)}_{\text{Expectation under Risk-neutral Probability Measure}} = \underbrace{\mathbb{E}(ZX)}_{\text{Expectation under Actual Probability Measure}} $$

Proof & Example

Under (\Omega,\mathcal{F},P), A\in \mathcal{F}, let X be a random variable X\sim N(0,1). \mathbb{E}(X)=0, and \mathbb{Var}(X)=1.

$Y=X+\theta$, $\mathbb{E}(Y)=\theta$, and $\mathbb{Var}(Y)=1$.

$X$ here is s.d. normal under the actual probability measure.

However, Y here is not standard normal under the current probability P(.), because \mathbb{E}(Y)\neq0.

What do we do?

We change the probability measure from P(.)\to\tilde{P}(.) to let Y be standard normal under the new probability measure!

We set the Radon-Nikodym Derivative,

$$Z(\omega) = exp\{ -\theta\ X(\omega) – \frac{1}{2}\theta^2 \}$$

Now, we can create the probability measure \tilde{P}(A), A={ \omega;Y(\omega)\leq b) }

$$\tilde{P}(A) = \int_A Z(\omega)\ dP(\omega)$$

such that Y=X+\theta would be standard normal distributed under the new probability measure \tilde{P}(A).

$$\tilde{P}(A) = \tilde{P}(Y(\omega \leq b)$$

$$ = \int_{{ Y(\omega)\leq b } } exp{ -\theta\ X(\omega) – \frac{1}{2}\theta^2 } \ dP(\omega)$$

, then change the integral range from the set A to \Omega by multiplying that indicator.

$$ = \int_{\Omega }\mathbb{1}_{ Y(\omega)\leq b }\ exp{ -\theta\ X(\omega) – \frac{1}{2}\theta^2 } \ dP(\omega)$$

, change from dP to dX,

$$ = \int_{-\infty }^{\infty }\mathbb{1}_{ b-\theta}\ exp{ -\theta\ X(\omega) – \frac{1}{2}\theta^2 } \ \frac{1}{\sqrt{2\pi}}e^{-\frac{1}{2}X^2(\omega)} \ dX(\omega)$$

$$ =\frac{1}{\sqrt{2\pi}} \int_{-\infty }^{b-\theta}\ exp{ -\theta\ X(\omega) – \frac{1}{2}\theta^2- \frac{1}{2}X^2(\omega)} \ dX(\omega)$$

$$ =\frac{1}{\sqrt{2\pi}} \int_{-\infty }^{b-\theta}\ exp\Bigg\{ -\frac{1}{2}\bigg(\theta+ X(\omega)\bigg)^2\Bigg\} \ dX(\omega)$$

, as Y=X+\theta, dY = dX, we now change dX to dY,

$$ =\frac{1}{\sqrt{2\pi}} \int_{-\infty }^{b}\ exp\big\{ -\frac{1}{2}Y(\omega)^2\big\} \ dY(\omega)$$

, the above is now a standard normal distribution for Y(\omega).

Value at Risk & Expected Shortfalls

Value at Risk – VaR

VaR is a probability statement about the potential change in the value of a portfolio.


$$Porb(x\leq VaR(X))= 1-c$$

$$ Prob\bigg(z \leq \frac{VaR(X)-\mu}{\sigma}\bigg)=1-c $$

  • $c$ – confidence interval, i.e. $c=99\%$. Then $1-c = 1\% $
  • $\mu$ and $\sigma$ are for $X$.
    • For Example, if X is yearly return, then \mu_{252days}=252\cdot\mu_{1day}, and \sigma_{252days}=\sqrt{252}\cdot\sigma_{1day}
  • $x$ here is the return. So, $c$ is the confidence interval, i.e. 99%.
    • VaR focus on the tail risks. If x stands for return, then tail risk is on the left tail, z_{1-c}.
  • If x is the loss, the tail risk is on the right tail. z_c

$$VaR(X) = \mu + \sigma\cdot \Phi^{-1}(1-c)$$

$$VaR(X) = \mu + \sigma\cdot z_{1-c}$$

  • I.E.

​ If c=99\%, then 1-c=1\%, so z_{1-c}=z_{0.01} \approx -2.33

VaR(X) = \mu – 2.33\cdot \sigma


​ The unit of VaR is the amount of loss, so it should be monetary amount. For example, if the total amount of portfolio is USD 1 million, then VaR = \$1m \cdot (\mu – 2.33\cdot \sigma).

Loss Distribution

Remember X is a distribution of loss. If we know the distribution of Portfolio Return R, R\sim N(\mu, \sigma^2), then what is the dist for X?

$$X \sim N(-\mu, \sigma^2)$$

Right! Loss is just the negative return. Also, the volatility would not be affected by plus / minus.

Expected Shortfall (ES)

Expected Shortfall states the Expected Loss during time T conditional on the loss being greater than the c^{th} percentile of the loss distribution.


$$ ES_c (X) = \mathbb{E}\bigg[ X|X\leq VAR_c(X) \bigg] $$

  • Be attention here, X is a r.v., and x stands for return here! while the only variable in the ES_c(X) is c, the confidence level, instead of X.
  • $c$ is the confidence level, i.e. $c$ = 99%.
  • If x stands for return, then the VaR is the left-tail, z_{1-c}.

$$ ES_c (X) = \mathbb{E}\bigg[ X|X\geq VAR_c(X) \bigg] $$

  • If x stands for loss (, which is the negative of return ), then the VaR is the right-tail, z_{c}.


Notation Form

Consider, x is the return, then ES_c (X) = \mathbb{E}\bigg[ X|X\leq VAR_c(X) \bigg], and VaR_c(x)= \mu + z_{1-c}\sigma, where c is the confidence level c=99\% for example.

$$ES_c(X) = \frac{\int_{-\infty}^{VaR} xf(x)dx }{\int_{-\infty}^{VaR} f(x)dx } = \frac{\int_{-\infty}^{VaR} x \phi(x)dx }{\int_{-\infty}^{VaR} \phi(x)dx } =\frac{\int_{-\infty}^{VaR} x \phi(x)dx }{ \Phi(VaR) – \Phi(-\infty)} $$

$$= \frac{1}{ \Phi(VaR) – \Phi(-\infty) }\int_{-\infty}^{VaR}x \frac{1}{\sqrt{2\pi \sigma^2}} e^{-\frac{(x-\mu)^2}{2\sigma^2}} dx $$

Replace z = \frac{x-\mu}{\sigma}, then x = \mu + z \sigma, and dx = \sigma dz

$$ = \frac{1}{\Phi(VaR)} \int_{-\infty}^{VaR}(\mu + z\sigma) \frac{1}{\sqrt{2\pi \sigma^2}} e^{-\frac{z^2}{2}}\sigma dz $$

$$ = \frac{1}{\Phi(VaR)}\mu \int_{-\infty}^{VaR}\frac{1}{\sqrt{2\pi }} e^{-\frac{z^2}{2}} dz + \sigma^2\int_{-\infty}^{VaR} z \frac{1}{\sqrt{2\pi \sigma^2}} e^{-\frac{z^2}{2}} dz $$

$$ = \frac{1}{\Phi(VaR)}\mu \Phi(VaR) – \frac{\sigma^2}{\Phi(VaR)}\int_{-\infty}^{VaR} \frac{1}{\sqrt{2\pi \sigma^2}} e^{-\frac{z^2}{2}} d(-\frac{z^2}{2}) $$

$$ = \mu – \frac{\sigma^2}{\Phi(VaR)} \frac{1}{\sqrt{2\pi \sigma^2}} \int_{-\infty}^{VaR} e^{-\frac{z^2}{2}} d(-\frac{z^2}{2}) $$

$$ = \mu – \frac{\sigma}{\Phi(VaR)} \frac{1}{\sqrt{2\pi }} e^{-\frac{z^2}{2}} |_{-\infty}^{VaR} $$

$$ = \mu – \frac{\sigma}{\Phi(VaR)} \frac{1}{\sqrt{2\pi }} e^{-\frac{VaR^2}{2}}= \mu – \frac{\sigma}{\Phi(VaR)} \phi(VaR)$$

Recall, VaR_c(x)= \mu + z_{1-c}\sigma, so \phi(VaR_c(x))= \phi(\mu + z_{1-c}\sigma) \leftrightarrow \phi(z_{1-c}) = \phi\bigg( \Phi^{-1}(1-c) \bigg), and \Phi(VaR_c(x))= \Phi(\mu + z_{1-c}\sigma) \leftrightarrow \phi(z_{1-c}) = \Phi\bigg( \Phi^{-1}(1-c) \bigg) = 1-c.


$$ ES_c(X) =\mu – \frac{\sigma}{\Phi(VaR)} \phi(VaR)=\mu -\sigma \frac{\phi\big( \Phi^{-1}(1-c) \big)}{1-c}$$

VaR Form

we ‘sum up’ (integrate) the VaR from c to 1, conditional on 1-c.

$$ES_c(X) = \frac{1}{1-c} \int_c^1 VaR_u(X)du$$

$$ ES_c(X) = \frac{1}{1-c} \int_c^1 \bigg( \mu + \sigma\cdot \Phi^{-1}(1-u) \bigg) du $$

$$ =\mu + \frac{\sigma}{1-c} \int^1_c \Phi^{-1}(1-u) du $$

We let u = \Phi(Z), where Z \sim N(0,1). Then,

  • $du =d(\Phi(z)) =\phi(z) dz$.
  • $u\in (c,1)$, so $z = \Phi^{-1}(u)\in (z_c \ , \infty)$


$$ ES_c(X) =\mu + \frac{\sigma}{1-c} \int^{\infty}_{z_c} \Phi^{-1}\big(1-\Phi(z)\big)\phi(z) dz $$

As 1-\Phi(z) = \Phi(-z)

$$ ES_c(X) =\mu + \frac{\sigma}{1-c} \int^{\infty}_{z_c} \Phi^{-1}(\Phi(-z))\phi(z) dz = \mu – \frac{\sigma}{1-c} \int^{\infty}_{z_c} z\phi(z) dz $$

$ \int_{z_c}^{\infty} z \phi(z)dz = \int_{z_c}^{\infty} z \frac{1}{\sqrt{2\pi}}e^{-\frac{z^2}{2}}dz = -\frac{1}{\sqrt{2\pi}} \int_{z_c}^{\infty} -e^{\frac{z^2}{2}}d(e^{-\frac{z^2}{2}})$

$=\frac{1}{\sqrt{2\pi}}e^{-\frac{z_c^2}{2}}=\phi(z_c)=\phi\big(\Phi^{-1}(c)\big)$, bring it back to $ES_c(X)$

$$ES_c(X) = \mu – \sigma\frac{ \phi\big(\Phi^{-1}(c)\big)}{1-c}$$

Morden Portfolio Theory

  • $x$ – vector weights
  • $R$ – vector of all assets’ returns
  • $\mu = \mathbb{E}(R)$ – mean return of all assets
  • $\Sigma = \mathbb{E}\bigg[ (R-\mu)(R-\mu)^T \bigg]$ – var-cov matrix of all assets


  • $\mu_x = x^T \mu$ – becomes a scalar now
  • $\sigma^2 = x^T \Sigma x$ – collapse to be a scalar


  • Maximise Expected Return s.t. volatility constraint.

$$ \max_{x} \mu_x \quad s.t. \quad \sigma_x \leq \sigma^* $$

  • Minimise Volatility s.t. return constraint.

$$ \min_{x} \sigma_x \quad s.t. \quad \mu_x \geq \mu^* $$

Portfolio Risk Measures

By definition, the loss of a portfolio is the negative of return, L(x) = -R(x).

The Loss distribution becomes the same normal distribution with x-axis reversed.

  • Volatility of Loss: \sigma(L(x)) = \sigma_x, the minus does not matter in the s.d.
  • Standard Deviation-based risk measure: =\mathbb{E}(L(x)) + cz_{c}\sigma(L(x)), x-axis is revered, so z_{1-c} for return becomes z_c for loss.
  • VaR: VaR_{\alpha}(x)=inf\bigg{ \mathscr{l}:Prob\big[ L(x)\leq \mathscr{l} \geq\big] \alpha \bigg}
  • Expected Shortfall: ES_{\alpha}(x) = \frac{1}{1-\alpha} \int_{\alpha}^1 VaR_u(x) du. In other form, ES_{\alpha}(x)=\mathbb{E}\bigg( L(x)| L(x)\geq VaR_{\alpha}(x) \bigg)

As R \sim N(\mu, \Sigma),

  • for our portfolio with weights x, mean = \mu, and \sigma_x = \sqrt{x^T \Sigma x}.
  • for the loss, mean = -\mu, and \sigma_x = \sqrt{x^T \Sigma x}.

Taylor Series and Transition Density Functions

See my Github repo for full details.


1. Trinomial Random Walk

2. Transition Probability Density Function

The transition probability density function, p(y,t;y’,t’), is defined by,

$$ Prob(a<y'<b, at\ time \ t’ | y \ at \ time\ t) = \int_a^b p(yet;y’,t’)dy’$$

In words this is “the probability that the random variable y ′ lies between a and b at time t ′ in the future, given that it started out with value y at time t.”

Think of y and t as being current values with y ′ and t ′ being future values. The transition probability density function can be used to answer the question,

“What is the probability of the variable y ′ being in a specified range at time t ′ in the future given that it started out with value y at time t?”

Our Goal is to find the transition probability p.d.f., and so we find the relationship between p(y,t;y’,t’), and p(y,t;y’,t’-\delta t),

3. From the Trinomial model to the Transition Probability Density function

The variable y can either rise, fall or take the same value after a time step δt. These movements have certain probabilities associated with them.

We are going to assume that the probability of a rise and a fall are both the same, \alpha<\frac{1}{2} . (But, of course, this can be generalized. Why would we want to generalize this?)

3.1 The Forward Equation

Given {y,t}, or says {y,t} the current and previous. {y’,t’} are variate in the future time.

The probability of being at y’ at time t’ is related to the probabilities of being at the previous three values and moving in the right direction:

$$ p(y,t;y’,t’) = \alpha \ p(y,t;y’+\delta y,t’-\delta t) + \ (1-2\alpha) \ p(y,t;y’,t’-\delta t) + \alpha \ p(y,t;y’-\delta y,t’-\delta t) $$

Given {y,t}, we find relationship between {y’,t’} and {y’\pm \delta y,t’-\delta t} that is y’ and t’ a bit time previously.

Remember, our goal is to find a solution of p(.), we try to solve the above equation.

3.2 Taylor Series Expansion

We expand each term of the equation.

$$ p(y,t;y’,t’) = \alpha \ p(y,t;y’+\delta y,t’-\delta t) + \ (1-2\alpha) \ p(y,t;y’,t’-\delta t) + \alpha \ p(y,t;y’-\delta y,t’-\delta t) $$

Why we do that? Because there are too many variables in it, hard to solve it. We have to reduce the dimension.

$$ p(y,t;y’+\delta y,t’-\delta t)\approx \ p(y,t;y’,t) – \delta t \frac{\partial p}{\partial t’} +\delta y \frac{\partial p}{\partial y’} + \frac{1}{2}\delta y^2 \frac{\partial^2 p}{\partial y’^2} + O(\frac{\partial^2 p}{\partial t’^2}) $$

$$ p(y,t;y’,t’-\delta t)\approx \ p(y,t;y’,t) – \delta t \frac{\partial p}{\partial t’} + O(\frac{\partial^2 p}{\partial t’^2}) $$

$$ p(y,t;y’-\delta y,t’-\delta t)\approx \ p(y,t;y’,t) – \delta t \frac{\partial p}{\partial t’} -\delta y \frac{\partial p}{\partial y’} + \frac{1}{2}\delta y^2 \frac{\partial^2 p}{\partial y’^2} + O(\frac{\partial^2 p}{\partial t’^2}) $$

Plug them back into that equation, and after cancel out terms repeated we would left with,

$$ \frac{\partial p}{\partial t’} =\alpha \frac{\delta y^2}{\delta t} \frac{\partial^2 p}{\partial y’^2} + O(\frac{\partial^2 p}{\partial t’^2})$$

We drop those derivative terms with order greater and equal than O(\frac{\partial^2 p}{\partial t’^2}).

$$ \frac{\partial p}{\partial t’} =\alpha \frac{\delta y^2}{\delta t} \frac{\partial^2 p}{\partial y’^2} $$

In the RHS, we focus on \alpha \frac{\delta y^2}{\delta t}, firstly. The denominator and numerator have to be in the same order to make that term definite. Or, say \delta y \sim O(\sqrt{\delta t}).

We thus let c^2 = \alpha \frac{\delta y^2}{\delta t}

$$ \frac{\partial p}{\partial t’} =c^2 \frac{\partial^2 p}{\partial y’^2} $$

The above equation is also named Fokker–Planck or forward Kolmogorov equation.

Now, we have a partial differential equation. Solve it, we can get the form of p.

3.3 Backward Equation works similar.

$$ p(y,t;y’,t’) = \alpha \ p(y+\delta y,t+\delta t;y’,t’) + \ (1-2\alpha) \ p(y,t+\delta t;y’,t’) + \alpha \ p(y-\delta y,t+\delta t;y’,t’) $$

the dimension-reduced result is the blow, and it is called the backward Kolmogorov equation.

$$ \frac{\partial p}{\partial t’} + c^2 \frac{\partial^2 p}{\partial y’^2} =0 $$

4. Solve the Forward Kolmogorov Equation

We will solve for p right now! However, we will solve it by assuming similarity solution.

$$ \frac{\partial p}{\partial t’} =c^2 \frac{\partial^2 p}{\partial y’^2} $$

This equation has an infinite number of solutions. It has different solutions for different initial conditions and different boundary conditions. We need only a special solution here. The detailed process of finding that solution is showing as the following,

4. 1 Assume a Solution Form

$$ p=t’^a f(\frac{y’}{t’^b}) = t’^a f(\xi)$$

$$ \xi = \frac{y’}{t’^b}$$

, where a, and b are indefinite variables.

Again, don’t ask why it is in this form, because it is a special solution!

4.2 Derivation

$$\frac{\partial p}{\partial y’}=t’^{a-b}\frac{df}{d\xi}$$

$$\frac{\partial^2 p}{\partial y’^2}=t’^{a-2b}\frac{d^2f}{d\xi^2}$$

$$\frac{\partial p}{\partial t’}=at’^{a-1}f(\xi)+by’t’^{a-b-1}\frac{df}{d\xi}$$

Substitue back into the forward Kolmogorov equation (remember y’ = t’^b \xi), we get,

$$ af(\xi) – b\xi \frac{df}{d\xi} = c^2 t’^{-2b+1} \frac{d^2f}{d\xi^2}$$

4.3 Choose b

As we need the RHS to be independent of t’, we could choose the value of b=\frac{1}{2}, to let the t’ has a power of 0. Why we do that? Because we aim to reduce the partial differential equation to be a ordinary differential equation, in which the only variable is \xi, and t’ disappear.

By assuming the special form of p, and letting b= 1/2, our forward Kolmogorov becomes,

$$ af(\xi) – \frac{1}{2}\xi \frac{df}{d\xi} = c^2 \frac{d^2f}{d\xi^2}$$

$$ p=t’^a f(\frac{y’}{\sqrt{t’}}) = t’^a f(\xi)$$

$$ \xi = \frac{y’}{\sqrt{t’}}$$

4.4 Choose a

$$p=t’^a f(\frac{y’}{\sqrt{t’}}) $$

We know that p is the transition p.d.f., its integral must be equal to ‘1’. t’ is independent by the definition of random walk behaviour, so we do only integrate p, w.r.t. y’.

$$\int_{\mathbb{R}}p\ dy’ = \int_{\mathbb{R}} t’^a f(\frac{y’}{\sqrt{t’}})\ dy’ = 1$$

$$ \int_{\mathbb{R}} t’^a f(\frac{y’}{\sqrt{t’}})\ dy’ = 1 $$

, by replace x = \frac{y’}{\sqrt{t’}},

$$ \int_{\mathbb{R}} t’^{a+1/2} f(x)\ dx =t’^{a+1/2} \int_{\mathbb{R}} f(x)\ dx= 1 $$

$t’$ is independent, so the above equation would be equal to ‘1’ regardless the power of $t’$. Thus, $a = -\frac{1}{2}$ for sure.

Also, we get \int_{\mathbb{R}} f(x)\ dx= 1.

4.5 Integrate! Solve it!

By assuming the special form of p, and letting a=-1/2, b=1/2, we get,

$$ -\frac{1}{2}f(\xi) – \frac{1}{2}\xi \frac{df}{d\xi} = c^2 \frac{d^2f}{d\xi^2}$$

$$ p=\frac{1}{\sqrt{t’}} f(\frac{y’}{\sqrt{t’}}) = \frac{1}{\sqrt{t’}}f(\xi)$$

$$ \xi = \frac{y’}{\sqrt{t’}}$$

The forward Kolmogorov equation becomes,

$$ -\frac{1}{2}\bigg(f(\xi) – \xi \frac{df}{d\xi} \bigg)= c^2 \frac{d^2f}{d\xi^2}$$

$$ -\frac{1}{2}\bigg( \frac{d \xi f(\xi)}{d \xi} \bigg)= c^2 \frac{d^2f}{d\xi^2}$$

, as f(\xi) – \xi \frac{df}{d\xi} = \frac{d \xi f(\xi)}{d \xi}.

Integrate 1st Time

$$ -\frac{1}{2}\xi f(\xi)= c^2 \frac{df}{d\xi} + constant$$

There’s an arbitrary constant of integration that could go in here but for the answer we want this is zero. We need only a special solution, so we can set that arbitrary constant term be zero.

So, the eq could be rewritten as,

$$ -\frac{1}{2c^2}\xi d\xi = \frac{1}{f(\xi)}df $$

Integrate 2nd Time

$$ ln\ f(\xi) = -\frac{\xi^2}{4c^2} + C$$

Take exponential, f(\xi) = e^C e^{-\frac{\xi^2}{4c^2}} = A e^{-\frac{\xi^2}{4c^2}} .

Find A

The Last Step here is to find the exact value of A. A is chosen such that the integral of f is one.

$$\int_{\mathbb{R}}f(\xi)\ d\xi =1$$

$$ \int_{\mathbb{R}}A e^{-\frac{\xi^2}{4c^2}} \ d\xi = 2cA\int_{\mathbb{R}} e^{-\frac{\xi^2}{4c^2}} \ d\big(\frac{\xi}{2c}\big) =1 $$

$$ 2cA \sqrt{\pi} = 1 $$

, so we get A = \frac{1}{2c\sqrt{\pi}}

Plug f(\xi), a, b, A back into p = t^a f(\xi).

$$ p(y’)=\frac{1}{2c\sqrt{\pi \ t’}}e^{-\frac{\xi^2}{4c^2}} =\frac{1}{2c\sqrt{\pi \ t’}}e^{-\frac{y’^2}{4c^2t’}} $$

$p(.)$ now is normal like distributed.

$$N(x) = \frac{1}{\sqrt{2\pi\sigma^2}}e^{-\frac{(x-\mu)^2}{2\sigma^2}}$$

So, we may say \mu_{y’}=0, and \sigma^2_{y’}=2c^2t’. Or, y’ \sim N(0, 2c^2t’).

5. Summary

$$p(y’)=\frac{1}{2c\sqrt{\pi \ t’}}e^{-\frac{\xi^2}{4c^2}} =\frac{1}{2c\sqrt{\pi \ t’}}e^{-\frac{y’^2}{4c^2t’}} $$

Finally, we solved the transition probability density function p(.). By assuming the forward or backward type of trinomial model, we find a partial differential relationship. Then, assuming a special form of p(.) by similarity method, we solve it.

The meaning is that p(y’)=\frac{1}{2c\sqrt{\pi \ t’}}e^{-\frac{\xi^2}{4c^2}} =\frac{1}{2c\sqrt{\pi \ t’}}e^{-\frac{y’^2}{4c^2t’}} is one of the transition probability density function that can satisfy the trinomial random walk.

Also, we find that p(.) is normally liked distributed.

Black and Scholes, 1973

See A bit Stochastic Calculus .


$$ dS_t = \mu S_t \ dt +\sigma S_t \ dW_t $$

  • In calculating d f(S_t), we would get, (by Taylor Expansion)

$$ df(S_t) = \bigg( \frac{\partial f}{\partial t} + \frac{\partial f}{\partial S_t}\mu S_t +\frac{1}{2}\frac{\partial^2 f}{\partial S_t^2}\sigma^2 S_t^2 \bigg)dt + \frac{\partial f}{\partial S_t}\sigma S_t \ dW_t $$

  • A Special Form of f(\cdot) is f(S) = log(S),

$$ d\ log(S_t) = \bigg( \mu – \frac{1}{2}\sigma^2 \bigg)dt + \sigma \ dW_t $$

We get Y_t = log S_t is the price of a derivative security with respect to S_t and t and then,

$$ dY_t= \bigg( \frac{\partial Y_t}{\partial t} + \frac{\partial Y_t}{\partial S_t}\mu S_t +\frac{1}{2}\frac{\partial^2 Y_t}{\partial S_t^2}\sigma^2 S_t^2 \bigg)dt + \frac{\partial Y_t}{\partial S_t}\sigma S_t \ dW_t $$

Consider a portfolio \Pi is constructed with (1) short one derivative, and (2) long some fraction of stocks, \Delta, such that the portfolio is risk natural. (\Delta = \frac{\partial Y}{\partial S})

$$ \Pi_t = -Y +\Delta \ S_t $$

Differentiate it,

$$ d\Pi_t = -dY_t +\frac{dY}{dS}dS_t $$

Subtitute dY_t and dS_t into the above equation, we would then get the stochastic process of portfolio, by Ito’s Lemma.

$$ d\Pi_t =-\bigg( \frac{\partial Y}{\partial t} + \frac{1}{2}\frac{\partial^2 Y}{\partial S^2} \sigma^2 S_t^2 \bigg) dt $$

The diffusion term dW_t disappears, and that means the portfolio is riskless during the interval dt. Under a no arbitrage assumption, this portfolio can only earn the riskless return, r.

$$ d\Pi_t =r\Pi_t \ dt $$

  • Subtitute d\Pi_t and \Pi_t into, we would get the Partial Differential Equation (PDE) / Black-Scholes equation,

$$ – \bigg( \frac{\partial Y}{\partial t} + \frac{1}{2}\frac{\partial^2 Y}{\partial S^2} \sigma^2 S_t^2 \bigg) dt = r\bigg(- Y_t + \frac{\partial Y}{\partial S}S_t \bigg)dt $$

$$ rY_t = \frac{\partial Y}{\partial t} + \frac{1}{2}\frac{\partial^2 Y}{\partial S^2} \sigma^2 S_t^2 + \frac{\partial Y}{\partial S}S_t $$

Then, we guess (where U(.) is a function of S_t at time t=T),

$$ Y_t = e^{-r(T-t)} U(S_T) $$

For a European call with strike price, K, U(S_T) would be the payoff at maturity,

$$ U(S_T) = max( S – K , 0 ) $$

Finally, through a series of process to find a specific solution of the PDE, we can solve the value of call, (\Phi(\cdot) is the cumulative standard normal distribtion)

$$ c = S_t\Phi(d_1) – K e^{-r (T-t)}\Phi(d_2) $$


$$ d_1 =\frac{ log(S_t/K) + (r-\frac{1}{2}\sigma^2)(T-t)}{\sigma \sqrt{T-t}} $$

$$ d_1 = \sigma \sqrt{T-t} $$

Volatility Forecast – Ornstein Uhlenbeck Process

See notes and the realisation.

We assume the volatility of returns follows a stochastic process. Define it as the Ornstein-Uhlenbeck process,

$$dX_t = \kappa (\theta – X_t)dt +\sigma \ dW_t$$

What’s the distribution function of the Ornstein-Uhlenbeck process?

We could apply the MLE estimation to find the parameter of the above random process.

Recall our Vasicek Form Ornstein-Uhlenbeck process is like the following:

$$ dX_t = \kappa (\theta – X_t)dt +\sigma \ dW_t $$

Multiply both sides by e^{kt}, then we get,

$$ e^{kt}dX_t = \kappa e^{kt} \theta \ dt – \kappa e^{kt} X_t \ dt +\sigma e^{kt}\ dW_t $$

We know that d( e^{kt} X_t )=e^{kt}dX_t + k e^{kt}X_t dt, and substitute it inside. Then, we get,

$$ d(e^{kt}X_t)=e^{kt}\theta \ dt + e^{kt}\sigma \ dW_t $$

Take an integral from [0,T],

$$ \int_0^T d(e^{kt}X_t)= \int_0^T e^{kt}\theta \ dt + \int_0^T e^{kt}\sigma \ dW_t $$

$$ X_T = X_0 e^{-kT} +\theta (1-e^{-kT}) + \int_0^T e^{-k(T-t)}\sigma \ dW_t $$

$\int_0^T e^{-k(T-t)}\sigma \ dW_t \sim N(0, \sigma^2\int_0^T e^{-2k(T-t)}dt)$

We then find \mathbb{E}(X_T) and Var(X_T).

$ \mathbb{E}(X_T) =\mathbb{E}\bigg( X_0 e^{-kT} +\theta (1-e^{-kT}) + \int_0^T e^{-k(T-t)}\sigma \ dW_t \bigg)$

$ \mathbb{E}(X_T)= X_0 e^{-kT} +\theta (1-e^{-kT}) $

$Var(X_T) = \mathbb{E}\bigg( \big( X_T – \mathbb{E}(X_T) \big)^2 \bigg)$

$Var(X_T)= \mathbb{E}\bigg( \big( \int_0^T e^{-k(T-t)}\sigma \ dW_t \big)^2 \bigg) $

By Ito’s Isometry: I(t)=\int_0^t \Delta(s)dW_s, then

$Var[I(t)]=\mathbb{E}[(I^2(t))]=\int_0^t \Delta^2(s)ds$


$ Var(X_T)= \int_0^T e^{-2k(T-t)}\sigma^2 \ dt $

$ Var(X_T)= \frac{\sigma^2}{2k}\big( 1-e^{-2kT} \big) $

Therefore, we finally get,

$ X_T \sim N\bigg(X_0 e^{-kT} +\theta (1-e^{-kT}), \frac{\sigma^2}{2k}\big( 1-e^{-2kT} \big) \bigg)$

MLE of Ornsterin-Uhlenbeck Process

$X_t \sim N(\cdot,\cdot)$

,where \mathbb{E}(X_{t+\delta t})=X_0 e^{-k\ \delta t} +\theta (1-e^{-k\ \delta t}), and Var(X_{t+\delta t})= \frac{\sigma^2}{2k}(1-e^{-2k\ \delta t}).

$$ f_{\theta}(x_{t+\delta t}|\theta)=\frac{1}{\sqrt{2\pi \sigma^2}}e^{-\frac{(x-\mu)^2}{2\sigma^2}} $$

We replace \mu = \mathbb{E}(X_{t+\delta t}) and \sigma^2 = Var(X_{t+\delta t}) insider.

Code realisation could be found in notes.

Maximum Likelihood Estimator

Joint Probability Density Function or Likelihood Function:

$f(x_1, x_2, …, x_n|\theta) = f(x_1|\theta) f(x_2|\theta) … f(x_N| \theta) $

$f(x_1, x_2, …, x_n|\theta) = \prod_{i=2}^n f(X_i | \theta) =L(\theta)$

A likelihood function is the density function regarded as a function of \theta.

$ L(\theta |x) = \prod f(x|\theta) $, $\theta \in \Theta$

As we know the sample, we need to ‘guess’ the parameter \theta to let the joint prob distribution have a maximum probability.

The Maximum Likelihood Estimator (MLE) is :

$ \hat{\theta}(x) = arg\max_{\theta} L(\theta|x) $

To simplify the calculation, we normally apply a logarithm transformation,

$$ l(\theta)= log\ L $$

$$log L(\theta |x) =log \prod f(x|\theta)= \sum_{i=1}^N log\ f_{\theta} (y_i) $$

As the logarithm transformation is monotonous,

$$ \hat{\theta}(x) = arg\max_{\theta} L(\theta|x) \Leftrightarrow $$

$$\hat{\theta}(x) = arg\max_{\theta} l(\theta|x) $$

To review it, we aim to estimate the parameters. The logic of the MLE estimator is that we guess a probability distribution, fitting into our data (sample). Then, we find or guess the value of parameters such that the joint probability function (likelihood function) achieves the maximum value.

In another word, we guess the probability dist and parameters to make sample data achieve highest possibility.

To find the argmax_{\theta} l(\theta) is equivalent to argmin_{\theta} -l(\theta)

A Normal Distribution Example

A normal distribution example,for random sample x_i, i=1, 2, 3, …, N

$$f_{\theta}(x) = \frac{1}{\sqrt{2\pi \sigma^2}}e^{-\frac{(x-\mu)^2}{2\sigma^2}} $$

We substitute f_{\theta}(x) into the log-likelihood distribution, then we find,

$$ l(\mu, \sigma^2)=log\ L(\mu,\sigma^2) $$

$$= -\frac{n}{2}\big( log2\pi + log\sigma^2 \big) -\frac{1}{2\sigma^2}\sum_{i=1}^n (x_i -\mu)^2 $$

Taking the First Order Conditions, F.O.C.

By setting \frac{\partial l}{\partial \mu}= 0 and \frac{\partial l}{\partial \sigma^2}= 0, we solve \mu_{MLE} and \sigma^2_{MLE}.

$$ \mu_{MLE} =\bar{x}$$

$$\sigma^2_{MLE} =\frac{1}{n} \sum_{i=1}^n (x_i – \bar{x})$$

P.S. We can prove that there are the local maximum, coz the second partial derivatives are negative.

def MLE_norm(x):
    mu_hat = np.mean(x)
    sigma2_hat = np.var(x)
    return mu_hat, sigma2_hat


mu = 5
sigma = 2.5
N = 10000

x = np.random.normal( mu, sigma, size=(N,) )


mu_hat, sigma2_hat = MLE_norm(x)

for_mu_hat = '\hat{\mu} = '+format(round(mu_hat,2))+''
for_sigma2_hat = '\hat{\sigma} = '+format(round(np.sqrt(sigma2_hat),2))+''
print('The MLE for data is:')

By our calculation, we know that mathematically the MLE estimator of mu and sigma are the mean and variance.

Perform MLE Numerically

If the log-likelihood function wasn’t continuous or differentiable. Can solve numerically through an optimisation problem where the objective function is …

In other words, sometimes we cannot calculate the MLE estimator by hand, because the log-likelihood function might not be differentiable. What can we do?

We apply the computer to interactively ‘guess’ values (Optimizer) that can end up with the maximum log-likelihood (or a minimum of negative log-likelihood).

The maximum likelihood estimator (MLE):

$$\hat{\theta}(x) = arg max_{\theta} L(\theta|x)$$

$$\hat{\theta}(x) = arg max_{\theta} L(\theta|x)$$

$$L(\theta) = \prod_{i=1}^N f(X_i | \theta) $$

$$ log\ f(X|\theta)=\sum_{i=1}^N log\ f(x_i|\theta)$$

$$ arg\max_{\theta} l(\theta) \Leftrightarrow arg\min_{\theta} -l(\theta) $$

Why we use the minimizer instead? There is only minimizer available in the scipy package, and those mini&max are basically equivalent.

def log_likelihood(theta, x):
    mu = theta[0]
    sigma = theta[1]
    l_theta = np.sum( np.log( sc.stats.norm.pdf(x, loc=mu, scale=sigma) ) )
    # there is a negative sign in front. We minimise it later.
    return -l_theta 

# Constraint Function that Restrict sigma to be positive.
def sigma_pos(theta):
    sigma = np.abs(theta[1])
    return sigma


cons_set = {'type':'ineq', 'fun': sigma_pos}

theta0 = [2,3] # inital guess. there could be any random numbers, coz it is just the intital value inputed into the optimiser
# find the minimum value of the log likelihood function
opt = sc.optimize.minimize(fun=log_likelihood, x0=theta0, args=(x,), constraints=cons_set)

for_mu_hat = '\hat{\mu} = '+format(round(opt.x[0],2))+''
for_sigma2_hat = '\hat{\sigma} = '+format(round(opt.x[1],2))+''

print('The MLE for data is:')

Risk-Neutral Pricing

Our basic logic is,

$$ \text{Option Value} =\text{Discounted Expectation of the Payoff} $$

However, there is not a perfect correlation between the Option and the Underlying Asset. That fact encourage us to do the risk-netural method for valuing an option.

Geometric Brownian Motion

  • Non-dividend paying stock, (S_0>0)

$$ dS_t = \mu S_t\ dt+\sigma S_t \ dW_t $$

  • Money market, or bank account, (B_0 = 1)

$$ dB_t = r B_t\ d_t $$


$$ \{ W_t \}_{t\in [0,T]} : \mathbb{P} \sim Brownian \ Motion $$

$ \{S_t\} $ is not a martingals under \mathbb{P} because \mu S_t is not zero.

So, we apply a transformation to S_t and to make it be a martingal (eliminate the drift term).

We here look at \{ \frac{S_t}{B_t} \},

$$ d\big( \frac{S_t}{B_t} \big)=-r\frac{S_t}{B_t}dt +\frac{1}{B_t} dS_t +\frac{1}{2}dS_t (-r\frac{1}{B_t}dt) $$

$$ d\big( \frac{S_t}{B_t} \big)=-r\frac{S_t}{B_t}dt +\bigg( \frac{1}{B_t}-\frac{1}{2}r\frac{1}{B_t}dt \bigg)\bigg( \mu S_t\ dt +\sigma S_t \ dW_t \bigg) $$

The cross terms dt & dW would decay to zero quickly in the stochastic integration. We therefore would get,

$$ d\big( \frac{S_t}{B_t} \big) = \sigma ( \frac{S_t}{B_t} )\underbrace{\bigg( \frac{\mu – r}{\sigma} dt + dW_t \bigg)}_{d\tilde{W_t}} $$

The drift disappears, instead we get d\big( \frac{S_t}{B_t} \big)d\tilde{W_t}

$$ d\big( \frac{S_t}{B_t} \big)d\tilde{W_t} $$

$$ d\tilde{W_t} = \frac{\mu -r}{\sigma} dt +dW_t $$

by Girsanov’s Theorem that \{a_t \}_{t\in [0,T]} be and adapted \{ \mathcal{F} \}_{t\in [0,T]} Ito Process, so that \mathbb{Q} is an equivalent measure on \mathbb{P} such that \tilde{W_t} is a Brownian Motion, \tilde{W_t} = W_t +\int_0^t a_u du.

$$ dS_t = rS_t \ d_t +\sigma S_t \ d\tilde{W_t} $$


The implication is that,

We can construct a relicating portfolio (S_t, B_t) with value process \{ V_t \}_{t\in[0,T]}. Under \mathbb{Q} the discounted value process exists that is a martingale \{ e^{-rt}V_t \}.

$$ e^{-rt} C_t =\mathbb{E}_{\mathbb{Q}}[ e^{-rT} C_T | \mathcal{F}_t ] $$

$$ =\mathbb{E}_{\mathbb{Q}}[ e^{-rT} V_T | \mathcal{F}_t ] = e^{-rt} V_t $$

Risk Neutral Expectation Pricing Formula

We therefore get,

$$ \frac{C_t}{B_t} = \mathbb{E}_{\mathbb{Q}}\bigg[ \frac{C_T}{B_t} \bigg| \mathcal{F}_t \bigg] $$

$$ B_T=1,\quad and \quad B_t=exp\{ \int_0^t r_s ds \} $$