Least Squares Method – Intro to Kalman Filter

Consider a Linear Equation,

$$ y_i = \sum_{j=1}^n C_{i,j} x_j +v_i,\quad i=1,2,…$$

, where $C_{i,j}$ are scalars and $v_i\in \mathbb{R}$ is the measurement noise. The noise is unknown, while we assume it follows certain patterns (the assumptions are due to some statistical properties of the noise). We assume $v_i, v_j$ are independent for $i\neq j$ . Properties are mean of zero, and variance equals sigma squared.

$$\mathbb{E}(v_i)=0$$

$$\mathbb{E}(v_i^2) = \sigma_i^2$$

We can rewrite $y_i = \sum_{j=1}^n C_{i,j} x_j +v_i$ as,

$$ \begin{pmatrix} y_1 \ y_2 \ \vdots\ y_s\end{pmatrix} = \begin{pmatrix} C_{11} & C_{12} & \cdots & C_{1n} \ C_{21} & C_{22}& \cdots & C_{2n} \ \vdots & \vdots & \cdots & \vdots \ C_{s1} & C_{s2} & \cdots & C_{sn}\end{pmatrix} \begin{pmatrix} x_1 \ x_2 \ \vdots\ x_n\end{pmatrix} + \begin{pmatrix} v_1 \ v_2 \ \vdots\ v_s\end{pmatrix} $$

, in a matrix form,

$$ \vec{y} = C \vec{x} + \vec{v} $$

, but I would write in a short form,

$$ y= C x +v$$

We solve for the least squared estimator from the optimisation problem, (there is a squared L2 norm)

$$ \min_x || y-Cx ||_2^2 $$

Recursive Least Squared Method

The classic least squared estimator might not work well when data evolving. So, there emerges a Recursive Least Squared Method to deal with the discrete-time instance. Let’s say, for a discrete-time instance $k$ , $y_k \in \mathbb{R}’$ is within a set of measurements group follows,

$$y_k = C_k x + v_k$$

, where $C_k \in \mathbb{R}^{l\times n}$ , and $v_k \in \mathbb{R}^l$ is the measurement noise vector. We assume that the covariance of the measurement noise is given by,

$$ \mathbb{E}[v_k v_k^T] = R_k$$

, and

$$\mathbb{E}[v_k]=0$$

The recursive least squared method has the following form in this section,

$$\hat{x}k = \hat{x}{k-1} + K_k (y_k – C_k \hat{x}_{k-1})$$

, where $\hat{x}k$ and $\hat{x}{k-1}$ are the estimates of the vector $x$ at the discrete-time instants $k$ and $k-1$ , and $K_k \in \mathbb{R}^{n\times l}$ is the gain matrix that we need to determine. $K_k$ is coined the ‘Gain Matrix’

The above equation updates the estimate of $x$ at the time instant $k$ on the basis of the estimate $\hat{x}_{k-1}$ at the previous time instant $k-1$ and on the basis of the measurement $y_k$ obtained at the time instant $k$ , as well as on the basis of the gain matrix $K_k$ computed at the time instant $k$ .

Notation

$\hat{x}$ is the estimate.

$$ \hat{x}k = \begin{pmatrix} \hat{x}{1,k} \ \hat{x}{2,k} \ \vdots \\hat{x}{n,k} \end{pmatrix} $$

, which is corresponding with the true vector $x$ .

$$x = \begin{pmatrix} x_1 \ x_2 \ \vdots \ x_n \end{pmatrix}$$

The estimation error, $\epsilon_{i,k} = x_i – \hat{x}_{i,k} \quad i=1,2,…,n$ .

$$\epsilon_k = \begin{pmatrix} \epsilon_{1,k} \ \epsilon_{2,k} \ \vdots \\epsilon_{n,k} \end{pmatrix} = x – \hat{x}_k = \begin{pmatrix} x_1-\hat{x}_{1,k} \ x_2 – \hat{x}_{2,k} \ \vdots \x_n-\hat{x}_{n,k} \end{pmatrix} $$

The gain $K_k$ is computed by minimising the sum of variances of the estimation errors,

$$ W_k = \mathbb{E}(\epsilon_{1,k}^2) + \mathbb{E}(\epsilon_{2,k}^2) + \cdots + \mathbb{E}(\epsilon_{n,k}^2) $$

Next, let’s show the cost function could be represented as follows, ( $tr(.)$ is the trace of a matrix)

$$ W_k = tr(P_k) $$

, and $P_k$ is the estimation error covariance matrix defined by

$$ P_k = \mathbb{E}(\epsilon_k \epsilon_k^T )$$

Or, says,

$$ K_k = arg\min_{K_k} W_k = tr\bigg( \mathbb{E}(\epsilon_k \epsilon_k^T ) \bigg)$$

Why is that?

$$\epsilon_k \epsilon_k^T = \begin{pmatrix} \epsilon_{1,k} \ \epsilon_{2,k} \\vdots \ \epsilon_{n,k} \end{pmatrix} \begin{pmatrix} \epsilon_{1,k} & \epsilon_{2,k} & \cdots & \epsilon_{n,k} \end{pmatrix}$$

$$ = \begin{pmatrix} \epsilon_{1,k}^2 & \cdots & \epsilon_{1,k}\epsilon_{n,k} \ \vdots & \epsilon_{i,k}^2 & \vdots \ \epsilon_{1,k}\epsilon_{n,k} & \cdots & \epsilon_{n,k}^2\end{pmatrix} $$

So,

$$ P_k = \mathbb{E}[\epsilon_k \epsilon_k^T] $$

$$tr(P_k) = \mathbb{E}(\epsilon_{1,k}^2) + \mathbb{E}(\epsilon_{2,k}^2) + \cdots + \mathbb{E}(\epsilon_{n,k}^2)$$

Optimisation

$$ K_k = arg\min_{K_k} W_k = tr\bigg( \mathbb{E}(\epsilon_k \epsilon_k^T ) \bigg) = tr(P_k)$$

Let’s derive the optimisation problem.

$$\epsilon_k = x-\hat{x}_k$$

$$ =x-\hat{x}{k-1} – K_k(y_k – C_k \hat{x}{k-1}) $$

$$ = x- \hat{x}{k-1} – K_k (C_k x + v_k – C_k \hat{x}{k-1}) $$

$$ = (I – K_k C_k)(x-\hat{x}_{k-1}) – K_k v_k $$

$$ =(I-K_k C_k )\epsilon_{k-1} – K_k v_k $$

Recall $y_k = C_k x + v_k$ and $\hat{x}k = \hat{x}{k-1} + K_k (y_k – C_k \hat{x}_{k-1})$

So, $\epsilon_k \epsilon_k^T$ would be,

$$\epsilon_k \epsilon_k^T = \bigg((I-K_k C_k )\epsilon_{k-1} – K_k v_k\bigg)\bigg((I-K_k C_k )\epsilon_{k-1} – K_k v_k\bigg)^T$$

$P_k = \mathbb{E}(\epsilon_k \epsilon_k^T)$, and $P_{k-1} = \mathbb{E}(\epsilon_{k-1} \epsilon_{k-1}^T)$.

$\mathbb{E}(\epsilon_{k-1} v_k^T) = \mathbb{E}(\epsilon_{k-1}) \mathbb{E}(v_k^T) =0$ by the white noise property of $\epsilon$ and $v$. However, $\mathbb{E}(v_k v_k^T) = R_k$. Substituting all those into $P_k$, we would get,

$$P_k = (I – K_k C_k)P_{k-1}(I – K_k C_k)^T + K_k R_k K_k^T$$

$$ P_k = P_{k-1} – P_{k-1} C_k^T K_k^T – K_k C_k P_{k-1} + K_k C_k P_{k-1}C_k^T K_k^T + K_k R_k K_k^T $$

$$W = tr(P_k)= tr(P_{k-1}) – tr(P_{k-1} C_k^T K_k^T) – tr(K_k C_k P_{k-1}) + tr(K_k C_k P_{k-1}C_k^T K_k^T) + tr(K_k R_k K_k^T) $$

We take F.O.C. to solve for $K_k = arg\min_{K_k} W_k = tr\bigg( \mathbb{E}(\epsilon_k \epsilon_k^T ) \bigg) = tr(P_k)$ , by letting $\frac{\partial W_k}{\partial K_k} = 0$ . See the Matrix Cookbook and find how to do derivatives w.r.t. $K_k$ .

$$\frac{\partial W_k}{\partial K_k} = -2P_{k-1} C_k^T + 2K_k C_k P_{k-1} C_k^T + 2K_k R_k = 0$$

We solve for $K_k$ ,

$$ K_k = P_{k-1} C_k^T (R_k + C_k P_{k-1} C_k^T)^{-1}$$

, we let $L_k = R_k + C_k P_{k-1} C_k^T$ , and $L_k$ has the following property $L_k = L_k^T$ and $L_k^{-1} = (L_k^{-1})^T$

$$ K_k = P_{k-1} C_k^T L_k^{-1} $$

Plug $K_k = P_{k-1} C_k^T K_k^{-1}$ back into $P_k$ .

$$ P_k = P_{k-1} – K_kC_k P_{k-1} = (I-K_k C_k)P_{k-1} $$

Summary

In the end, the Recursive Least Squared Method could be summarised as the following three equations.

1. Update the Gain Matrix.

$$ K_k = P_{k-1} C_k^T (R_k + C_k P_{k-1} C_k^T)^{-1}$$

2. Update the Estimate.

$$\hat{x}_k = \hat{x}_{k-1} + K_k (y_k – C_k \hat{x}_{k-1})$$

3. Propagation of the estimation error covariance matrix by using this equation.

$(I-K_k C_k)P_{k-1}$

Reference

Introduction to Kalman Filter: Derivation of the Recursive Least Squares Method

Sigmoid & Logistic

Sigmoid function is largely used for the binary classification, in either machine learning algorithm or econometrics.

Why the Sigmoid Function shapes in this form?

Firstly, let’s introduce the odds.

Odds provide a measure of the likelihood of a particular outcome. They are calculated as the ratio of the number of outcomes that produce that outcome to the number that do not.

Odds also have a simple relation with probability: the odds of an outcome are the ratio of the probability that the outcome occurs to the probability that the outcome does not occur. In mathematical terms, p is the probability of the outcome, and 1-p is the probability of not occurring.

$$ odds = \frac{p}{1-p} $$

Odd and Probability

Let’s find some insights behind the probability and the odd. Probability links with the outcomes in that for each outcomes, the probability give its specific corresponding probability. $Pr(Y)$ , where $Y$ is the outcome, and $Pr(\cdot)$ is the probability density function that project outcomes to it’s prob.

What about the odds? Odds is more like a ratio that is calculated by the probability as the formula says.

Implication: Compared to the probability, odds provide more about how the binary classification is balanced or not, but the probability distribution.

Example

Rolling a six-side die. The probability of rolling 6 is $1/6$ , but the odd is $1/5.

Formula

$$ odd = \frac{Pr(Y)}{1-Pr(Y)} $$

, where $Y$ is the outcomes.

Logit

As the probability $Pr(Y)$ is always between $[0,1]$ , the odds must be non-negative, $odd \in [0,\infty]$ . We may want to apply a monotonic transformation to re-gauge that range of odds. We will apply on the logarithm.

$$ Sigmoid/Logistic := log(odds) =log\bigg( \frac{Pr(Y)}{1-Pr(Y)} \bigg) $$

We then get the Sigmoid function.

As the transformation we apply on is monotonic, the Sigmoid function remains the similar properties as the odd. The Sigmoid function keeps the similar implication, representing the balance of the binary outcomes.

Then, we bridge $Y = f(X)$ , the outcome $Y$ is a function of events $X$ . Here, we assume a linear form as $Y = X\beta$ . The sigmoid function would then become a function of $X$ .

$$g(X) = log\bigg( \frac{Pr(X\beta)}{1-Pr(X\beta)} \bigg) $$

$$ e^g = \frac{p}{1-p} $$

$$ p = \frac{e^g}{e^g+1}=\frac{1}{1+e^{-g}}$$

$$ p = \frac{1}{1+e^{-X\beta}}$$

We finally get out logistic sigmoid function as above.

Dirac Delta Function

The Dirac Delta Function could be applied to simplify the differential equation. There are three main properties of Dirac Delta Function.

$$\delta (x-x’) =\lim_{\tau\to0}\delta (x-x’)$$

such that,

$$ \delta (x-x’) = \begin{cases} \infty & x= x’ \ 0 & x\neq x’ \end{cases} $$

$$\int_{-\infty}^{\infty} \delta (x-x’)\ dx =1$$

Three Properties:

Property 1:

$$\delta(x-x’)=0 \quad \quad ,x\neq x’ $$

Property 2:

$$ \int_{x’-\epsilon}^{x’+\epsilon} \delta (x-x’)dx =1\quad \quad ,\epsilon >0 $$

Property 3:

$$\int_{x’-\epsilon}^{x’+\epsilon} f(x)\ \delta (x-x’)dx = f(x’)$$

At $x=x’$ the Dirac Delta function is sometimes thought of has having an “infinite” value. So, the Dirac Delta function is a function that is zero everywhere except one point and at that point it can be thought of as either undefined or as having an “infinite” value.

Girsanov’s Theorem

Statement

We can change the probability measure, and then make a random variable follows a certain probability measure.

Radon-Nikodym Derivative:

$$Z(\omega) = \frac{\tilde{P}(\omega)}{P(\omega)}$$

$\tilde{P}(\omega)$ is the risk-neutral probability measure.
${P}(\omega)$ is the actual probability measure.
Properties:
- $Z(\omega)>0$
- $\mathbb{E}(Z)=1$
- As $\tilde{P}(\omega) = Z(\omega) P(\omega)$ , so if $Z(\omega)$ , then $\tilde{P}(\omega)>P(\omega)$ . vice versa.

We can calculate that,

$$ \underbrace{\tilde{\mathbb{E}}(X)}_{\text{Expectation under Risk-neutral Probability Measure}} = \underbrace{\mathbb{E}(ZX)}_{\text{Expectation under Actual Probability Measure}} $$

Proof & Example

Under $(\Omega,\mathcal{F},P)$ , $A\in \mathcal{F}$ , let $X$ be a random variable $X\sim N(0,1)$ . $\mathbb{E}(X)=0$ , and $\mathbb{Var}(X)=1$ .

$Y=X+\theta$, $\mathbb{E}(Y)=\theta$, and $\mathbb{Var}(Y)=1$.

$X$ here is s.d. normal under the actual probability measure.

However, $Y$ here is not standard normal under the current probability $P(.)$ , because $\mathbb{E}(Y)\neq0$ .

What do we do?

We change the probability measure from $P(.)\to\tilde{P}(.)$ to let $Y$ be standard normal under the new probability measure!

We set the Radon-Nikodym Derivative,

$$Z(\omega) = exp\{ -\theta\ X(\omega) – \frac{1}{2}\theta^2 \}$$

Now, we can create the probability measure $\tilde{P}(A)$ , $A={ \omega;Y(\omega)\leq b) }$

$$\tilde{P}(A) = \int_A Z(\omega)\ dP(\omega)$$

such that $Y=X+\theta$ would be standard normal distributed under the new probability measure $\tilde{P}(A)$ .

$$\tilde{P}(A) = \tilde{P}(Y(\omega \leq b)$$

$$ = \int_{{ Y(\omega)\leq b } } exp{ -\theta\ X(\omega) – \frac{1}{2}\theta^2 } \ dP(\omega)$$

, then change the integral range from the set $A$ to $\Omega$ by multiplying that indicator.

$$ = \int_{\Omega }\mathbb{1}_{ Y(\omega)\leq b }\ exp{ -\theta\ X(\omega) – \frac{1}{2}\theta^2 } \ dP(\omega)$$

, change from $dP$ to $dX$ ,

$$ = \int_{-\infty }^{\infty }\mathbb{1}_{ b-\theta}\ exp{ -\theta\ X(\omega) – \frac{1}{2}\theta^2 } \ \frac{1}{\sqrt{2\pi}}e^{-\frac{1}{2}X^2(\omega)} \ dX(\omega)$$

$$ =\frac{1}{\sqrt{2\pi}} \int_{-\infty }^{b-\theta}\ exp{ -\theta\ X(\omega) – \frac{1}{2}\theta^2- \frac{1}{2}X^2(\omega)} \ dX(\omega)$$

$$ =\frac{1}{\sqrt{2\pi}} \int_{-\infty }^{b-\theta}\ exp\Bigg\{ -\frac{1}{2}\bigg(\theta+ X(\omega)\bigg)^2\Bigg\} \ dX(\omega)$$

, as $Y=X+\theta$ , $dY = dX$ , we now change $dX$ to $dY$ ,

$$ =\frac{1}{\sqrt{2\pi}} \int_{-\infty }^{b}\ exp\big\{ -\frac{1}{2}Y(\omega)^2\big\} \ dY(\omega)$$

, the above is now a standard normal distribution for $Y(\omega)$ .

Value at Risk & Expected Shortfalls

Value at Risk – VaR

VaR is a probability statement about the potential change in the value of a portfolio.

Notation

$$Porb(x\leq VaR(X))= 1-c$$

$$ Prob\bigg(z \leq \frac{VaR(X)-\mu}{\sigma}\bigg)=1-c $$

$c$ – confidence interval, i.e. $c=99\%$. Then $1-c = 1\% $
$\mu$ and $\sigma$ are for $X$.
- For Example, if $X$ is yearly return, then $\mu_{252days}=252\cdot\mu_{1day}$ , and $\sigma_{252days}=\sqrt{252}\cdot\sigma_{1day}$
$x$ here is the return. So, $c$ is the confidence interval, i.e. 99%.
- VaR focus on the tail risks. If $x$ stands for return, then tail risk is on the left tail, $z_{1-c}$ .
If $x$ is the loss, the tail risk is on the right tail. $z_c$

$$VaR(X) = \mu + \sigma\cdot \Phi^{-1}(1-c)$$

$$VaR(X) = \mu + \sigma\cdot z_{1-c}$$

I.E.

If $c=99\%$ , then $1-c=1\%$ , so $z_{1-c}=z_{0.01} \approx -2.33$

$VaR(X) = \mu – 2.33\cdot \sigma$

P.S.

The unit of VaR is the amount of loss, so it should be monetary amount. For example, if the total amount of portfolio is USD 1 million, then $VaR = \$1m \cdot (\mu – 2.33\cdot \sigma)$ .

Loss Distribution

Remember $X$ is a distribution of loss. If we know the distribution of Portfolio Return $R$ , $R\sim N(\mu, \sigma^2)$ , then what is the dist for $X$ ?

$$X \sim N(-\mu, \sigma^2)$$

Right! Loss is just the negative return. Also, the volatility would not be affected by plus / minus.

Expected Shortfall (ES)

Expected Shortfall states the Expected Loss during time T conditional on the loss being greater than the $c^{th}$ percentile of the loss distribution.

Notation

$$ ES_c (X) = \mathbb{E}\bigg[ X|X\leq VAR_c(X) \bigg] $$

Be attention here, $X$ is a r.v., and $x$ stands for return here! while the only variable in the $ES_c(X)$ is $c$ , the confidence level, instead of $X$ .
$c$ is the confidence level, i.e. $c$ = 99%.
If $x$ stands for return, then the VaR is the left-tail, $z_{1-c}$ .

$$ ES_c (X) = \mathbb{E}\bigg[ X|X\geq VAR_c(X) \bigg] $$

If $x$ stands for loss (, which is the negative of return ), then the VaR is the right-tail, $z_{c}$ .

Derivation

Notation Form

Consider, $x$ is the return, then $ES_c (X) = \mathbb{E}\bigg[ X|X\leq VAR_c(X) \bigg]$ , and $VaR_c(x)= \mu + z_{1-c}\sigma$ , where $c$ is the confidence level $c=99\%$ for example.

$$ES_c(X) = \frac{\int_{-\infty}^{VaR} xf(x)dx }{\int_{-\infty}^{VaR} f(x)dx } = \frac{\int_{-\infty}^{VaR} x \phi(x)dx }{\int_{-\infty}^{VaR} \phi(x)dx } =\frac{\int_{-\infty}^{VaR} x \phi(x)dx }{ \Phi(VaR) – \Phi(-\infty)} $$

$$= \frac{1}{ \Phi(VaR) – \Phi(-\infty) }\int_{-\infty}^{VaR}x \frac{1}{\sqrt{2\pi \sigma^2}} e^{-\frac{(x-\mu)^2}{2\sigma^2}} dx $$

Replace $z = \frac{x-\mu}{\sigma}$ , then $x = \mu + z \sigma$ , and $dx = \sigma dz$

$$ = \frac{1}{\Phi(VaR)} \int_{-\infty}^{VaR}(\mu + z\sigma) \frac{1}{\sqrt{2\pi \sigma^2}} e^{-\frac{z^2}{2}}\sigma dz $$

$$ = \frac{1}{\Phi(VaR)}\mu \int_{-\infty}^{VaR}\frac{1}{\sqrt{2\pi }} e^{-\frac{z^2}{2}} dz + \sigma^2\int_{-\infty}^{VaR} z \frac{1}{\sqrt{2\pi \sigma^2}} e^{-\frac{z^2}{2}} dz $$

$$ = \frac{1}{\Phi(VaR)}\mu \Phi(VaR) – \frac{\sigma^2}{\Phi(VaR)}\int_{-\infty}^{VaR} \frac{1}{\sqrt{2\pi \sigma^2}} e^{-\frac{z^2}{2}} d(-\frac{z^2}{2}) $$

$$ = \mu – \frac{\sigma^2}{\Phi(VaR)} \frac{1}{\sqrt{2\pi \sigma^2}} \int_{-\infty}^{VaR} e^{-\frac{z^2}{2}} d(-\frac{z^2}{2}) $$

$$ = \mu – \frac{\sigma}{\Phi(VaR)} \frac{1}{\sqrt{2\pi }} e^{-\frac{z^2}{2}} |_{-\infty}^{VaR} $$

$$ = \mu – \frac{\sigma}{\Phi(VaR)} \frac{1}{\sqrt{2\pi }} e^{-\frac{VaR^2}{2}}= \mu – \frac{\sigma}{\Phi(VaR)} \phi(VaR)$$

Recall, $VaR_c(x)= \mu + z_{1-c}\sigma$ , so $\phi(VaR_c(x))= \phi(\mu + z_{1-c}\sigma) \leftrightarrow \phi(z_{1-c}) = \phi\bigg( \Phi^{-1}(1-c) \bigg)$ , and $\Phi(VaR_c(x))= \Phi(\mu + z_{1-c}\sigma) \leftrightarrow \phi(z_{1-c}) = \Phi\bigg( \Phi^{-1}(1-c) \bigg) = 1-c$ .

Thus,

$$ ES_c(X) =\mu – \frac{\sigma}{\Phi(VaR)} \phi(VaR)=\mu -\sigma \frac{\phi\big( \Phi^{-1}(1-c) \big)}{1-c}$$

VaR Form

we ‘sum up’ (integrate) the VaR from $c$ to $1$ , conditional on $1-c$ .

$$ES_c(X) = \frac{1}{1-c} \int_c^1 VaR_u(X)du$$

$$ ES_c(X) = \frac{1}{1-c} \int_c^1 \bigg( \mu + \sigma\cdot \Phi^{-1}(1-u) \bigg) du $$

$$ =\mu + \frac{\sigma}{1-c} \int^1_c \Phi^{-1}(1-u) du $$

We let $u = \Phi(Z)$ , where $Z \sim N(0,1)$ . Then,

$du =d(\Phi(z)) =\phi(z) dz$.
$u\in (c,1)$, so $z = \Phi^{-1}(u)\in (z_c \ , \infty)$

Thus,

$$ ES_c(X) =\mu + \frac{\sigma}{1-c} \int^{\infty}_{z_c} \Phi^{-1}\big(1-\Phi(z)\big)\phi(z) dz $$

As $1-\Phi(z) = \Phi(-z)$

$$ ES_c(X) =\mu + \frac{\sigma}{1-c} \int^{\infty}_{z_c} \Phi^{-1}(\Phi(-z))\phi(z) dz = \mu – \frac{\sigma}{1-c} \int^{\infty}_{z_c} z\phi(z) dz $$

$ \int_{z_c}^{\infty} z \phi(z)dz = \int_{z_c}^{\infty} z \frac{1}{\sqrt{2\pi}}e^{-\frac{z^2}{2}}dz = -\frac{1}{\sqrt{2\pi}} \int_{z_c}^{\infty} -e^{\frac{z^2}{2}}d(e^{-\frac{z^2}{2}})$

$=\frac{1}{\sqrt{2\pi}}e^{-\frac{z_c^2}{2}}=\phi(z_c)=\phi\big(\Phi^{-1}(c)\big)$, bring it back to $ES_c(X)$

$$ES_c(X) = \mu – \sigma\frac{ \phi\big(\Phi^{-1}(c)\big)}{1-c}$$

Morden Portfolio Theory

$x$ – vector weights
$R$ – vector of all assets’ returns
$\mu = \mathbb{E}(R)$ – mean return of all assets
$\Sigma = \mathbb{E}\bigg[ (R-\mu)(R-\mu)^T \bigg]$ – var-cov matrix of all assets

So,

$\mu_x = x^T \mu$ – becomes a scalar now
$\sigma^2 = x^T \Sigma x$ – collapse to be a scalar

Optimisation

Maximise Expected Return s.t. volatility constraint.

$$ \max_{x} \mu_x \quad s.t. \quad \sigma_x \leq \sigma^* $$

Minimise Volatility s.t. return constraint.

$$ \min_{x} \sigma_x \quad s.t. \quad \mu_x \geq \mu^* $$

Portfolio Risk Measures

By definition, the loss of a portfolio is the negative of return, $L(x) = -R(x)$ .

The Loss distribution becomes the same normal distribution with x-axis reversed.

Volatility of Loss: $\sigma(L(x)) = \sigma_x$ , the minus does not matter in the s.d.
Standard Deviation-based risk measure: $=\mathbb{E}(L(x)) + cz_{c}\sigma(L(x))$ , x-axis is revered, so $z_{1-c}$ for return becomes $z_c$ for loss.
VaR: $VaR_{\alpha}(x)=inf\bigg{ \mathscr{l}:Prob\big[ L(x)\leq \mathscr{l} \geq\big] \alpha \bigg}$
Expected Shortfall: $ES_{\alpha}(x) = \frac{1}{1-\alpha} \int_{\alpha}^1 VaR_u(x) du$ . In other form, $ES_{\alpha}(x)=\mathbb{E}\bigg( L(x)| L(x)\geq VaR_{\alpha}(x) \bigg)$

As $R \sim N(\mu, \Sigma)$ ,

for our portfolio with weights $x$ , $mean = \mu$ , and $\sigma_x = \sqrt{x^T \Sigma x}$ .
for the loss, $mean = -\mu$ , and $\sigma_x = \sqrt{x^T \Sigma x}$ .

Taylor Series and Transition Density Functions

See my Github repo for full details.

https://github.com/eightsmile/cqf

1. Trinomial Random Walk

2. Transition Probability Density Function

The transition probability density function, $p(y,t;y’,t’)$ , is defined by,

$$ Prob(a<y'<b, at\ time \ t’ | y \ at \ time\ t) = \int_a^b p(yet;y’,t’)dy’$$

In words this is “the probability that the random variable y ′ lies between a and b at time t ′ in the future, given that it started out with value y at time t.”

Think of y and t as being current values with y ′ and t ′ being future values. The transition probability density function can be used to answer the question,

“What is the probability of the variable y ′ being in a speciﬁed range at time t ′ in the future given that it started out with value y at time t?”

Our Goal is to find the transition probability p.d.f., and so we find the relationship between $p(y,t;y’,t’)$ , and $p(y,t;y’,t’-\delta t)$ ,

3. From the Trinomial model to the Transition Probability Density function

The variable y can either rise, fall or take the same value after a time step δt. These movements have certain probabilities associated with them.

We are going to assume that the probability of a rise and a fall are both the same, $\alpha<\frac{1}{2}$ . (But, of course, this can be generalized. Why would we want to generalize this?)

3.1 The Forward Equation

Given ${y,t}$ , or says ${y,t}$ the current and previous. ${y’,t’}$ are variate in the future time.

The probability of being at $y’$ at time $t’$ is related to the probabilities of being at the previous three values and moving in the right direction:

$$ p(y,t;y’,t’) = \alpha \ p(y,t;y’+\delta y,t’-\delta t) + \ (1-2\alpha) \ p(y,t;y’,t’-\delta t) + \alpha \ p(y,t;y’-\delta y,t’-\delta t) $$

Given ${y,t}$ , we find relationship between ${y’,t’}$ and ${y’\pm \delta y,t’-\delta t}$ that is y’ and t’ a bit time previously.

Remember, our goal is to find a solution of $p(.)$ , we try to solve the above equation.

3.2 Taylor Series Expansion

We expand each term of the equation.

$$ p(y,t;y’,t’) = \alpha \ p(y,t;y’+\delta y,t’-\delta t) + \ (1-2\alpha) \ p(y,t;y’,t’-\delta t) + \alpha \ p(y,t;y’-\delta y,t’-\delta t) $$

Why we do that? Because there are too many variables in it, hard to solve it. We have to reduce the dimension.

$$ p(y,t;y’+\delta y,t’-\delta t)\approx \ p(y,t;y’,t) – \delta t \frac{\partial p}{\partial t’} +\delta y \frac{\partial p}{\partial y’} + \frac{1}{2}\delta y^2 \frac{\partial^2 p}{\partial y’^2} + O(\frac{\partial^2 p}{\partial t’^2}) $$

$$ p(y,t;y’,t’-\delta t)\approx \ p(y,t;y’,t) – \delta t \frac{\partial p}{\partial t’} + O(\frac{\partial^2 p}{\partial t’^2}) $$

$$ p(y,t;y’-\delta y,t’-\delta t)\approx \ p(y,t;y’,t) – \delta t \frac{\partial p}{\partial t’} -\delta y \frac{\partial p}{\partial y’} + \frac{1}{2}\delta y^2 \frac{\partial^2 p}{\partial y’^2} + O(\frac{\partial^2 p}{\partial t’^2}) $$

Plug them back into that equation, and after cancel out terms repeated we would left with,

$$ \frac{\partial p}{\partial t’} =\alpha \frac{\delta y^2}{\delta t} \frac{\partial^2 p}{\partial y’^2} + O(\frac{\partial^2 p}{\partial t’^2})$$

We drop those derivative terms with order greater and equal than $O(\frac{\partial^2 p}{\partial t’^2})$ .

$$ \frac{\partial p}{\partial t’} =\alpha \frac{\delta y^2}{\delta t} \frac{\partial^2 p}{\partial y’^2} $$

In the RHS, we focus on $\alpha \frac{\delta y^2}{\delta t}$ , firstly. The denominator and numerator have to be in the same order to make that term definite. Or, say $\delta y \sim O(\sqrt{\delta t})$ .

We thus let $c^2 = \alpha \frac{\delta y^2}{\delta t}$

$$ \frac{\partial p}{\partial t’} =c^2 \frac{\partial^2 p}{\partial y’^2} $$

The above equation is also named Fokker–Planck or forward Kolmogorov equation.

Now, we have a partial differential equation. Solve it, we can get the form of $p$ .

3.3 Backward Equation works similar.

$$ p(y,t;y’,t’) = \alpha \ p(y+\delta y,t+\delta t;y’,t’) + \ (1-2\alpha) \ p(y,t+\delta t;y’,t’) + \alpha \ p(y-\delta y,t+\delta t;y’,t’) $$

the dimension-reduced result is the blow, and it is called the backward Kolmogorov equation.

$$ \frac{\partial p}{\partial t’} + c^2 \frac{\partial^2 p}{\partial y’^2} =0 $$

4. Solve the Forward Kolmogorov Equation

We will solve for $p$ right now! However, we will solve it by assuming similarity solution.

$$ \frac{\partial p}{\partial t’} =c^2 \frac{\partial^2 p}{\partial y’^2} $$

This equation has an inﬁnite number of solutions. It has diﬀerent solutions for diﬀerent initial conditions and diﬀerent boundary conditions. We need only a special solution here. The detailed process of finding that solution is showing as the following,

4. 1 Assume a Solution Form

$$ p=t’^a f(\frac{y’}{t’^b}) = t’^a f(\xi)$$

$$ \xi = \frac{y’}{t’^b}$$

, where $a$ , and $b$ are indefinite variables.

Again, don’t ask why it is in this form, because it is a special solution!

4.2 Derivation

$$\frac{\partial p}{\partial y’}=t’^{a-b}\frac{df}{d\xi}$$

$$\frac{\partial^2 p}{\partial y’^2}=t’^{a-2b}\frac{d^2f}{d\xi^2}$$

$$\frac{\partial p}{\partial t’}=at’^{a-1}f(\xi)+by’t’^{a-b-1}\frac{df}{d\xi}$$

Substitue back into the forward Kolmogorov equation (remember $y’ = t’^b \xi$ ), we get,

$$ af(\xi) – b\xi \frac{df}{d\xi} = c^2 t’^{-2b+1} \frac{d^2f}{d\xi^2}$$

4.3 Choose $b$

As we need the RHS to be independent of $t’$ , we could choose the value of $b=\frac{1}{2}$ , to let the $t’$ has a power of 0. Why we do that? Because we aim to reduce the partial differential equation to be a ordinary differential equation, in which the only variable is $\xi$ , and $t’$ disappear.

By assuming the special form of $p$ , and letting $b= 1/2$ , our forward Kolmogorov becomes,

$$ af(\xi) – \frac{1}{2}\xi \frac{df}{d\xi} = c^2 \frac{d^2f}{d\xi^2}$$

$$ p=t’^a f(\frac{y’}{\sqrt{t’}}) = t’^a f(\xi)$$

$$ \xi = \frac{y’}{\sqrt{t’}}$$

4.4 Choose $a$

$$p=t’^a f(\frac{y’}{\sqrt{t’}}) $$

We know that $p$ is the transition p.d.f., its integral must be equal to ‘1’. $t’$ is independent by the definition of random walk behaviour, so we do only integrate $p$ , w.r.t. $y’$ .

$$\int_{\mathbb{R}}p\ dy’ = \int_{\mathbb{R}} t’^a f(\frac{y’}{\sqrt{t’}})\ dy’ = 1$$

$$ \int_{\mathbb{R}} t’^a f(\frac{y’}{\sqrt{t’}})\ dy’ = 1 $$

, by replace $x = \frac{y’}{\sqrt{t’}}$ ,

$$ \int_{\mathbb{R}} t’^{a+1/2} f(x)\ dx =t’^{a+1/2} \int_{\mathbb{R}} f(x)\ dx= 1 $$

$t’$ is independent, so the above equation would be equal to ‘1’ regardless the power of $t’$. Thus, $a = -\frac{1}{2}$ for sure.

Also, we get $\int_{\mathbb{R}} f(x)\ dx= 1$ .

4.5 Integrate! Solve it!

By assuming the special form of $p$ , and letting $a=-1/2$ , $b=1/2$ , we get,

$$ -\frac{1}{2}f(\xi) – \frac{1}{2}\xi \frac{df}{d\xi} = c^2 \frac{d^2f}{d\xi^2}$$

$$ p=\frac{1}{\sqrt{t’}} f(\frac{y’}{\sqrt{t’}}) = \frac{1}{\sqrt{t’}}f(\xi)$$

$$ \xi = \frac{y’}{\sqrt{t’}}$$

The forward Kolmogorov equation becomes,

$$ -\frac{1}{2}\bigg(f(\xi) – \xi \frac{df}{d\xi} \bigg)= c^2 \frac{d^2f}{d\xi^2}$$

$$ -\frac{1}{2}\bigg( \frac{d \xi f(\xi)}{d \xi} \bigg)= c^2 \frac{d^2f}{d\xi^2}$$

, as $f(\xi) – \xi \frac{df}{d\xi} = \frac{d \xi f(\xi)}{d \xi}$ .

Integrate 1st Time

$$ -\frac{1}{2}\xi f(\xi)= c^2 \frac{df}{d\xi} + constant$$

There’s an arbitrary constant of integration that could go in here but for the answer we want this is zero. We need only a special solution, so we can set that arbitrary constant term be zero.

So, the eq could be rewritten as,

$$ -\frac{1}{2c^2}\xi d\xi = \frac{1}{f(\xi)}df $$

Integrate 2nd Time

$$ ln\ f(\xi) = -\frac{\xi^2}{4c^2} + C$$

Take exponential, $f(\xi) = e^C e^{-\frac{\xi^2}{4c^2}} = A e^{-\frac{\xi^2}{4c^2}}$ .

Find $A$

The Last Step here is to find the exact value of $A$ . $A$ is chosen such that the integral of $f$ is one.

$$\int_{\mathbb{R}}f(\xi)\ d\xi =1$$

$$ \int_{\mathbb{R}}A e^{-\frac{\xi^2}{4c^2}} \ d\xi = 2cA\int_{\mathbb{R}} e^{-\frac{\xi^2}{4c^2}} \ d\big(\frac{\xi}{2c}\big) =1 $$

$$ 2cA \sqrt{\pi} = 1 $$

, so we get $A = \frac{1}{2c\sqrt{\pi}}$

Plug $f(\xi), a, b, A$ back into $p = t^a f(\xi)$ .

$$ p(y’)=\frac{1}{2c\sqrt{\pi \ t’}}e^{-\frac{\xi^2}{4c^2}} =\frac{1}{2c\sqrt{\pi \ t’}}e^{-\frac{y’^2}{4c^2t’}} $$

$p(.)$ now is normal like distributed.

$$N(x) = \frac{1}{\sqrt{2\pi\sigma^2}}e^{-\frac{(x-\mu)^2}{2\sigma^2}}$$

So, we may say $\mu_{y’}=0$ , and $\sigma^2_{y’}=2c^2t’$ . Or, $y’ \sim N(0, 2c^2t’)$ .

5. Summary

$$p(y’)=\frac{1}{2c\sqrt{\pi \ t’}}e^{-\frac{\xi^2}{4c^2}} =\frac{1}{2c\sqrt{\pi \ t’}}e^{-\frac{y’^2}{4c^2t’}} $$

Finally, we solved the transition probability density function $p(.)$ . By assuming the forward or backward type of trinomial model, we find a partial differential relationship. Then, assuming a special form of $p(.)$ by similarity method, we solve it.

The meaning is that $p(y’)=\frac{1}{2c\sqrt{\pi \ t’}}e^{-\frac{\xi^2}{4c^2}} =\frac{1}{2c\sqrt{\pi \ t’}}e^{-\frac{y’^2}{4c^2t’}}$ is one of the transition probability density function that can satisfy the trinomial random walk.

Also, we find that $p(.)$ is normally liked distributed.

Black and Scholes, 1973

See A bit Stochastic Calculus .

For,

$$ dS_t = \mu S_t \ dt +\sigma S_t \ dW_t $$

In calculating $d f(S_t)$ , we would get, (by Taylor Expansion)

$$ df(S_t) = \bigg( \frac{\partial f}{\partial t} + \frac{\partial f}{\partial S_t}\mu S_t +\frac{1}{2}\frac{\partial^2 f}{\partial S_t^2}\sigma^2 S_t^2 \bigg)dt + \frac{\partial f}{\partial S_t}\sigma S_t \ dW_t $$

A Special Form of $f(\cdot)$ is $f(S) = log(S)$ ,

$$ d\ log(S_t) = \bigg( \mu – \frac{1}{2}\sigma^2 \bigg)dt + \sigma \ dW_t $$

We get $Y_t = log S_t$ is the price of a derivative security with respect to $S_t$ and $t$ and then,

$$ dY_t= \bigg( \frac{\partial Y_t}{\partial t} + \frac{\partial Y_t}{\partial S_t}\mu S_t +\frac{1}{2}\frac{\partial^2 Y_t}{\partial S_t^2}\sigma^2 S_t^2 \bigg)dt + \frac{\partial Y_t}{\partial S_t}\sigma S_t \ dW_t $$

Consider a portfolio $\Pi$ is constructed with (1) short one derivative, and (2) long some fraction of stocks, $\Delta$ , such that the portfolio is risk natural. ( $\Delta = \frac{\partial Y}{\partial S}$ )

$$ \Pi_t = -Y +\Delta \ S_t $$

Differentiate it,

$$ d\Pi_t = -dY_t +\frac{dY}{dS}dS_t $$

Subtitute $dY_t$ and $dS_t$ into the above equation, we would then get the stochastic process of portfolio, by Ito’s Lemma.

$$ d\Pi_t =-\bigg( \frac{\partial Y}{\partial t} + \frac{1}{2}\frac{\partial^2 Y}{\partial S^2} \sigma^2 S_t^2 \bigg) dt $$

The diffusion term $dW_t$ disappears, and that means the portfolio is riskless during the interval $dt$ . Under a no arbitrage assumption, this portfolio can only earn the riskless return, $r$ .

$$ d\Pi_t =r\Pi_t \ dt $$

Subtitute $d\Pi_t$ and $\Pi_t$ into, we would get the Partial Differential Equation (PDE) / Black-Scholes equation,

$$ – \bigg( \frac{\partial Y}{\partial t} + \frac{1}{2}\frac{\partial^2 Y}{\partial S^2} \sigma^2 S_t^2 \bigg) dt = r\bigg(- Y_t + \frac{\partial Y}{\partial S}S_t \bigg)dt $$

$$ rY_t = \frac{\partial Y}{\partial t} + \frac{1}{2}\frac{\partial^2 Y}{\partial S^2} \sigma^2 S_t^2 + \frac{\partial Y}{\partial S}S_t $$

Then, we guess (where $U(.)$ is a function of $S_t$ at time t=T),

$$ Y_t = e^{-r(T-t)} U(S_T) $$

For a European call with strike price, K, $U(S_T)$ would be the payoff at maturity,

$$ U(S_T) = max( S – K , 0 ) $$

Finally, through a series of process to find a specific solution of the PDE, we can solve the value of call, ( $\Phi(\cdot)$ is the cumulative standard normal distribtion)

$$ c = S_t\Phi(d_1) – K e^{-r (T-t)}\Phi(d_2) $$

with,

$$ d_1 =\frac{ log(S_t/K) + (r-\frac{1}{2}\sigma^2)(T-t)}{\sigma \sqrt{T-t}} $$

$$ d_1 = \sigma \sqrt{T-t} $$

42-2010_Handbook Download

Volatility Forecast – Ornstein Uhlenbeck Process

VolatilityForecast_OrnsteinUhlenbeck Download

See notes and the realisation.

We assume the volatility of returns follows a stochastic process. Define it as the Ornstein-Uhlenbeck process,

$$dX_t = \kappa (\theta – X_t)dt +\sigma \ dW_t$$

What’s the distribution function of the Ornstein-Uhlenbeck process?

We could apply the MLE estimation to find the parameter of the above random process.

Recall our Vasicek Form Ornstein-Uhlenbeck process is like the following:

$$ dX_t = \kappa (\theta – X_t)dt +\sigma \ dW_t $$

Multiply both sides by $e^{kt}$ , then we get,

$$ e^{kt}dX_t = \kappa e^{kt} \theta \ dt – \kappa e^{kt} X_t \ dt +\sigma e^{kt}\ dW_t $$

We know that $d( e^{kt} X_t )=e^{kt}dX_t + k e^{kt}X_t dt$ , and substitute it inside. Then, we get,

$$ d(e^{kt}X_t)=e^{kt}\theta \ dt + e^{kt}\sigma \ dW_t $$

Take an integral from [0,T],

$$ \int_0^T d(e^{kt}X_t)= \int_0^T e^{kt}\theta \ dt + \int_0^T e^{kt}\sigma \ dW_t $$

$$ X_T = X_0 e^{-kT} +\theta (1-e^{-kT}) + \int_0^T e^{-k(T-t)}\sigma \ dW_t $$

$\int_0^T e^{-k(T-t)}\sigma \ dW_t \sim N(0, \sigma^2\int_0^T e^{-2k(T-t)}dt)$

We then find $\mathbb{E}(X_T)$ and $Var(X_T)$ .

$ \mathbb{E}(X_T) =\mathbb{E}\bigg( X_0 e^{-kT} +\theta (1-e^{-kT}) + \int_0^T e^{-k(T-t)}\sigma \ dW_t \bigg)$

$ \mathbb{E}(X_T)= X_0 e^{-kT} +\theta (1-e^{-kT}) $

$Var(X_T) = \mathbb{E}\bigg( \big( X_T – \mathbb{E}(X_T) \big)^2 \bigg)$

$Var(X_T)= \mathbb{E}\bigg( \big( \int_0^T e^{-k(T-t)}\sigma \ dW_t \big)^2 \bigg) $

By Ito’s Isometry: $I(t)=\int_0^t \Delta(s)dW_s$ , then

$Var[I(t)]=\mathbb{E}[(I^2(t))]=\int_0^t \Delta^2(s)ds$

Then,

$ Var(X_T)= \int_0^T e^{-2k(T-t)}\sigma^2 \ dt $

$ Var(X_T)= \frac{\sigma^2}{2k}\big( 1-e^{-2kT} \big) $

Therefore, we finally get,

$ X_T \sim N\bigg(X_0 e^{-kT} +\theta (1-e^{-kT}), \frac{\sigma^2}{2k}\big( 1-e^{-2kT} \big) \bigg)$

MLE of Ornsterin-Uhlenbeck Process

$X_t \sim N(\cdot,\cdot)$

,where $\mathbb{E}(X_{t+\delta t})=X_0 e^{-k\ \delta t} +\theta (1-e^{-k\ \delta t})$ , and $Var(X_{t+\delta t})= \frac{\sigma^2}{2k}(1-e^{-2k\ \delta t})$ .

$$ f_{\theta}(x_{t+\delta t}|\theta)=\frac{1}{\sqrt{2\pi \sigma^2}}e^{-\frac{(x-\mu)^2}{2\sigma^2}} $$

We replace $\mu = \mathbb{E}(X_{t+\delta t})$ and $\sigma^2 = Var(X_{t+\delta t})$ insider.

Code realisation could be found in notes.

Maximum Likelihood Estimator

Joint Probability Density Function or Likelihood Function:

$f(x_1, x_2, …, x_n|\theta) = f(x_1|\theta) f(x_2|\theta) … f(x_N| \theta) $

$f(x_1, x_2, …, x_n|\theta) = \prod_{i=2}^n f(X_i | \theta) =L(\theta)$

A likelihood function is the density function regarded as a function of $\theta$ .

$ L(\theta |x) = \prod f(x|\theta) $, $\theta \in \Theta$

As we know the sample, we need to ‘guess’ the parameter $\theta$ to let the joint prob distribution have a maximum probability.

The Maximum Likelihood Estimator (MLE) is :

$ \hat{\theta}(x) = arg\max_{\theta} L(\theta|x) $

To simplify the calculation, we normally apply a logarithm transformation,

$$ l(\theta)= log\ L $$

$$log L(\theta |x) =log \prod f(x|\theta)= \sum_{i=1}^N log\ f_{\theta} (y_i) $$

As the logarithm transformation is monotonous,

$$ \hat{\theta}(x) = arg\max_{\theta} L(\theta|x) \Leftrightarrow $$

$$\hat{\theta}(x) = arg\max_{\theta} l(\theta|x) $$

To review it, we aim to estimate the parameters. The logic of the MLE estimator is that we guess a probability distribution, fitting into our data (sample). Then, we find or guess the value of parameters such that the joint probability function (likelihood function) achieves the maximum value.

In another word, we guess the probability dist and parameters to make sample data achieve highest possibility.

To find the $argmax_{\theta} l(\theta)$ is equivalent to $argmin_{\theta} -l(\theta)$

A Normal Distribution Example

A normal distribution example,for random sample $x_i$ , i=1, 2, 3, …, N

$$f_{\theta}(x) = \frac{1}{\sqrt{2\pi \sigma^2}}e^{-\frac{(x-\mu)^2}{2\sigma^2}} $$

We substitute $f_{\theta}(x)$ into the log-likelihood distribution, then we find,

$$ l(\mu, \sigma^2)=log\ L(\mu,\sigma^2) $$

$$= -\frac{n}{2}\big( log2\pi + log\sigma^2 \big) -\frac{1}{2\sigma^2}\sum_{i=1}^n (x_i -\mu)^2 $$

Taking the First Order Conditions, F.O.C.

By setting $\frac{\partial l}{\partial \mu}= 0$ and $\frac{\partial l}{\partial \sigma^2}= 0$ , we solve $\mu_{MLE}$ and $\sigma^2_{MLE}$ .

$$ \mu_{MLE} =\bar{x}$$

$$\sigma^2_{MLE} =\frac{1}{n} \sum_{i=1}^n (x_i – \bar{x})$$

P.S. We can prove that there are the local maximum, coz the second partial derivatives are negative.

def MLE_norm(x):
    mu_hat = np.mean(x)
    sigma2_hat = np.var(x)
    return mu_hat, sigma2_hat

#----------

mu = 5
sigma = 2.5
N = 10000

np.random.seed(0)
x = np.random.normal( mu, sigma, size=(N,) )

#----------

mu_hat, sigma2_hat = MLE_norm(x)

for_mu_hat = ' $\hat{\mu} = '+format(round(mu_hat,2))+'$ '
for_sigma2_hat = ' $\hat{\sigma} = '+format(round(np.sqrt(sigma2_hat),2))+'$ '
print('The MLE for data is:')
display(Latex(for_mu_hat))
display(Latex(for_sigma2_hat))

By our calculation, we know that mathematically the MLE estimator of mu and sigma are the mean and variance.

Perform MLE Numerically

If the log-likelihood function wasn’t continuous or differentiable. Can solve numerically through an optimisation problem where the objective function is …

In other words, sometimes we cannot calculate the MLE estimator by hand, because the log-likelihood function might not be differentiable. What can we do?

We apply the computer to interactively ‘guess’ values (Optimizer) that can end up with the maximum log-likelihood (or a minimum of negative log-likelihood).

The maximum likelihood estimator (MLE):

$$\hat{\theta}(x) = arg max_{\theta} L(\theta|x)$$

$$L(\theta) = \prod_{i=1}^N f(X_i | \theta) $$

$$ log\ f(X|\theta)=\sum_{i=1}^N log\ f(x_i|\theta)$$

$$ arg\max_{\theta} l(\theta) \Leftrightarrow arg\min_{\theta} -l(\theta) $$

Why we use the minimizer instead? There is only minimizer available in the scipy package, and those mini&max are basically equivalent.

def log_likelihood(theta, x):
    mu = theta[0]
    sigma = theta[1]
    l_theta = np.sum( np.log( sc.stats.norm.pdf(x, loc=mu, scale=sigma) ) )
    # there is a negative sign in front. We minimise it later.
    return -l_theta 

# Constraint Function that Restrict sigma to be positive.
def sigma_pos(theta):
    sigma = np.abs(theta[1])
    return sigma

#--------

cons_set = {'type':'ineq', 'fun': sigma_pos}

theta0 = [2,3] # inital guess. there could be any random numbers, coz it is just the intital value inputed into the optimiser
# find the minimum value of the log likelihood function
opt = sc.optimize.minimize(fun=log_likelihood, x0=theta0, args=(x,), constraints=cons_set)

for_mu_hat = ' $\hat{\mu} = '+format(round(opt.x[0],2))+'$ '
for_sigma2_hat = ' $\hat{\sigma} = '+format(round(opt.x[1],2))+'$ '

print('The MLE for data is:')
display(Latex(for_mu_hat))
display(Latex(for_sigma2_hat))

Maximum-Likelihood-Estimator Download

Risk-Neutral Pricing

Our basic logic is,

$$ \text{Option Value} =\text{Discounted Expectation of the Payoff} $$

However, there is not a perfect correlation between the Option and the Underlying Asset. That fact encourage us to do the risk-netural method for valuing an option.

Geometric Brownian Motion

Non-dividend paying stock, ( $S_0>0$ )

$$ dS_t = \mu S_t\ dt+\sigma S_t \ dW_t $$

Money market, or bank account, ( $B_0 = 1$ )

$$ dB_t = r B_t\ d_t $$

and,

$$ \{ W_t \}_{t\in [0,T]} : \mathbb{P} \sim Brownian \ Motion $$

$ \{S_t\} $ is not a martingals under $\mathbb{P}$ because $\mu S_t$ is not zero.

So, we apply a transformation to $S_t$ and to make it be a martingal (eliminate the drift term).

We here look at $\{ \frac{S_t}{B_t} \}$ ,

$$ d\big( \frac{S_t}{B_t} \big)=-r\frac{S_t}{B_t}dt +\frac{1}{B_t} dS_t +\frac{1}{2}dS_t (-r\frac{1}{B_t}dt) $$

$$ d\big( \frac{S_t}{B_t} \big)=-r\frac{S_t}{B_t}dt +\bigg( \frac{1}{B_t}-\frac{1}{2}r\frac{1}{B_t}dt \bigg)\bigg( \mu S_t\ dt +\sigma S_t \ dW_t \bigg) $$

The cross terms dt & dW would decay to zero quickly in the stochastic integration. We therefore would get,

$$ d\big( \frac{S_t}{B_t} \big) = \sigma ( \frac{S_t}{B_t} )\underbrace{\bigg( \frac{\mu – r}{\sigma} dt + dW_t \bigg)}_{d\tilde{W_t}} $$

The drift disappears, instead we get $d\big( \frac{S_t}{B_t} \big)d\tilde{W_t}$

$$ d\big( \frac{S_t}{B_t} \big)d\tilde{W_t} $$

$$ d\tilde{W_t} = \frac{\mu -r}{\sigma} dt +dW_t $$

by Girsanov’s Theorem that $\{a_t \}_{t\in [0,T]}$ be and adapted $\{ \mathcal{F} \}_{t\in [0,T]}$ Ito Process, so that $\mathbb{Q}$ is an equivalent measure on $\mathbb{P}$ such that $\tilde{W_t}$ is a Brownian Motion, $\tilde{W_t} = W_t +\int_0^t a_u du$ .

$$ dS_t = rS_t \ d_t +\sigma S_t \ d\tilde{W_t} $$

Implication

The implication is that,

We can construct a relicating portfolio $(S_t, B_t)$ with value process $\{ V_t \}_{t\in[0,T]}$ . Under $\mathbb{Q}$ the discounted value process exists that is a martingale $\{ e^{-rt}V_t \}$ .

$$ e^{-rt} C_t =\mathbb{E}_{\mathbb{Q}}[ e^{-rT} C_T | \mathcal{F}_t ] $$

$$ =\mathbb{E}_{\mathbb{Q}}[ e^{-rT} V_T | \mathcal{F}_t ] = e^{-rt} V_t $$

Risk Neutral Expectation Pricing Formula

We therefore get,

$$ \frac{C_t}{B_t} = \mathbb{E}_{\mathbb{Q}}\bigg[ \frac{C_T}{B_t} \bigg| \mathcal{F}_t \bigg] $$

$$ B_T=1,\quad and \quad B_t=exp\{ \int_0^t r_s ds \} $$

Recursive Least Squared Method

Notation

Optimisation

Summary

Reference

Firstly, let’s introduce the odds.

Odd and Probability

Example

Formula

Logit

Statement

Proof & Example

What do we do?

Value at Risk – VaR

Notation

Loss Distribution

Expected Shortfall (ES)

Notation

Derivation

Notation Form

VaR Form

Morden Portfolio Theory

Optimisation

Portfolio Risk Measures

1. Trinomial Random Walk

2. Transition Probability Density Function

3. From the Trinomial model to the Transition Probability Density function

3.1 The Forward Equation

3.2 Taylor Series Expansion

3.3 Backward Equation works similar.

4. Solve the Forward Kolmogorov Equation

4. 1 Assume a Solution Form

4.2 Derivation

4.3 Choose b

4.4 Choose a

4.5 Integrate! Solve it!

Integrate 1st Time

Integrate 2nd Time

Find A

5. Summary

What’s the distribution function of the Ornstein-Uhlenbeck process?

MLE of Ornsterin-Uhlenbeck Process

A Normal Distribution Example

Perform MLE Numerically

Geometric Brownian Motion

Implication

Risk Neutral Expectation Pricing Formula

4.3 Choose $b$

4.4 Choose $a$

Find $A$