Simulating Geometric Brownian Motion (GBM)

Here is a realisation of the previous blog, A bit Stochastic Calculus.


  • 1. Stochastic Integral is different from Ordinary Integral, so as to differentiation. The Stochastic Differential Equation, S.D.E. we normally use to mimic the movement of a stock is as the following,

$$ d S=\mu S d + \sigma S dW$$

, where dW follows a Brownian Motion.

The Brownian Motion also has the following critical properties:

  1. Martingale: E(S_{t+k}|F_t)=E(S_t|F_t) for k>1.
  2. Quadratic Variation: \sum_{i=1}^n (S_i-S_{i-1})^2 \to t
  3. Normality: An increment of S_{t+dt} is normally distributed, with mean zero and variance t_i – t_{i-1}
  • 2. What does Ito’s Lemma tell us?

If the stock price, S_t, follows a stochastic process, then the financial contracts F(S_t), as a function of the underlying stock price, follow another stochastic process.

  • 3. Taylor Expansion helps

See notes about Paul Willmott’s book’s paragraph.

  • 4. Different Forms of Stochastic Processes could be applied.

Example 1: dS=\mu \ dt + \sigma \ dX

Example 2: dS=\mu S \ dt +\sigma S \ dX

For this, dV = \frac{\partial V}{\partial t}dt+\frac{\partial V}{\partial S}dS+\frac{1}{2}\sigma^2 S^2 \frac{\partial^2 V}{\partial S^2}dt, which is also called the Geometric Brownian Motion (GBM). If in a specific form of value function V=F(S)=ln(S), then dF = (\mu -\frac{1}{2}\sigma^2)dt + \sigma \ dX. See derivation in attached notes.

Example 3: A mean-reverting random walk. dS=(v-\mu S)dt +\sigma dX

Example 4: Another mean-reverting r.w. dS=(v-\mu S)dt +\sigma S^{1/2}dX

The stochastic term is altered compared with example 3. Now if S ever gets close to zero the randomness decreases,

A bit Stochastic Calculus

See the HTML file for full detals.

Key Takeaways

  • Property of \{W_t\}:
  1. $ W_t – W_s \sim N(0, t-s) $
  2. $(W_t – W_{t-1})$ and $W_{t-i} – W_{t-i+1}$ are uncorrelated. So, $\int_0^t dW_u = \sum^t dW_u = W_t$
  • For,

$$ dS_t = \mu S_t \ dt +\sigma S_t \ dW_t $$

Why the Geometric Brownian Motion of \{S_t\} is designed in that form?

The answer might be,

$$ dS_t /S_t = \mu \ dt +\sigma \ dW_t $$

$$\int_0^t dS_t /S_t = \int_0^t \mu \ dt + \int_0^t \sigma \ dW_t $$

$$ log(S_t) = \mu \ t +\sigma \ W_t $$

Taking the first difference is similar to differentiation. (d(log(S_T)) = log(S_t/S_t-1) = log(1+r_t) \ approx r_t).

$$ r_t = \mu + \sigma \ \Delta W_t $$

The return, \{r_t\}, is equal to a mean, \mu, plus a stochastic term. That is a random walk.

  • We apply a transformation, if S_t follows a Geometric Brownian Motion, then f(S_t) follows another,

In calculating d f(S_t), we would get, (by Taylor Expansion)

$$ df(S_t) = \bigg( \frac{\partial f}{\partial t} + \frac{\partial f}{\partial S_t}\mu S_t +\frac{1}{2}\frac{\partial^2 f}{\partial S_t^2}\sigma^2 S_t^2 \bigg)dt + \frac{\partial f}{\partial S_t}\sigma S_t \ dW_t $$

  • A Special Form of f(\cdot) is f(S) = log(S),

$$ d\ log(S_t) = \bigg( \mu – \frac{1}{2}\sigma^2 \bigg)dt + \sigma \ dW_t $$

Integrate the above equation from time 0 to time t, then we would get,

$$ log(S_t) = log(S_0) + (\mu – \frac{1}{2}\sigma^2)t + \sigma W_t $$

Brownian Motion to Normal Distribution

Codes are shown in the HTML file.

Markov Property

The Markov Property states that the expected value of the random variable \(S_i\) conditional upon all of the past events only depends on the previous value \(S_{i−1}\). The implication of the Markov Property is that the

Martingale Property

best expectation of future price, \((S_{t+k}\), is the current price, \(S_t\). The current price contains all the information until now.

$$ \mathbb{E}(S_{t+k}|F_t)=S_t, \text{ for k>=0} $$

That is called the Martingale Property.

Quadratic Variation

The Quadratic Variation is defined as,

$$ \sum_{i=1}^k (S_i – S_{i-1})^2 $$

In the coin toss case, each movement must be either “1” or “-1”, so \(S_i – S_{i-1} = \pm 1\). Thus, \((S_i – S_{i-1})^2 =1\) for all “i”. Therefore, the Quadratic Variation equals “k”.

A Realisation

Here below is the Code Realisation of a Brownian motion – Chapter 5.6 Paul Wilmott Introduces Quantitative Finance.


The notes are not only a review for the preparation of quants but also hopefully a learning note for my babe.

Key Points

1. Preliminaries

Firstly and most importantly, I need to declare what is statistics and why shall we learn statistics. The following is only based on my own understanding. My understanding is pretty limited (I got only a master’s degree, so I am definitely not an expert in Statistics) and subjective, please provide your suggestions and even your blames to me. Glad to know your ideas.

1.1 What is Statistics?

From my understanding, Statistics is a tool, characterized by mathematics, to explain the world. Such a bull shit am I talking.

Be serious. I may say that statistics is a process to estimate the population by samples.

To do a study about the population is always costly, and pretty much unpredictable. For example, to do tests in the individual level, we have to collect data from all the people. The population census could only be done in a national level and conducted by the gov. Even so, the census is unable to be performed in a year-by-year basis, and there are measurement errors always. Thus, a more cost-effective way would be to estimate the population through the data from a small set of people who are randomly selected.

Another example could be the weather forecast, which is similar to doing a time series analysis or panel data analysis. The forecast may most likely be biased because things change unpredictably and irregularly. So we may say that is even impossible to and the full data to estimate the population (factors related to weather in this case). Thus, a simpler way might be that we collect different factors and historical data such as temperature, because we may assume the temperature changes are consistent over a short period of time.

However, there are gaps between population and sample. How could we connect those gaps? The answer is Statistics. Statistics provide some mathematical proven methods to make the sample have a better capture of the population, based on assumptions.

Let’s begin our study.

2. Probability

2.1 Conditional Probability

P(A|B)=\frac{P(A\cap B)}{P(B)}\\ \\ P(A\cap B)=P(A|B)\times P(B)

2.2 Mutually Exclusive and Independent Events

P(A\cap B)=0 \\ \\ P(A\cup B)=P(A)+P(B)

If two events are independent, then

P(A|B)=P(A) \\ \\ so, \quad P(A\cap B)=P(A)\times P(B)

3. Random Variables – r.v.

3.1 Definition

Random Variables: X, Y, Z

Observations: x,y,z

3.2 Probability Mass/Density Funciton – p.d.f. (For Discrete r.v. or Continuous r.v.)
3.2.1 Definition

p.d.f captures the probability that a r.v. X has a given value of x.


3.2.2 Properties of p.d.f.
  1. \(f(x)\geq 0\), since probability is always positive.
  2. \(\int_{-\infty}^{+\infty} f(x)\ dx=1\)
  3. \(P(a<X<b)=\int_a^b f(x) \ dx\)

Replace the integral with summation for discrete r.v.

P.S. For continuous r.v. X, P(X=x)=0. That means for a continuous r.v., any points on the p.d.f have a zero probability.

For example, the probability of selecting a number “3” among 1 to 10 is zero.

3.3 Cumulative Distribution Function – c.d.f

F(X)=P(X\leq x) \\ \\ f(x)=\frac{d}{dx} F(x)

3.4 Expectation

E(X)=\mu \\ \\ E(X)=\int_{dominX}x\cdot f(x)\ dx=\sum_x x\cdot P(X=x) \\

3.5 Variance and Standard Deviation

Var(X)=\sigma^2\\ \\ Var(X)=E(X-E(X))=E(X^2)-(E(X))^2 \\ =\frac{\sum (x-\mu)^2}{n}=\frac{\sum x^2}{n}-\mu^2

3.6 Moments

The first moment, \(E(X)=\mu\).

The n^{th} moment, \(E(X^n)=\int_x x^n\ f(x)\ dx\).

The second central moment is about mean. \(E(X-E(X))=\sigma^2\), Variance.

The third central moment, \(E(X-E(X))^3. Skewness = \frac{E(X-E(X))^3}{\sigma^3}\). Standard normal dist has a Skewness of 0. (Right or Left Tails)

The Fourth central moment, \(E(X-E(X))^4. Kurtosis = \frac{E(X-E(X))^4}{\sigma^4}\). Standard normal dist has a Kurtosis of 3. (Fat or This, Tall or Short).

3.7 Covariance

Cov(X,Y)=E[(X-E(X))(Y-E((Y))]\\ =E(XY)-E(X)E(Y)

4. Distribution

The meaning of distributions, and the properties (mean & var).

4.1 Bernoulli DIst
4.2 Binomial Dist
4.3 Possion Dist

X \sim Possion(\lambda)\\ \\p.d.f \quad P(X=x)=\frac{e^{-\lambda}\lambda^x}{x!}\\ E(X)=\lambda,\quad Var(X)=\lambda

4.4 Normal Dist & Standard Normal

X\sim N(\mu,\sigma^2)\\ \\ p.d.f. \quad f(x)=\frac{1}{\sqrt{2\pi \sigma^2}}exp\frac{(x-\mu)^2}{2\sigma^2}

For a standard normal dist,

X\sim N(0,1)\\ \\ E(X)=0,\quad Var(X)=1

4.4.1 Standardisation


4.4.2 Properties of Normal Dist

One / Two /Three standard deviation regions.

5. Central Limit Theorem – CLM

i.i.d. – independent identical distributed

Suppose X_1,X_2,…,X_n are n independent r.v., each has the same distribution, and as the number n increases, the distribtuion of

X_1+X_2+…+X_n\\\\ \text{and,}\\\\ \frac{X_1+X_2+…+X_n}{n}

would behave like a normal distribution.

Key facts:

  1. The distribution of X is not stated. We do not have to restrict the distribution of r.v.s, as long as they are in the same dist.
  2. If X is a r.v. with mean \(\mu\) and standard deviation \(\sigma\) from a random dist, the CLT tells that the distribution of the sample mean, \(\bar{X}\) is normal dist.

E(\bar{X})=E(\frac{\sum X}{n})=\frac{\sum E(X)}{n}\\=\frac{n\mu}{n}=\mu\\ \\ $$

$$Var(\bar{X})=Var(\frac{\sum X}{n})=\frac{\sum Var(X)}{n^2}\\=\frac{n\sigma^2}{n^2}=\frac{\sigma^2}{n}\\

Therefore, we would get the distribution of \bar{X},

\bar{X}\sim N(\mu,\frac{\sigma^2}{n})

By standardising it,

\frac{\bar{X}-\mu}{\frac{\sigma}{\sqrt{n}}}\sim N(0,1)

Also, for S_n=X_1+X_2+…+X_n.

S_n \sim N(n\mu,n\sigma^2)\\\\ \frac{S_n-n\mu}{\sqrt{n}\sigma}

The more observations there are, the more similar the distribution to normal would be. Also, a less standard deviation means the estimate has fewer variations and is more accurate.

Why is CLT important?

It is important because it provides a way to use repeated observations to estimate the whole population, which is impossible to be observed.

6. A Few Notations

Recall, our aim of using statistics is to find the true population. We may assume the true population follows a distribution, and that distribution has some parameters. What we are doing right now is to use the sample data (feasibly collectible) to presume the population parameters.

  • Estimator: a function, using sample or available data, to estimate the population. i.e. \(\bar{x}\) and \(S^2\).
  • Estimate: the value/figure we truly calculated. By inputting data into the estimator, the output is the estimate.
Population (Population Parameters that we want to get but can never get)

Population Mean: \(\mu=\frac{\sum x_i}{N}\).

Population Variance: \(\sigma^2=\frac{\sum (x_i-\mu)^2}{N}\).

Sample Estimator

Sample Mean: \(\bar{x}=\frac{\sum x_i}{N}\).

Sample Variance: \(\hat{\sigma}^2=\frac{\sum (x_i-\bar{x})^2}{N}\).

Throw data into sample estimators would get the estimates, and those estimates are then applied to presume the population parameters.

Remember that sample is only part of the population, we collect data from the sample because they are more accessible and feasible to get. Still, we need to use our sample data to be representative of the population, or in another word, to have some foreseers about the whole population. Therefore, we use a different notation for sample statistics.

An important aspect is that we need our sample to have better representativeness of the population. There are some measurements.

6.1 Unbiasedness

If \(E(\bar{X})=\mu, \text{or} \ E(S^2)=\sigma^2 \)(the expectation of our sample estimate is equal to the population), then we would say the estimator is unbiased.

The unbiased estimator of sample variance is \(S^2=\frac{\sum (x_i-\bar{x})^2}{n-1}\).


Why the denominator is “n-1”?

There would be a long discussion to talk about that. We can simply understand “-1” as the adjustment of the \(\bar{x}\) in the numerator because \(\bar{x}\) is calculated to represent the population mean \(\mu\) and \(\bar{x}\) is not intrinsically available (it is costly, to save for the cost, the denominator has a deduction).

In sum, \(S^2\) is an unbiased estimator of population variance, \(\sigma^2\). We also have a special name for the sample standard deviation, Standard Error, s.e..

6.2 Consistency

If there is an estimator such that as \(n\rightarrow \infty\) , the estimator goes close to the population parameter, we may say that estimator is consistent.

For example, although \(\hat{\sigma}^2=\frac{\sum (x_i-\mu)^2}{N}\) is biased, it is consistent if the number of observation keeps increasing.

Flaws of discussion is available in this section, awaiting to be updated.

7. Estimation

7.1 Maximum Likelihood Estimation – MLE

By assuming a probability distribution of the r.v. X, fitting into sample observations and trying to find the parameters that can maximise the joint probability (likelihood function).

To illustrate the problem, we need to find the parameters \lambda that can maximise the likelihood function.

\lambda_0=\text{arg}\max_{\lambda}\ L(\lambda;x)

The value of the parameters \(\lambda_0\) is our MLE estimator. (Remember what estimator is? See section 6).

For example

Assume r.v. \(X\sim N(\mu,\sigma^2)\). Let \(x_1,x_2,…,x_n\) be a random sample of i.i.d. observations. We use MLE to find the value of \(\mu\) and \(\sigma^2\). So, we need to maximise the log-likelihood function (instead of using the likelihood function, we do a logarithm transformation for easier calculation. Because the log transformation is monotonic, the transformation is legal).

\begin{align*} f(x_1,x_2,…,x_n;\mu,\sigma^2)&=f(x_1,\mu,\sigma^2)f(x_2,\mu,\sigma^2)…f(x_n,\mu,\sigma^2)\\ \text{Let}\\L(\mu,\sigma^2;x_1,x_2,…,x_n)&=log \ l(\mu,\sigma^2;x_1,x_2,…,x_n)\\ &=log\ f(x_1;\mu,\sigma^2)+log\ f(x_2;\mu,\sigma^2)+…+log\ f(x_n;\mu,\sigma^2)\\ &=\sum_{i=1}^N log\ f(x_i;\mu,\sigma^2) \\ \text{Plug in }f(x;\mu,\sigma^2)=\frac{1}{\sqrt{2\pi \sigma^2}}exp\frac{(x-\mu)^2}{2\sigma^2} \\ L(\mu,\sigma^2;x_1,…,x_n)&=log\ [\sum \frac{1}{\sqrt{2\pi \sigma^2}}exp\frac{(x-\mu)^2}{2\sigma^2} ] \\ &=-\frac{n}{2}log\ (2\pi)-n\cdot log\ (\sigma)-\frac{1}{2\sigma^2}\sum (x_i-\mu)^2 \end{align*}


\hat{\mu}_{MLE}=\frac{1}{n}\sum x_i \\ \hat{\sigma^2}_{MLE}=\frac{1}{n}\sum (x_i-\mu)^2

We would find the MLE estimators are the same as the OLS estimator in the following section.

7.2 Regression

Assume a linear model through which we can have a minimum sum mean squared.

\hat{\beta}_{all}=arg\min_{\beta_{all}}\sum(y_i-\hat{y_i})^2\\ \Leftrightarrow\\ \hat{\beta}=arg\min_{\beta}(Y-\hat{Y})'(Y-\hat{Y})\\ \\

, where

\hat{y}=\hat{\beta_0}+\hat{\beta_1}x_1+…+\hat{\beta_k}\\ or,\quad \hat{Y}=X\hat{\beta}




For the preparation of Quants

1. Functions Definition

1.1 Each x has only one y

A function denoted f (x) of a single variable x is a rule that assigns each element of a set X ( written x \in X ) to exactly one element y of a set Y ( y\in Y) :
y=f(x)\quad or \quad x\rightarrow f(x)

1.2 Domain of f

$Dom f$ Domain of Function

$Im f$ Image of Function

For a given value of x, there should be at most one value of y.

1.3 Implicit Form f(x,y)=0

For example,

1.4 Polynomials


2. Implicit Differentiation

For example,
Mainly two ways to take derivatives,
ln(y)=ln(a^x)=xln(a) \
\frac{1}{y}\frac{dy}{dx}=ln(a)\quad\text{by taking derivatives to x}\
\Rightarrow \frac{dy}{dx}=y\cdot ln(a) \
and plug y=a^x inside
\frac{dy}{dx}=a^x\cdot ln(a)
Or, simply we apply the exponential transformation, and take deriviatives later.
y=e^{ln(a^x)}=e^{x\cdot ln(a)}
However, for a polynomial, we normally have to do the implicit differentiation.
4y^4-2y^2x^2-yx^2+x^2+3=0 \\
16y^3y’-(4y’yx^2+4y^2x)-(y’x^2+2yx)+2x=0 $$
$$(16y^3-2yx^2-x^2)y’=-2x+4y^2x+2xy \\
\Rightarrow y’=\frac{-2x+4y^2x+2xy}{16y^3-2yx^2-x^2}

3. L ‘Hospital’s Rule & Limitations

If there is a limitation (, which is called as the inderterminate form),
\lim_{x \rightarrow a} \frac{f(x)}{g(x)}\equiv \frac{0}{0} \ or \ \frac{\infty}{\infty}
then, it could be calculated as,
\lim_{x \rightarrow a} \frac{f(x)}{g(x)}=\lim_{x \rightarrow a} \frac{f'(x)}{g'(x)}=\lim_{x \rightarrow a} \frac{f”(x)}{g”(x)}=…=\lim_{x \rightarrow a} \frac{f^{(n)}(x)}{g^{(n)}(x)}
For example, \frac{sin(x)}{x}, at x \rightarrow 0.

4. Taylor Series

Approximate a function a certain point, by a series of terms.(detailing explaination sees Blog Section 6 )

We use the 1st, 2nd, 3rd, 4th, … n^th derivatives, etc, to approximate the function at a certain value.
f(x)\approx f(x_0)+(x-x_0)f'(x)|_{x=x_0}+\frac{1}{2}(x-x_0)f”(x)|_{x=x_0}+…+\frac{1}{n!}f^{(n)}(x)|_{x=x_0}(x-x_0)^n

For example, e^x at x=0.

5. Integration

5.1 Intergration by Parts

y=u(x)v(x) \
y’=u’\cdot v +u\cdot v’ \
u’v=y’-uv’ \

and then integrate from both sides,
\int u’v dx=\int y’ dx-\int uv’ dx
as \int y’ dx = y+C, so we would get,
\int u’v\cdot dx=\int v\cdot du=y-\int u\cdot dv +C
For example,
\int xe^x\cdot dx=\int x\cdot de^x \
=xe^x-\int e^x\cdot dx +(C) \

5.2 Reduction Formula

We define a integral, (I_n is called Gamma Function)
\int_0^{\infty}e^{-t}t^n\cdot dt= I_n
$n$ is determined as the subscript of $I_n$, and could be treated as a constant in that integral.

We integrate that formula, and would get,
n\int_0^{\infty}e^{-t}t^{n-1}dt=I_n \
n\cdot I_{n-1}=I_n
If we keep doing that, we would get,
I_n= n\cdot I_{n-1}=n(n-1)I_{n-2}=…=n!\cdot I_0
I_0=\int_0^{\infty}e^{-t}\cdot dt=1
so we get,
I_n=n!\cdot I_0=n!

5.3 Other Tips

5.3.1 ln|f(x)|

\int \frac{f'(x)}{f(x)}=ln|x|+C

For example,
\int \frac{x}{1+x^2}dx\
=\frac{1}{2}\int\frac{1}{1+x^2}dx^2=\frac{1}{2}\int\frac{1}{1+x^2}d(1+x^2) \

5.3.2 Decompose the Fraction – Factorisation

For example,
A=\frac{1}{5},\quad B=-\frac{1}{5}
The further implication is that.

Any rational expression \frac{f(x)}{g(x)}, ( with degree of f(x) < degree of g(x)), could be rewritten as.
\frac{f(x)}{g(x)}\equiv F_1+F_2 +…+F_k
, where each F_i Is,
F_i=\frac{A}{(px+q)^m}\quad or\quad \frac{Ax+B}{(px+q)^m}

6. Complex Number – i

6.1 Definition

i=\sqrt{-i}\quad, i^2=-1

and z could be expressed in polar co-ordinate form as,
z=r(cos \theta+i\ sin\theta)
, where
x=r\ cos\theta \quad, y=r\ sin\theta
The set of all complex numbers is denoted \mathbb{C}; and for any complex number z, we could write z \in \mathbb{C}. ( \mathbb{R} \subset \mathbb{C} ).

6.2 Modulus

The modulus of z donates |z| is defined as,

6.3 Complex Conjugate


For example, if z=x+iy, then \bar{z}=x-iy.

6.4 Polar Form

z=r(cos\ \theta+i\ sin \ \theta)=re^{i\theta}

by Euler’s Identity,
e^{i\theta}=cos\ \theta+i\ \sin\ \theta \
e^{-i\theta}=cos\ \theta-i\ \sin\ \theta \
|z|=r,\quad arg\ z=\theta

6.5 Euler’s Formula

The Euler’s Identity is shown as, by applying Taylor’s Expansion and by i^2=-1,
=(1-\frac{\theta^2}{2}+\frac{\theta^4}{4!}+…)+i\times(\theta-\frac{theta^3}{3!}+\frac{\theta^5}{5!}+…) $$
$$=cos\ \theta +i\ \sin\ \theta$$
Plug \(\theta = \pi\) into Euler’s Formula,
e^{i\pi}=cos\ \pi+ sin\ \pi\

7.Higher Derivatives

\frac{\partial^2 f}{\partial x^2}=f_{xx}=\frac{\partial}{\partial x}(\frac{\partial f}{\partial x}) \ \
\frac{\partial^2 f}{\partial x \partial y}=f_{xy}=\frac{\partial}{\partial y}(\frac{\partial f}{\partial x}), $$
\(f_{xy}=f_{yx}\) Sequence no matters if 2nd derivatives exist and continuous.


Hodrick Prescott Filter / HP Filter

We decompose a time series into two parts, one is the trend, and the other is the seasonality.

$$ y_t=g_t+c_t $$

, where \(g_t\) is the trend, and \(c_t\) represents seasonality. Or, one can understand those two components as a low-frequent part, and a high-frequent part.

The filer tells that,

$$\min_{g} \sum_i^N (y_i-g_i)^2+\lambda \sum_i^{N-1} (g_i^2-2g_{i+1}+g_{i+2}^2 )^2$$

$$\min_{g} \sum_i^N (y_i-g_i)^2+\lambda \sum_i^{N-1} [(g_i-g_{i+1})-(g_{i+1}-g_{i+2}]^2$$

,which can be also written as,

$$\min_{g} || y-g||^2+\lambda||\nabla^2 g||$$

We can see the first term represents how far the trend term \(g\) is away from the original series \(y\), and the second term means to smooth the trend term \(g\).

$$ g=argmin_g || y-g||^2+\lambda||\nabla^2 g||^2$$

We replace \( \nabla^2 g \) by \(Dg\).

$$ || y-g||^2+\lambda||D g||$$

$$ (y-g)^T (y-g) +\lambda (Dg)^T(Dg)$$

We take the first gradient (f.o.c.) to solve for the trend term \(g\).

$$ -(y-g)+\lambda D^TDg =0$$


$$ y=(I+\lambda D^T D)g $$


$$ g=(I+\lambda D^T D)^{-1}y $$