Duality

$$min\ f_0(x), x \in \mathbb{R}^n$$

$$s.t. f_i(x)\leq 0, \text{for i from 1 to m}$$

$$ \quad h_i(x)=0, \text{for i from 1 to q}$$

  • That is, in Lagrangian form,

$$L(x,\lambda,\gamma)=f_0((x)+\sum \lambda_i f_i (x) +\sum \gamma_i h_i(x)$$

$$ \min_{x} \max_{\lambda,\gamma} L(x,\lambda, \gamma) $$

$$s.t. \lambda \geq 0$$

  • The Duality Problem is,

$$g(\lambda, \gamma) = \min_{x} L(x,\lambda, \gamma)$$

$$\max_{\lambda, \gamma} g(\lambda, \gamma) $$

$$s.t. \nabla_x L(x,\lambda, \gamma)=0$$

$$\quad \lambda \geq 0$$

Why Duality?

We change the original problem in to the duality, which becomes a convexity optimisation problem.

Convexity Optimisation

  • The object function is convex. (or the negative of a concavity)
  • The feasible set is a convex set.

See further study.

KKT for Optimisation

Optimisation

Suppose f, g_1, g_2, …, g_m are differentiable.

$$ \max_{x_1, x_2,…, x_n} f(x_1, x_2, …, x_n) $$

$$s.t. g_1(x_1, x_2, …, x_n)\geq 0 $$

$$\quad g_2(x_1, x_2, …, x_n)\geq 0 $$

$$\quad ……$$

$$\quad g_m(x_1, x_2, …, x_n)\geq 0 $$

There are m constraints g_i, i \in (1,m), and one object function f to maximum. n variables.

KKT Conditions

Suppose the constraint qualification is satisfied. A local maximiser of the above problem \bar{x} = (\bar{x_1}, \bar{x_2},… \bar{x_n}) together with some \bar{lambda_1}, \bar{lambda_2}, …, \bar{lambda_m} \geq 0 satisfies the following:

$$ \frac{\partial f}{\partial x_i}(\bar{x}) + \lambda_1 \frac{\partial g_1}{\partial x_i}(\bar{x})+ \lambda_2 \frac{\partial g_2}{\partial x_i}(\bar{x}) +…+ \lambda_{m} \frac{\partial g_m}{\partial x_i}(\bar{x}) = 0$$

for i=1,2,…,n, and

$$\bar{\lambda_k} g_k(\bar{x})=0$$

for k=1,2,…,m

  • We have state “Suppose the constraint qualification is satisfied” in the beginning. What is the qualification?

See step 4 in the next section.

Solve the Maximisation Problem

  • Step 1. Step up the Lagrangian
  • Step 2. Write down the KKT conditions and constraints.
  • Step 3. Solve the equation set. (P.S. Discuss the complementary slackness, and confirm the possible solutions)
  • Step 4. Check constraint qualification.
  • Step 4.

Constraint Set, C=\{ (x_1, x_2, …, x_n)\in \mathbb{R}^n | g_k(x_1, x_2, …, x_m) \geq 0 \} for k=1,2,…,m.

The constraint qualification is satisfied at \bar{x} in C, if the constraints that hold at \bar{x} with equality are independent. That is, if the m vectors

$$ \nabla g_k(\bar{x})=\bigg( \frac{\partial g_k}{\partial x_1}(\bar{x}), \frac{\partial g_k}{\partial x_2}(\bar{x}),…,\frac{\partial g_k}{\partial x_n}(\bar{x}) \bigg)’ $$

for k=1,2,…,m are independent.

Black and Scholes, 1973

See A bit Stochastic Calculus .

For,

$$ dS_t = \mu S_t \ dt +\sigma S_t \ dW_t $$

  • In calculating d f(S_t), we would get, (by Taylor Expansion)

$$ df(S_t) = \bigg( \frac{\partial f}{\partial t} + \frac{\partial f}{\partial S_t}\mu S_t +\frac{1}{2}\frac{\partial^2 f}{\partial S_t^2}\sigma^2 S_t^2 \bigg)dt + \frac{\partial f}{\partial S_t}\sigma S_t \ dW_t $$

  • A Special Form of f(\cdot) is f(S) = log(S),

$$ d\ log(S_t) = \bigg( \mu – \frac{1}{2}\sigma^2 \bigg)dt + \sigma \ dW_t $$


We get Y_t = log S_t is the price of a derivative security with respect to S_t and t and then,

$$ dY_t= \bigg( \frac{\partial Y_t}{\partial t} + \frac{\partial Y_t}{\partial S_t}\mu S_t +\frac{1}{2}\frac{\partial^2 Y_t}{\partial S_t^2}\sigma^2 S_t^2 \bigg)dt + \frac{\partial Y_t}{\partial S_t}\sigma S_t \ dW_t $$

Consider a portfolio \Pi is constructed with (1) short one derivative, and (2) long some fraction of stocks, \Delta, such that the portfolio is risk natural. (\Delta = \frac{\partial Y}{\partial S})

$$ \Pi_t = -Y +\Delta \ S_t $$

Differentiate it,

$$ d\Pi_t = -dY_t +\frac{dY}{dS}dS_t $$

Subtitute dY_t and dS_t into the above equation, we would then get the stochastic process of portfolio, by Ito’s Lemma.

$$ d\Pi_t =-\bigg( \frac{\partial Y}{\partial t} + \frac{1}{2}\frac{\partial^2 Y}{\partial S^2} \sigma^2 S_t^2 \bigg) dt $$

The diffusion term dW_t disappears, and that means the portfolio is riskless during the interval dt. Under a no arbitrage assumption, this portfolio can only earn the riskless return, r.

$$ d\Pi_t =r\Pi_t \ dt $$

  • Subtitute d\Pi_t and \Pi_t into, we would get the Partial Differential Equation (PDE) / Black-Scholes equation,

$$ – \bigg( \frac{\partial Y}{\partial t} + \frac{1}{2}\frac{\partial^2 Y}{\partial S^2} \sigma^2 S_t^2 \bigg) dt = r\bigg(- Y_t + \frac{\partial Y}{\partial S}S_t \bigg)dt $$

$$ rY_t = \frac{\partial Y}{\partial t} + \frac{1}{2}\frac{\partial^2 Y}{\partial S^2} \sigma^2 S_t^2 + \frac{\partial Y}{\partial S}S_t $$

Then, we guess (where U(.) is a function of S_t at time t=T),

$$ Y_t = e^{-r(T-t)} U(S_T) $$

For a European call with strike price, K, U(S_T) would be the payoff at maturity,

$$ U(S_T) = max( S – K , 0 ) $$

Finally, through a series of process to find a specific solution of the PDE, we can solve the value of call, (\Phi(\cdot) is the cumulative standard normal distribtion)

$$ c = S_t\Phi(d_1) – K e^{-r (T-t)}\Phi(d_2) $$

with,

$$ d_1 =\frac{ log(S_t/K) + (r-\frac{1}{2}\sigma^2)(T-t)}{\sigma \sqrt{T-t}} $$

$$ d_1 = \sigma \sqrt{T-t} $$

Volatility Forecast – Ornstein Uhlenbeck Process

See notes and the realisation.

We assume the volatility of returns follows a stochastic process. Define it as the Ornstein-Uhlenbeck process,

$$dX_t = \kappa (\theta – X_t)dt +\sigma \ dW_t$$

What’s the distribution function of the Ornstein-Uhlenbeck process?

We could apply the MLE estimation to find the parameter of the above random process.

Recall our Vasicek Form Ornstein-Uhlenbeck process is like the following:

$$ dX_t = \kappa (\theta – X_t)dt +\sigma \ dW_t $$

Multiply both sides by e^{kt}, then we get,

$$ e^{kt}dX_t = \kappa e^{kt} \theta \ dt – \kappa e^{kt} X_t \ dt +\sigma e^{kt}\ dW_t $$

We know that d( e^{kt} X_t )=e^{kt}dX_t + k e^{kt}X_t dt, and substitute it inside. Then, we get,

$$ d(e^{kt}X_t)=e^{kt}\theta \ dt + e^{kt}\sigma \ dW_t $$

Take an integral from [0,T],

$$ \int_0^T d(e^{kt}X_t)= \int_0^T e^{kt}\theta \ dt + \int_0^T e^{kt}\sigma \ dW_t $$

$$ X_T = X_0 e^{-kT} +\theta (1-e^{-kT}) + \int_0^T e^{-k(T-t)}\sigma \ dW_t $$

$\int_0^T e^{-k(T-t)}\sigma \ dW_t \sim N(0, \sigma^2\int_0^T e^{-2k(T-t)}dt)$

We then find \mathbb{E}(X_T) and Var(X_T).

$ \mathbb{E}(X_T) =\mathbb{E}\bigg( X_0 e^{-kT} +\theta (1-e^{-kT}) + \int_0^T e^{-k(T-t)}\sigma \ dW_t \bigg)$

$ \mathbb{E}(X_T)= X_0 e^{-kT} +\theta (1-e^{-kT}) $

$Var(X_T) = \mathbb{E}\bigg( \big( X_T – \mathbb{E}(X_T) \big)^2 \bigg)$

$Var(X_T)= \mathbb{E}\bigg( \big( \int_0^T e^{-k(T-t)}\sigma \ dW_t \big)^2 \bigg) $

By Ito’s Isometry: I(t)=\int_0^t \Delta(s)dW_s, then

$Var[I(t)]=\mathbb{E}[(I^2(t))]=\int_0^t \Delta^2(s)ds$

Then,

$ Var(X_T)= \int_0^T e^{-2k(T-t)}\sigma^2 \ dt $

$ Var(X_T)= \frac{\sigma^2}{2k}\big( 1-e^{-2kT} \big) $

Therefore, we finally get,

$ X_T \sim N\bigg(X_0 e^{-kT} +\theta (1-e^{-kT}), \frac{\sigma^2}{2k}\big( 1-e^{-2kT} \big) \bigg)$

MLE of Ornsterin-Uhlenbeck Process

$X_t \sim N(\cdot,\cdot)$

,where \mathbb{E}(X_{t+\delta t})=X_0 e^{-k\ \delta t} +\theta (1-e^{-k\ \delta t}), and Var(X_{t+\delta t})= \frac{\sigma^2}{2k}(1-e^{-2k\ \delta t}).

$$ f_{\theta}(x_{t+\delta t}|\theta)=\frac{1}{\sqrt{2\pi \sigma^2}}e^{-\frac{(x-\mu)^2}{2\sigma^2}} $$

We replace \mu = \mathbb{E}(X_{t+\delta t}) and \sigma^2 = Var(X_{t+\delta t}) insider.

Code realisation could be found in notes.

Maximum Likelihood Estimator

Joint Probability Density Function or Likelihood Function:

$f(x_1, x_2, …, x_n|\theta) = f(x_1|\theta) f(x_2|\theta) … f(x_N| \theta) $

$f(x_1, x_2, …, x_n|\theta) = \prod_{i=2}^n f(X_i | \theta) =L(\theta)$

A likelihood function is the density function regarded as a function of \theta.

$ L(\theta |x) = \prod f(x|\theta) $, $\theta \in \Theta$

As we know the sample, we need to ‘guess’ the parameter \theta to let the joint prob distribution have a maximum probability.

The Maximum Likelihood Estimator (MLE) is :

$ \hat{\theta}(x) = arg\max_{\theta} L(\theta|x) $

To simplify the calculation, we normally apply a logarithm transformation,

$$ l(\theta)= log\ L $$

$$log L(\theta |x) =log \prod f(x|\theta)= \sum_{i=1}^N log\ f_{\theta} (y_i) $$

As the logarithm transformation is monotonous,

$$ \hat{\theta}(x) = arg\max_{\theta} L(\theta|x) \Leftrightarrow $$

$$\hat{\theta}(x) = arg\max_{\theta} l(\theta|x) $$

To review it, we aim to estimate the parameters. The logic of the MLE estimator is that we guess a probability distribution, fitting into our data (sample). Then, we find or guess the value of parameters such that the joint probability function (likelihood function) achieves the maximum value.

In another word, we guess the probability dist and parameters to make sample data achieve highest possibility.

To find the argmax_{\theta} l(\theta) is equivalent to argmin_{\theta} -l(\theta)

A Normal Distribution Example

A normal distribution example,for random sample x_i, i=1, 2, 3, …, N

$$f_{\theta}(x) = \frac{1}{\sqrt{2\pi \sigma^2}}e^{-\frac{(x-\mu)^2}{2\sigma^2}} $$

We substitute f_{\theta}(x) into the log-likelihood distribution, then we find,

$$ l(\mu, \sigma^2)=log\ L(\mu,\sigma^2) $$

$$= -\frac{n}{2}\big( log2\pi + log\sigma^2 \big) -\frac{1}{2\sigma^2}\sum_{i=1}^n (x_i -\mu)^2 $$

Taking the First Order Conditions, F.O.C.

By setting \frac{\partial l}{\partial \mu}= 0 and \frac{\partial l}{\partial \sigma^2}= 0, we solve \mu_{MLE} and \sigma^2_{MLE}.

$$ \mu_{MLE} =\bar{x}$$

$$\sigma^2_{MLE} =\frac{1}{n} \sum_{i=1}^n (x_i – \bar{x})$$

P.S. We can prove that there are the local maximum, coz the second partial derivatives are negative.

def MLE_norm(x):
    mu_hat = np.mean(x)
    sigma2_hat = np.var(x)
    return mu_hat, sigma2_hat

#----------

mu = 5
sigma = 2.5
N = 10000

np.random.seed(0)
x = np.random.normal( mu, sigma, size=(N,) )

#----------

mu_hat, sigma2_hat = MLE_norm(x)

for_mu_hat = '\hat{\mu} = '+format(round(mu_hat,2))+''
for_sigma2_hat = '\hat{\sigma} = '+format(round(np.sqrt(sigma2_hat),2))+''
print('The MLE for data is:')
display(Latex(for_mu_hat))
display(Latex(for_sigma2_hat))

By our calculation, we know that mathematically the MLE estimator of mu and sigma are the mean and variance.

Perform MLE Numerically

If the log-likelihood function wasn’t continuous or differentiable. Can solve numerically through an optimisation problem where the objective function is …

In other words, sometimes we cannot calculate the MLE estimator by hand, because the log-likelihood function might not be differentiable. What can we do?

We apply the computer to interactively ‘guess’ values (Optimizer) that can end up with the maximum log-likelihood (or a minimum of negative log-likelihood).

The maximum likelihood estimator (MLE):

$$\hat{\theta}(x) = arg max_{\theta} L(\theta|x)$$

$$\hat{\theta}(x) = arg max_{\theta} L(\theta|x)$$

$$L(\theta) = \prod_{i=1}^N f(X_i | \theta) $$

$$ log\ f(X|\theta)=\sum_{i=1}^N log\ f(x_i|\theta)$$

$$ arg\max_{\theta} l(\theta) \Leftrightarrow arg\min_{\theta} -l(\theta) $$

Why we use the minimizer instead? There is only minimizer available in the scipy package, and those mini&max are basically equivalent.

def log_likelihood(theta, x):
    mu = theta[0]
    sigma = theta[1]
    l_theta = np.sum( np.log( sc.stats.norm.pdf(x, loc=mu, scale=sigma) ) )
    # there is a negative sign in front. We minimise it later.
    return -l_theta 

# Constraint Function that Restrict sigma to be positive.
def sigma_pos(theta):
    sigma = np.abs(theta[1])
    return sigma

#--------

cons_set = {'type':'ineq', 'fun': sigma_pos}

theta0 = [2,3] # inital guess. there could be any random numbers, coz it is just the intital value inputed into the optimiser
# find the minimum value of the log likelihood function
opt = sc.optimize.minimize(fun=log_likelihood, x0=theta0, args=(x,), constraints=cons_set)

for_mu_hat = '\hat{\mu} = '+format(round(opt.x[0],2))+''
for_sigma2_hat = '\hat{\sigma} = '+format(round(opt.x[1],2))+''

print('The MLE for data is:')
display(Latex(for_mu_hat))
display(Latex(for_sigma2_hat))

Risk-Neutral Pricing

Our basic logic is,

$$ \text{Option Value} =\text{Discounted Expectation of the Payoff} $$

However, there is not a perfect correlation between the Option and the Underlying Asset. That fact encourage us to do the risk-netural method for valuing an option.

Geometric Brownian Motion

  • Non-dividend paying stock, (S_0>0)

$$ dS_t = \mu S_t\ dt+\sigma S_t \ dW_t $$

  • Money market, or bank account, (B_0 = 1)

$$ dB_t = r B_t\ d_t $$

and,

$$ \{ W_t \}_{t\in [0,T]} : \mathbb{P} \sim Brownian \ Motion $$

$ \{S_t\} $ is not a martingals under \mathbb{P} because \mu S_t is not zero.

So, we apply a transformation to S_t and to make it be a martingal (eliminate the drift term).

We here look at \{ \frac{S_t}{B_t} \},

$$ d\big( \frac{S_t}{B_t} \big)=-r\frac{S_t}{B_t}dt +\frac{1}{B_t} dS_t +\frac{1}{2}dS_t (-r\frac{1}{B_t}dt) $$

$$ d\big( \frac{S_t}{B_t} \big)=-r\frac{S_t}{B_t}dt +\bigg( \frac{1}{B_t}-\frac{1}{2}r\frac{1}{B_t}dt \bigg)\bigg( \mu S_t\ dt +\sigma S_t \ dW_t \bigg) $$

The cross terms dt & dW would decay to zero quickly in the stochastic integration. We therefore would get,

$$ d\big( \frac{S_t}{B_t} \big) = \sigma ( \frac{S_t}{B_t} )\underbrace{\bigg( \frac{\mu – r}{\sigma} dt + dW_t \bigg)}_{d\tilde{W_t}} $$

The drift disappears, instead we get d\big( \frac{S_t}{B_t} \big)d\tilde{W_t}

$$ d\big( \frac{S_t}{B_t} \big)d\tilde{W_t} $$

$$ d\tilde{W_t} = \frac{\mu -r}{\sigma} dt +dW_t $$

by Girsanov’s Theorem that \{a_t \}_{t\in [0,T]} be and adapted \{ \mathcal{F} \}_{t\in [0,T]} Ito Process, so that \mathbb{Q} is an equivalent measure on \mathbb{P} such that \tilde{W_t} is a Brownian Motion, \tilde{W_t} = W_t +\int_0^t a_u du.

$$ dS_t = rS_t \ d_t +\sigma S_t \ d\tilde{W_t} $$

Implication

The implication is that,

We can construct a relicating portfolio (S_t, B_t) with value process \{ V_t \}_{t\in[0,T]}. Under \mathbb{Q} the discounted value process exists that is a martingale \{ e^{-rt}V_t \}.

$$ e^{-rt} C_t =\mathbb{E}_{\mathbb{Q}}[ e^{-rT} C_T | \mathcal{F}_t ] $$

$$ =\mathbb{E}_{\mathbb{Q}}[ e^{-rT} V_T | \mathcal{F}_t ] = e^{-rt} V_t $$

Risk Neutral Expectation Pricing Formula

We therefore get,

$$ \frac{C_t}{B_t} = \mathbb{E}_{\mathbb{Q}}\bigg[ \frac{C_T}{B_t} \bigg| \mathcal{F}_t \bigg] $$

$$ B_T=1,\quad and \quad B_t=exp\{ \int_0^t r_s ds \} $$

Simulating Geometric Brownian Motion (GBM)

Here is a realisation of the previous blog, A bit Stochastic Calculus.

Implications

  • 1. Stochastic Integral is different from Ordinary Integral, so as to differentiation. The Stochastic Differential Equation, S.D.E. we normally use to mimic the movement of a stock is as the following,

$$ d S=\mu S d + \sigma S dW$$

, where dW follows a Brownian Motion.

The Brownian Motion also has the following critical properties:

  1. Martingale: E(S_{t+k}|F_t)=E(S_t|F_t) for k>1.
  2. Quadratic Variation: \sum_{i=1}^n (S_i-S_{i-1})^2 \to t
  3. Normality: An increment of S_{t+dt} is normally distributed, with mean zero and variance t_i – t_{i-1}
  • 2. What does Ito’s Lemma tell us?

If the stock price, S_t, follows a stochastic process, then the financial contracts F(S_t), as a function of the underlying stock price, follow another stochastic process.

  • 3. Taylor Expansion helps

See notes about Paul Willmott’s book’s paragraph.

  • 4. Different Forms of Stochastic Processes could be applied.

Example 1: dS=\mu \ dt + \sigma \ dX

Example 2: dS=\mu S \ dt +\sigma S \ dX

For this, dV = \frac{\partial V}{\partial t}dt+\frac{\partial V}{\partial S}dS+\frac{1}{2}\sigma^2 S^2 \frac{\partial^2 V}{\partial S^2}dt, which is also called the Geometric Brownian Motion (GBM). If in a specific form of value function V=F(S)=ln(S), then dF = (\mu -\frac{1}{2}\sigma^2)dt + \sigma \ dX. See derivation in attached notes.

Example 3: A mean-reverting random walk. dS=(v-\mu S)dt +\sigma dX

Example 4: Another mean-reverting r.w. dS=(v-\mu S)dt +\sigma S^{1/2}dX

The stochastic term is altered compared with example 3. Now if S ever gets close to zero the randomness decreases,

A bit Stochastic Calculus

See the HTML file for full detals.

Key Takeaways

  • Property of \{W_t\}:
  1. $ W_t – W_s \sim N(0, t-s) $
  2. $(W_t – W_{t-1})$ and $W_{t-i} – W_{t-i+1}$ are uncorrelated. So, $\int_0^t dW_u = \sum^t dW_u = W_t$
  • For,

$$ dS_t = \mu S_t \ dt +\sigma S_t \ dW_t $$

Why the Geometric Brownian Motion of \{S_t\} is designed in that form?

The answer might be,

$$ dS_t /S_t = \mu \ dt +\sigma \ dW_t $$

$$\int_0^t dS_t /S_t = \int_0^t \mu \ dt + \int_0^t \sigma \ dW_t $$

$$ log(S_t) = \mu \ t +\sigma \ W_t $$

Taking the first difference is similar to differentiation. (d(log(S_T)) = log(S_t/S_t-1) = log(1+r_t) \ approx r_t).

$$ r_t = \mu + \sigma \ \Delta W_t $$

The return, \{r_t\}, is equal to a mean, \mu, plus a stochastic term. That is a random walk.

  • We apply a transformation, if S_t follows a Geometric Brownian Motion, then f(S_t) follows another,

In calculating d f(S_t), we would get, (by Taylor Expansion)

$$ df(S_t) = \bigg( \frac{\partial f}{\partial t} + \frac{\partial f}{\partial S_t}\mu S_t +\frac{1}{2}\frac{\partial^2 f}{\partial S_t^2}\sigma^2 S_t^2 \bigg)dt + \frac{\partial f}{\partial S_t}\sigma S_t \ dW_t $$

  • A Special Form of f(\cdot) is f(S) = log(S),

$$ d\ log(S_t) = \bigg( \mu – \frac{1}{2}\sigma^2 \bigg)dt + \sigma \ dW_t $$

Integrate the above equation from time 0 to time t, then we would get,

$$ log(S_t) = log(S_0) + (\mu – \frac{1}{2}\sigma^2)t + \sigma W_t $$