Never Split the Difference

by Chris Voss

A former FBI Top Hostage Negotiator’s Filed-Tested Tools for Talking

Chapter 1 The New Rules

One core assumption is that feeling is a form of thinking. Inspired by Daniel Kahneman and Amos Tversky, people are neither fully rational nor completely selfish, and that their tastes are anything but stable. Thus, do not assume people make rational decision especially when they are in negotiation.

  • Human suffers several behavioural phenomenons or theories, including Cognitive Bias, Framing Effect, Prospect Theory, Loss Aversion, etc.
  • System 1 (fast, instinctive, and emotional) and System 2 (slow, deliberative, and logical) are there to guide and steer the rational thoughts.
  • Tactical Empathy. When individuals feel listened to, they tend to listen to themselves more carefully and to openly evaluate and clarify their own thoughts and feelings. Listening is a martial art.
  • Negotiation servers to distinct (1) information gathering and (2) behaviour influencing

Chapter 2 Be A Mirror

Negotiator should engage the process with a mindset of discovery. The goal is to extract and observe as much information as possible. We start with we know nothing, and get to explore in the negotiation.

  • Don’t commit to assumptions; instead, view them as hypotheses and use the negotiation to test them rigorously. Negotiation is not an act of battle, it’s a process of discovery.

  • Slow Down, put together all the puzzle pieces.

  • Use the Late-Night, FM DJ Voice: deep, soft, slow, and reassuring. Clam the other side down. It’s the voice of an easygoing, good-natured person. The attitude is light and encouraging. Relax and smile. Smiling would have an impact tonally.

  • Mirroring, also called isopaxism, is essentially imitation. It’s another neuro-behaviour humans display in which we copy each other to comfort each other. Establish Trust. Use mirrors to encourage the other side to empathise and bond with you, keep people talking, buy your side time to regroup, and encourage your counterparts to reveal their strategy.

    • Repeat the last three words of what someone has just said.
    1. Start with “I’m sorry …”
    2. Mirror. Repeat the last three words (or the critical one to three words).
    3. Silence. At least four seconds, to let the mirror work its magic on your counterpart.
    4. Repeat

Chapter 3 Don’t Feel Their Pain, Label It

Negotiation is about emotional and feelings. How can one separate people from the problem when the emotions are the problem?

Instead of denying or ignoring emotions, good negotiators identify and influence them. Emotion is a tool.

  • Tactical Empathy. The ability to recognise the perspective of a counterpart, and of that recognition.
    • That’s an academic way of saying that empathy is paying attention to another human being, asking what they are feeling and making a commitment to understanding their world.
    • understand the feelings and mindset of another in the moment and also hearing what is behind those feelings so you increase your influence in all the moments that follow.
    • Labeling, by spotting their feelings, turned them into words, and then very calmly and respectfully repeated their emotions back to them.
    • Labeling is a way of validating someone’s emotion by acknowledging it. Give someone’s emotion a name and you show you identify with how the person feels.
    • use the wording with “roughly”: “it looks like you are …”, “it seems you don’t want to go back to jail”.
    • when you phrase a label as a neutral statement of understanding, it encourages your counterpart to be responsive.
    • The last rule of labeling is silence. Once thrown out a label, be quite and listen.
    • Label counterpart’s fears to diffuse their power. When deal with a person who wants to be appreciated and understood. So use labels to reinforce and encourage positive perceptions and dynamics.
    • when people are shown photos of faces expressing strong emotion, the brain shows greater activity in the amygdala, the part that generates fear. But when they are asked to label the emotion, the activity moves to the areas that govern rational thinking. In other words, labeling an emotion-applying rational words to a fear-disrupt its raw intensity.
    • list the worst things that the other party could say about you and say them before the other person can.

Chapter 4 Beware “Yes” – Master “No”

“Yes” is often a meaningless answer that hides deeper objection (and “Maybe” is even worse). Pushing hard for “Yes” doesn’t get a negotiator any closer to a win; it just angers the other side. “No” is pure gold. That negative provides a great opportunity for you and the other party to clarify what you really want by eliminating what you don’t want. ‘No’ is not failure, it lead to “Yes”, as the final goal. Don’t get to ‘Yes’ before the final. “No “make people feel safe, “Yes” make people guard.

  • “No” could be one of the alternative, i.e. I am not yet ready to agree;I don’t understand; I don’t think I can afford it.
  • After getting ‘No’, ask solution-based questions or simply label their effect, i.e. what about this doesn’t work for you? what would you need to make it work?
  • Every ‘No’ gets me closer to ‘Yes’. But how to lead to a ‘No’? Two ways as below.
    • Mislabel one’s emotions or desires. Say something that you know is totally wrong, i.e. “So it seems that you really are eager to leave your job”. That forces them to listen and makes them comfortable correcting you by saying ‘No’.
    • Ask the other party what they don’t want. People are comfortable saying ‘No’ because it feels like self-protection. And once you’ve gotten them to say ‘No’, people are much more open to moving forward to new options and ideas.
  • In Email, how ever to be ignored again. Provoke a “No” with a one-sentence email.

Chapter 5 Trigger The Two Words That Immediately Transform Any Negotiation

Never try to get “Yes” at the end point. “Yes” is nothing without “how”. In business negotiation, “that’s right” often leads to the best outcomes. “That’s right” is great, however if “You’re right”, nothing changes. Consider this: whenever someone is bothering you, and they just won’t let up, and they won’t listen to anything you have to say, what do you tell them to get them to shut up and go away? The answer is “You’re right”.

  • The more person feels understood, and positively affirmed in that understanding, the more likely that urge for constructive behaviour will take hold.
  • “That’s right” is better than “Yes”. Strive for it. Reaching “That’s right” in a negotiation creates breakthroughs.
  • Use a summary to trigger a “that’s right”. The building blocks of a good summary are a label combined with paraphrasing. Identify, reariculate, and emotionally affirm “the world according to…”

Chapter 6 Bend Their Reality

People are emotional, irrational beasts who are emotional and irrational in predictable, pattern-filled way. Using the knowledge and tools to bend the reality is rational, not cheating. Tools are:

  • Don’t let yourself be fooled by the surface.
  • Not not Compromise by a split difference.
    • The win-win mindset pushed by so many negotiation experts is usually ineffective and often disastrous.
    • Compromise is often a ‘bad deal’. No deal is better than a bad deal.
  • Approaching deadlines. Deadlines regularly make people say and do impulsive things that are against their best interest, because we all have a natural tendency to rush as a deadline approaches. Having a deadline pushes you to speed up your concessions, but the other side, thinking that it has time, will just hold out for more. So, reveal the deadline to the counterpart could reduce the risks of impasse, and lead to a quickest concession.
  • Page 120
  • The F-word, “Fair”, is an emotional term people usually exploit to put the other side on the defensive and gain concessions. When your counterpart drops the F-bomb, don’t get suckered into a concession. Instead, ask them to explain how you’re mistreating them.
  • Bend the counterpart’s reality by anchoring one’s starting point.
  • People will take more risks to avoid a loss than to realise a gain. Make sure your counterpart sees that there is more things to lose by inaction. (Prospect Theory)

Club de Paris

The Paris Club (Club de Paris, 巴黎俱乐部) has reached 478 agreements with 102 different debtor countries. Since 1956, the debt treated in the framework of Paris Club agreements amounts to $ 614 billion.

Low-income countries generally do not have access to these markets. The assistance from bilateral and multilateral donors remains vital for them. Non-Paris Club creditors are becoming an increasingly important source of financing for these countries. Yet despite the fact that Paris Club creditors now have to deal with far more complex and diverse debt situations than in 1956, their original principles still stand.


Duty of Members

Ad Hoc Participants & 6 principles

Permanent Members

The 22 Paris Club permanent members are countries with large exposure to other States woldwide and that agree on the main principles and rules of the Paris Club. The claims may be held directly by the government or through its appropriate institutions, especially Export credit agencies. These creditor countries have constantly applied the terms defined in the Paris Club Agreed Minutes to their bilateral claims and have settled any bilateral disputes or arrears with Paris Club countries, if any. The following countries are permanent Paris Club members:


Ad Hoc Members

Other official creditors can also actively participate in negotiation sessions or in monthly “Tours d’Horizon” discussions, subject to the agreement of permanent members and of the debtor country. When participating in Paris Club discussions, invited creditors act in good faith and abide by the practices described in the table below. The following creditors have participated as creditors in some Paris Club agreements or Tours d’Horizon in an ad hoc manner:

Abu Dhabi
Czech Republic
New Zealand
Saudi Arabia
South Africa * prospective member on 8 July 2022
Trinidad and Tobago

Development and History of Paris Club

Early Stage

In 1956, the world economy was emerging from the aftermath of the Second World War. The Bretton Woods institutions were in the early stages of their existence, international capital flows were scarce, and exchange rates were fixed. Few African countries were independent and the world was divided along Cold War lines. Yet there was a strong spirit of international cooperation in the Western world and, when Argentina voiced the need to meet its sovereign creditors to prevent a default, France offered to host an exceptional three-day meeting in Paris that took place from 14 to 16 May 1956.

Dealing with the Debt Crisis (1981-1996)

1981 marked a turning point in Paris Club activity. The number of agreements concluded per year rose to more than ten and even to 24 in 1989. This was the famous “debt crisis” of the 1980s, triggered by Mexico defaulting on its sovereign debt in 1982 and followed by a long period during which many countries negotiated multiple debt agreements with the Paris Club, mainly in sub-Saharan Africa and Latin America, but also in Asia (the Philippines), the Middle East (Egypt and Jordan) and Eastern Europe (Poland, Yugoslavia and Bulgaria). Following the collapse of the Soviet Union in 1992, Russia joined the list of countries that have concluded an agreement with the Paris Club. So by the 1990s, Paris Club activity had become truly international.

Debt Burden Enlarges for some Countries

In 1996, the international financial community realized that the external debt situation of a number of mostly African low-income countries had become extremely difficult. This was the starting point of the Heavily Indebted Poor Countries (HIPC) Initiative.

The HIPC Initiative demonstrated the need for creditors to take a more tailored approach when deciding on debt treatment for debtor countries. Hence in October 2003, Paris Club creditors adopted a new approach to non-HIPCs: the “Evian Approach”.

Evian Approach

General frame of the Evian approach

  1. Analysis the sustability

    When a country approaches the Paris Club, the sustainability of its debt would be examined, before the financing assurances are requested, in coordination with the IMF according to its standard debt sustainability analysis to see whether there might be a sustainability concern in addition to financing needs. Specific attention would be paid to the evolution of debt ratios over time as well as to the debtor country’s economic potential; its efforts to adjust fiscal policy; the existence, durability and magnitude of an external shock; the assumptions and variables underlying the IMF baseline scenario; the debtor’s previous recourse to Paris Club and the likelihood of future recourse. If a sustainability issue is identified, Paris Club creditors will develop their own view on the debt sustainability analysis in close coordination with the IMF.

  2. if face liquidity problem

    For countries who face a liquidity problem but are considered to have sustainable debt going forward, the Paris Club would design debt treatments on the basis of the existing terms. However, Paris Club creditors agreed that the rationale for the eligibility to these terms would be carefully examined, and that all the range built-into the terms including through shorter grace period and maturities, would be used to adapt the debt treatment to the financial situation of the debtor country. Countries with the most serious debt problems will be dealt with more effectively under the new options for debt treatments. For other countries, the most generous implementation of existing terms would only be used when justified.

  3. if not sustainable or need special treatment

    For countries whose debt has been agreed by the IMF and the Paris Club creditor countries to be unsustainable, who are committed to policies that will secure an exit from the Paris Club in the framework of their IMF arrangements, and who will seek comparable treatment from their other external creditors, including the private sector, Paris Club creditors agreed that they would participate in a comprehensive debt treatment. However, according to usual Paris Club practices, eligibility to a comprehensive debt treatment is to be decided on a case-by-case basis.

    In such cases, debt treatment would be delivered according to a specific process designed to maintain a strong link with economic performance and public debt management. The process could have three stages. In the first stage, the country would have a first IMF arrangement and the Paris Club would grant a flow treatment. This stage, whose length could range from one to three years according to the past performance of the debtor country, would enable the debtor country to establish a satisfactory track record in implementing an IMF program and in paying Paris Club creditors. In the second stage, the country would have a second arrangement with the IMF and could receive the first phase of an exit treatment granted by the Paris Club. In the third stage, the Paris Club could complete the exit treatment based on the full implementation of the successor IMF program and a satisfactory payment record with the Paris Club. The country would thus only fully benefit from the exit treatment if it maintains its track record over time.


There data in the website yoy.


Refer to Horn et al., (2021) figure 9 in page 13, Paris Club seems played important role during 2010s.



About Gold

About Gold ?

There seems some “irrational” movements of gold price since the end of 2023.

How gold should be priced? What factors affect the pricing of gold. Here below are some of my reading and insights.

Typical Determinants

Typically, the gold price is considered to be correlated with a list of factors:

  1. Inflation

    In counter with the inflation.

  2. Long-term Real Interest Rate

    TIPS, the Treasury Inflation-Protected Securities, is considered to be the real-interest rate, as the inflation rate is counter-deducted. The long-term rate, specifically 10-yr rate, is preferred as we generally assume holding the Gold in a long-term investment horizon. The long-term Real Interest Rate is considered as the opportunity of holding gold. Therefore, the higher rate, the greater the cost of holding golds, and less demand of gold. Price decrease thereafter.

  3. US Dollar

    The Brandon Wood System links the gold price with US Dollar, with 35USD = 1 ounce Gold. Since the collapse of Brandon Wood System, there is not a fixed exchange rate between USD and Gold anymore. However, USD is still the most important determinant of gold price in that the unit of Gold price is still USD/ounce. It is also like an exchange rate, the more per ounce value of gold is, the more USD/ounce should be. Or, contrastively, the weak USD is, the more USD/ounce shall be.

  • US Dollar Index

    The US Dollar Index might be consider an proxy of the strength and weakness of US dollar. However, as US dollar is the weighted geometric mean of the exchange rates of six major currencies compared to the US dollar:

    • Euro (EUR) – 57.6% weight
    • Japanese yen (JPY) – 13.6% weight
    • British pound (GBP) – 11.9% weight
    • Canadian dollar (CAD) – 9.1% weight
    • Swedish krona (SEK) – 4.2% weight
    • Swiss franc (CHF) – 3.6% weight

      The USD Index is actually a composite of weighted average of above listed currency. The increase of US dollar index means USD is appreciated w.r.t. above currencies. I.E. if USD appreciates w.r.t. EURO, then USDX is likely to increase.

      Therefore, an increase in USDX means appreciation of USD, and then USD/ounce shall decrease, gold price decrease.

  1. Risks / Uncertainty

  2. Demand and Supply from Central Banks.

    We ignore the impact of demand and supply from individuals and industries, but focus on the demand of Central Banks. Like what those CB did during February and March 2024 would increase the demand of gold price.

Empirical Research

Refer to the research report from CICC, a four-factor model is established. The authors specified the relationship between gold and those four factors, one by one.

  • The dependent variable: Gold Price. (Also, attention that they focus on the gold price, not the return as machine learning usually do)

  • Explanatory Variables:

    • US Real Interest rate

    ​ Capture the Opportunity Cost of holding gold. Similar as the explanation in the above section.

    • US Dollar Index,

    ​ Similar as the explanation in the above section.

    • Central Bank Net Gold Purchasing,

    ​ The supply side is limited, demand is mainly driven by the Central Banks of US, China, EU, JP, etc.

    • US Gov Debt Level.

    ​ This factor represents the credibility of US dollar or US government. The greater US Debt level, the less credit-worthy the US gov is. Then, the more desire of holding gold as the counter party of US Dollar credibility.

    The Statistic table is shown below.


​ The author argue that people do not need to consider the spurious regression though the R-squared is incredibly high. They state that the reason is that they are only considering the model like a co-integration model. They have tested the integration of the residual term, and find that the residual is stationary. High R-squared means there are less left in the residual.

​ Their explanation is like the Bull Shit. However, we just ignore the bull shit econometric modeling and statistic figure in the above table, as we are not doing academic. Let consider the predictable power and the implication of the model.

​ Here below is their simulated result and the real gold price movement. Let’s investigate is their model perform as good as stated in their report. Also, let’s see how ML way could perform.


Code Example


CFA Learning Notes and Materials

11th April 2024

I have passed the CFA III level exam, and been granted the chart.

For any errates and insights, please free to contact to me.

Here below are my learning footprints for CFA level III. All files are converted to .html as you will find in the following . If you need the raw markdown codes, please move to my Github Repo.

P.S. there are typos and miswritten parts in the notes. Welcome to find me and help me update those mistakes. Or, probably I will update them if I fail the level III exam (in that I would review those notes). 🙂

Best Wishes

  1. CME
    P.S. BehaviouralFinance
  2. AssetAllocation
  3. Derivatives&Exchange
  4. Fixed-Income
  5. Equity
    P.S. Equity-Active
  6. Fixed-Income
  7. Alternatives
  8. PrivateWealthManagement
  9. InstitutionalInvestors
  10. TradingEvaluationManagerSelection
    P.S. TradingAdditional
  11. Ethics
    P.S. Ethics_from_Level_II_Code_n_Standards

Two Approaches for Forecasting Exchange Rate

The first approach is that analysts focus on flows of export and imports to establish what the net trade flows are and how large they are relative to the economy and other, potentially larger financing and investment flows. The approach also considers differences between domestic and foreign inflation rates that relate to the concept of purchasing power parity. Under PPP, the expected percentage change in the exchange rate should equal the difference between inflation rates. The approach also considers the sustainability of current account imbalances, reflecting the difference between national saving and investment.

The second approach is that the analysis focuses on capital flows and the degree of capital mobility. It assumes that capital seeks the highest risk-adjusted return. The expected changes in the exchange rate will reflect the differences in the respective countries’ assets’ characteristics such as relative short-term interest rates, term, credit, equity and liquidity premiums. The approach also considers hot money flows and the fact that exchange rates provide an across the board mechanism for adjusting the relative sizes of each country’s portfolio of assets.

Source by CFA reading materials

Least Squares Method – Intro to Kalman Filter

Consider a Linear Equation,

$$ y_i = \sum_{j=1}^n C_{i,j} x_j +v_i,\quad i=1,2,…$$

, where C_{i,j} are scalars and v_i\in \mathbb{R} is the measurement noise. The noise is unknown, while we assume it follows certain patterns (the assumptions are due to some statistical properties of the noise). We assume v_i, v_j are independent for i\neq j. Properties are mean of zero, and variance equals sigma squared.


$$\mathbb{E}(v_i^2) = \sigma_i^2$$

We can rewrite y_i = \sum_{j=1}^n C_{i,j} x_j +v_i as,

$$ \begin{pmatrix} y_1 \ y_2 \ \vdots\ y_s\end{pmatrix} = \begin{pmatrix} C_{11} & C_{12} & \cdots & C_{1n} \ C_{21} & C_{22}& \cdots & C_{2n} \ \vdots & \vdots & \cdots & \vdots \ C_{s1} & C_{s2} & \cdots & C_{sn}\end{pmatrix} \begin{pmatrix} x_1 \ x_2 \ \vdots\ x_n\end{pmatrix} + \begin{pmatrix} v_1 \ v_2 \ \vdots\ v_s\end{pmatrix} $$

, in a matrix form,

$$ \vec{y} = C \vec{x} + \vec{v} $$

, but I would write in a short form,

$$ y= C x +v$$

We solve for the least squared estimator from the optimisation problem, (there is a squared L2 norm)

$$ \min_x || y-Cx ||_2^2 $$

Recursive Least Squared Method

The classic least squared estimator might not work well when data evolving. So, there emerges a Recursive Least Squared Method to deal with the discrete-time instance. Let’s say, for a discrete-time instance k, y_k \in \mathbb{R}’ is within a set of measurements group follows,

$$y_k = C_k x + v_k$$

, where C_k \in \mathbb{R}^{l\times n}, and v_k \in \mathbb{R}^l is the measurement noise vector. We assume that the covariance of the measurement noise is given by,

$$ \mathbb{E}[v_k v_k^T] = R_k$$

, and


The recursive least squared method has the following form in this section,

$$\hat{x}k = \hat{x}{k-1} + K_k (y_k – C_k \hat{x}_{k-1})$$

, where \hat{x}k and \hat{x}{k-1} are the estimates of the vector x at the discrete-time instants k and k-1, and K_k \in \mathbb{R}^{n\times l} is the gain matrix that we need to determine. K_k is coined the ‘Gain Matrix’

The above equation updates the estimate of x at the time instant k on the basis of the estimate \hat{x}_{k-1} at the previous time instant k-1 and on the basis of the measurement y_k obtained at the time instant k, as well as on the basis of the gain matrix K_k computed at the time instant k.


$\hat{x}$ is the estimate.

$$ \hat{x}k = \begin{pmatrix} \hat{x}{1,k} \ \hat{x}{2,k} \ \vdots \\hat{x}{n,k} \end{pmatrix} $$

, which is corresponding with the true vector x.

$$x = \begin{pmatrix} x_1 \ x_2 \ \vdots \ x_n \end{pmatrix}$$

The estimation error, \epsilon_{i,k} = x_i – \hat{x}_{i,k} \quad i=1,2,…,n.

$$\epsilon_k = \begin{pmatrix} \epsilon_{1,k} \ \epsilon_{2,k} \ \vdots \\epsilon_{n,k} \end{pmatrix} = x – \hat{x}_k = \begin{pmatrix} x_1-\hat{x}_{1,k} \ x_2 – \hat{x}_{2,k} \ \vdots \x_n-\hat{x}_{n,k} \end{pmatrix} $$

The gain K_k is computed by minimising the sum of variances of the estimation errors,

$$ W_k = \mathbb{E}(\epsilon_{1,k}^2) + \mathbb{E}(\epsilon_{2,k}^2) + \cdots + \mathbb{E}(\epsilon_{n,k}^2) $$

Next, let’s show the cost function could be represented as follows, (tr(.) is the trace of a matrix)

$$ W_k = tr(P_k) $$

, and P_k is the estimation error covariance matrix defined by

$$ P_k = \mathbb{E}(\epsilon_k \epsilon_k^T )$$

Or, says,

$$ K_k = arg\min_{K_k} W_k = tr\bigg( \mathbb{E}(\epsilon_k \epsilon_k^T ) \bigg)$$

Why is that?

$$\epsilon_k \epsilon_k^T = \begin{pmatrix} \epsilon_{1,k} \ \epsilon_{2,k} \\vdots \ \epsilon_{n,k} \end{pmatrix} \begin{pmatrix} \epsilon_{1,k} & \epsilon_{2,k} & \cdots & \epsilon_{n,k} \end{pmatrix}$$

$$ = \begin{pmatrix} \epsilon_{1,k}^2 & \cdots & \epsilon_{1,k}\epsilon_{n,k} \ \vdots & \epsilon_{i,k}^2 & \vdots \ \epsilon_{1,k}\epsilon_{n,k} & \cdots & \epsilon_{n,k}^2\end{pmatrix} $$


$$ P_k = \mathbb{E}[\epsilon_k \epsilon_k^T] $$

$$tr(P_k) = \mathbb{E}(\epsilon_{1,k}^2) + \mathbb{E}(\epsilon_{2,k}^2) + \cdots + \mathbb{E}(\epsilon_{n,k}^2)$$


$$ K_k = arg\min_{K_k} W_k = tr\bigg( \mathbb{E}(\epsilon_k \epsilon_k^T ) \bigg) = tr(P_k)$$

Let’s derive the optimisation problem.

$$\epsilon_k = x-\hat{x}_k$$

$$ =x-\hat{x}{k-1} – K_k(y_k – C_k \hat{x}{k-1}) $$

$$ = x- \hat{x}{k-1} – K_k (C_k x + v_k – C_k \hat{x}{k-1}) $$

$$ = (I – K_k C_k)(x-\hat{x}_{k-1}) – K_k v_k $$

$$ =(I-K_k C_k )\epsilon_{k-1} – K_k v_k $$

Recall y_k = C_k x + v_k and \hat{x}k = \hat{x}{k-1} + K_k (y_k – C_k \hat{x}_{k-1})

So, \epsilon_k \epsilon_k^T would be,

$$\epsilon_k \epsilon_k^T = \bigg((I-K_k C_k )\epsilon_{k-1} – K_k v_k\bigg)\bigg((I-K_k C_k )\epsilon_{k-1} – K_k v_k\bigg)^T$$

$P_k = \mathbb{E}(\epsilon_k \epsilon_k^T)$, and $P_{k-1} = \mathbb{E}(\epsilon_{k-1} \epsilon_{k-1}^T)$.

$\mathbb{E}(\epsilon_{k-1} v_k^T) = \mathbb{E}(\epsilon_{k-1}) \mathbb{E}(v_k^T) =0$ by the white noise property of $\epsilon$ and $v$. However, $\mathbb{E}(v_k v_k^T) = R_k$. Substituting all those into $P_k$, we would get,

$$P_k = (I – K_k C_k)P_{k-1}(I – K_k C_k)^T + K_k R_k K_k^T$$

$$ P_k = P_{k-1} – P_{k-1} C_k^T K_k^T – K_k C_k P_{k-1} + K_k C_k P_{k-1}C_k^T K_k^T + K_k R_k K_k^T $$

$$W = tr(P_k)= tr(P_{k-1}) – tr(P_{k-1} C_k^T K_k^T) – tr(K_k C_k P_{k-1}) + tr(K_k C_k P_{k-1}C_k^T K_k^T) + tr(K_k R_k K_k^T) $$

We take F.O.C. to solve for K_k = arg\min_{K_k} W_k = tr\bigg( \mathbb{E}(\epsilon_k \epsilon_k^T ) \bigg) = tr(P_k), by letting \frac{\partial W_k}{\partial K_k} = 0. See the Matrix Cookbook and find how to do derivatives w.r.t. K_k.

$$\frac{\partial W_k}{\partial K_k} = -2P_{k-1} C_k^T + 2K_k C_k P_{k-1} C_k^T + 2K_k R_k = 0$$

We solve for K_k,

$$ K_k = P_{k-1} C_k^T (R_k + C_k P_{k-1} C_k^T)^{-1}$$

, we let L_k = R_k + C_k P_{k-1} C_k^T, and L_k has the following property L_k = L_k^T and L_k^{-1} = (L_k^{-1})^T

$$ K_k = P_{k-1} C_k^T L_k^{-1} $$

Plug K_k = P_{k-1} C_k^T K_k^{-1} back into P_k.

$$ P_k = P_{k-1} – K_kC_k P_{k-1} = (I-K_k C_k)P_{k-1} $$


In the end, the Recursive Least Squared Method could be summarised as the following three equations.

  • 1. Update the Gain Matrix.

$$ K_k = P_{k-1} C_k^T (R_k + C_k P_{k-1} C_k^T)^{-1}$$

  • 2. Update the Estimate.

$$\hat{x}_k = \hat{x}_{k-1} + K_k (y_k – C_k \hat{x}_{k-1})$$

  • 3. Propagation of the estimation error covariance matrix by using this equation.

(I-K_k C_k)P_{k-1}


Sigmoid & Logistic

Sigmoid function is largely used for the binary classification, in either machine learning algorithm or econometrics.

Why the Sigmoid Function shapes in this form?

Firstly, let’s introduce the odds.

Odds provide a measure of the likelihood of a particular outcome. They are calculated as the ratio of the number of outcomes that produce that outcome to the number that do not.

Odds also have a simple relation with probability: the odds of an outcome are the ratio of the probability that the outcome occurs to the probability that the outcome does not occur. In mathematical terms, p is the probability of the outcome, and 1-p is the probability of not occurring.

$$ odds = \frac{p}{1-p} $$

Odd and Probability

Let’s find some insights behind the probability and the odd. Probability links with the outcomes in that for each outcomes, the probability give its specific corresponding probability. Pr(Y), where Y is the outcome, and Pr(\cdot) is the probability density function that project outcomes to it’s prob.

What about the odds? Odds is more like a ratio that is calculated by the probability as the formula says.

Implication: Compared to the probability, odds provide more about how the binary classification is balanced or not, but the probability distribution.


Rolling a six-side die. The probability of rolling 6 is 1/6, but the odd is $1/5.


$$ odd = \frac{Pr(Y)}{1-Pr(Y)} $$

, where Y is the outcomes.


As the probability Pr(Y) is always between [0,1], the odds must be non-negative, odd \in [0,\infty]. We may want to apply a monotonic transformation to re-gauge that range of odds. We will apply on the logarithm.

$$ Sigmoid/Logistic := log(odds) =log\bigg( \frac{Pr(Y)}{1-Pr(Y)} \bigg) $$

We then get the Sigmoid function.

As the transformation we apply on is monotonic, the Sigmoid function remains the similar properties as the odd. The Sigmoid function keeps the similar implication, representing the balance of the binary outcomes.

Then, we bridge Y = f(X), the outcome Y is a function of events X. Here, we assume a linear form as Y = X\beta. The sigmoid function would then become a function of X.

$$g(X) = log\bigg( \frac{Pr(X\beta)}{1-Pr(X\beta)} \bigg) $$

$$ e^g = \frac{p}{1-p} $$

$$ p = \frac{e^g}{e^g+1}=\frac{1}{1+e^{-g}}$$

$$ p = \frac{1}{1+e^{-X\beta}}$$

We finally get out logistic sigmoid function as above.

Dirac Delta Function

The Dirac Delta Function could be applied to simplify the differential equation. There are three main properties of Dirac Delta Function.

$$\delta (x-x’) =\lim_{\tau\to0}\delta (x-x’)$$

such that,

$$ \delta (x-x’) = \begin{cases} \infty & x= x’ \ 0 & x\neq x’ \end{cases} $$

$$\int_{-\infty}^{\infty} \delta (x-x’)\ dx =1$$

Three Properties:

  • Property 1:

$$\delta(x-x’)=0 \quad \quad ,x\neq x’ $$

  • Property 2:

$$ \int_{x’-\epsilon}^{x’+\epsilon} \delta (x-x’)dx =1\quad \quad ,\epsilon >0 $$

  • Property 3:

$$\int_{x’-\epsilon}^{x’+\epsilon} f(x)\ \delta (x-x’)dx = f(x’)$$

At x=x’ the Dirac Delta function is sometimes thought of has having an “infinite” value. So, the Dirac Delta function is a function that is zero everywhere except one point and at that point it can be thought of as either undefined or as having an “infinite” value.

Girsanov’s Theorem


We can change the probability measure, and then make a random variable follows a certain probability measure.

  • Radon-Nikodym Derivative:

$$Z(\omega) = \frac{\tilde{P}(\omega)}{P(\omega)}$$

  • $\tilde{P}(\omega)$ is the risk-neutral probability measure.
  • ${P}(\omega)$ is the actual probability measure.
  • Properties:
    • $Z(\omega)>0$
    • $\mathbb{E}(Z)=1$
    • As \tilde{P}(\omega) = Z(\omega) P(\omega), so if Z(\omega), then \tilde{P}(\omega)>P(\omega). vice versa.

We can calculate that,

$$ \underbrace{\tilde{\mathbb{E}}(X)}_{\text{Expectation under Risk-neutral Probability Measure}} = \underbrace{\mathbb{E}(ZX)}_{\text{Expectation under Actual Probability Measure}} $$

Proof & Example

Under (\Omega,\mathcal{F},P), A\in \mathcal{F}, let X be a random variable X\sim N(0,1). \mathbb{E}(X)=0, and \mathbb{Var}(X)=1.

$Y=X+\theta$, $\mathbb{E}(Y)=\theta$, and $\mathbb{Var}(Y)=1$.

$X$ here is s.d. normal under the actual probability measure.

However, Y here is not standard normal under the current probability P(.), because \mathbb{E}(Y)\neq0.

What do we do?

We change the probability measure from P(.)\to\tilde{P}(.) to let Y be standard normal under the new probability measure!

We set the Radon-Nikodym Derivative,

$$Z(\omega) = exp\{ -\theta\ X(\omega) – \frac{1}{2}\theta^2 \}$$

Now, we can create the probability measure \tilde{P}(A), A={ \omega;Y(\omega)\leq b) }

$$\tilde{P}(A) = \int_A Z(\omega)\ dP(\omega)$$

such that Y=X+\theta would be standard normal distributed under the new probability measure \tilde{P}(A).

$$\tilde{P}(A) = \tilde{P}(Y(\omega \leq b)$$

$$ = \int_{{ Y(\omega)\leq b } } exp{ -\theta\ X(\omega) – \frac{1}{2}\theta^2 } \ dP(\omega)$$

, then change the integral range from the set A to \Omega by multiplying that indicator.

$$ = \int_{\Omega }\mathbb{1}_{ Y(\omega)\leq b }\ exp{ -\theta\ X(\omega) – \frac{1}{2}\theta^2 } \ dP(\omega)$$

, change from dP to dX,

$$ = \int_{-\infty }^{\infty }\mathbb{1}_{ b-\theta}\ exp{ -\theta\ X(\omega) – \frac{1}{2}\theta^2 } \ \frac{1}{\sqrt{2\pi}}e^{-\frac{1}{2}X^2(\omega)} \ dX(\omega)$$

$$ =\frac{1}{\sqrt{2\pi}} \int_{-\infty }^{b-\theta}\ exp{ -\theta\ X(\omega) – \frac{1}{2}\theta^2- \frac{1}{2}X^2(\omega)} \ dX(\omega)$$

$$ =\frac{1}{\sqrt{2\pi}} \int_{-\infty }^{b-\theta}\ exp\Bigg\{ -\frac{1}{2}\bigg(\theta+ X(\omega)\bigg)^2\Bigg\} \ dX(\omega)$$

, as Y=X+\theta, dY = dX, we now change dX to dY,

$$ =\frac{1}{\sqrt{2\pi}} \int_{-\infty }^{b}\ exp\big\{ -\frac{1}{2}Y(\omega)^2\big\} \ dY(\omega)$$

, the above is now a standard normal distribution for Y(\omega).