Sigmoid function is largely used for the binary classification, in either machine learning algorithm or econometrics.
Why the Sigmoid Function shapes in this form?
Firstly, let’s introduce the odds.
Odds provide a measure of the likelihood of a particular outcome. They are calculated as the ratio of the number of outcomes that produce that outcome to the number that do not.
Odds also have a simple relation with probability: the odds of an outcome are the ratio of the probability that the outcome occurs to the probability that the outcome does not occur. In mathematical terms, p is the probability of the outcome, and 1-p is the probability of not occurring.
$$ odds = \frac{p}{1-p} $$
Odd and Probability
Let’s find some insights behind the probability and the odd. Probability links with the outcomes in that for each outcomes, the probability give its specific corresponding probability. Pr(Y), where Y is the outcome, and Pr(\cdot) is the probability density function that project outcomes to it’s prob.
What about the odds? Odds is more like a ratio that is calculated by the probability as the formula says.
Implication: Compared to the probability, odds provide more about how the binary classification is balanced or not, but the probability distribution.
Example
Rolling a six-side die. The probability of rolling 6 is 1/6, but the odd is $1/5.
Formula
$$ odd = \frac{Pr(Y)}{1-Pr(Y)} $$
, where Y is the outcomes.
Logit
As the probability Pr(Y) is always between [0,1], the odds must be non-negative, odd \in [0,\infty]. We may want to apply a monotonic transformation to re-gauge that range of odds. We will apply on the logarithm.
$$ Sigmoid/Logistic := log(odds) =log\bigg( \frac{Pr(Y)}{1-Pr(Y)} \bigg) $$
We then get the Sigmoid function.
As the transformation we apply on is monotonic, the Sigmoid function remains the similar properties as the odd. The Sigmoid function keeps the similar implication, representing the balance of the binary outcomes.
Then, we bridge Y = f(X), the outcome Y is a function of events X. Here, we assume a linear form as Y = X\beta. The sigmoid function would then become a function of X.
$$g(X) = log\bigg( \frac{Pr(X\beta)}{1-Pr(X\beta)} \bigg) $$
$$ e^g = \frac{p}{1-p} $$
$$ p = \frac{e^g}{e^g+1}=\frac{1}{1+e^{-g}}$$
$$ p = \frac{1}{1+e^{-X\beta}}$$
We finally get out logistic sigmoid function as above.