{"id":4704,"date":"2022-11-14T09:31:58","date_gmt":"2022-11-14T01:31:58","guid":{"rendered":"https:\/\/fanyuzhao.com\/?p=4704"},"modified":"2022-11-14T13:25:51","modified_gmt":"2022-11-14T05:25:51","slug":"maximum-likelihood-estimator","status":"publish","type":"post","link":"https:\/\/fanyuzhao.com\/?p=4704","title":{"rendered":"Maximum Likelihood Estimator"},"content":{"rendered":"\n<p>Joint Probability Density Function or Likelihood Function:<\/p>\n\n\n\n<p>$f(x_1, x_2, \u2026, x_n|\\theta) = f(x_1|\\theta) f(x_2|\\theta) \u2026 f(x_N| \\theta) $<\/p>\n\n\n\n<p>$f(x_1, x_2, \u2026, x_n|\\theta) = \\prod_{i=2}^n f(X_i | \\theta) =L(\\theta)$<\/p>\n\n\n\n<p>A likelihood function is the density function regarded as a function of <span class=\"katex math inline\">\\theta<\/span>.<\/p>\n\n\n\n<p>$ L(\\theta |x) = \\prod f(x|\\theta) $, $\\theta \\in \\Theta$<\/p>\n\n\n\n<p>As we know the sample, we need to &#8216;guess&#8217; the parameter <span class=\"katex math inline\">\\theta<\/span> to let the joint prob distribution have a maximum probability.<\/p>\n\n\n\n<p>The Maximum Likelihood Estimator (MLE) is :<\/p>\n\n\n\n<p>$ \\hat{\\theta}(x) = arg\\max_{\\theta} L(\\theta|x) $<\/p>\n\n\n\n<p>To simplify the calculation, we normally apply a logarithm transformation,<\/p>\n\n\n\n<p>$$ l(\\theta)= log\\ L $$<\/p>\n\n\n\n<p>$$log L(\\theta |x) =log \\prod f(x|\\theta)= \\sum_{i=1}^N log\\ f_{\\theta} (y_i) $$<\/p>\n\n\n\n<p>As the logarithm transformation is monotonous,<\/p>\n\n\n\n<p>$$ \\hat{\\theta}(x) = arg\\max_{\\theta} L(\\theta|x) \\Leftrightarrow $$<\/p>\n\n\n\n<p>$$\\hat{\\theta}(x) = arg\\max_{\\theta} l(\\theta|x) $$<\/p>\n\n\n\n<p><strong>To review it, we aim to estimate the parameters. The logic of the MLE estimator is that we guess a probability distribution, fitting into our data (sample). Then, we find or guess the value of parameters such that the joint probability function (likelihood function) achieves the maximum value.<\/strong><\/p>\n\n\n\n<p><strong>In another word, we guess the probability dist and parameters to make sample data achieve highest possibility.<\/strong><\/p>\n\n\n\n<p>To find the <span class=\"katex math inline\">argmax_{\\theta} l(\\theta)<\/span> is equivalent to <span class=\"katex math inline\">argmin_{\\theta} -l(\\theta)<\/span><\/p>\n\n\n\n<h4 class=\"wp-block-heading\">A Normal Distribution Example<\/h4>\n\n\n\n<p>A normal distribution example,for random sample <span class=\"katex math inline\">x_i<\/span>, i=1, 2, 3, \u2026, N<\/p>\n\n\n\n<p>$$f_{\\theta}(x) = \\frac{1}{\\sqrt{2\\pi \\sigma^2}}e^{-\\frac{(x-\\mu)^2}{2\\sigma^2}} $$<\/p>\n\n\n\n<p>We substitute <span class=\"katex math inline\">f_{\\theta}(x)<\/span> into the log-likelihood distribution, then we find,<\/p>\n\n\n\n<p>$$ l(\\mu, \\sigma^2)=log\\ L(\\mu,\\sigma^2) $$<\/p>\n\n\n\n<p>$$= -\\frac{n}{2}\\big( log2\\pi + log\\sigma^2 \\big) -\\frac{1}{2\\sigma^2}\\sum_{i=1}^n (x_i -\\mu)^2 $$<\/p>\n\n\n\n<p>Taking the First Order Conditions, F.O.C.<\/p>\n\n\n\n<p>By setting <span class=\"katex math inline\">\\frac{\\partial l}{\\partial \\mu}= 0<\/span> and <span class=\"katex math inline\">\\frac{\\partial l}{\\partial \\sigma^2}= 0<\/span>, we solve <span class=\"katex math inline\">\\mu_{MLE}<\/span> and <span class=\"katex math inline\">\\sigma^2_{MLE}<\/span>.<\/p>\n\n\n\n<p>$$ \\mu_{MLE} =\\bar{x}$$<\/p>\n\n\n\n<p>$$\\sigma^2_{MLE} =\\frac{1}{n} \\sum_{i=1}^n (x_i &#8211; \\bar{x})$$<\/p>\n\n\n\n<p>P.S. We can prove that there are the local maximum, coz the second partial derivatives are negative.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>def MLE_norm(x):\n    mu_hat = np.mean(x)\n    sigma2_hat = np.var(x)\n    return mu_hat, sigma2_hat\n\n#----------\n\nmu = 5\nsigma = 2.5\nN = 10000\n\nnp.random.seed(0)\nx = np.random.normal( mu, sigma, size=(N,) )\n\n#----------\n\nmu_hat, sigma2_hat = MLE_norm(x)\n\nfor_mu_hat = '<span class=\"katex math inline\">\\hat{\\mu} = '+format(round(mu_hat,2))+'<\/span>'\nfor_sigma2_hat = '<span class=\"katex math inline\">\\hat{\\sigma} = '+format(round(np.sqrt(sigma2_hat),2))+'<\/span>'\nprint('The MLE for data is:')\ndisplay(Latex(for_mu_hat))\ndisplay(Latex(for_sigma2_hat))<\/code><\/pre>\n\n\n\n<p>By our calculation, we know that mathematically the MLE estimator of <strong>mu<\/strong> and <strong>sigma<\/strong> are the <strong>mean<\/strong> and <strong>variance<\/strong>.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Perform MLE Numerically<\/h4>\n\n\n\n<p>If the log-likelihood function <strong>wasn\u2019t continuous or differentiable<\/strong>. Can solve numerically through an optimisation problem where the objective function is \u2026<\/p>\n\n\n\n<p>In other words, sometimes we cannot calculate the MLE estimator by hand, because the log-likelihood function might not be differentiable. What can we do?<\/p>\n\n\n\n<p>We apply the computer to interactively &#8216;guess&#8217; values <strong>(Optimizer)<\/strong> that can end up with the maximum log-likelihood (or a minimum of negative log-likelihood).<\/p>\n\n\n\n<p>The maximum likelihood estimator (MLE):<\/p>\n\n\n\n<p>$$\\hat{\\theta}(x) = arg max_{\\theta} L(\\theta|x)$$<\/p>\n\n\n\n<p>$$\\hat{\\theta}(x) = arg max_{\\theta} L(\\theta|x)$$<\/p>\n\n\n\n<p>$$L(\\theta) = \\prod_{i=1}^N f(X_i | \\theta) $$<\/p>\n\n\n\n<p>$$ log\\ f(X|\\theta)=\\sum_{i=1}^N log\\ f(x_i|\\theta)$$<\/p>\n\n\n\n<p>$$ arg\\max_{\\theta} l(\\theta) \\Leftrightarrow arg\\min_{\\theta} -l(\\theta) $$<\/p>\n\n\n\n<p>Why we use the minimizer instead? There is only minimizer available in the scipy package, and those mini&amp;max are basically equivalent.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>def log_likelihood(theta, x):\n    mu = theta&#91;0]\n    sigma = theta&#91;1]\n    l_theta = np.sum( np.log( sc.stats.norm.pdf(x, loc=mu, scale=sigma) ) )\n    # there is a negative sign in front. We minimise it later.\n    return -l_theta \n\n# Constraint Function that Restrict sigma to be positive.\ndef sigma_pos(theta):\n    sigma = np.abs(theta&#91;1])\n    return sigma\n\n#--------\n\ncons_set = {'type':'ineq', 'fun': sigma_pos}\n\ntheta0 = &#91;2,3] # inital guess. there could be any random numbers, coz it is just the intital value inputed into the optimiser\n# find the minimum value of the log likelihood function\nopt = sc.optimize.minimize(fun=log_likelihood, x0=theta0, args=(x,), constraints=cons_set)\n\nfor_mu_hat = '<span class=\"katex math inline\">\\hat{\\mu} = '+format(round(opt.x&#91;0],2))+'<\/span>'\nfor_sigma2_hat = '<span class=\"katex math inline\">\\hat{\\sigma} = '+format(round(opt.x&#91;1],2))+'<\/span>'\n\nprint('The MLE for data is:')\ndisplay(Latex(for_mu_hat))\ndisplay(Latex(for_sigma2_hat))<\/code><\/pre>\n\n\n\n<div class=\"wp-block-file\"><a id=\"wp-block-file--media-8cc534b2-2596-494f-a725-2f159a198038\" href=\"https:\/\/fanyuzhao.com\/wp-content\/uploads\/2022\/11\/Maximum-Likelihood-Estimator.html\">Maximum-Likelihood-Estimator<\/a><a href=\"https:\/\/fanyuzhao.com\/wp-content\/uploads\/2022\/11\/Maximum-Likelihood-Estimator.html\" class=\"wp-block-file__button\" download aria-describedby=\"wp-block-file--media-8cc534b2-2596-494f-a725-2f159a198038\">Download<\/a><\/div>\n","protected":false},"excerpt":{"rendered":"<p>Joint Probability Density Function or Likelihood Function: $f(x_1, x_2, \u2026, x_n|\\theta) = f(x_1|\\theta) f(x_2|\\theta) \u2026 f(x_N| \\theta) $ $f(x_1, x_2, \u2026, x_n|\\theta) = \\prod_{i=2}^n f(X_i | \\theta) =L(\\theta)$ A likelihood function is the density function regarded as a function of \\theta. $ L(\\theta |x) = \\prod f(x|\\theta) $, $\\theta \\in \\Theta$ As we know the &hellip; <a href=\"https:\/\/fanyuzhao.com\/?p=4704\" class=\"more-link\">Continue reading <span class=\"screen-reader-text\">Maximum Likelihood Estimator<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[6,8,18,26],"tags":[],"_links":{"self":[{"href":"https:\/\/fanyuzhao.com\/index.php?rest_route=\/wp\/v2\/posts\/4704"}],"collection":[{"href":"https:\/\/fanyuzhao.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/fanyuzhao.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/fanyuzhao.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/fanyuzhao.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=4704"}],"version-history":[{"count":14,"href":"https:\/\/fanyuzhao.com\/index.php?rest_route=\/wp\/v2\/posts\/4704\/revisions"}],"predecessor-version":[{"id":4729,"href":"https:\/\/fanyuzhao.com\/index.php?rest_route=\/wp\/v2\/posts\/4704\/revisions\/4729"}],"wp:attachment":[{"href":"https:\/\/fanyuzhao.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=4704"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/fanyuzhao.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=4704"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/fanyuzhao.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=4704"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}