23 Jul
23Jul

In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of a statistical model, given observations. MLE attempts to find the parameter values that maximize the likelihood function, given the observations. The resulting estimate is called a maximum likelihood estimate, which is also abbreviated as MLE.

In data science, Maximum likelihood estimation is a method that determines values for the parameters of a model. The parameter values are found such that they maximise the likelihood that the process described by the model produced the data that were actually observed. refer here

The distinction between probability and likelihood is fundamentally important: Probability attaches to possible results; likelihood attaches to hypotheses.

Definition

Let X1, X2,..., Xn be a random sample from a distribution that depends on one or more unknown parameters θ1, θ2,..., θwith probability density (or mass) function f(xi; θ1, θ2,..., θm). Suppose that (θ1, θ2,..., θm) is restricted to a given parameter space Ω. Then:

(1) When regarded as a function of θ1, θ2,..., θm, the joint probability density (or mass) function of X1, X2,..., Xn:

                                           L(θ1,θ2,…,θm)=∏i=1nf(xi;θ1,θ2,…,θm)

((θ1, θ2,..., θm) in Ω) is called the likelihood function.

(2) If:

                                    [u1(x1,x2,…,xn),u2(x1,x2,…,xn),…,um(x1,x2,…,xn)]

is the m-tuple that maximizes the likelihood function, then:

θ̂i=ui(X1,X2,…,Xn)

is the maximum likelihood estimator of θi, for i = 1, 2, ..., m.

(3) The corresponding observed values of the statistics in (2), namely:

                                      [u1(x1,x2,…,xn),u2(x1,x2,…,xn),…,um(x1,x2,…,xn)]

are called the maximum likelihood estimates of θi, for i = 1, 2, ..., m.

Properties of Maximum-Likelihood Estimators

Under very broad conditions, maximum-likelihood estimators have the following general properties:

  • Maximum-likelihood estimators are consistent. 
  • They are asymptotically unbiased, although they may be biased in finite samples.
  • They are asymptotically efficient — no asymptotically unbiased estimator has a smaller asymptotic variance.
  • They are asymptotically normally distributed.I If there is a sufficient statistic for a parameter, then the maximum- likelihood estimator of the parameter is a function of a sufficient statistic. • 
  • A sufficient statistic is a statistic that exhausts all of the information inthe sample about the parameter of interest.°c

Think of a coin toss; coin is biased. No one knows the degree of bias. It could range from o(all tails) to 1 (all heads). A fair coin will be 0.5 (head/tail equally likely). When you do 10 tosses, and you observe 7 Heads, then the MLE is that degree of bias which is more likely to produce the observed fact of 7 heads in 10 tosses.

Think of a stock price of say, British Petroleum. BP was selling at $59.88 on April 23, 2010. By June 25, 2010 the price was down to $27.02. There could be several reasons for this fall. But the most likely reason could be the BP oil spill and the public sentiment. Stock price is the observed fact. The MLE will estimate the most likely underlying reason. refer here

Maximum Likelihood Estimators are (estimated) parameters that make the model of the researcher explain the data at hand as much as possible. The model is the result of thoughts of the researcher on the variables, i.e., the model reflects the researcher's subjective beliefs about the relationships between the variables. And MLE provides maximum explanatory power of the model given the available data.

Advantages

  • Maximum likelihood provides a consistent approach to parameter estimation problems. This means that maximum likelihood estimates can be developed for a large variety of estimation situations. For example, they can be applied in reliability analysis to censored data under various censoring models.
  • Maximum likelihood methods have desirable mathematical and optimality properties. Specifically,
    1. They become minimum variance unbiased estimators as the sample size increases. By unbiased, we mean that if we take (a very large number of) random samples with replacement from a population, the average value of the parameter estimates will be theoretically exactly equal to the population value. By minimum variance, we mean that the estimator has the smallest variance, and thus the narrowest confidence interval, of all estimators of that type.
    2. They have approximate normal distributions and approximate sample variances that can be used to generate confidence bounds and hypothesis tests for the parameters.
  • Several popular statistical software packages provide excellent algorithms for maximum likelihood estimates for many of the commonly used distributions. This helps mitigate the computational complexity of maximum likelihood estimation.

Disadvantages

  • The likelihood equations need to be specifically worked out for a given distribution and estimation problem. The mathematics is often non-trivial, particularly if confidence intervals for the parameters are desired.
  • The numerical estimation is usually non-trivial. Except for a few cases where the maximum likelihood formulas are in fact simple, it is generally best to rely on high quality statistical software to obtain maximum likelihood estimates. Fortunately, high quality maximum likelihood software is becoming increasingly common.
  • Maximum likelihood estimates can be heavily biased for small samples. The optimality properties may not apply for small samples.
  • Maximum likelihood can be sensitive to the choice of starting values.