In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of a statistical model, given observations. MLE attempts to find the parameter values that maximize the likelihood function, given the observations. The resulting estimate is called a maximum likelihood estimate, which is also abbreviated as MLE.
In data science, Maximum likelihood estimation is a method that determines values for the parameters of a model. The parameter values are found such that they maximise the likelihood that the process described by the model produced the data that were actually observed. refer here
The distinction between probability and likelihood is fundamentally important: Probability attaches to possible results; likelihood attaches to hypotheses.
Let X1, X2,..., Xn be a random sample from a distribution that depends on one or more unknown parameters θ1, θ2,..., θm with probability density (or mass) function f(xi; θ1, θ2,..., θm). Suppose that (θ1, θ2,..., θm) is restricted to a given parameter space Ω. Then:
(1) When regarded as a function of θ1, θ2,..., θm, the joint probability density (or mass) function of X1, X2,..., Xn:
L(θ1,θ2,…,θm)=∏i=1nf(xi;θ1,θ2,…,θm)
((θ1, θ2,..., θm) in Ω) is called the likelihood function.
(2) If:
[u1(x1,x2,…,xn),u2(x1,x2,…,xn),…,um(x1,x2,…,xn)]
is the m-tuple that maximizes the likelihood function, then:
θ̂i=ui(X1,X2,…,Xn)
is the maximum likelihood estimator of θi, for i = 1, 2, ..., m.
(3) The corresponding observed values of the statistics in (2), namely:
[u1(x1,x2,…,xn),u2(x1,x2,…,xn),…,um(x1,x2,…,xn)]
are called the maximum likelihood estimates of θi, for i = 1, 2, ..., m.
Under very broad conditions, maximum-likelihood estimators have the following general properties:
Think of a coin toss; coin is biased. No one knows the degree of bias. It could range from o(all tails) to 1 (all heads). A fair coin will be 0.5 (head/tail equally likely). When you do 10 tosses, and you observe 7 Heads, then the MLE is that degree of bias which is more likely to produce the observed fact of 7 heads in 10 tosses.
Think of a stock price of say, British Petroleum. BP was selling at $59.88 on April 23, 2010. By June 25, 2010 the price was down to $27.02. There could be several reasons for this fall. But the most likely reason could be the BP oil spill and the public sentiment. Stock price is the observed fact. The MLE will estimate the most likely underlying reason. refer here
Maximum Likelihood Estimators are (estimated) parameters that make the model of the researcher explain the data at hand as much as possible. The model is the result of thoughts of the researcher on the variables, i.e., the model reflects the researcher's subjective beliefs about the relationships between the variables. And MLE provides maximum explanatory power of the model given the available data.