MAXIMUM LIKELIHOOD ESTIMATION

Chapter 11 MAXIMUM LIKELIHOOD ESTIMATION

  Let the observations be x = (x 1 ,x 2 , ..., x n ) , and the Likelihood Function be denoted

  by L (θ) = d (x; θ), θ ∈ Θ ⊂ < k . Then the Maximum Likelihood Estimator (MLE) is:

  ∧ If θ is unique, then the model is identified. Let (θ) = ln d (x; θ) , then for local

  identification we have the following Lemma.

  Lemma 35 The model is Locally Identified iff the Hessian is negative definite with probability 1, i.e.

  Assume that the log-Likelihood Function can be written in the following form:

  Then we usually make the following assumptions:

  Assumption A1. The range of the random variable x, say C, i.e.

  C= n {x ∈ < : d (x; θ) > 0 },

  be independent of the parameter θ.

  Maximum Likelihood Estimation

  Assumption A2. The Log-Likelihood Function ln d (x; θ) has partial derivatives

  with respect of θ up to third order, which are bounded and integrable with respect to x.

  The vector

  ∂ (x; θ) s (x; θ) = ∂θ

  which is k × 1, is called the score vector. We can state the following Lemma.

  Lemma 36 Under the assumptions A1. and A2. we have that

  Proof: As L (x, θ) is a density function it follows that

  Z L (x, θ) dx = 1

  C

  where C = {x ∈ < n : L (x, θ) > 0 }⊂< . Under A1. C is independent of θ. Conse- quently, the derivative, with respect to θ, of the integral is equal to the integral of

  n

  the derivative. ∗ Consequently taking derivatives of the above integral we have:

  Ldx = 0 ⇒ sLdx = 0 ⇒ E (s) = 0.

  ∂θ

  C C

  ∗ In case that C was dependent on θ, we would have to apply the Second Fundamental Theorm of Analysis to find the derivative of the integral.

  Maximum Likelihood Estimation

  ¡

  ¢

  Hence, we also have that E s

  =0 and taking derivatives with respect to θ

  we have

  Ldx = − ∂θ ∂θ∂θ

  Ldx ⇒E ss = −E ∂θ∂θ ∂θ∂θ

  C C

  which is the second result. ¥

  The matrix

  = −E ∂θ ∂θ∂θ

  ∂θ

  is called (Fisher) Information Matrix and is a measure of the information that the sample x contains about the parameters in θ. In case that (x, θ) can be written as

  P n

  (x, θ) = i=1 (x i , θ) we have that

  ∂θ∂θ i=1

  Consequently the Information matrix is proportional to the sample size. Furthermore,

  ³ ∂ 2 (x

  ´

  by Assumption A2. E

  i ,θ)

  ∂θ∂θ is bounded. Hence

  J (θ) = O (n) .

  Now we can state the following Lemma that will be needed in the sequel.

  Maximum Likelihood Estimation

  Lemma 37 Under assumptions A1. and A2. we have that for any unbiased estima- tor of θ, say e θ ,

  ³ ´

  E eθs =I k ,

  where I k is the identity matrix of order k.

  Proof: As e θ is an unbiased estimator of θ, we have that

  Taking derivatives, with respect to θ , we have that

  eθ(x) L (x; θ) dx = I k ⇒ eθ (x)

  which is the result. ¥

  We can now prove the Cramer-Rao Theorem.

  Theorem 38 Under the regularity assumptions A1. and A2. we have that for any unbiased estimator e θ we have that

  ³ ´ J −1 (θ) ≤V eθ .

  ³ ´ ³ ´

  Proof: Let us define the following 2k × 1 vector ξ = δ ,s = eθ −θ ,s . Now we have that

  For the above result we took into consideration that e θ is unbiased, i.e. E eθ =θ ,

  ³ ´

  ¡ ¢

  the above Lemma, i.e. E eθs =I k , E (s) = 0, and E ss = J (θ) . It is known

  Maximum Likelihood Estimation

  that all variance-covariance matrices are positive semi-definite. Hence V (ξ) ≥ 0. Let us define the following matrix

  The matrix B is k × 2k and of rank k. Consequently, as V (ξ) ≥ 0, we have that BV (ξ) B ≥ 0. Hence, as I

  k and J −1 (θ) are symmetric we have

  Consequently, V eθ ≥J −1 (θ) , in the sense that V eθ −J −1 (θ) is a positive semi- definite matrix. ¥

  The matrix J −1 (θ) is called the Cramer-Rao Lower Bound as it is the lower bound for any unbiased estimator (either linear or non-linear).

  Let, now, θ 0 denote the true parameter values of θ and d (x; θ 0 ) be the likeli-

  hood function evaluated at the true parameter values. Then for any function f (x) we define

  Z

  E 0 [f (x)] =

  f (x) d (x; θ 0 ) dx.

  Let (θ) n be the average log-likelihood and define a function z : Θ → < as

  Then we can state the following Lemma: Lemma 39 ∀θ ∈ Θ we have that

  z (θ) ≤ z (θ 0 )

  with strict inequality if

  Pr [x ∈ S : d (x; θ) 6= d (x; θ 0 )] > 0

  Maximum Likelihood Estimation

  Proof: From the definition of z (θ) we have that

  where the inequality is due to Jensen. The inequality is strict when the ratio d(x;θ)

  d(x;θ 0 )

  is non-constant with probability greater than 0. ¥

  When the observations x = (x 1 ,x 2 , ..., x n ) are randomly sampled, we have that

  X n

  (θ) =

  z i where z i = ln d (x i ; θ)

  i=1

  and the z i random variables are independent and have the same distribution with mean E (z i ) = z (θ) . Then from the Weak Law of Large Numbers we have that

  However, the above is true under weaker assumptions, e.g. for dependent observations or for non-identically distributed random variables etc. To avoid a lengthy exhibition of various cases we make the following assumption:

  Assumption A3. Θ

  is a compact subset of < k , and

  1 p lim (θ) = z (θ) ∀θ ∈ Θ. n

  Recall that a closed and bounded subset of < k is compact.

  Theorem 40 Under the above assumption and if the statistical model is identified we have that

  ∧ p θ →θ 0

  ∧

  where θ is the MLE and θ 0 the true parameter values.

  Maximum Likelihood Estimation

  Proof: Let N is an open sphere with centre θ 0 and radius ε, i.e. N=

  {θ ∈ Θ : kθ − θ 0 k < ε}. Then N is closed and consequently A = N ∩ Θ is closed

  and bounded, i.e. compact. Hence

  max z (θ)

  θ∈A

  exist and we can define

  Let T δ ⊂ S the event (a subset of the sample space) which defined by

  Hence (11.2) applies for θ = θ as well. Hence

  Given now that θ ≥ (θ 0 ) we have that µ ∧ ¶

  Furthermore, as θ 0 ∈ Θ, we have that

  from the relationship that if |x| < d => −d < x < d. Adding the above two inequalities we have that

  µ ∧ ¶

  f or T δ ⇒ z θ > z (θ 0 ) − δ.

  Substituting out δ, employing (11.1) we get

  θ ∈A⇒ θ ∈N∩Θ⇒ θ ∈N∩Θ⇒ θ ∈ N.

  Maximum Likelihood Estimation

  ∧

  Consequently we have shown that when T δ is true then θ ∈ N. This implies that

  and taking limits, as n → ∞ we have

  by the definition of N and by assumption A3. Hence, as ε is any small positive number, we have

  ∀ε > 0 lim Pr ( kθ − θ 0 k < ε) = 1

  n→∞

  which is the definition of probability limit. ¥

  When the observations x = (x 1 ,x 2 , ..., x n ) are randomly sampled, we have :

  Given now that the observations x i are independent and have the same distribution, ∂ ln d(x i with density d (x ;θ)

  i ; θ) , the same is true for the vectors

  and the matrices

  ∂θ

  ∂ 2 ln d(x i ;θ)

  ∂θ∂θ . Consequently we can apply a Central Limit Theorem to get:

  and from the Law of Large Numbers

  X n ∂ 2 ln d (x

  i ; θ) p _

  n −1 H (θ) = n −1

  →− J (θ)

  ∂θ∂θ

  i=1

  where

  µ 2 ¶

  ∂ (θ)

  J (θ) = n −1 J (θ) = n −1 J (θ) = −1 −n E ,

  ∂θ∂θ

  Maximum Likelihood Estimation

  i.e., the average Information matrix. However, the two asymptotic results apply even if the observations are dependent or identically distributed. To avoid a lengthy exhibition of various cases we make the following assumptions. As n → ∞ we have:

  Assumption A4.

  d ³ _ ´ n −12 s (θ) →N 0, J (θ)

  and Assumption A5.

  We can now state the following Theorem

  Theorem 41 Under assumptions A2. and A3.the above two assumptions and iden- tification we have:

  where θ is the MLE and θ 0 the true parameter values.

  ∧

  Proof: As θ maximises the Likelihood Function we have from the first order conditions that

  ∂θ From the Mean Value Theorem, around θ 0 , we have that µ ∧

  ° where θ ∗ ∈ θ, θ 0 , i.e. kθ ∗ −θ 0 k≤ θ −θ 0 °.

  Maximum Likelihood Estimation

  Now from the consistency of the MLE we have that

  ∧

  θ=θ 0 +o p (1)

  whereo p (1) ∙ ¸ is a random variable that goes to 0 in probability as n → ∞. As now

  ∧

  θ ∗ ∈ θ, θ 0 , we have that

  θ ∗ =θ 0 +o p (1)

  as well. Hence from 11.3 we have that

  As now θ ∗ =θ 0 +o p (1) and under the second assumption we have that

  Under now the first assumption the above equation implies that d n θ −θ

  Example: Let y t v N (μ, σ ) i.i.d for t = 1, ..., T . Then

  1 X 2 (θ) = − ln (2π) − ln σ − 2 (y t − μ)

  where θ = ⎝

  ⎠. Now

  σ 2

  ∂ T 1 X

  2 (y t σ − μ)

  ∂μ

  t=1

  and

  ∂ T −T 1 X

  (y t − μ)

  ∂σ

  2σ

  2σ 4

  t=1

  Now

  µ ∧ ¶

  µ ∧ ¶

  ∂ θ

  ∂ θ

  ∂μ

  ∂σ 2

  Maximum Likelihood Estimation

  and consequently evaluating H (θ) at θ we have

  which is clearly negative definite. Now the Information matrix is:

  J (θ) = −E (H (θ)) = E ⎝ ⎣

  ⎦ ⎠

  1 P T

  1 P T

  T

  σ 4 t=1 (y t − μ) − 2σ 4 + σ 6 t=1 (y t − μ)

  ⎡

  ⎤

  T

  σ 2 0

  = ⎣

  ⎦

  T 2σ 4

  Maximum Likelihood Estimation

Dokumen yang terkait

Analisis Komparasi Internet Financial Local Government Reporting Pada Website Resmi Kabupaten dan Kota di Jawa Timur The Comparison Analysis of Internet Financial Local Government Reporting on Official Website of Regency and City in East Java

19 819 7

ANTARA IDEALISME DAN KENYATAAN: KEBIJAKAN PENDIDIKAN TIONGHOA PERANAKAN DI SURABAYA PADA MASA PENDUDUKAN JEPANG TAHUN 1942-1945 Between Idealism and Reality: Education Policy of Chinese in Surabaya in the Japanese Era at 1942-1945)

1 29 9

Docking Studies on Flavonoid Anticancer Agents with DNA Methyl Transferase Receptor

0 55 1

EVALUASI PENGELOLAAN LIMBAH PADAT MELALUI ANALISIS SWOT (Studi Pengelolaan Limbah Padat Di Kabupaten Jember) An Evaluation on Management of Solid Waste, Based on the Results of SWOT analysis ( A Study on the Management of Solid Waste at Jember Regency)

4 28 1

Improving the Eighth Year Students' Tense Achievement and Active Participation by Giving Positive Reinforcement at SMPN 1 Silo in the 2013/2014 Academic Year

7 202 3

The Correlation between students vocabulary master and reading comprehension

16 145 49

Improping student's reading comprehension of descriptive text through textual teaching and learning (CTL)

8 140 133

The correlation between listening skill and pronunciation accuracy : a case study in the firt year of smk vocation higt school pupita bangsa ciputat school year 2005-2006

9 128 37

Pengaruh kualitas aktiva produktif dan non performing financing terhadap return on asset perbankan syariah (Studi Pada 3 Bank Umum Syariah Tahun 2011 – 2014)

6 101 0

Transmission of Greek and Arabic Veteri

0 1 22