MAXIMUM LIKELIHOOD ESTIMATION

Chapter 11 MAXIMUM LIKELIHOOD ESTIMATION

Let the observations be x = (x 1 ,x 2 , ..., x n ) , and the Likelihood Function be denoted

by L (θ) = d (x; θ), θ ∈ Θ ⊂ < k . Then the Maximum Likelihood Estimator (MLE) is:

∧ If θ is unique, then the model is identified. Let (θ) = ln d (x; θ) , then for local

identification we have the following Lemma.

Lemma 35 The model is Locally Identified iﬀ the Hessian is negative definite with probability 1, i.e.

Assume that the log-Likelihood Function can be written in the following form:

Then we usually make the following assumptions:

Assumption A1. The range of the random variable x, say C, i.e.

C= n {x ∈ < : d (x; θ) > 0 },

be independent of the parameter θ.

Maximum Likelihood Estimation

Assumption A2. The Log-Likelihood Function ln d (x; θ) has partial derivatives

with respect of θ up to third order, which are bounded and integrable with respect to x.

The vector

∂ (x; θ) s (x; θ) = ∂θ

which is k × 1, is called the score vector. We can state the following Lemma.

Lemma 36 Under the assumptions A1. and A2. we have that

Proof: As L (x, θ) is a density function it follows that

Z L (x, θ) dx = 1

where C = {x ∈ < n : L (x, θ) > 0 }⊂< . Under A1. C is independent of θ. Conse- quently, the derivative, with respect to θ, of the integral is equal to the integral of

the derivative. ∗ Consequently taking derivatives of the above integral we have:

Ldx = 0 ⇒ sLdx = 0 ⇒ E (s) = 0.

∂θ

C C

∗ In case that C was dependent on θ, we would have to apply the Second Fundamental Theorm of Analysis to find the derivative of the integral.

Maximum Likelihood Estimation

Hence, we also have that E s

=0 and taking derivatives with respect to θ

we have

Ldx = − ∂θ ∂θ∂θ

Ldx ⇒E ss = −E ∂θ∂θ ∂θ∂θ

C C

which is the second result. ¥

The matrix

= −E ∂θ ∂θ∂θ

∂θ

is called (Fisher) Information Matrix and is a measure of the information that the sample x contains about the parameters in θ. In case that (x, θ) can be written as

P n

(x, θ) = i=1 (x i , θ) we have that

∂θ∂θ i=1

Consequently the Information matrix is proportional to the sample size. Furthermore,

³ ∂ 2 (x

by Assumption A2. E

i ,θ)

∂θ∂θ is bounded. Hence

J (θ) = O (n) .

Now we can state the following Lemma that will be needed in the sequel.

Maximum Likelihood Estimation

Lemma 37 Under assumptions A1. and A2. we have that for any unbiased estimator of θ, say e θ ,

³ ´

E eθs =I k ,

where I k is the identity matrix of order k.

Proof: As e θ is an unbiased estimator of θ, we have that

Taking derivatives, with respect to θ , we have that

eθ(x) L (x; θ) dx = I k ⇒ eθ (x)

which is the result. ¥

We can now prove the Cramer-Rao Theorem.

Theorem 38 Under the regularity assumptions A1. and A2. we have that for any unbiased estimator e θ we have that

³ ´ J −1 (θ) ≤V eθ .

³ ´ ³ ´

Proof: Let us define the following 2k × 1 vector ξ = δ ,s = eθ −θ ,s . Now we have that

For the above result we took into consideration that e θ is unbiased, i.e. E eθ =θ ,

³ ´

¡ ¢

the above Lemma, i.e. E eθs =I k , E (s) = 0, and E ss = J (θ) . It is known

Maximum Likelihood Estimation

that all variance-covariance matrices are positive semi-definite. Hence V (ξ) ≥ 0. Let us define the following matrix

The matrix B is k × 2k and of rank k. Consequently, as V (ξ) ≥ 0, we have that BV (ξ) B ≥ 0. Hence, as I

k and J −1 (θ) are symmetric we have

Consequently, V eθ ≥J −1 (θ) , in the sense that V eθ −J −1 (θ) is a positive semi- definite matrix. ¥

The matrix J −1 (θ) is called the Cramer-Rao Lower Bound as it is the lower bound for any unbiased estimator (either linear or non-linear).

Let, now, θ 0 denote the true parameter values of θ and d (x; θ 0 ) be the likeli-

hood function evaluated at the true parameter values. Then for any function f (x) we define

E 0 [f (x)] =

f (x) d (x; θ 0 ) dx.

Let (θ) n be the average log-likelihood and define a function z : Θ → < as

Then we can state the following Lemma: Lemma 39 ∀θ ∈ Θ we have that

z (θ) ≤ z (θ 0 )

with strict inequality if

Pr [x ∈ S : d (x; θ) 6= d (x; θ 0 )] > 0

Maximum Likelihood Estimation

Proof: From the definition of z (θ) we have that

where the inequality is due to Jensen. The inequality is strict when the ratio d(x;θ)

d(x;θ 0 )

is non-constant with probability greater than 0. ¥

When the observations x = (x 1 ,x 2 , ..., x n ) are randomly sampled, we have that

X n

(θ) =

z i where z i = ln d (x i ; θ)

i=1

and the z i random variables are independent and have the same distribution with mean E (z i ) = z (θ) . Then from the Weak Law of Large Numbers we have that

However, the above is true under weaker assumptions, e.g. for dependent observations or for non-identically distributed random variables etc. To avoid a lengthy exhibition of various cases we make the following assumption:

Assumption A3. Θ

is a compact subset of < k , and

1 p lim (θ) = z (θ) ∀θ ∈ Θ. n

Recall that a closed and bounded subset of < k is compact.

Theorem 40 Under the above assumption and if the statistical model is identified we have that

∧ p θ →θ 0

∧

where θ is the MLE and θ 0 the true parameter values.

Maximum Likelihood Estimation

Proof: Let N is an open sphere with centre θ 0 and radius ε, i.e. N=

{θ ∈ Θ : kθ − θ 0 k < ε}. Then N is closed and consequently A = N ∩ Θ is closed

and bounded, i.e. compact. Hence

max z (θ)

θ∈A

exist and we can define

Let T δ ⊂ S the event (a subset of the sample space) which defined by

Hence (11.2) applies for θ = θ as well. Hence

Given now that θ ≥ (θ 0 ) we have that µ ∧ ¶

Furthermore, as θ 0 ∈ Θ, we have that

from the relationship that if |x| < d => −d < x < d. Adding the above two inequalities we have that

µ ∧ ¶

f or T δ ⇒ z θ > z (θ 0 ) − δ.

Substituting out δ, employing (11.1) we get

θ ∈A⇒ θ ∈N∩Θ⇒ θ ∈N∩Θ⇒ θ ∈ N.

Maximum Likelihood Estimation

∧

Consequently we have shown that when T δ is true then θ ∈ N. This implies that

and taking limits, as n → ∞ we have

by the definition of N and by assumption A3. Hence, as ε is any small positive number, we have

∀ε > 0 lim Pr ( kθ − θ 0 k < ε) = 1

n→∞

which is the definition of probability limit. ¥

When the observations x = (x 1 ,x 2 , ..., x n ) are randomly sampled, we have :

Given now that the observations x i are independent and have the same distribution, ∂ ln d(x i with density d (x ;θ)

i ; θ) , the same is true for the vectors

and the matrices

∂θ

∂ 2 ln d(x i ;θ)

∂θ∂θ . Consequently we can apply a Central Limit Theorem to get:

and from the Law of Large Numbers

X n ∂ 2 ln d (x

i ; θ) p _

n −1 H (θ) = n −1

→− J (θ)

∂θ∂θ

i=1

where

µ 2 ¶

∂ (θ)

J (θ) = n −1 J (θ) = n −1 J (θ) = −1 −n E ,

∂θ∂θ

Maximum Likelihood Estimation

i.e., the average Information matrix. However, the two asymptotic results apply even if the observations are dependent or identically distributed. To avoid a lengthy exhibition of various cases we make the following assumptions. As n → ∞ we have:

Assumption A4.

d ³ _ ´ n −12 s (θ) →N 0, J (θ)

and Assumption A5.

We can now state the following Theorem

Theorem 41 Under assumptions A2. and A3.the above two assumptions and identification we have:

where θ is the MLE and θ 0 the true parameter values.

∧

Proof: As θ maximises the Likelihood Function we have from the first order conditions that

∂θ From the Mean Value Theorem, around θ 0 , we have that µ ∧

° where θ ∗ ∈ θ, θ 0 , i.e. kθ ∗ −θ 0 k≤ θ −θ 0 °.

Maximum Likelihood Estimation

Now from the consistency of the MLE we have that

∧

θ=θ 0 +o p (1)

whereo p (1) ∙ ¸ is a random variable that goes to 0 in probability as n → ∞. As now

∧

θ ∗ ∈ θ, θ 0 , we have that

θ ∗ =θ 0 +o p (1)

as well. Hence from 11.3 we have that

As now θ ∗ =θ 0 +o p (1) and under the second assumption we have that

Under now the first assumption the above equation implies that d n θ −θ

Example: Let y t v N (μ, σ ) i.i.d for t = 1, ..., T . Then

1 X 2 (θ) = − ln (2π) − ln σ − 2 (y t − μ)

where θ = ⎝

⎠. Now

σ 2

∂ T 1 X

2 (y t σ − μ)

∂μ

t=1

and

∂ T −T 1 X

(y t − μ)

∂σ

2σ

2σ 4

t=1

Now

µ ∧ ¶

∂ θ

∂μ

∂σ 2

Maximum Likelihood Estimation

and consequently evaluating H (θ) at θ we have

which is clearly negative definite. Now the Information matrix is:

J (θ) = −E (H (θ)) = E ⎝ ⎣

⎦ ⎠

1 P T

σ 4 t=1 (y t − μ) − 2σ 4 + σ 6 t=1 (y t − μ)

⎡

⎤

σ 2 0

= ⎣

⎦

T 2σ 4

Maximum Likelihood Estimation

MAXIMUM LIKELIHOOD ESTIMATION

Chapter 11 MAXIMUM LIKELIHOOD ESTIMATION

Parts

Dokumen yang terkait

Analisis Komparasi Internet Financial Local Government Reporting Pada Website Resmi Kabupaten dan Kota di Jawa Timur The Comparison Analysis of Internet Financial Local Government Reporting on Official Website of Regency and City in East Java

ANTARA IDEALISME DAN KENYATAAN: KEBIJAKAN PENDIDIKAN TIONGHOA PERANAKAN DI SURABAYA PADA MASA PENDUDUKAN JEPANG TAHUN 1942-1945 Between Idealism and Reality: Education Policy of Chinese in Surabaya in the Japanese Era at 1942-1945)

Docking Studies on Flavonoid Anticancer Agents with DNA Methyl Transferase Receptor

EVALUASI PENGELOLAAN LIMBAH PADAT MELALUI ANALISIS SWOT (Studi Pengelolaan Limbah Padat Di Kabupaten Jember) An Evaluation on Management of Solid Waste, Based on the Results of SWOT analysis ( A Study on the Management of Solid Waste at Jember Regency)

Improving the Eighth Year Students' Tense Achievement and Active Participation by Giving Positive Reinforcement at SMPN 1 Silo in the 2013/2014 Academic Year

The Correlation between students vocabulary master and reading comprehension

Improping student's reading comprehension of descriptive text through textual teaching and learning (CTL)

The correlation between listening skill and pronunciation accuracy : a case study in the firt year of smk vocation higt school pupita bangsa ciputat school year 2005-2006

Pengaruh kualitas aktiva produktif dan non performing financing terhadap return on asset perbankan syariah (Studi Pada 3 Bank Umum Syariah Tahun 2011 – 2014)

Transmission of Greek and Arabic Veteri

Dukungan

Links

MAXIMUM LIKELIHOOD ESTIMATION

Chapter 11 MAXIMUM LIKELIHOOD ESTIMATION

Parts

Dokumen yang terkait

Analisis Komparasi Internet Financial Local Government Reporting Pada Website Resmi Kabupaten dan Kota di Jawa Timur The Comparison Analysis of Internet Financial Local Government Reporting on Official Website of Regency and City in East Java

ANTARA IDEALISME DAN KENYATAAN: KEBIJAKAN PENDIDIKAN TIONGHOA PERANAKAN DI SURABAYA PADA MASA PENDUDUKAN JEPANG TAHUN 1942-1945 Between Idealism and Reality: Education Policy of Chinese in Surabaya in the Japanese Era at 1942-1945)

Docking Studies on Flavonoid Anticancer Agents with DNA Methyl Transferase Receptor

EVALUASI PENGELOLAAN LIMBAH PADAT MELALUI ANALISIS SWOT (Studi Pengelolaan Limbah Padat Di Kabupaten Jember) An Evaluation on Management of Solid Waste, Based on the Results of SWOT analysis ( A Study on the Management of Solid Waste at Jember Regency)

Improving the Eighth Year Students' Tense Achievement and Active Participation by Giving Positive Reinforcement at SMPN 1 Silo in the 2013/2014 Academic Year

The Correlation between students vocabulary master and reading comprehension

Improping student's reading comprehension of descriptive text through textual teaching and learning (CTL)

The correlation between listening skill and pronunciation accuracy : a case study in the firt year of smk vocation higt school pupita bangsa ciputat school year 2005-2006

Pengaruh kualitas aktiva produktif dan non performing financing terhadap return on asset perbankan syariah (Studi Pada 3 Bank Umum Syariah Tahun 2011 – 2014)

Transmission of Greek and Arabic Veteri

Dokumen yang Anda mencari sudah siap untuk unduhkan