Statistical models for small area estimatoin

Inventory Model With Gamma Distribution

Hadi Sumadibrata, Ismail
Bin Mohd

642

Accuracy Analysis Of Naive Bayesian

Ruslam, Armin Lawi, And
Sri Astuti Thamrin

649

A New Method For Generating Fuzzy Rules From Training Data
And Its Application In Financial Problems

Agus Maman Abadi,
Subanar, Widodo, Samsubar
Saleh


655

The Application Of Laws Of Large Numbers In Convergence
Concept In Probability And Distribution

Georgina M. Tinungki

662

An Empirical Bayes Approach for Binary Response Data in
Small Area Estimation

Dian Handayani, Noor
Akma Ibrahim, Khairil A.
Notodiputro, MOhd. Bakri
Adam

669

Statistical Models For Small Area Estimation


Khairil A Notodiputro,
Anang Kurnia, and Kusman
Sadik

677

Maximum Likelihood Estimation For The Non-Separable Spatial
Unilateral Autoregressive Model

Norhashidah Awang,
Mahendran Shitan

685

Small Area Estimation Using Natural Exponential Families With
Quadratic Variance Function (Nef-Qvf) For Binary Data

Kismiantini


691

Using An Extended And Ensemble Kalman Filter Algorithm For
The Training Of Feedforward Neural Network In Time Series
Forecasting

Zaqiatud Darojah, M. Isa
Irawan, And Erna Apriliani

696

Estimation Of Outstanding Claims Liability And Sensitivity
Analysis: Probabilistic Trend Family (PTF) Model

Arif Herlambang, Dumaria
R Tampubolon

704

Expected Value Of Shot Noise Processes


Suyono

711

Modelling Malaysian Wind Speed Data Via Two Paramaters
Weibull

Nur Arina Basilah Kamisan,
Yong Zulina Zubairi, Abdul
Ghapor Hussin, Mohd.
Sahar Yahya

718

Application Of Latin Hypercube Sampling And Monte Carlo
Simulation Methods: Case Study The Reliability Of Stress
Intensity Factor And Energy Release Rate Of Indonesian
Hardwoods


Yosafat Aji Pranata And
Pricillia Sofyan Tanuwijaya

726

The Development Of Markov Chain Monte Carlo (Mcmc)
Algorithm For Autologistic Regression Parameters Estimation

Suci Astutik, Rahma
Fitriani, Umu Sa’adah, And
Agustin Iskandar

734

A Note About Dh-Fever Estimation With ARIMAX Models

Elly Ana, Dwi Atmono
Agus W

741


Evaluation Of Additive-Innovational Outlier Identification
Procedure For Some Bilinear Models

I

745

Anti-Spam Filter

ix

smail, M.I., Mohamed, I.B.,
Yahya, M.S.

Model By Spectral Regression Methods

Iriawan, Suhartono

Application Of Cluster Analysis To Developing Core Collection

In Plant Genetic Resources

Sutoro

875

Small Area Estimation With Time And Area Effects Using A
Dynamic Linear Model

Kusman Sadik And Khairil
Anwar Notodiputro

880

Statistical Analysis Of Wind Direction Data

Ahmad Mahir Razali, Arfah
Ahmad, Azami Zaharim
And Kamaruzzaman Sopian


886

Generalized Additive Mixed Models in Small Area Estimation

Anang Kurnia, Khairil A.
Notodiputro, Asep
Saefuddin, I Wayan
Mangku

891

Kernel Principal Component Analysis In Data Visualization

Ismail Djakaria, Suryo
Guritno, Sri Haryatmi

898

GARCH Models And The Simulations


Nelson Nainggolan, Budi
Nurani Ruchjana And
Sutawanir Darwis

906

Rainfall Prediction Using Bayesian Network

Hera Faizal Rachmat, Aji
Hamim Wigena, and Erfiani

911

Identifying Item Bias Using The Simple Volume Indices And
Multidimensional Item Response Theory Likelihood Ratio
(Irt-Lr) Test

Heri Retnawati

916


Ordinary Kriging And Inverse Distance Weighting For Mapping
Soil Phosphorus In Paddy Field

Mohammad Masjkur,
Muhammad Nuraidi and
Chichi Noviant

924

K-Means Clustering Visualization
On Agriculture Potential Data
For Villages In Bogor Using Mapserver

Imas S. Sitanggang, Henri
Harianja, and Lailan
Syaufina

932


Some Methods To Estimate The Number Of Components In A
Mixture

M. A. Satyawan, A. H.
Wigena, Erfiani

941

A Probabilistic Model For Finding A Repeat
Triplet Region In DNA Sequence
Application Of Spherical Harmonics In Determination Of Tec
Using Gps Observable

Tigor Nauli

947

Mardina Abdullah, Siti
Aminah Bahari, Baharudin
Yatim, Azami Zaharim,
Ahmad Mahir Razali

954

Testing Structure Correlation Of Global Market By Statistic Vvsv

Erna Tri Herdiani, and
Maman A. Djauhari

961

Exploring the MAUP from a spatial perspective

Gandhi Pawitan

967

Estimation of RCA(1) Model using EF:
A new procedure and its robustness

1Norli Anida Abdullah,
2Ibrahim Mohamed,
3Shelton Peiris

996

Second Order Linear Elliptic Operators
In The Unit Square

Abdul Rouf Alghofari

xi

1008

POSTER
Study Of Fractional Factorial Split-Plot Experiment

Sri Winarni, Budi Susetyo,
and Bagus Sartono

1012

Improving Model Performance For Predicting Poverty Village
Category Using Neighborhood Information In Bogor

Bagus Sartono, Utami Dyah
S, and Zulhelmi Thaib

1019

Ammi Models On Count Data: Log-Bilinear Models

Alfian Futuhul Hadi
H. Ahmad Ansori Mattjik
I Made Sumertajaya
Halimatus Sa’diyah
Budi Nurani R , and
Kartlos J. Kachiashvili

1026

Atje Setiawan A. ,
Retantyo Wardoyo , Sri
Hartati , and Agus Harjoko

1045

Validation Of Training Model
For Robust Tests Of Spread

Teh Sin Yin, and Abdul
Rahman Othman

1056

Spectral Approach For Time Series Analysis

Kusman Sadik

1063

The ACE Algorithm for Optimal Transformations in Multiple
Regression
The Relation Between The Students’ Interaction And The
Construction Of Mathematical Knowledge
Application of Auto Logistic Regression Spatial Model using
Variogram Based Weighting Matrix to Predict Poverty Village
Category

Kusman Sadik

1066

Rini Setianingsih

1069

Utami Dyah Syafitri, Bagus
Sartono, Vinda Pratama

1075

Prediction Of Oil Production Using Non Linear Regression By
Sdpro Software
(Special Program Package)*)
An Implementation Of Spatial Data Mining
Using Spatial Autoregressive (Sar) Model
For Education Quality Mapping At West Java*)

xii

1038

The 3rd International Conference on Mathematics and Statistics (ICoMS-3)
Institut Pertanian Bogor, Indonesia, 5-6 August 2008

STATISTICAL MODELS
FOR SMALL AREA ESTIMATION
1

Khairil A Notodiputro, 2Anang Kurnia, and 3Kusman Sadik

1,2,3

Department of Statistics, Bogor Agricultural University, Jl. Meranti, Wing 22 Level 4
Kampus IPB Darmaga, Bogor – Indonesia 16680
e-mail : 1khairiln@bima.ipb.ac.id,2anangk@ipb.ac.id, 3kusmansadik@yahoo.com

Abstract. Small Area Estimation (SAE) is a statistical technique to estimate parameters of
sub-population containing small size of samples with adequate precision. This technique is
very important to be developed due to the increasing needs of statistic for small domains,
such as districts or villages. Some SAE techniques have been developed in Canada, USA, and
UE based on real data. We adapted these techniques to produce small area statistic in
Indonesia based on national data collected by Badan Pusat Statistik. . In this paper we
propose a class of generalized additive mixed model to improve the model of auxiliary data in
small area estimation. Moreover since some surveys are carried out periodically so that the
estimation could be improved by incorporating both the area and time random effects we also
proposed a state space model which accounts for the two random effects.
Keywords: small area estimation, generalized additive mixed model, block diagonal
covariance, Kalman filter, state space model

1. Introduction
Small Area Estimation (SAE) is an important concept in survey sampling especially for indirect
parameter estimation of relatively small samples. This method can be used to estimate parameters of sub
population (a domain which is smaller than population). Direct estimation for sub population fails to
provide enough precision since the sample size is small.
Another method which can be used to obtain higher precision in small area estimation may be
developed by linking some information in particular area with some other areas through appropriate
model. This procedure is called indirect estimation. The procedure involves data from other domains. In
other words, small area estimation model is borrowing strength from sample observation of related areas
through auxiliary data (recent census and current administrative records) to increase effective sample size
(Rao, 2003).
In this paper we will discuss small area estimation through indirect method or estimation based
models. One of the problems found in using this procedure is low precision of linear model for modeling
of auxiliary data. In this paper we propose a class of generalized additive mixed model to improve the
model of auxiliary data in small area estimation. Moreover since some surveys are carried out
periodically so that the estimation could be improved by incorporating both the area and time random
effects we also proposed a state space model which accounts for the two random effects.

2. Brief review of related topics
2.1 Small area estimation based on linear mixed model
There are essentially two-types of models in small area estimation. The first is area level model that
relate small area direct estimator to area-specific auxiliary data xi = (x1i, x2i, …, xpi). We assume the
parameter of interest θi = xi’β + υi where υi ~ N(0, A) and direct estimator θ̂ i = θi + ei where ei|θi ~ N(0,
Di) and Di known. The model combines the parameter of interest and the indirect estimates forms θ̂ i =
xi’β + υi + ei which is a case of generalized linear mixed model. The second is unit level model. In this

677

model the information is available at the sampling unit level and modeling is done based on individual
data xij = (x1ij, x2ij, …, xpij) and we have model yij = xijTβ + υi + ei that is a more complex model.
We consider the following Fay-Herriot model (see Fay and Herriot, 1979) for the basic area level
model
yi = xi ’β + υi + ei
where υi and ei are independent with υi ~ N(0, A) and ei ~ N(0, Di) for i = 1, 2, ..., k. We assume that
β and A unknown but Di (i = 1, 2, ..., k) are known.
The best predictor (BP) of θi = xi’β + υi if β and A known is given by

θ̂i

BP

= θ̂i (yi|β, A) = xi’β + (1 – B i)(yi - xi’β)

where Bi = Di/(A + Di) for i = 1, 2, ..., k. Let θ̂i BP = θ̂i (yi|β, A) is also Bayes estimator of θi under the
following Bayesian models:
(i)
yi|θi ~ N(θi, Di)
(ii)
θi ~ N(xi ’β, A) is prior distribution for θi , i = 1, 2, ..., k.
The Bayes estimator is given from the posterior distribution

(

)

)

(

y
x 'β
-1
(θi|yi, β, A) ~ N Di + iA ,( D1 + A1 )
= N xi' β +
i
i

A
A +D i ( y i

- x i ' β ),

AD i
A + Di

)

It follows that

θ̂i EB = E(θi|yi, β, A) = xi’β + (1 – B i)(yi - xi’β)
AD

where MSE( θ̂i EB) = Var(θi |yi, β, A) = A + Di = (1 – Bi)D i = g1i(A).
i

The estimator θ̂i BP is equivalent

with θ̂i EB for normally distributed cases.
When A is known, β could be estimated using the weighted maximum likelihood method
log L(β, V) = -

1
1
log|V| - (Y - Xβ)’ V -1(Y - Xβ)
2
2

where V = Diag(A + D1, A + D 2, ..., A + Dk).
Let β* = β̂ i (A) = (X’V-1X)-1 X’V-1Y and by replacing β with β* in the θ̂i BP, we get the best linear
unbiased predictor (BLUP) of θi given by

θ̂i BLUP = θ̂i (yi|A) = xi’β* + (1 – B i)(yi - xi’β*)
Ghosh and Rao (1994) describe the MSE( θ̂i BLUP) = g1i(A) + g2i(A), where
g1i(A)
= AD i = (1 – Bi)D i, and
A + Di
g2i(A)
= Di2/(A + Di) [xi’(X’V-1X)-1 xi]
= Di (1 – Bi) [xi’(X’V-1X)-1 xi] untuk i = 1, 2, …, k.
However, in practice both β and A is unknown. To estimate A, we can use maximum likelihood
(ML), restricted/residual maximum likelihood (REML) or method of moment (MM). If we replace β by
BLUP
) estimator, we get the empirical best linear unbiased predictor
β̂ and A by  in the BLUP ( θ̂i
(EBLUP)

θ̂i

EBLUP

= θ̂i (yi| Â ) = xi’ β̂ + (1 – B̂ i )(yi - xi’ β̂ )

If defined MSE of θ̂i EBLUP is MSE( θ̂i EBLUP) = E( θ̂i EBLUP - θi)2 = Var( θ̂i EBLUP) + (Bias θ̂i EBLUP)2,
Kacker and Harville (1984) reformulated it as
MSE( θ̂i EBLUP )

= MSE( θ̂i BLUP) + E( θ̂i EBLUP - θ̂i BLUP )2
= H1i(A) + H2i(A)

678

where H1i(A) = MSE( θ̂i BLUP) = g1i(A) + g2i(A) and H2i(A) = E( θ̂i EBLUP - θ̂i BLUP)2. Leading term g1i(A)
lead to large reduction in MSE relative to the MSE of the direct estimator, g2i(A) is due to estimating of β
and H2i(A) is due to estimating A.
Prasad and Rao (1990) used the Taylor series method to estimate g1i(A), g2i(A) and H2i(A). The MSE
estimator of θ̂i
MSE( θ̂i EBLUP )PR = g1i ( Â ) + g2i( Â ) + 2 g3i( Â )
2D i 2

where g3i( Â ) =
2

k (A + D i )

3

k
∑ (A + D j )2 . The MSE( θ̂i )

PR

is identical to the Bayes risk as defined by Butar

j= 1

and Lahiri (2001).
2.2 Generalized additive (mixed) model
Multiple regression analysis is one of the most widely used statistical techniques. It is a powerful tool
when its assumptions are met, including that the relationships between the predictors and the response are
well described with a defined function (e.g., straight-line, polynomial, or exponential). In many
applications, however, the reliance on a defined function is limited. Many phenomena do not have a
relationship that can be easily defined.
To overcome the above difficulties, Stone (1985) proposed the additive model to solve them. These
models estimate an additive approximation to the multivariate regression function. The advantages of this
approximation are at least twofold. First, since each of the individual additive terms is estimated using a
univariate smoother, the curse of dimensionality is avoided, at the cost of not being able to approximate
universally. Second, estimates of the individual terms explain how the dependent variable changes with
the corresponding independent variables.
In general, generalized additive models (GAM) enable us to relax this assumption by replacing a
defined function with a non-parametric smoother to uncover existing relationships. Smoothing is a
method that will highlight a trend by separating it from variability due to noise. Several different
smoothers are available, but the most commonly used are spline or loess. Smoothers have a parameter
that can be used to control the closeness of the fit of the trend to the data. For detail about GAM please
refer to Hastie and Tibshirani (1990).
GAM is additive models since they simultaneously fit the distinct effects of each independent
variable. Each effect can be estimated using either a smoother or a defined function, leading to the
description of GAM as semi parametric. GAM is appropriate under the assumption of the absence of
interaction effects.
GAM also offers the added flexibility of permitting non-normal error distributions. This allows
modeling response variables with distributions such as binomial and Poisson. Generalized Additive
Mixed Models (GAMM) has also been recently developed to incorporate random effects, which are an
additive extension of Generalized Linear Mixed Model (GLMM) in the spirit of Hastie and Tibshirani
(1990).
Let Y be a response random variable and X1, X2, ... , Xp be a set of predictor variables. A regression
procedure can be viewed as a method for estimating the expected value of Y given the values of X1, X2, ...
, Xp. The standard linear regression model assumes a linear form for the conditional expectation
E(Y | X 1, X2, …, Xp) = β0 + β1X1 + β 2X2 + … + βpXp
Given a sample, estimates of β0, β1, β 2, …, β p are usually obtained by the least squares method. The
additive model generalizes the linear model by modeling the conditional expectation as
E(Y | X1, X2, …, Xp) = s0 + s1(X1) + s2(X2) + … + sp(Xp)
where si(X), i = 1,2, ... , p are smooth functions.
In order to be estimable, the smooth functions si have to satisfy standardized conditions such as Esj(Xj)
= 0. These functions are not given a parametric form but instead are estimated in a nonparametric fashion.
While traditional linear models and additive models can be used in most statistical data analysis, there are
types of problems for which they are not appropriate. For example, the normal distribution may not be
adequate for modeling discrete responses such as counts or bounded responses such as proportions.

679

Generalized additive models address these difficulties, extending additive models to many other
distributions besides just the normal. Thus, generalized additive models can be applied to a much wider
range of data analysis problems. Similar to generalized linear models, generalized additive models consist
of a random component, an additive component, and a link function relating the two components. The
response Y, the random component, is assumed to have exponential family density. The mean of the
response variable µ is related to the set of covariates X 1, X2, ... , X p by a link function g. The quantity

η = s o + Σ si ( Xi )
defines the additive component, where s1(·), ... , sp(·) are smooth functions, and the relationship between µ
and η is defined by g(µ) = η. The most commonly used link function is the canonical link, for which η =
θ.
Furthermore, Lin and Zhang (1999) proposed Generalized Additive Mixed Models (GAMM) for over
dispersed and correlated data. They explored the Generalized Linear Mixed Model (GLMM)
representation of the smoothing spline estimators and estimated the smoothing parameter using REML.
Following Breslow and Clayton (1993), Lin and Zhang (1999) used Double Penalized Quasi-Likelihood
to estimate beta and REML is used to estimate the variance components.

3. The GAMM approach for small area estimation
Rao (2003) provided extensive review of the most commonly used estimators, including synthetic and
composite estimator, empirical best unbiased linear predictors, empirical Bayes and hierarchical Bayes
approach. These estimators are based on parametric approach. We propose a class of nonparametric
approach, generalized additive mixed model (GAMM). The GAMM approach has significant advantages
over its parametric approach to model auxiliary variable. The adoption of this approach in small area
estimation is straight forward.
Consider an extension of the Fay-Herriot model for the basic area level model
yi = xi’β + υi + ei , i = 1, 2, ..., k
where β is coefficient regression parameters, υi are random effect area, and ei are sampling errors. Also
assume ei ~ (0, Di), υi ~ (0, A) and that they are independent. Di is usually assumed to be known, see Rao
(2003).
We assume that yi and xi are related by a smooth function m(.). Let X be the random vector of
predictors, thus
yi = m(xi) + υi + ei , i = 1, 2, ..., k
where υi |X ~ (0, υ(xi)), ei ~ (0, D i), and ei and υi are independent. The small area mean functions is
θi(xi) = m(xi) + υi
are linear combination of mean m(xi) and the random effects υi. We can use an estimator of the mean
function using a linear smoother such as smoothing splines, regression splines, and local polynomial
regression. For detail discussion of these methods, see Hastie and Tibshirani, (1990).
If we use Kernel smoothing function to estimate m(xi), the best predictor for small area means θi can
be written as
E(θi|yi) = γi yi + (1 - γi) m̂ h (xi)
where γi = υ(xi) / (υ(xi) + Di). To approximate MSE, we substitute xi ’β in linear mixed model with
m̂ h (xi).
ˆ u2
mse( θ̂i ) = Di σ

ˆ u2
Di + σ

+ (1- ˆγ )2 mse (m
ˆ h ( x i ) ) + 2D i2 σ
ˆ u2 + Di

(

)-3 mse (σˆ 2u )

4. Evaluation and application of GAMM
We illustrate the GAMM approach using two data set. The first data was hypothetical data for 32
small area where υi and ei have normal distribution with mean 0 and variance 1. Y, which is the variable
that we are interested is, define as function of X2 and X is auxiliary data. GAMM approach show better
prediction than EBLUP estimator. The mean absolute relative estimation (MARE) of GAMM approach

680

is 0.0193 and the EBLUP estimator is 0.0212. Further, the relative root mean square error (RRMSE) of
GAMM approach is 0.0289, while the EBLUP estimator is 0.0327/
The second data set was real data taken from PODES 2005 and SUSENAS 2005 for Bogor
Municipality. Both data were collected by BPS (Statistics Indonesia). Y is unemployment level which is
indicated as percentage of unemployment from group of “age work” for each village in Bogor
Municipality. Percentage of men (X2), percentage of non-permanent housing (X5), percentage of letter
poor statement (X7), and percentage of pre prosperous-family and prosperous-family 1 (X8) are used as
auxiliary variable.
Table 1. Estimator of Unemployment Level in Bogor Municipality

Village
1002
1005
1006
1009
1013
1015
1016
2002
2006
3001
3002
3004
3006
3007
3008
4002
4004

Pamoyanan
Kertamaya
Rancamaya
Muarasari
Batutulis
Empang
Cikaret
Sindangrasa
Sukasari
Bantarjati
Tegalgundil
Cimahpar
Cibuluh
Kedunghalang
Ciparigi
Gudang
Tegallega

Direct

GAMM

13.04
8.42
25.00
1.85
6.38
3.33
9.80
1.67
8.33
5.45
6.90
3.28
10.53
9.09
4.88
14.81
2.27

12.64
8.86
23.36
1.97
6.46
3.42
9.74
1.75
8.21
5.56
6.98
3.59
10.91
8.94
5.16
14.48
2.53

EBLUP
13.03
8.43
24.94
1.85
6.39
3.34
9.80
1.67
8.33
5.46
6.90
3.29
10.53
9.09
4.88
14.79
2.28

Village
4006
4010
5002
5003
5004
5006
5008
5009
5012
5015
6001
6003
6004
6005
6007
6009
6011

Sempur
Kebonkelapa
Pasirkuda
Pasirjaya
Gunungbatu
Menteng
Cilendek Barat
Sindangbarang
Situgede
Curugmekar
Kedungwaringin
Kebonpedes
Tanahsareal
Kedungbadak
Sukadamai
Kayumanis
Kencana

Figure 1. Scatter plot of auxiliary variable

681

Direct
10.94
12.07
20.00
13.51
10.64
10.91
16.67
6.38
4.00
10.42
6.38
9.43
11.54
6.38
12.50
5.45
6.25

GAMM
10.38
12.06
17.60
12.91
10.31
10.91
15.81
6.72
4.24
10.25
6.33
9.55
10.92
6.35
11.99
5.56
6.57

EBLUP
10.93
12.07
19.95
13.49
10.63
10.90
16.64
6.39
4.00
10.41
6.39
9.44
11.53
6.38
12.49
5.47
6.26

Table 1 exhibits the results from each method to estimate unemployment level in Bogor Municipality.
The RRMSE for direct estimator, GAMM approach and EBLUP are 0.0361, 0.0326 and 0.0335. Actually
all of the estimators support direct estimator. The possible factors which can affect this condition is
variance between small area that was higher than variance sampling error within small area. However,
the GAMM approach was able to reduce the auxiliary variable influence which was not linear. Figure 1
shows the scatter plot of auxiliary variable while X2 and X7 have not linearity between the auxiliary and
the response interest.
It is shown in our study that generalized additive mixed model outperforms generalized linear mixed
model in EBLUP at least in two aspects. First, generalized additive mixed model relaxes the assumption
of linearity between the predictors and the response and avoids the problem of model misspecification
that often happened in EBLUP. Secondly, by incorporating nonlinear effects, generalized additive mixed
model helps to discover the hidden pattern of predictors and therefore improves the predictive
performance.

5. State space models
Many sample surveys are repeated in time with partial replacement of the sample elements. For such
repeated surveys considerable gain in efficiency can be achieved by borrowing strength across both small
areas and time. Their model consist of a sampling error model

θˆ it = θit + eit, t = 1, …, T; i = 1, …, m
θit = zitTβ it
where the coefficients β it = (βit0, β it1, …, βitp) are allowed to vary cross-sectionally and over time, and
the sampling errors eit for each area i are assumed to be serially uncorrelated with mean 0 and variance
ψit. The variation of βit over time is specified by the following model:
T

β itj 
β i ,t −1, j  1 
 β  = T j  β  +   vitj , j = 0,1,..., p
 ij 
 ij  0
It is a special case of the general state-space model which may be expressed in the form
yt =
Ztα t + ε t;
E(εt) = 0,
E(εtεtT) = Σt
αt =
H tαt-1 + Aηt;
E(η t) = 0,
E(ηtη tT) = Γ
where εt and ηt are uncorrelated contemporaneously and over time. The first equation is known as the
measurement equation, and the the second equation is known as the transition equation. This model is a
special case of the general linear mixed model but the state-space form permits updating of the estimates
over time, using the Kalman filter equations, and smoothing past estimates as new data becomes
available, using an appropriate smoothing algoritm.
~ be the BLUP estimator of α based on all
The vector αt is known as the state vector. Let α
t-1
t -1

~ = Hα
~ is the BLUP of α at time (t-1). Further, P = HP HT +
α
t
t|t-1
t-1
t |t -1
t -1
~
T
AΓA is the covariance matrix of the prediction errors α t |t -1 - αt, where
~ - α )( α
~ - α )T
P = E( α
observed up to time (t-1), so that

t-1

t -1

t-1

t -1

t-1

is the covariance matrix of the prediction errors at time (t-1). At time t, the predictor of αt and its
covariance matrix are updated using the new data (yt, Zt). We have
~
~
y t - Zt α
t|t -1 = Zt(αt - α t |t -1 ) + εt

~ , Z= Z, v = α which has the linear mixed model form with y = yt - Zt α
t
t
t |t -1

~ , G = P and V = F ,
α
t|t-1
t
t |t -1

where Ft = ZtPt|t-1ZtT + Σt. Therefore, the BLUP estimator ~
v = GZTV-1y reduces to

~ =α
~
~
-1
T
α
t|t -1 + Pt|t-1Zt Ft (yt - Zt α t |t -1 )
t -1

6. Application of state space models
Model of small area estimation can be applied to estimate the average of households expenditure per
month for each of m = 37 counties in East Java, Indonesia. We used Susenas data (National Economic

682

and Social Survey, BPS 2003-2005) to demonstrate the performance of EBLUP resulted from state space
models .
Table 2. Design Based and Model Based Estimates of County Means and Estimated Standard Error

County

µ̂ i
Pacitan
Ponorogo
Trenggalek
Tulungagung
Blitar
Kediri
Malang
Lumajang
Jember
Banyuwangi
Bondowoso
Situbondo
Probolinggo
Pasuruan
Sidoarjo
Mojokerto
Jombang
Nganjuk
Madiun
Magetan
Ngawi
Bojonegoro
Tuban
Lamongan
Gresik
Bangkalan
Sampang
Pamekasan
Sumenep
Kota Kediri
Kota Blitar
Kota Malang
Kota Probolinggo
Kota Pasuruan
Kota Mojokerto
Kota Madiun
Kota Surabaya

Model Based (Indirect Estimator)
EBLUP(state space)

Design Based
(Direct Estimator)

4.89
5.5
5.3
6.78
5.71
5.62
5.94
5.07
4.65
5.98
4.53
4.67
5.54
6.31
9.33
6.91
6.09
5.56
5.5
5.52
4.89
5.06
6.02
6.29
8.49
6.61
6.32
5.78
5.48
8.01
7.98
11.14
9.1
7.75
9.45
8.4
11.45
Mean

EBLUP

s( µ̂ i )

µ̂ iH

s( µ̂ iH )

µ̂ iss

s( µ̂ iss )

0.086
0.148
0.135
0.229
0.132
0.105
0.128
0.119
0.090
0.142
0.127
0.104
0.154
0.151
0.169
0.160
0.131
0.125
0.139
0.161
0.102
0.093
0.114
0.106
0.186
0.140
0.158
0.107
0.108
0.159
0.191
0.298
0.183
0.149
0.204
0.162
0.328
0.149

3.89
5.83
6.89
7.06
5.74
7.09
6.58
4.75
4.96
5.55
4.64
5.89
6.07
4.95
9.46
6.55
5.06
4.40
5.16
4.84
4.61
5.25
5.75
6.47
9.07
5.69
7.20
6.10
5.76
7.60
7.63
12.63
7.68
8.09
9.51
8.33
11.81

0.062
0.149
0.155
0.215
0.198
0.110
0.112
0.118
0.126
0.124
0.092
0.085
0.184
0.121
0.177
0.135
0.130
0.041
0.116
0.145
0.097
0.067
0.061
0.123
0.167
0.091
0.150
0.126
0.077
0.157
0.159
0.273
0.140
0.085
0.235
0.150
0.353
0.138

5.23
5.73
5.65
7.05
6.12
6.45
5.19
5.74
5.28
6.15
5.43
4.44
7.34
6.39
8.32
8.25
5.96
4.87
5.46
4.16
4.15
4.50
6.47
5.69
9.01
7.00
6.85
5.93
5.09
7.11
8.51
11.61
10.50
8.41
9.01
7.62
11.16

0.038
0.132
0.161
0.172
0.141
0.091
0.109
0.081
0.113
0.131
0.105
0.074
0.186
0.109
0.123
0.107
0.091
0.029
0.121
0.132
0.086
0.047
0.046
0.065
0.198
0.076
0.182
0.109
0.032
0.144
0.182
0.225
0.153
0.072
0.211
0.196
0.321
0.124

Table 2 shows the design based and model based estimates. The design based estimates is direct
estimator based on sampling design. EBLUP estimates, µ̂ iH , used small area model with area effects
(data of Susenas 2005) whereas, EBLUP(ss) estimates, µ̂ iss , used small area model with area and time
effects (data of Susenas 2003 to 2005). The estimated standard errors are denoted by s( µ̂ i ), s( µ̂ iH ), and
s( µ̂ iss ). It is clear from Table 1 that the estimated standard errors of mean for the model based is less than
the estimated standard error for the estimates design based. The estimated standard error mean of
EBLUP(ss) is less than EBLUP.

683

7. Conclusion
Small area estimation can be used to increase the effective sample size and thus decrease the standard
error. The above methods showed that gain in efficiency can be achieved by borrowing strength across
small area as well as time. Availability of good auxiliary data and determination of suitable linking
models are crucial to the formation of indirect estimators.

8. Acknowledgements
This work was supported by a research grant from DGHE Ministry of National Education Republic of
Indonesia: Development of Small Area Estimation and Its Application for BPS’ Data, Batch IV 2nd years
(2007).

9. References
Butar, F.B. and Lahiri, P. 2001. “On Measure of Uncertainty of Empirical Bayes Small Area Estimator.
Journal of Statistical Planning and Inference.
Breslow, N.E. and Clayton, D.G. 1993. Approximate inference in generalized linear mixed models.
Journal of the American Statistics Association, Vol. 88, pp. 9-25.
Fahrmeir, L. and Lang, S. 2001. Bayesian inference for generalized additive mixed models based on
Markov random field priors. Journal of the Applied Statistics, Vol. 50, part 2, pp. 201-220.
Fay, R.E. and Herriot, R.A., (1979), “Estimates of income for small places: An application of James-Stein
procedures to Census data”. Journal of the American Statistical Association, Vol. 74, p:269-277
Ghosh, M. and Rao, J.N.K. 1994. “Small Area Estimation: An Appraisal”. Statistical Science, 9, No.1
p:55-93.
Jiang, J. 1996. “REML estimation: Asymptotic behavior and related topics”, Annals of Statistics, 24,
:255-286.
Jiang, J., Lahiri, P. and Wan, S.M. 2002. A Unified Jackknife Theory, Annals of Statistics, 30.
Hastie, T. and Tibshirani, R. 1990. Generalized Additive Models. London: Chapman and Hall.
Lin, X and Zhang, D. 1999. Inference in generalized additive mixed models by using smoothing splines.
Journal of the Royal Statistics Society Series B, Vol. 6 part 2, pp. 381-400.
McCulloch, C. 1997. Maximum likelihood algorithms for generalized linear mixed models. Journal of
the American Statistics Association, Vol. 92, pp. 162-190.
Prasad, N.G.N. and Rao, J.N.K. 1990. “The Estimation of Mean Squared Errors of Small Area
Estimators”. Journal of American Statistical Association, 85, pp. 163-171.
Rao, J.N.K. 1999. Some Recent Advances in Model-Based Small Area Estimation, Survey Methodology,
Vol.25 No.2, pp. 175-186.
Rao, J.N.K. 2003. Small Area Estimation, New York : John Wiley and Sons.
Rao, J.N.K. 2005. Inferential Issues In Small Area Estimation: Some New Developments. Statistics In
Transition, December 2005 Vol. 7, No. 3, Pp. 513—526.
Zeger, S.L. and Karim, M.R. 1991. Generalized linear model with random effects: a Gibbs sampling
approach. Journal of the American Statistics Association, Vol. 86, pp. 79-86.

684