The Determinants of Birthweight: Addressing Potential Sample Selection Bias from Babies Who Are Not Weighed at Birth - repository civitas UGM

P a n in fs a n k

TC:JlAH\
oRNGI N
,
~

S ID O M lIN C lI[

GATRA
~

r ~

utsrqponmlkjihgfedcbaZYXWV
~!I

,;

G :.


L

:..

-.

T h e D e t e r m in a n t s o f B ir t h w e ig h t : A d d r e s s in g P o t e n t ia l
S a m p le S e le c t io n B ia s f r o m B a b ie s W h o A r e N o t W e ig h e d a t
B ir t h !
H eni W ahyuni

1

I n t r o d u c t io n utsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA

The empirical literature reviewed with regard to the infant health production function has
focused on the issues relating to endogeneity and sample selection biases, caused by unobserved
health heterogeneity and the pregnancy-resolution

decision (Liu 1998;


ROllS,

JewellJIHGFEDCBA
& Brown

2004). The first bias relates to endogeneity of prenatal care, while the second, in existing studies,
arises from a given woman's

decision to abort or continue her pregnancy.

Specifically,

unobservable factors that may influence a woman's decision to proceed with the pregnancy or
abort are factors that are also likely to influence her use of prenatal care and birth outcomes,
particularly birthweight. Sample selection bias relating to the decision to abort is unlikely to be a
problem in Indonesia, where abortion is socially unacceptable and only conducted for medical
reasons. There is a potential for selection bias, however, due to non-random missing information
on birthweight.
The potential sample selection bias that arises from birthweight being missing for some babies

(those not weighed at birth) is a common issue in developing countries and generally does not
occur in studies of birthweight in developed countries. If the birth weight information in the
sample is not missing at random, however, the analysis of the determinants of birthweight
(without considering unreported birthweight) will be biased (Heckman 1979). This represents a
possible sample selection issue, given that the data on a key variable (birthweight) are available

This paper has been presented at the National Seminar of the 60th anniversary of the Faculty of Economics and
Business UGM Seminar, Balancing Indonesian Economy: Governance and Accountability, Ethics, and Strategy
toward Inclusive Growth, 19 September 2015.
2 Heni Wahyuni is a lecturer and researcher at Faculty of Economics and Business, Universitas Gadjah Mada. Email: [email protected]
1

only for a subset of the population, who are not weighed at birth. This is often referred to asONMLKJIHGFEDCBA
in c id e n ta l tr u n c a tio n

(Wooldridge 2002, p. 552).

Relatively few studies, however, have investigated the potential of sample selection bias from
unweighed babies in the relationship between prenatal care and infant health in developing
countries, including Indonesia. Among those few studies, two significant studies have considered

this issue (HabibovJIHGFEDCBA
& Fan 2011; Mwabu 2009). Habibov and Fan (2011) used 73 percent of all
live births with birthweight in Azerbaijan to analyze the effect of prenatal care on birthweight.
Mwabu (2009) reports that only 17 percent of babies delivered at home and 75 percent of babies
delivered at modem facilities in Kenya have a reported birthweight. Those two studies tested the
potential bias, due to unweighed babies, and found no evidence of selection bias in their data.
One study has examined the impact of the village midwife program on birthweight in Indonesia
(Frankenberg

&

Thomas 2001), but it does not take into account the selection problem, arising

due to some babies not being weighed at birth.
This study will use the Indonesia Family Life Survey (lFLS) data, which are IFLS3 and IFLS4
data. The focus of the study is to test whether there is a sample selection bias on the determinants
of birthweight. Specifically, birthweight is the outcome of interest, but I will observe this
outcome, conditional upon whether or not the baby has been weighed at birth. The IFLS data for
live-birth babies, born during 2002-08, inclusively, indicates that approximately


II percent of

babies were not weighed. It is not appropriate to eliminate unweighed babies from the sample
and only include an analysis of the pregnancy outcome (birthweight) of the subset of mothers
whose babies were weighed, unless the birth weight and whether or not the baby was weighed are
independent. Otherwise, it could lead to biased estimates. Furthermore, in Indonesia, the IFLS
data shows that babies not weighed at birth are more likely to be born at home or in the office of
midwives with a traditional birth assistant, as well as to low-income and less educated mothers.
Previous studies, for example Mwabu (2009), use various instruments such as money prices,
time prices, household assets and income, environmental factors (rainfall), interaction terms
between land and mean long-term rainfall, and between cattle and mean long-term rainfall.
However, these are not available in the IFLS data.

2

A M o d e lin g F r a m e w o r k

H e c k m a n S a m p le S e le c t io n M o d e l utsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA

In an attempt to analyze the potential

Heckman

selection

model.

selection

The model

problem

consists

from unweighed

of two equations.

babies, I have used a


The selection

equation

(I)

represents whether the baby was weighed and the outcome equation (2) relates to birthweight,

(1)

_ {1ONMLKJIHGFEDCBA
JIHGFEDCBA
00
ifz i
O ·f

Zi -

l


>

*

Zi

-

-

xdJ +

f

>

E iifz i

0


Y i = l-u n o b s e r v e d ifz ;

where

z; is a latent

=

variable

(2)

0
measuring

vector of factors known to influence

the propensity

of a baby to be weighed at birth;


a baby's to be weighed

usage, as well as the age of the mother when pregnancy
schooling,

household

characteristics

(HH index),

whether she smoked before or during pregnancy,
during the pregnancy,
pregnancy,

baby-specific

ended, per capita expenditure,


health condition

(singleton

year of birth), and the cost of the delivery;

Ui

is a

at birth that includes prenatal care

(general

years of

health and 8MI),

and whether she experienced

characteristics

Wi

any complication

or multiple birth, gender, order of
contains

any unmeasured

factors in

equation (I).

We do not actually observe
I if the baby was weighed
sample for the birthweight

z;.

All we observe is a dichotomous

and zero, otherwise;
equation,

equation (1) is a dummy variable,

baby was weighed or not and the dependent
(in kilograms).

included, as independent
identified, the selection

The independent

variables
equation

there is only a selected

(censored)

Yi.

The dependent variable in the selection

birthweight

however,

variable z . , where it is equal to

variable

in the outcome

variables

indicating whether the

equation

in the selection

(2) is the baby's

equation

(1) are also

(xa, in the outcome equation (2). In order for the model to be
should also have at least one independent

variable that is not
3

included in the outcome equation. Otherwise, the model is identified only by functional form,
and the coefficients have no structural interpretation (see identification section).
The disturbance term is assumed to be normal with:

u(-N(O,l)
Er-N(O,l)

WhenONMLKJIHGFEDCBA
p JIHGFEDCBA
"* 0, standard OLS techniques applied to the birthweight equation will yield biased
results. If p

=

0, then there is no selection problem and the standard OLS model is appropriate. It

is important to test if there is a potential selection problem - if babies that were not weighed at
birth to mothers who have different characteristics from those mothers whose babies were
weighed. The null hypothesis of no selection bias is Ho: P

=

O. The Heckman selection model is

used to determine whether there is a selection problem or not. The sample is split between babies
that were or were not weighed directly after delivery.
Identification
Identification is very important in the system of equations. As explained previously, most
unweighed babies were delivered at home with midwives or traditional birth attendants. One
reason for mothers to choose to deliver the baby at home is because it may be less costly than
doing so in a modem health facility with professional birth attendants. Any variable, therefore,
!

that represents the cost of delivery may potentially become an identifier for the selection ~f
whether the baby is delivered at home or at a modem facility. It also can be said that this

1

identifier is an exclusion restriction in the selection equation, which is whether the baby was
weighed or not. In this analysis, the cost of delivery as an identifier will be applied in the
selection equation and it will be excluded from the birthweight equation, since it is unlikely that
any measure of delivery cost will influence birthweight.

4

Data
The sample for the empirical analysis is restricted to live births for pregnancies which ended
between the years 2002 and 2008, inclusively, for ever-married women in IFLS4 (2007-08). A
pooled cross-section approach is used, but it utilizes the panel nature of the IFLS data to provide
information on a range of explanatory variables. After excluding observations with missing
responses;' the sample consists of 4,436 live births in which the birthweight is observed (the
baby is weighed). Including live births in which the birthweight is not observed (the baby is not
weighed), the number of observations is 5,023.MLKJIHGFEDCBA

D e s c r ip t iv e

S t a t is t ic s

Table I presents the summary statistics for the sample, used in the empirical analysis, separately
tor mothers of babies that were weighed and not weighed. The average number of prenatal care
visits for mothers of babies weighed is 9.19, which is much greater than the 6.13 visits for those
unweighed. Similarly, the percentage of mothers who followed the WHO recommendation for
prenatal care visits is much higher for the weighed babies, approximately

84.49 percent,

compared with only about 54.86 percent for unweighed babies. The household index, as an
indicator of economic background, is higher for mothers of babies that were weighed than babies
that were unweighed. The average number of years of schooling was about 12 years or seniorhigh school level for mothers whose babies were weighed, but only about 7 years or graduated
from primary school for those whose babies were not weighed. The lower level of maternal
education relating to unweighed babies may indicate a limited knowledge about health, which
may negatively impact pregnancy outcomes. Moreover, more than 55 percent of mothers whose
babies were weighed lived in urban areas, compared to approximately 17 percent of those with
unweighed babies. The average age at the end of the pregnancy was about the same for both
groups: 27 years of age. Similarly, there was not much difference in the 13M!of mothers and the
general health condition of the mothers. There were very low rates of smoking behavior and
pregnancy complications during pregnancy for both categories of mothers.

Responses were coded as missing are defined as the responses with an illogical answer; the surveyor could not
meet the respondents, such as in the case of 8M! (height &weight measurement) or when the respondent refused to
answer.

3

5

The inclusion of unweighed babies is important because not only are there a significant
percentage (approximately II percent) of babies not weighed in the sample, but the observed
characteristics of mothers of babies not weighed are different from those whose babies were
weighed, thus increasing a potential for bias. More specifically, these mothers are from a lower
socioeconomic background, compared to those whose babies were weighed, in terms of per
capita expenditure, years of education, owning a television, using electricity, having good
drinking water (the household index), and living in rural areas.

Table 1: Statistics Descriptive of Variables Per Category of Birthweight and Definitions
(in percent, unless otherwise indicated)
Variable
__________________________________________________
Total number of prenatal care visit (visits)
WHO recommendation"
Household index

Birthweight
~N~o~t_w~e~ig=h~ed~
VVeighed
6.13
54.86

9.19
84.49

3.71

5.45

7.29

11.99

Age of mother (years)

27.47

27.71

Body mass index (index)

21.45

22.53

Healthy mother (general health condition)
First birth

87.56

88.89

55.54

64.29

57.92

29.13

Years of education (years)

th

Per capita expenditure below 25 percentile of population level
th
th
Per capita expenditure between 25 and 50 percentile of population
level

24.19

27.71

Per capita expenditure above 50th percentile of population level

17.89

Smoking behavior

43.17
1.24

Male baby

0.68
52.47

Singleton baby

92.84

94.21

Having pregnancy complication

14.31
17.38

15.89

Living in urban area
Cost of delivery of the baby (rupiahs)
Number of observations

126,338.20

50.97

55.95
892,835.30

4436
587JIHGFEDCBA

Source: IFLS3 (2000) and IFLS4 (2007-08).

WHO recommends that the minimum number of prenatal care visits during pregnancy be four; with at least one
visit in the first trimester of pregnancy, at least once in the second trimester, and at least twice in the third trimester
(World Health Organization 2005).

4

6

The delivery expense is the variable that differentiates mothers whose babies were and were not
weighed at birth. In general, difference in delivery cost is large between the two groups of
mother: 892,835.30 rupiahs (about A$100) for weighed babies and 126,338.20 rupiahs (A$14)
for unweighed babies. This is a key variable that will be included in the selection equation
(selection for weighed or not weighed), but will not be included in the outcome equation for
birthweight (in kilograms). Thus, it is assumed that the cost of delivery is an instrument in the
selection equation that will not influence birthweight.MLKJIHGFEDCBA

R e s u lt s a n d D is c u s s io n

Estimation results of the Heckman model with the selection problem are provided in Table 2.
The selection equation is the equation that represents whether the baby is weighed or not
weighed immediately following delivery. The estimate for rho indicates a weak correlation
between the selection and the birth outcome (jj

=

-0.0614).

The negative estimate of rho may

appear counterintuitive; however, it is not statistically significant. The associated Wald-test of
independence of equations is not statistically significant (chi2JIHGFEDCBA
= 0.63) with p-value

=

0.4289.

This suggests that the Heckman model with selection may not be appropriate in this case or, in
other words, there is no sample selection bias problem due to unweighed babies.
Although the Heckman model is not appropriate in this case, it is interesting to observe the
results of the selection equation. The estimated coefficient of money spent on delivery, as a key
variable, is significant in explaining whether the baby was weighed or not. Birthweights also are
more likely to be recorded for mothers who have more years of schooling, are from households
with a higher household socioeconomic index with per capita expenditure between 25th and

so"

percentiles of the population level. However, as there is no evidence of a selection bias, the
analysis can be continued on a single structural equation for birthweight, using Ordinary Least
Squares (OLS) regression.

7

Table 2: Heckman Selection Model Cor Weighed and Not Weighed Babies
Variable

Outcome

Selection

Coefficient (SE)

Coefficient (SE)

Total number of prenatal care visit

o.oi 17 (0.0059)**

0.1290 (0.0135)···

Total number of prenatal care visit squared

-0.0002 (0.0002)

-0.0034 (0.0005)··*

Household index

-0.0064 (0.0072)

0.1848 (0.0229)**·

0.0017 (0.0022)

0.0497 (0.0075)···

Age of mother between 25-34 yrs

0.0452 (0.0198)**

0.1166 (0.0683)·

Age of mother 35 and older

0.0832 (0.0304)***

0.1391 (0.0997)

Body mass index

0.0435 (0.0126)***

0.0145 (0.0638)

Years of education
Age of mother less than 25 yrs (baseline)

I

,J

Body mass index squared

-0.0006 (0.0002)**

Dummy if Body mass index is imputed

-0.0810 (0.1691)

5.2062 (0.1852)···

Healthy mother (general health status)

-0.0098 (0.0296)

0.0573 (0.0923)

-0.0735 (0.0180)***

0.1 096 (0.063)·

-0.0203 (0.0242)

0.1377 (0.0722)*

-0.0349 (0.0237)

0.0977 (0.0781)

utsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
First birth
Per capita expenditure below 25th percentile of
population level (baseline)
Per capita expenditure between 25th and so"
percentile of population level
Per capita expenditure above 50th percentile of
population level

0.0003 (0.0013)

0.0931 (0.0948)

0.5679 (0.3416)*

Male baby

0.0838 (0.0169)"*

0.0003 (0.0544)

Singleton baby

0.2911 (0.0489)"*

0.1 02 (0.1392)

Having pregnancy complication

0.0365 (0.0250)

-0.0456 (0.0886)

-0.0158 (0.0051)"*

-0.0009 (0.0181)

Smoking behavior

Year of baby's birth
Cost of delivery of the baby
Constant
Rho
Wald test of independent equation (chi2)=0.63

0.0000015 (0.0000005)"*
2.2066 (0.1719)***

-2.1115 (0.7648)*"

-0.0614 (0.0775)

Probability> chi2 = 0.4289
Number of observation
4436
587
••• Significant at a ) percent level; ** at a 5 percent level; * at a ) 0 percent level.

Additional (Sensitivity) Analysis
Variables, such as years of education, BMI, general health status, household index, and per
capita expenditure have missing information. Observations with missing information were
excluded from the regression analyses. As a sensitivity analysis, the model was re-estimated,
using imputed values for the missing data. The median value for continuous variables and zero
values for dummy variables were applied to replace the missing values. Dummy variables,
8

iildicnting that the data were missing,

results of these sensitivity

analyses

Wald-test

as additional

were qualitatively

lor rho indicates a weak correlation
The associated

were included

between

of independence

variables.

the same as those reported.

selection

is not statistically

that the reported

outcome

The

The estimate

(jjJIHGFEDCBA
= -0.0718814 ).

and birth outcome

of equations

0:.)'-;) with p-value = 0.3297. This indicates

explanatory

significant

equations

(chi '

=

are robust to

»ussingness and the evidence of the sample selection problem has not been found.MLKJIHGFEDCBA

'C h a p te r C o n c lu s io n

This study has analyzed
'wcip,hed immediately

the potential

after delivery

oi[..; because some pregnancies

selection

in Indonesia.

that arises when some babies are not

In other countries,

there is a sample selection

do not end in live birth due to abortion

resolution bias), which are not random.
the pregnancy-resolution

problem

decision

In Indonesia,

religious

is less of a concern,

decisions

and cultural

(pregnancy-

views indicate that

but there is a similar sample

selection

issue for unweighed babies.

In this study, I use the Heckman

selection model to test whether there is a potential selection bias

from babies that are not weighed.

The results show that I did not find evidence

problem from some babies not having been weighed
be continued on the sub-sample
One limitation

in Indonesia.

therefore,

can

of live-birth babies with reported birthweight.

has been noted

in the analysis.

The cost of delivery

women. It is difficult to use the cost of delivery that is measured
the data on the community

The analyses,

of a selection

(IFLSI). However, I use the data from IFLS 2007-08. There are many missing community

data

variables

from the community

module

delivery that is reported by individual

of communities

level because
1993

(not panel respondents)

on the sample

at the community

by the

IFLS

for new respondents

level is based

here is reported

in IFLS 2007. Therefore,

and so the analysis

from

it is difficult to use any

uses the information

of cost of

mothers,

9

R e f e r e n c e utsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
& Thomas, D. 2001, 'Women's health and pregnancy outcomes: do services
Frankenberg, E.JIHGFEDCBA
make a difference?',ONMLKJIHGFEDCBA
D e m o g r a p h y , vol. 38, no. 2, pp. 253-65.
Habibov, N.N. & Fan, L. 2011, 'Does prenatal healthcare improve child birthweight outcomes in
Azerbaijan? Results of the national demographic and health survey', E c o n o m i c s a n d
H u m a n B i o l o g y , vol. 9, no. 1, pp. 56-65.
Heckman, J.1. 1979, 'Sample selection bias as a specification error', E c o n o m e t r i c a , vol. 47, no.
1, pp. 153 - 61.
Liu, G.G. 1998, 'Birth outcomes and the effectiveness of prenatal care', H e a l t h S e r v i c e s
R e s e a r c h , vol. 32, no. 6, pp. 805-23.
Mwabu, G. 2009, 'The production of child in Kenya: a structural model of birth weight', J o u r n a l
o f A f r i c a n E c o n o m i e s , vol. 18, no. 2, pp. 212-60.
Rous, J.1., Jewell, R.T. & Brown, R.W. 2004, 'The effect of prenatal care on birthweight: a fullinformation maximum likelihood approach', H e a l t h E c o n o m i c s , vol. 13, no. 3, pp. 25164.
Wooldridge, J.M. 2002, E c o n o m e t r i c a n a l y s i s o f c r o s s s e c t i o n a n d p a n e l d a t a , The MIT Press
Cambridge, Massachusetts, London.
World Health Organization 2005, T h e W o r l d H e a l t h R e p o r t 2 0 0 5 : M a k e e v e r y m o t h e r a n d c h i l d
c o u n t , World Health Organization, Geneva.

lO

f,~ r -

,~ ,

Faculty of Economics

'

~

U N IV E R S IT A S

G A D JA H

H eni

/

/
/'

./,-;;="7._,<
"'",/
/.,/ .«.•••
. -1 > -c ..~
c..---. (/"c...

ti!;;" -

./

~

v

INNOVAII
INIIRACI
I NSPIR I

1955-2015

I~

th a t

Wahyuni,

ONMLKJIHGFEDCBA

FEB
UGM

M ADA

.- utsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA

T h is is c e r tify

,~

B u s in e s s .
andMLKJIHGFEDCBA

PaninBank

TEl lJl1 1\ l1N61N'
1 f SIOOMUNCUI.

P h .D .

. -.

.