Effect of Remediating MCAR Data to Least

Effect of Remediating MCAR
Data to Least Square Estimates
(LSE) of Non-Normal Data
MIRALUNA L. HERRERA
Caraga State University (CSU)

Missing Data Analysis
(Hair, et.al., 2007)
• What is missing data?
• What is the impact of missing data?
• How to identify missing data?
– Nature of missing data process
– Extent of missingness
– Randomness of missing data
process
– Method of remediating missing
data

Objectives
• To report the nature of missingness of the
samples with varying sample size in terms of:

– p-value deviation of sample sizes from the
MCAR missingness of the simulated data,
and
– rate of missingness across varying sample
size.

Objectives

• To compare the methods in remediating missing
data in terms of the bias of regression coefficient
and standard error across varying sample size
- compare correlated normal data, & correlated nonnormal data
- compare uncorrelated and correlated non-normal
data
- compare mean substitution, expectation
maximization and multiple imputation

The Data
• Work of Burdeos and Herrera (2011)
• 629 dengue incidence recorded in the Butuan

Medical Centre from June 2000 to July 2010
• Variables – age of patient & number of days
confined in the hospital
• Simulated data n =10, 20, 30, 50, 100 with
100 runs per n using R
• Regression coefficient and SE – using SPSS

Methodology
Generating 20% MCAR
Randomization of
missing values
Little MCAR Test
Computing b & SE
(in SPSS 15.0)
- correlated/uncorrelated variables

Computing % of
bias (in MSExcel)

Simulating MCAR 100 data

sets
n=10, 20, 30, 50, 100
Data Processing in
R-2.10.1-win32
Little MCAR Test in SPSS

Remediating Missing Values
Data Processing in SPSS 15.0
Mean Substitution
Expectation Maximization
Multiple Imputations

Data
Processing
in
R-2.10.1-win32
Data Entry
Note: Put NA for the missing values in the data set so that R executes the
command.
> age days mat mat[ ,1] mat[ ,2] mat


Simulating 100 runs of n paired data (n=10, 20, 30, 50,100)
> # number of simulation
nsim # number of values per simulation
> nval