Generalized linear mixed models of ordinal poverty response in nested area

(1)

GENERALIZED LINEAR MIXED MODELS

OF ORDINAL POVERTY RESPONSE

IN NESTED AREA

YEKTI WIDYANINGSIH

SCHOOL OF GRADUATE STUDIES BOGOR AGRICULTURAL UNIVERSITY

BOGOR 2012

(2)

(3)

iii THE STATEMENT OF DISSERTATION

AND SOURCES OF INFORMATION

I hereby declare that the dissertation entitled "Generalized Linear Mixed Models of Ordinal Poverty Response in Nested Area" is my own work under direction of the supervisory committee and has not been submitted in any form to any university. Sources of information derived or quoted from the work published or unpublished of other authors mentioned in the text and listed in the Bibliography (References) at the end of this dissertation.

Bogor, July 2012 Yekti Widyaningsih NIM G161070011

(4)

PERNYATAAN MENGENAI DISERTASI DAN SUMBER INFORMASI

Dengan ini saya menyatakan bahwa disertasi berjudul "Generalized Linear Mixed Models of Ordinal Poverty Response in Nested Area" adalah karya saya sendiri di bawah arahan para pembimbing dan belum diajukan dalam bentuk apa pun kepada perguruan tinggi mana pun. Sumber informasi yang berasal atau dikutip dari karya yang diterbitkan atau tidak diterbitkan dari penulis lain telah disebutkan dalam teks dan dicantumkan dalam Daftar Pustaka (References) di bagian akhir disertasi ini.

Bogor, July 2012

Yekti Widyaningsih NIM G161070011

(5)

v ABSTRACT

YEKTI WIDYANINGSIH. Generalized Linear Mixed Models of Ordinal Poverty Response in Nested Area. ASEP SAEFUDDIN, KHAIRIL A. NOTODIPUTRO, AJI HAMIM WIGENA.

The Linear Mixed Models in this study is a development of Spatial Generalized Linear Mixed Model proposed by Zhang and Lin (2008). As in Zhang’s and Lin’s model, spatial (regional) data in this study is concerned on the hotspot detection. Hotspot detection method used by Zhang and Lin was Circle Based Scan Statistic (SS) method of Kulldorf (1997), while research in this dissertation using Upper Level Set Scan Statistic (ULS) hotspot detection method of Patil and Taillie (2004). Application of this hotspot detection method begins by comparing the two methods through simulation to obtain 14 performance criteria, resulting that the ULS hotspot detection method is better than the other one. Furthermore, the ULS method is performed to detect hotspot of bad nutrition in some districts, the results are used as a covariate in the modeling. This study focuses on the development of models for regional data viewed from the proximity of nested observations. According to Cressie (1993) there is a tendency for adjacent observations have a stronger correlation than distant observations. In statistics, also could be said there are differences in the variation of individuals within a group with individuals between groups. This condition must be considered in the modeling. Generalized estimating equation (GEE) is a parameter estimation method accounts for the correlation between observations. Working correlation matrices (WCM) is an important part in the parameters estimation process. Three structures of correlation matrices are studied and implemented to know which structure is the most appropriate to the data. The results of parameters estimation of Nested GLM and Nested GLMM based on combinations of some WCMs and parameter estimation techniques were compared. Response variable used in the model is in ordinal scale having complexity in the modeling, which also a focus of this research, while response variable used in Zhang’s and Lin’s model is a count variable with Poisson distribution. This ordinal response is obtained by grouping the ranking result by ORDIT (Ordering Dually in Triangles) ranking method from Myer and Patil (2010). Through the development of the model in this study involving nested spatial data, better results is provided especially when using diagonal working correlation matrix.

Keywords: ranking, hotspot, scan statistics, upper level set, nested generalized linear mixed model, working correlation matrix, ordinal poverty response.

(6)

ABSTRAK

YEKTI WIDYANINGSIH. Model Campuran Linier Terampat untuk Respon Kemiskinan Ordinal dalam Area Tersarang. ASEP SAEFUDDIN, KHAIRIL A. NOTODIPUTRO, AJI HAMIM WIGENA.

Model Campuran Linier Terampat dalam penelitian ini merupakan pengembangan dari Spatial Generalized Linear Mixed Model (Spatial GLMM) yang sudah dikerjakan oleh Zhang dan Lin (2008). Sebagaimana pada model dari Zhang & Lin, data spatial yang digunakan dalam penelitian ini berkaitan dengan hasil pendeteksian hotspot. Zhang dan Lin (2008) menggunakan metode pendeteksian hotspot Circle Based Scan Statistic (SS) dari Kulldorf (1997), sedangkan penelitian dalam disertasi ini menggunakan metode pendeteksian hotspot Upper Level Set Scan Statistic (ULS) dari Patil dan Taillie (2004). Aplikasi dari metode pendeteksian hotspot diawali dengan membandingkan kedua metode tersebut, yaitu SS dan ULS melalui simulasi untuk mendapatkan 14 kriteria kinerja. Hasil simulasi memberikan kesimpulan, bahwa metode pendeteksian hotspot ULS lebih baik. Selanjutnya dilakukan pendeteksian hotspot gizi buruk pada beberapa kabupaten yang hasilnya digunakan sebagai peubah penyerta dalam pemodelan. Penelitian ini difokuskan pada pengembangan model untuk data spatial tersarang yang dipandang dari kedekatan pengamatannya. Menurut Cressie (1993) ada kecenderungan bahwa pengamatan-pengamatan yang berdekatan memiliki korelasi yang lebih kuat dibandingkan pengamatan-pengamatan yang berjauhan. Secara statistik dapat juga dikatakan ada perbedaan variasi individu-individu di dalam satu kelompok dengan individu-individu dari kelompok yang berbeda. Kondisi ini harus diperhatikan dalam pemodelan. Generalized Estimating Equation (GEE) adalah suatu metode pendugaan parameter yang memperhatikan kondisi tersebut. Working correlation matrices (WCM) yang merupakan bagian penting dalam pendugaan parameter dengan metode GEE dibahas dan diaplikasikan untuk beberapa struktur matriks korelasi, untuk mengetahui struktur WCM mana yang paling sesuai dengan kondisi data. Hasil pendugaan parameter dari Nested GLM dan Nested GLMM dengan kombinasi beberapa WCM dan teknik pendugaan parameter dibandingkan. Peubah respon yang digunakan dalam model adalah peubah respon berskala ordinal yang merupakan bagian teori yang cukup kompleks, yang juga menjadi fokus dalam penelitian. Sedangkan peubah respon yang digunakan pada model Zhang dan Lin adalah peubah tercacah yang berdistribusi Poisson. Peubah respon dengan skala ordinal diperoleh dari pengelompokan hasil metode ranking ORDIT (Ordering Dually in Triangles) dari Myer dan Patil (2010). Melalui pengembangan model dalam penelitian ini, pemodelan yang melibatkan data lokasi (spatial) sebagai faktor acak memberikan hasil yang lebih baik, terutama apabila menggunakan matriks korelasi (working correlation matrix) yang diagonal (independent WCM).

Kata kunci: ranking, hotspot, scan statistics, upper level set, model campuran linier terampat tersarang, working correlation matrix, respon ordinal kemiskinan.

(7)

vii SUMMARY

YEKTI WIDYANINGSIH. Generalized Linear Mixed Models of Ordinal Poverty Response in Nested Area. ASEP SAEFUDDIN, KHAIRIL A. NOTODIPUTRO, AJI HAMIM WIGENA.

Ranking, hotspot detection and modeling are important techniques for almost all fields of study. These three techniques have important roles for decision makers, even in business, education, ecology, and socio economic, especially in government to increase the transparency of decision making. Every country in this world has several policies to arrange for several affairs. Due to the limitation of the sources, the right and apt decision is very important and urgent. To support the right decision in every area, the role of these techniques is needed. Optimistically, this dissertation is able to contribute ideas and thoughts to the government and ministries in decision making process related to poverty reduction.

Focus of study in this dissertation is modeling in Nested Generalized Linear Model (NGLM) and Nested Generalized Linear Mixed Model (NGLMM) as an expansion of Zhang’s and Lin’s Model (2008), a model as a strategy to detect hotspot through parameter estimates of spatial association in non-nested study area using count response variable. Modeling in this study is GLM and GLMM with hotspot detection result as an explanatory variable, applied in nested area using multinomial ordinal response variable. Before modeling, two studies, i.e. ranking method and hotspot detection methods are studied.

ORDIT (Ordering Dually in Triangle) ranking method is studied and implemented on poverty data. Actually, this method was developed to handle ranking process of many individuals based on many indicators. It is not easy to rank individuals with many indicators. This study explaines how to rank many individuals based on many indicators through some mathematical concepts, such as order theory, duality, and partial order set (poset). Due to the limitation of the data, this method is implemented to order sub districts based on poverty level using only two indicators, i.e. surkin (surat miskin) or poverty letters (PL) and askeskin (asuransi kesehatan untuk orang miskin) or health insurance for the poor (HIP). Observation unit of this data is sub district (kecamatan). In this study, 1679 sub districts in Java Island are ordered based on poverty using those indicators, HIP and PL. As the result, 6 of 10 most severe sub districts are in Jember district, and 5 and 3 of 10 least severe sub districts are in Probolinggo city and Surabaya, respectively. Based on the results of ranking method, it can be concluded globally, that the order from less severe to most severe levels of the three provinces are West Java, Central Java and East Java.

The work of this ranking method was continued by grouping the ranking result into 3 parts based on ranking order. The three poverty levels of sub districts are worst, moderate, and mild. Every sub-district has its own grade as 1 or 2 or 3. “One” is for the worst, “2” is for moderate, and “3” is for mild. The result of this grouping ranking is kept as a report and would be used as response variable for modeling.

Furthermore, two hotspot detection methods, Circle Based Scan Statistics (SS) and Upper Level Set Scan Statistics (ULS) are studied. In this study,

(8)

viii

comparison of two hotspot detection methods is carried out by simulation based on diseases case data. The data is assumed has Poisson distribution. Based on this assumption, 8 data sets were built and computed in 10.000 times to obtain the output, which are the performances of the methods in 14 criteria. The mean and standard deviation of each criterion from each simulation and each data set are computed and then compared. From these outputs, 14 criteria are summarized, analysized and compared. As the result of comparison, it is believed, ULS hotspot detection is better than Circle based Scan Statistics (SS).

The research is continued on detection of bad nutrition hotspot in 8 districts that have been chosen randomly. In this result, we have hotspot status for every sub district in these 8 districts: 0 means sub district is not in the hotspot area and 1 means sub district is in the hotspot area. This result would be included in modeling as a dichotomy explanatory variable, to answer the question: does the hotspot of bad nutrition explain significantly on poverty level through Nested GLM and GLMM.

Modeling is started with data preparation, as follows. Three districts from West Java, 2 districts from Central Java, and 3 districts from East Java are chosen randomly for model implementation. The names of these 8 districts are Kuningan, Karawang, Majalengka, Cilacap, Boyolali, Ngawi, Blitar, and Jember. Three levels of poverty which is the result of study on ranking method is used as ordinal response in modeling, while bad nutrition hotspot status which is the result of study on ULS hotspot detection method is used as an explanatory variable. Moreover, other explanatory variables for modeling are number of farmer families, schools, and health personnel. The reason of this variables determination is based on Bappenas Report 2011. To simplify understanding in interpretation, values of explanatory variables are divided into three parts, i.e. low, moderate, and mild which are appropriate to some resources.

Based on Zhang’s and Lin’s models, modification is developed, that is (1) upgrading the model for nested data (districts nested in province), with assumption correlation of sub districts within district is higher than correlation of sub districts between districts, (2) using ordinal scale as response variable. Modeling was undertaken for the GLM and GLMM. In Nested GLM, Generalized Estimating Equation (GEE) method is used as parameter estimation to tackle clustered and correlated data problem, while in Nested GLMM, Pseudo Likelihood is used as the model parameter estimation method. In Nested GLMM, district is a random effect in the model.

Some working correlation matrices can be implemented through GEE method. Three types working correlation matrices (WCM), i.e. exchangeable, unstructured, and independent are studied. An objective of modeling is to know which WCM gives the best results. Assumed that the poverty data has unstructured pattern in correlation between sub districts in a district. As the result, independent WCM gave the minimum ratio of robust and model based standard errors. It is believed the data has independent correlation structure.

According to the combination results of grouping ORDIT ranking (worst, moderate, mild) with hotspot status, there are 19 sub districts in East Java as bad nutrition hotspot and also categorized as the worst level of poverty. This number is the largest among other numbers of the combination. Furthermore, output of GLM with model base and unstructured WCM supports this finding, where

(9)

p-ix value is equal to 0.018. The hotspot of bad nutrition is statistically significant as the contribution to poverty level in East Java. This finding is an interlinking among the results of ORDIT ranking, hotspot detection, and nested modeling.

(10)

Prohibited from quoting part or all of these papers or the source is anonymous. Citations only for educational purposes, research, writing papers, preparing reports, writing criticism, or review an issue, and citations will not damage the normal interest IPB.

Prohibited from announcing and reproduce part or all papers in any form without permission from IPB.

(11)

GENERALIZED LINEAR MIXED MODELS

OF ORDINAL POVERTY RESPONSE

IN NESTED AREA

YEKTI WIDYANINGSIH

Dissertation

Submitted to the School of Graduate Studies of Bogor Agricultural University

in partial fulfillment of the requirements for Doctorate degree in Statistics

SCHOOL OF GRADUATE STUDIES BOGOR AGRICULTURAL UNIVERSITY

BOGOR 2012

(12)

xii

Closed Examination (July 7, 2012) Examiner:

1. Dr. Ir. I Gusti Putu Purnaba, DEA. Department of Mathematics

Faculty of Mathematics and Natural Sciences Bogor Agricultural University

2. Dr. Ir. Hari Wijayanto, M.Si. Department of Statistics

Faculty of Mathematics and Natural Sciences Bogor Agricultural University

Open Examination (July 30, 2012) Examiner:

1. Prof. Dr. Ir. Dadang Sukandar, M.Sc. Department of Community Nutrition

Faculty of Human Ecology, Bogor Agricultural University 2. Dr. Slamet Sutomo, SE., MS.

(13)

xiii Research Title: Generalized Linear Mixed Models of Ordinal Poverty

Response in Nested Area Student Name: Yekti Widyaningsih

NRP: G161070011

Approved as to style and content by:

Dr. Ir. Asep Saefuddin,MSc. Chair of Committee

Prof. Dr. Ir. Khairil A. Notodiputro, MS. Dr. Ir. Aji Hamim Wigena, MSc. Member Member

Acknowledged,

Dr. Ir. Aji Hamim Wigena, MSc. Dr. Ir. Dahrul Syah, MSc. Agr. Head of Study Program Dean of School of Graduate Studies

Date of defense: Date of graduation:

(14)

(15)

xv ACKNOWLEDGEMENTS

Praise and thank to God Almighty for all His grace so that scientific work is successfully completed. Research has been conducted since mid-2009 under the title Generalized Linear Mixed Models of Ordinal Poverty Response in Nested Area.

I would never have been able to finish my dissertation without the guidance of my committee members, help from friends, and support from my family, especially my mother and my late father.

There are many people who through their generosity and knowledge have made important contributions to this dissertation. It would be impossible to list everyone who contributed or to adequately list the extent of the contributions for those who are mentioned.

First and foremost, I am extremely grateful to my advisor, Dr. Ir. Asep Saefuddin, MSc. for his guidance and support throughout my graduate study. I especially thank him for giving me the opportunity to participate in several of his research projects which deal with many challenging statistical issues. I wish to thank my committee members Prof. Dr. Ir. Khairil Anwar Notodiputro, MS. and Dr. Ir. Aji Hamim Wigena, MSc. who let me experience the research of data simulation in the field and practical issues beyond the textbooks, and patiently corrected my writing.

I would like to express my deepest gratitude to my advisor in Pennsylvania State University, Prof. Ganapati P. Patil for his excellent guidance, caring, patience, and providing me with an excellent atmosphere for doing research. I would like to thank Prof. Wayne L. Myers who let me experience the research of ranking method, also patiently corrected my writing. Many thanks to Prof. Sharad W. Joshi who as a good friend was always willing to help and give his best suggestions. It would have been a lonely work room without communicating with him by phone. My research would not have been possible without their helps.

Many thanks to the Directorate of Mendepdiknas for financial assistance, BPPS, S3 Sandwich Program, and Hibah Pasca. And many thanks also to Department of Statistics, Pennsylvania State University.

(16)

xvi

Many thanks to the school of graduate studies leaders, Chairman of the PS statistics, faculty and employees of the school of graduate studies who have provided services to both teaching and administration.

Many thank to the Department of Statistics lecturers who always take the time to discuss and provide advice and encouragement, and also the statistics department employees for their helps.

Special thanks goes to Dr. Ir. I Gusti Putu Purnaba, DEA., Dr. Ir. Hari Wijayanto, M.Si., Prof. Dr. Ir. Dadang Sukandar, M.Sc. and Dr. Slamet Sutomo, SE., MS. who were willing to participate in my final defense committee at the last moment. I would also like to thank Drs. Tjiong Giok Pin, M.Si for possibility to use the map files.

I also wish to acknowledge my friends, S2 and S3, with whom I shared my joy, complaints, and laughter through these past years.

Finally, I would like to thank my parents, two elder sisters, and elder brother. They were always supporting me and encouraging me with their best wishes.

Even though I has benefited from the help and advice of many people, there are some bound to be things I have not grasped – so remaining mistakes and omissions remain my responsibility. I would be grateful for messages pointing out errors in this dissertation.

Bogor, July 2012

(17)

xvii CURRICULUM VITAE

Yekti Widyaningsih was born on September 15, 1967 in Bandung, West Java. Her father’s name is Prayuto and her mother’s name is Ambarwati. Yekti is the youngest of one brother and three sisters. She graduated with Dra in Mathematics from University of Indonesia in 1992 with Dra. Linggawati, M.S. as her advisor, and received master’s degree in statictics in 2002 from Bogor Agricultural University with advisors Dr. Ir. Amril Aman, M.Sc. and Dr. Ir. Hadi Sumarno. To strengthen her knowledge on statistics, she had finished Ph.D program in 2012 at the same place and received scholarship from BPPS. The advisors of her PhD dissertation were Dr. Ir. Asep Saefuddin, M.Sc., Prof. Dr. Ir. Khairil Anwar Notodiputro, MS. and Dr. Ir. Aji Hamim Wigena, MSc. A part of this dissertation was written in Pennsylvania State University with Prof. Ganapati P. Patil as her advisor and Prof. Wayne L. Myers and Prof. Sharad Joshi as her co-advisors. Research in PSU was supported by Directorate General of Higher Education Indonesia (DIKTI) as Doctoral Sandwich Program 2010. She also involved in Geoinformatic Research of Penelitian Hibah Pascasarjana DIKTI. As long as doing her Ph.D, she has written some papers, which published at national and international seminars. The papers are:

1. Yekti Widyaningsih, Asep Saefuddin, Khairil Anwar Notodiputro, Aji Hamim Wigena. (2012). Nested Generalized Linear Mixed Model for Correlated Nested Data with Ordinal Response. Jurnal IPTEK ITS Volume 23/No.2/May 2012.

2. Yekti Widyaningsih, Wayne L. Myers, Asep Saefuddin. (2012). Sub Districts Poverty Level Determination using Ordering Dually in Triangle (ORDIT) Ranking Method, Jurnal Math Info Volume 5/No.2/July 2012.

3. Yekti Widyaningsih, Asep Saefuddin, Khairil Anwar Notodiputro, Aji Hamim Wigena. (2012). Nested Generalized Linear Mixed Model with Ordinal Response: Simulation and application on Poverty Data in Java Island. AIP Conference Proceedings of The 5-th International Conference on Research and Education in Mathematics, Institut Teknologi Bandung, October 2011. 4. Yekti Widyaningsih, Asep Saefuddin, Khairil Anwar Notodiputro, Aji Hamim

Wigena. (2011). Ordering Dually in Triangles (Ordit) and Hotspot Detection in Generalized Linear Model for Poverty and Infant Health in East Java. Paper in The 6-th SEAMS-GMU 2011 International Conference on

Mathematics and Its Applications, Universitas Gadjah Mada, Yogyakarta, July 2011.

5. Yekti Widyaningsih. (2010). Pemodelan Bayesian untuk Pemetaan Kasus Penyakit (Bayesian Modelling for Diseases Mapping). Paper in Seminar Nasional Matematika 2010, Universitas Indonesia, 6 Februari 2010.

(18)

xviii

6. Yekti Widyaningsih and Asep Saefuddin. (2009). Contiguous Diseases Outbreak in Indonesia: The Applications on Spatial Scan Statistics Method. Paper in ICCS-X, Cairo, Egypt, December 2009.

7. Yekti Widyaningsih and Siti Nurrohmah. (2009). The Application of Spatial Scan Statistics on The Tuberculosis Hotspot Detection in Indonesia.

Procceeding of IndoMs International Conference on Mathematics and Its Application (IICMA), Yogyakarta.

8. Yekti Widyaningsih and Asep Saefuddin. (2008). Health Profile 2005 and Geoinformatic of Diseases in Indonesia. Paper in the International Workshop on Digital Governance and Hotspot GeoInformatics, Jalgaon, India, March 11-24, 2008.

9. Yekti Widyaningsih dan Tjiong Giok Pin. (2008). A Space-Time Scan

Statistics to Detect Cluster Alarms of Dengue Mortality in Indonesia 2005. An article of jurnal “Makara seri Sains” Volume 12 No.1/April 2008.

10.Yekti Widyaningsih and Asep Saefuddin. (2007). Disease Outbreak in Indonesia: The Application of Scan Statistics. Paper in the 1st International Conference on Theory and Practice of Electronic Governance. Macau Polytechnic Institute, 10-13 December 2007.

11.Yekti Widyaningsih. (2007). Model, Calculations, and Application of Spatial Scan Statistics. Paper in the International Conference on Mathematics and Its Applications, Universitas Gadjah Mada, Yogyakarta, August 2007.

12.Yekti Widyaningsih. (2007). A Space-Time Permutation Scan Statistics for Disease Outbreak Detection. Poster at 1st Joint Seminar UI-UKM 2007. Universitas Indonesia.

13.Yekti Widyaningsih, Siti Nurrohmah dan Nurjanah. (2007). A Spatial Scan Statistic with Poisson Process to Detect the Outbreaks of Bird Flu In Indonesia. Poster at 1st Joint Seminar UI-UKM 2007, Universitas Indonesia.

(19)

xix

!

"

#

$

%

(20)

(21)

xxi TABLE OF CONTENTS

Page

ABSTRACT ……… v

SUMMARY ……… vii

ACKNOWLEDGEMENTS ……… xv

TABLE OF CONTENTS ... xxi

LIST OF TABLES ……….. xxiv

LIST OF FIGURES ………. xxv

LIST OF APPENDIXES ………... xxvii

LIST OF ABBREVIATIONS ……….. xxviii

GLOSSARY ………. xxix

1 INTRODUCTION 1 1.1 Background ... 1

1.2 The Purpose of Research ... 10

1.3 Research Framework... 11

1.4 The Outline of Disertassion ... 13

1.5 Novelty ... 14

2 SUB DISTRICTS POVERTY LEVEL DETERMINATION USING ORDERING DUALLY IN TRIANGLE (ORDIT) RANKING METHOD 15 2.1 Introduction ………... 15

2.2 Theoretical Background ……….. 17

2.2.1 Rating Relations/Rules for Ascribing Advantage ………... 17

2.2.2 Subordination Schematic and Ordering Dually in Triangles (ORDIT) ………. 18

2.2.3 Product-order Rating Regime ………. 19

2.2.4 The concepts of Askeskin or HIP and Surkin or PL ..……. 21

2.3 Methodology ………... 22

2.4 Results and Discussion ………... 25

(22)

xxii

3 COMPARISON BETWEEN CIRCLE BASED SCAN

STATISTICS AND UPPER LEVEL SET SCAN STATISTCS BASED ON SIMULATION STUDY

33 3.1 Introduction ... 33

3.2 Theoretical Study ……….. 34

3.2.1 The Concept of Hotspot Detection ……… 34 3.2.2 Hypothesis testing for comparison between SS and ULS.. 37 3.2.3 Circle-based Scan Statistics (SS) Hotspot Detection .…. 38 3.2.4 Upper Level Set (ULS) Scan statistics ………. 39 3.3 The Methods ... 42 3.3.1 The Steps of Simulations ………. 43 3.3.2 The Fourteen Criteria ….……….. 44 3.4 The Results and Analysis ……..………. 46 3.5 ULS Hotspot Detection for Bad Nutrition Case in Java Island … 49 3.6 The Results of Bad-nutrition Hotspot Detection ……….. 49

3.7 Conclusion ……… 51

4 NESTED GENERALIZED LINEAR MIXED MODEL FOR CORRELATED DATA

4.1 Introduction ……… 53

4.2 Theoretical Background ... 54 4.2.1 Nested GLM for Ordinal Response ... 60 4.2.1.1 Data Layout ...………... 64 4.2.1.2 Model Specification for Nested Correlated Data 68 4.2.1.3 GEE for Ordinal Response Data ………. 70 4.2.1.4 Working Correlation Matrices ... 73 4.2.1.5 The Algorithm for GEE Parameter Estimation ... 76 4.2.1.6 Wald Statistics ……… 79 4.2.2 Nested Generalized Linear Mixed Model ………. 80 4.2.2.1 Ordinal Response Model ……… 81 4.2.2.2 Logistic Response Function ……… 84 4.2.2.3 Wolfinger and O’Connell Approach …………. 86 4.3 Methodology ... 90 4.3.1 Model Building ………... 90

(23)

xxiii 4.3.2 Implementation ... 94 4.4 Results and Discussion ……….. 100 4.4.1 Standard error of parameter estimates ……… 100 4.4.2 Significance (p-values) ………... 105

4.5 Conclusion ………... 109

5 GENERAL DISCUSSION 111

6 CONCLUSION AND RECOMMENDATION 117

6.1 Conclusion ... 117

6.2 Recommendation ……….. 118

(24)

xxiv

LIST OF TABLES

Page 1 Entities with 3 indicators ……….. 18 2 Six leading lines of the poverty dataset ……… 24 3 Description of indicators HIP and PL ……….. 24 4 The first 6 lines of the data: identity number (id), province, district,

sub district, indicators values, and sub district’s ranking based on indicator ………

27 5 The first 6 lines of the result obtained by applying the ProdOrdr

function to place poverty measurement rank of sub district ………….

27 6 The ten most severe sub districts according to ORDIT ranking …….. 29 7 The ten least severe sub districts according to ORDIT ranking ……... 29 8 Poverty level of sub districts in the West, Central and East Java ……. 30 9 Performance criteria comparison of ULS and SS for 5% significance 47 10 Performance criteria comparison of ULS and SS for 1% significance 48 11 General structure of data layout ………... 66 12 General structure of nested data layout ………... 67 13 Link function name, form, inverse of link function, and range of the

predicted mean ……….

81 14 The first and second derivatives of link function ……… 81 15 Provinces, districts and number of sub districts ………. 97

16 Data description ……… 97

17 Averages of Standard Error of the Nested GLM ………. 103 18 Averages of Standard Error of the Nested GLMM ……….. 104 19 Averages of SER/SEM …………... 105

20 Percentages of true classification result of Nested GLM ………. 108 21 Classification result of Nested GLMM for all WCMs ………. 109 22 Sub districts in hotspot area of bad nutrition and poverty level ……... 114

(25)

xxv LIST OF FIGURES

Page

1 Un-nested hotspot (dark color areas are the hotspots) ………. 4 2 Nested hotspots in three provinces ………. ……… 4

3 Research Diagram ……… 8

4 The systematic of the research activity ……… 12 5 Research diagram: relation among chapters ……….. 13 6 X-shaped Hasse diagram of five entities labeled as A, B, C, D and E 18 7 Subordination schematic with plotted instance dividing a right

triangle into two parts, a ‘trapezoidal triplet’ (of AA, SS and II) below, and a ‘topping triangle’ (of CCC, SS and II) above…………. 19 8 The Map of Java Island with districts identity ………..………… 23 9 Scatterplot of the indicators: HIP vs PL ………... 26 10 Boxplot of the indicators, HIP and PL………... 26 11 Precedence plot (based on place ranks) of subdistricts from R

commands ………. 28

12 A study area with zone and non zone areas ……….. 35 13 A part of circle based hotspot detection process ………... 39 14 A map and its adjacent matrix ……….. 41 15 ULS hotspot detection process (dark color is the hotspot) …………... 42 16 ULS Hotspot of bad nutrition in Kuningan, Karawang,

Majalengka, Temanggung, Boyolali, and Cilacap ……… 50 17 ULS Hotspot of bad nutrition in Blitar, Ngawi, and Jember ………… 51

18 The scheme of modeling……… 54

19 An ordered response and its latent variable ………. 63 20 Changes in the value of x that cause changes in the magnitude of

probability; a1, a2, a3 is the threshold ……….. 64 21 The effect of a covariate on the transformed cumulative

probabilities (pdf of Y for some values of x) ……… 64 22 Developing of Zhang’s and Lin’s GLMM ... 92 23 Study Area with 3 provinces {s = 1, 2, 3}, 3 districts are randomly

chosen from West and East Java {i = 1, 2, 3}, and 2 districts are randomly chosen from Central Java {i = 1, 2}. There are nsi sub

districts in district i of province s... 93 24 Districts in Java Island as sample for modeling ………... 96 25 Standard errors of model based Nested GLM parameters ……… 101

(26)

xxvi

26 Standard errors of robust Nested GLM parameters ……….. 101 27 Standard error of Nested GLM parameters of exchangeable WCM … 102 28 Standard error of Nested GLM parameters of unstructured WCM …. 102 29 Standard error of Nested GLM parameters of independent WCM …. 102 30 Standard errors of Nested GLMM parameters ……….. 103 31 SER/SEM ratios for Nested GLM and GLMM ………. 104

(27)

xxvii LIST OF APPENDIX

Page 1 Fourteen criteria of poverty ……….. 127 2 Concept of health insurance for the poor (hip) or askeskin and

certificate of cannot afford (PL) or surkin ……….. 128 3 Sum of Poisson Random Variables ………. 129 4 Multinomial distribution ……… 130 5 Maximum Likelihood Estimation ……….. 131 6 Conditional simulation with hotspot z assumed known ……….. 132 7 Output of true hotspot1 simulation Central Java ………. 133 8 Output of true hotspot2 simulation Central Java ………. 135 9 Output of true hotspot1 simulation Java Island ……… 137 10 Output of true hotspot2 simulation Java Island ……….. 139 11 Output of true hotspot1 simulation Map X ……….. 141 12 Output of true hotspot2 simulation Map X ……….. 143 13 Output of true hotspot1 simulation Map Y ……….. 145 14 Output of true hotspot2 simulation Map Y ……… 147 15 Fourteen criteria of hotspot method for p-value =0.05 ……….. 149 16 Fourteen criteria of hotspot method for p-value =0.01 ………. 150 17 ULS Hotspot of bad-nutrition in Kuningan and Karawang... 151 18 ULS Hotspot of bad nutrition in Majalengka and Temanggung ... 151 19 ULS Hotspot of bad nutrition in Boyolali and Cilacap... 152 20 ULS Hotspot of bad nutrition in Blitar and Ngawi... 152 21 ULS Hotspot of bad nutrition in Jember ……….. 153 22 Parameter Estimates and Standard Errors of Nested GLM ……… 154 23 Significance of Nested GLM parameter estimates ……… 155 24 Classification result of Nested GLM ……… 156 25 Parameter Estimates and Standard Errors of Nested GLMM ……. 157 26 Significance of Nested GLMM parameter estimates ……… 158 27 Matrix Equation for Nested GLMM (an example)………. 159 28 Theorem of Pearson residual Moran’s IPR and IaPR ……… 160

29 Exchangeable Working Correlation Matrix... 161 30 Unstructured Working Correlation Matrix... 162 31 Spearman’s rho Correlation Matrix of the data ……….. 163

(28)

xxviii

LIST OF ABBREVIATIONS CSHD Circle based hotspot detection

GEE Generalized estimating equation GLM Generalized linear model GLMM Generalized linear mixed model ORDIT Ordering dually in triangles

SS Scan statistic

ULS Upper level set

WCM Working correlation matrix

SER Standard error of robust estimation

(29)

xxix GLOSSARY1

Askeskin Health insurance for the poor (asuransi kesehatan untuk orang miskin).

Cluster A grouping containing ‘lower level’ elements. For example in a survey sample, a district (cluster) containing of sub districts. Explanatory

variable

An independent variable: in the fixed part of the model usually denoted by X_{and in the random part by}Z.

Fixed part The part in a model represented by Xβ, that is the average relationship. The parameters β are referred to as ‘fixed parameters’.

Hotspot Unusual phenomenon, anomalies, aberrations, outbreaks, elevated clusters, or critical areas.

Kronecker product

An operation on two matrices of arbitrary size resulting in a block matrix.

⊗ =

Level A component of a hierarchical data.

Nested The clustering of units into a hierarchy (level). Random part That part of a model represented by u or Zu.

Regional Area (daerah).

Response part The part of a model represented by Y. Also known as a ‘dependent’ variable.

Spatial Happening or existing in space. Study area An area of examination. Surkin (surat

miskin)

Certificate of the Poor and Disadvantaged; Poverty letters; SKTM (surat keterangan tidak mampu).

(30)

Chapter 1 INTRODUCTION

1.1 Background

Nowadays, the issues of poverty are often discussed. Although statistics show that the number of poor in Indonesia decreased, from 30.02 million people (12:49%) in March 2011 to 29.89 million people (12:36%) in September 2011, Indonesia is still facing the problem of poverty (BPS 2011).

Related to poverty alleviation programs, many policies should be made, especially at the time when the government needs to make decisions about which area should get a priority to receive a treatment. In making this decision, ranking and the hotspot detection technique should have a role to support the decisions. Furthermore, modeling is also important to know which factors are related to poverty.

Ranking, and hotspot detection, and modeling are three important methods in statistics used to evaluate and examine data in everyday life and in many fields of study. The data could be the number of disease cases, people in poverty, particular animals or plants related to biodiversity or environment and ecology, and many others. Related to an effort to alleviate poverty, SMERU1 also ranks areas in several regions in Indonesia based upon poverty levels. Ranking in the representation of poverty, can support an objective decision-making and will increase the transparency of government decision making. Moreover, a well-defined poverty level can lend credibility to government decision making (Widyanti 2003). In addition, cases of bad nutrition currently occur in nearly all parts of Indonesia. About 4 million children in Indonesia are exposed to the risk of bad nutrition (Yurnaldi 2008). In this problem, hotspot areas need to be known to support the objective decision in a poverty reduction programs. Furthermore, modeling for poverty data by taking into account the different conditions of a region from other regions and the resource constraints is necessary.

SMERU is an independent institution for research and public policy studies which professionally and proactively provides accurate and timely information, as well as objective analysis on various socioeconomic and poverty issues considered most urgent and relevant for the people of Indonesia.

(31)

Statistical models that correspond to these conditions should able to overcome the nested and random conditions.

Based on those thoughts and facts, the study in this dissertation is about ranking and hotspot detection, and incorporating the results of these two methods in the development of Nested Generalized Linear Mixed Model.

Currently, ranking, hotspot detection, and modeling techniques are being developed by experts. The ranking method that is based on several indicators using ecological and environmental data was developed by Myers and Patil (2010). Hotspot detection method was developed by Kulldorf (1997), Patil and Taillie (2004), and Duczmal, Tavares, Patil, and Cancado (2010), whereas modeling with fixed factors and hotspot as covariates was developed by Zhang and Lin (2008). This dissertation combines these three approaches, with a focus on model development of Zhang’s and Lin’s model and applied on poverty data.

The ranking method applied in this study is ORDIT (ORdering Dually In Triangles) used to rank individuals (unit observations) based on several indicators (Myers and Patil 2010). Furthermore, a comparison of the two hotspot detection methods, namely Circle-based scan statistics by Kulldorff (1997) and upper level set scan statistics by Patil and Taillie (2004) has been studied. The development and implementation of the model is based on nested GLM (Generalized Linear Model) and nested GLMM (Generalized Linear Mixed Model) using GEE (Generalized Estimating Equation) method and pseudo likelihood, respectively for parameter estimation.

GLMM is a statistical model accommodating fixed effects and random effects, while GLM only uses fixed effects. The distributions of the response variables are not restricted to the normal distributions, but distributions within the exponential family. Some GLMM principles used in the formation of spatial models are mentioned by Lawson and Clark (2002) in their discussion of the possibility of risk of non-continuity surface and Loh and Zhu (2007) calculated the spatial correlation of the scan statistic with the GLMM spatial model in an effort to obtain more accurate analysis results. Furthermore, some researchers have begun to explore geographic and ecologic potentials to be used as explanatory variables to identify the hotspot. For example, after identifying the hotspot of a distance of

(32)

3 two breast cancer cases, Roche et al. (2002) compared these two geographic areas, cluster and non-cluster, and found that the two tend to be isolated due to a language factor. This research suggests that identified risk factors may contribute to the observed patterns, but since the cluster detection separates between control and non-control factors, it is impossible to use it as a statistical conclusion. Furthermore, a study by Zhang and Lin (2008) is to improve predictability of the model through the incorporation of explanatory variables and the process of spatial cluster detection in a frequentist approach. In other words, Zhang and Lin combine hotspots and modeling, where explanatory variables and hotspot were observed simultaneously.

Zhang and Lin (2008) apply a spatial GLMM with cluster (hotspot area) of Kulldorff (1997) as explanatory variables. Through this modeling Zhang and Lin have detected the hotspot significance which appeared in the spatial data. The hotspots detected in this model is the common hotspot with a single level (not nested) as presented in Figure 1. As geographical and ecological factors also contribute to identifying the hotspot, the model also pays attention to the geographic and ecological components. This research is aimed to further analyze, what if the model is applied to the data with spatially nested form. In other words, we develop models that have been introduced by Zhang and Lin (2008) with respect to nested hotspots as shown in Figure 2. In the nested spatially GLMM model, hotspots can be assumed as fixed effects, in which the response variable is measured in ordinal scale. It is also important to note that the result of the estimation will take into account for conditions in which the variables are as well as independent variable and random effect.

One thing that is often ignored in statistical modeling is the variance of the data. The variance of the data can be viewed as global or local variance. A global variance is calculated and observed based on overall data variability while local variance is calculated and observed based on a group of data. A research using mixed-effects regression modeling with heterogeneous variances for analyzing Ecological Momentary Assessment (EMA) data was made by Hedeker and Melmerstein (2005), which focused on the longitudinal data.

(33)

Figure 1 Un-nested hotspot (coloring areas are the hotspots)

(34)

5 It is always possible to find data with different variances in different conditions or groups of data. For this kind of data, we must consider the existence of local variance and parameter model estimation to be addressed. The appropriate method of parameter estimation is GEE, the generalized estimating equation (Hardin and Hilbe 2003).

GEE parameter estimation method uses a working correlation matrix to estimate model parameters. The elements of this matrix are correlation values between observations in a cluster. If the subject or district i has ni subdistricts, the dimension of the working correlation matrix is ni × ni, and correlations are computed based on ni pairs observations for all i. Higher correlations will be appropriate to the smaller variance in a cluster (subject). Therefore, the GEE method is used to estimate the model parameters for clustered data.

As mentioned before, issues relating to quality levels are often found in everyday life. Ranking is important for problems related to priorities and efficiency. This study will also discuss and implement the ranking method to poverty data and then categorize the results into a few degrees to be used as an ordinal response variable in modeling. This ordinal response represents the level of subject quality. Similarly, the outbreak (hotspot) in a particular area is also important and interesting to be studied. Associated with the model development, hotspot status of a sub district will be included also as an explanatory variable in the model to know its contribution to the poverty level. This part follows Zhang’s and Lin’s model development.

Zhang and Lin (2008) used a spatial GLMM with cluster (hotspot area) on data based on a vast land area (continental) and absence of nested area assumption. This research, however, uses the spatial data analysis in nested spatial form by taking into account the local variances. The necessary of nested can be caused by a local variance of data generated by some factors including environmental conditions or other factors like history and cultural characteristics of individuals in the area as described in the following 4 paragraphs to differentiate among West, Central, and East Java.

According to research of “Civilization Java” by Rahardjo (2011),

(35)

“East Java Survey of Poor Families” by Garner and Amaliah (1999), there are dissimilar characteristics among West, Central, and East Java.

The geographical characteristics make Central Java more closed than East Java. Almost all the main mountains in Central Java are located in the center of the province, and the coastline is like a thick wall that limited access to the outside world in the ancient times. In contrast, the center of civilization in East Java is much more open. Although there are several mountains in East Java, they do not form an impenetrable barrier wall from or to the coast. Two of the largest and longest rivers in Java (Brantas and Solo) can be navigated through to the interior. Industry and trading activities have occurred much earlier in Central Java and East Java. These conditions have formed the culture and characteristics of two peoples who are rather different (Rahardjo 2006).

On the other hand, according to the report of household research conducted by Hondai (2006), consumption expenditure varies considerably (significantly) from one province to another. Central Java consist mainly rural areas except a few medium sized urban areas, which are rather homogeneous in the region compared to West Java. In the following analysis, the author investigates changes in inequality of West Java as a representative of a rapidly growing industrial region and Central Java as a representative of a rather homogeneous rural region of the country.

Furthermore, a recent Survey of Poor Families reported that very high rates of malnutrition were found in East Java, which was considerably higher than national rates and those of other provinces in Java (Gardner 1999).

The purpose of this study is to develop a model GLMM of Zhang and Lin for data with ordinal response variable with respect to the local variance (nested conditions). Associated with a written statement by Hardin and Hilbe, the term

nested is synonymous with panel. Hardin and Hilbe (2003) stated (in their book entitled Generalized Estimating Equations) there is a correlation between observations in the panel data, and if the common likelihood is used for parameter estimation regardless of the condition of data, then it is not correct, and can result in an interpretation that is not true, because the variance matrix is assumed to have no correlations among observations in the panel. Therefore, it is important to

(36)

7 consider the spatial correlation between areas in the nested condition. Many components affect the variance; for example: natural resources, climate, environment, language, culture, customs, demographics, life style, and others.

Related to the spatial modeling, Figure 3 represents the studies of spatial GLM and GLMM developed by researchers, and the position of the current study, which is nested GLMM using spatial data in this dissertation. Figure 3 provides an explanation about the Spatial Model. In general, the spatial model is divided into two parts, that is, the spatial GLM and spatial GLMM.

Spatial GLM has been developed by Schabenberger and Gotway (2005) as

Fixed Effects and the Marginal Specification. Parameter estimation in these models can be handled widely based on a specification of two first moments of outcome, using Generalized Estimating Equation or Quasi-likelihood from Wedderburn (1974). Schabenberger and Gotway (2005) also developed spatial GLMM that they named the Mixed Models and the Conditional Specification,

which is a GLM with the conditional approach incorporating the unobserved spatial process as random effects within the mean function. The conditional mean and variance of outcome are modeled as a function of both fixed covariate effects and random effects deriving from the unobserved spatial process. In spatial GLMM, Schabenberger and Gotway used Penalized Quasi likelihood estimation from Breslow and Clayton (1993) and Pseudo-likelihood estimation from

Wolfinger and O’Connell (1993).

Hao Zhang (2002) developed spatial GLMM using MMSE prediction Metropolis-Hasting algorithm, while GLMM for point referenced spatial data was developed by Gamperli and Vounatsou (2004) using Laplace Approximation. Furthermore, approximate Bayesian inference in spatial GLMM was developed by Jo Eidsvik, Sara Martino and Håvard Rue (2007). Some other spatial GLMMs developed are approximate Bayesian inference with skew normal latent variables by Hosseini (2011) and estimating spatial pattern using GLMM by Kwak

et.al.(2012). As mentioned before, spatial GLMM with clusters (hotspot area) of Zhang and Lin (2008) is one of the spatial GLMMs in this diagram, which has been developed in this study using ordinal response. Focus of this dissertation is

(37)

igure

3 Re

arc

h D

gra

(38)

9 showed by the light blue rectangular of the diagram at Figure 3. Nested GLM and GLMM with spatial data will be developed and the GEE method is used for Nested GLM parameter estimation and pseudo likelihood is used for Nested GLMM parameter estimation. Comparison of parameter estimations based on some working correlation matrices will be studied, and the application of the model is on assessing the effect of some covariates on some poverty level.

Other models in the diagram are Multilevel Models for ordinal data by Kenett (2011), Multilevel models with ordinal outcomes and their application on psychology data using Maximum Likelihood and Penalized Quasi Likelihood as the parameter estimation method (Bauer and Sterba 2011). Ordinal response was used in all these models.

Based on the explanation above and literature reviews related to spatial GLM and GLMM, it appears that the majority of research over the last few years concern parameters estimation technique, while research in this dissertation concerns correlation within cluster. The Focus and novelty of this research is the modification of Zhang’s and Lin’s model on ordinal response variable and taking into account the clustered nested data. The attention to the correlation matrix is to deal with clustered conditions of data.

Research in this dissertation focuses on developing of GLM and GLMM of ordinal response in nested spatially data with study on some working correlation matrices, showed in blue rectangles at Figure 3. Some interesting studies related to this research are developed and discussed in chapters 2 and 3 of this dissertation. As mentioned before, poverty issues have often been discussed and is still a major problem in Indonesia. In an effort to be able to contribute ideas and thoughts, this dissertation takes poverty as the main application problem with ranking, hotspot detecting and modeling as the methods used for analysis.

Based on the statements, ideas, thoughts and explanations above, the questions in this study are

1. How to determine the level of severity or poverty of sub districts in Java and which areas are the most and the least severe in poverty.

2. How to know the best hotspot method and which areas are a hotspot of a factor that are related to poverty levels.

(39)

3. How to build a model for clustered nested data with multinomial ordinal response, and how to estimate the model parameters.

4. How is the influence of working correlation matrix structure on the estimation of model parameters for nested correlated data?

5. How are the differences of the model parameters estimate between Nested GLM and Nested GLMM?

1.2 The Purpose of Research

Based on several initiatives and issues described in section 1.1, the purpose of this dissertation is

1. To determine sub districts level of severity (poverty) in Java.

2. To obtain the best hotspot detection method between the two methods that will be studied and to apply this best method on a factor for modeling. 3. To build a model for nested data with multinomial ordinal response, and to

estimate the model parameters. Furthermore, to know parameter estimate of explanatory variables in every province used in modeling.

4. To study the influence of the working correlation matrix structure on the estimation of model parameters for data with clustered nested condition. 5. To study the differences of the model parameter estimation between Nested

GLM and Nested GLMM.

The purpose of this study have been achieved and described in chapters 2, 3, and 4. The first purpose is achieved in chapter 2, as the study of ORDIT ranking method and its implementation on the poverty data. The second purpose is achieved in Chapter 3 as the study and comparison of two hotspot detection methods. The best method has been used to detect bad nutrition hotspot area. Finally, the third, fourth and fifth purposes are achieved in Chapter 4, modeling and its implementation on poverty data using the result of Chapter 2 as the dependent variable, while the result of chapter 3 is used as an independent variable in the modeling.

(1)

159 Appendix 27 Matrix Equation for Nested GLMM (an example)

 













































3

2

3

1

2

1

2

1 3322

3321

0

1

0 3322

3321

0

1 3312

3311

0

1

0 3312

3311

0

1 3222

3221

0

1

0 3222

3221

0

1 3212

3211

0

1

0 3212

3211

0

1 3122

3121

0

1

0 3122

3121

0

1 3112

3111

0

1

0 3112

3111

0

1

0 2222

2221

0

1

0 2222

2221

0

1

0 2212

2211

0

1

0 2212

2211

0

1

0 2122

2121

0

1

0 2122

2121

0

1

0 2112

2111

0

1

0 2112

2111

0

1

0 1222

1221

1

0 1222

1221

0

1

0 1212

1211

1

0 1212

1211

0

1

0 1132

1131

1

0 1132

1131

0

1

0 1122

1121

1

0 1122

1121

0

1

0 1112

1111

1

0 1112

1111

0

1

2 ,

332

1 ,

332

2 ,

331

1 ,

331

2 ,

322

1 ,

322

2 ,

321

1 ,

321

2 ,

312

1 ,

312

2 ,

311

1 ,

311

2 ,

222

1 ,

222

2 ,

221

1 ,

221

2 ,

212

1 ,

212

2 ,

211

1 ,

211

2 ,

122

1 ,

122

2 ,

121

1 ,

121

2 ,

113

1 ,

113

2 ,

112

1 ,

112

2 ,

111

1 ,

111

2 1





x

y

























7

6

5

4

3

2

1

0

1

0

1

0

1

0

1

0

1

0

1

0

1

0

1

0

1

0

1

0

1

0

1

0

1

0

1

0

1

0

1

0

1

0

1

0

1

0

1

0

1

0

1

0

1

0

1

0

1

0

1

0

1

0

1

0

1 u

(2)

Appendix 28 Theorem of Pearson residu

al Moran’s

I

PR

and

I

aPR

Let

z

i

be the variable of interest in unit

i

. Moran’s

I

statistic is given by

Where

(

if two units are adjacent, 0 otherwise),

,

.

THEOREM 1: Suppose that Y

i

for i = 1,. . . ,m are independent random variables

with expected value

and variance

,

where is a vector of unknown

p

arameters. Assume that

is

consistently estimated by

.

Define

I

PR

as the Moran’s

I

statistic by taking

in equation

above. Then,

as

, where

represents convergence in law or convergence in distribution.

COROLLARY 1:

Suppose that β and α are

consistently estimated by

and

as m

→ ∞

. Then,

given by equation (5) is also

consistently estimated.

Define

I

aPR

as the Moran’s

I

statistic by taking

in equation

(7), where

is given by equation (6). Then,

as

.

(3)

161 In excel files

Appendix 31 Unstructured Working Correlation Matrix

Appendix 32 Spearman’s rho Correlation Matrix of the da

ta

(4)

1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 2 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 1 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 2 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 1 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 2 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 1 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 2 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 1 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 2 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 1 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 2 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 1 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 2 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 1 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 2 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 1 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 2 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 1 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 2 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 1 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 2 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 1 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 2 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 1 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 2 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 1 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 2 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 1 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 2 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 1 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061 .161 .061

2 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063 .061 .063

161

[ subdist = 15 ] [ subdist = 16 ]

ordb ordb ordb ordb ordb ordb ordb ordb

[ subdist = 9 ] [ subdist = 10 ] [ subdist = 11 ] [ subdist = 12 ] [ subdist = 13 ] [ subdist = 14 ]

[ subdist = 4 ]

ordb ordb ordb ordb

Measurement

Measurement [ subdist = 1 ] [ subdist = 2 ] [ subdist = 3 ] [ subdist = 4 ] [ subdist = 5 ] [ subdist = 6 ] [ subdist = 7 ] [ subdist = 8 ]

ordb ordb

[ subdist = 1 ]

[ subdist = 2 ]

[ subdist = 3 ]

ordb ordb

Dependent Variable: ordb

Model: (Threshold), prov, farm(prov), school(prov), medis(prov), ULS(prov) a. Ridge value was added to the working correlation matrix to make it positive definite. b. The diagonal blocks differ for each subject and are not displayed.

[ subdist = 11 ]

[ subdist = 12 ]

[ subdist = 13 ]

[ subdist = 14 ]

[ subdist = 15 ]

[ subdist = 16 ] [ subdist = 5 ]

[ subdist = 6 ]

[ subdist = 7 ]

[ subdist = 8 ]

[ subdist = 9 ]

(5)

162

1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2

1 .175 -.019 -.441 -.279 -.382 .170 -.326 .010 -.241 -.074 -.636 .028 .026 -.372 -.270 .155 -.303 .160 -.153 -.324 .008 -.138 -.731 .528 -.310 -.223 .376 -.257 -.500 .391 2 .672 .114 .796 .165 .666 -.086 .559 1.000 .743 .931 .842 .184 .573 .607 .905 -.031 1.000 -.504 1.000 -.200 .454 .934 1.000 -.592 1.000 .041 -.766 1.000 .391 .084 1 .175 .672 .685 -.900 .328 .052 .008 1.000 -.097 1.000 -.753 .446 1.000 -.250 .653 .069 .489 .187 1.000 -.384 .261 .156 1.000 .082 .757 -.373 -.175 1.000 -.711 .933 2 -.019 .114 -.667 .551 -.411 .149 -.227 -.195 .041 -.540 .286 -.407 -.905 .365 -.357 .236 -.025 -.302 -.741 .180 -.520 1.000 -.318 -.394 .022 -.380 -.257 -.870 .410 -.620 1 -.441 .796 .685 -.667 .635 -.612 .402 .098 -.124 .794 -.027 .441 .717 -.414 .714 -.592 .208 -.050 .627 -.500 .120 .095 .641 -.186 .226 -.150 -.015 .819 -.033 .171 2 -.279 .165 -.900 .551 -.175 .356 -.092 -.167 .148 -.295 .549 -.080 -.610 .576 -.249 .321 -.037 .018 -.495 .344 -.216 .146 -.111 -.070 -.008 -.191 .035 -.642 .539 -.297 1 -.382 .666 .328 -.411 .635 -.175 .343 -.123 -.077 .685 .052 .400 .542 -.425 .625 -.581 .196 -.145 .489 -.498 .092 .176 .377 -.141 .187 -.129 .018 .604 .034 .110 2 .170 -.086 .052 .149 -.612 .356 -.453 1.000 .388 -.617 .142 -.395 -.364 .879 -.660 1.000 .171 .239 -.237 .539 .022 -.222 .162 .286 .296 -.458 -.255 -.188 .163 .182 1 -.326 .559 .008 -.227 .402 -.092 .343 -.453 .181 -.075 .417 -.010 -.055 .132 .266 -.432 -.153 .074 .324 -.390 .480 -.259 .025 -.161 -.278 .465 .017 -.046 .083 -.217 2 .010 1.000 1.000 -.195 .098 -.167 -.123 1.000 .046 1.000 -.556 .508 1.000 .579 .560 1.000 1.000 .029 .991 .404 -.399 1.000 1.000 -.120 1.000 -.986 -.406 1.000 -.351 1.000 1 -.241 .743 -.097 .041 -.124 .148 -.077 .388 .181 .046 .495 -.296 -.041 .319 -.162 .392 .472 -.306 .505 -.201 .670 -.163 -.176 .268 .334 .256 -.397 .246 .101 .114 2 -.074 .931 1.000 -.540 .794 -.295 .685 -.617 -.075 1.000 -.483 1.000 1.000 -.528 1.000 -.649 .678 -.282 1.000 -.400 -.295 1.000 1.000 -.201 .985 -.410 .186 1.000 -.364 .798 1 -.636 .842 -.753 .286 -.027 .549 .052 .142 .417 -.556 .495 -.483 -.526 .678 -.137 .132 .026 -.030 -.010 .091 .367 -.191 -.056 -.164 -.076 .379 -.182 -.397 .520 -.418 2 .028 .184 .446 -.407 .441 -.080 .400 -.395 -.010 .508 -.296 1.000 .562 -.352 .658 -.425 .169 -.034 .344 -.294 -.201 .438 .581 -.035 .287 -.336 .264 .453 -.116 .367 1 .026 .573 1.000 -.905 .717 -.610 .542 -.364 -.055 1.000 -.041 1.000 -.526 .562 .846 -.356 .713 -.322 1.000 -.588 .124 .405 .823 .182 .778 -.444 -.112 1.000 -.428 .788 2 -.372 .607 -.250 .365 -.414 .576 -.425 .879 .132 .579 .319 -.528 .678 -.352 -.489 .844 -.206 .506 -.256 .725 .090 -.117 .657 -.288 -.059 .316 -.124 -.687 .394 -.364 1 -.270 .905 .653 -.357 .714 -.249 .625 -.660 .266 .560 -.162 1.000 -.137 .658 .846 -.489 .371 -.214 .794 -.610 .003 .685 .802 -.188 .508 -.257 .062 .891 -.138 .375 2 .155 -.031 .069 .236 -.592 .321 -.581 1.000 -.432 1.000 .392 -.649 .132 -.425 -.356 .844 .191 .213 -.227 .471 .020 -.167 .167 .255 .306 -.531 -.307 -.148 .168 .154 1 -.303 1.000 .489 -.025 .208 -.037 .196 .171 -.153 1.000 .472 .678 .026 .169 .713 -.206 .371 .191 .838 -.322 .204 .629 .549 .139 1.000 -.485 -.501 1.000 -.021 .540 2 .160 -.504 .187 -.302 -.050 .018 -.145 .239 .074 .029 -.306 -.282 -.030 -.034 -.322 .506 -.214 .213 -.415 .295 -.138 -.481 .211 -.085 -.516 -.021 .308 -.527 .035 -.166 1 -.153 1.000 1.000 -.741 .627 -.495 .489 -.237 .324 .991 .505 1.000 -.010 .344 1.000 -.256 .794 -.227 .838 -.415 .760 .349 .801 .215 .826 .172 -.283 1.000 -.404 .734 2 -.324 -.200 -.384 .180 -.500 .344 -.498 .539 -.390 .404 -.201 -.400 .091 -.294 -.588 .725 -.610 .471 -.322 .295 -.599 -.127 .382 -.429 -.093 .723 -.055 -.639 .192 -.482 1 .008 .454 .261 -.520 .120 -.216 .092 .022 .480 -.399 .670 -.295 .367 -.201 .124 .090 .003 .020 .204 -.138 .760 -.599 -.408 .415 -.023 .602 -.157 .265 -.183 .223 2 -.138 .934 .156 1.000 .095 .146 .176 -.222 -.259 1.000 -.163 1.000 -.191 .438 .405 -.117 .685 -.167 .629 -.481 .349 -.127 1.000 -.461 .903 -.648 -.180 .521 .001 .223 1 -.731 1.000 1.000 -.318 .641 -.111 .377 .162 .025 1.000 -.176 1.000 -.056 .581 .823 .657 .802 .167 .549 .211 .801 .382 -.408 1.000 1.000 -.491 -.274 1.000 .028 .365 2 .528 -.592 .082 -.394 -.186 -.070 -.141 .286 -.161 -.120 .268 -.201 -.164 -.035 .182 -.288 -.188 .255 .139 -.085 .215 -.429 .415 -.461 .016 -.201 .077 .168 -.223 .469 1 -.310 1.000 .757 .022 .226 -.008 .187 .296 -.278 1.000 .334 .985 -.076 .287 .778 -.059 .508 .306 1.000 -.516 .826 -.093 -.023 .903 1.000 .016 -.470 1.000 -.045 .597 2 -.223 .041 -.373 -.380 -.150 -.191 -.129 -.458 .465 -.986 .256 -.410 .379 -.336 -.444 .316 -.257 -.531 -.485 -.021 .172 .723 .602 -.648 -.491 -.201 .098 -.622 -.308 -.459 1 .376 -.766 -.175 -.257 -.015 .035 .018 -.255 .017 -.406 -.397 .186 -.182 .264 -.112 -.124 .062 -.307 -.501 .308 -.283 -.055 -.157 -.180 -.274 .077 -.470 .098 -.113 .014 2 -.257 1.000 1.000 -.870 .819 -.642 .604 -.188 -.046 1.000 .246 1.000 -.397 .453 1.000 -.687 .891 -.148 1.000 -.527 1.000 -.639 .265 .521 1.000 .168 1.000 -.622 -.342 .862 1 -.500 .391 -.711 .410 -.033 .539 .034 .163 .083 -.351 .101 -.364 .520 -.116 -.428 .394 -.138 .168 -.021 .035 -.404 .192 -.183 .001 .028 -.223 -.045 -.308 -.113 -.342

2 .391 .084 .933 -.620 .171 -.297 .110 .182 -.217 1.000 .114 .798 -.418 .367 .788 -.364 .375 .154 .540 -.166 .734 -.482 .223 .223 .365 .469 .597 -.459 .014 .862

Appendix 30 Unstructured Working Correlation Matrix

Measurement

[ subdist = 1 ] [ subdist = 2 ] [ subdist = 3 ] [ subdist = 4 ] [ subdist = 5 ] [ subdist = 6 ] [ subdist = 7 ] [ subdist = 14 ] [ subdist = 15 ] [ subdist = 16 ]

ordb ordb

[ subdist = 13 ]

[ subdist = 3 ]

ordb ordb ordb ordb

[ subdist = 8 ] [ subdist = 9 ] [ subdist = 10 ] [ subdist = 11 ] [ subdist = 12 ]

ordb ordb ordb ordb ordb ordb ordb ordb

[ subdist = 1 ]

[ subdist = 2 ]

ordb ordb

[ subdist = 15 ] [ subdist = 4 ]

[ subdist = 5 ]

[ subdist = 6 ]

[ subdist = 7 ]

[ subdist = 8 ]

[ subdist = 9 ]

[ subdist = 10 ]

[ subdist = 11 ]

[ subdist = 12 ]

[ subdist = 13 ]

[ subdist = 14 ]

[ subdist = 16 ]

Dependent Variable: ordb

(6)

subdist1 subdist2 subdist3 subdist4 subdist5 subdist6 subdist7 subdist8 subdist9 subdist10 subdist11 subdist12 subdist13 subdist14 subdist15 subdist16 Corr. Coeff. 1.000 .134 0.000 0.000 0.000 .380 0.000 .245 0.000 .267 .127 .653 -.267 .250 .365 .250 Sig. (2-tailed) .752 1.000 1.000 1.000 .353 1.000 .559 1.000 .522 .765 .079 .522 .550 .374 .550 Corr. Coeff. .134 1.000 .459 .793* _.757* _.429 _.248 _.793* _.567 _.643 _.753* _.538 _.643 _.252 _.586 _.193

Sig. (2-tailed) .752 .252 .019 .030 .289 .553 .019 .142 .086 .031 .169 .086 .546 .127 .647 Corr. Coeff. 0.000 .459 1.000 .897** _.761* _.593 _.886** _.524 _.469 _.723* _.664 _.524 _.723* _.521 _.370 _.465

Sig. (2-tailed) 1.000 .252 .003 .028 .121 .003 .182 .241 .043 .072 .182 .043 .185 .367 .246 Corr. Coeff. 0.000 .793* .897** 1.000 .898** .635 .731* .760* .613 .829* .842** .593 .829* .517 .537 .435 Sig. (2-tailed) 1.000 .019 .003 .002 .091 .039 .029 .106 .011 .009 .121 .011 .189 .170 .281 Corr. Coeff. 0.000 .757* _.761* _.898** _1.000 _.761* _.761* _.653 _.449 _.802* _.761* _.694 _.802* _.417 _.365 _.417

Sig. (2-tailed) 1.000 .030 .028 .002 .028 .028 .079 .264 .017 .028 .056 .017 .304 .374 .304 Corr. Coeff. .380 .429 .593 .635 .761* _1.000 _.736* _.635 _.352 _.904** _.693 _.842** _.497 _.747* _.432 _.690

Sig. (2-tailed) .353 .289 .121 .091 .028 .037 .091 .393 .002 .057 .009 .210 .033 .285 .058 Corr. Coeff. 0.000 .248 .886** _.731* _.761* _.736* _1.000 _.359 _.331 _.723* _.464 _.524 _.723* _.521 _.123 _.578

Sig. (2-tailed) 1.000 .553 .003 .039 .028 .037 .383 .423 .043 .246 .182 .043 .185 .771 .134 Corr. Coeff. .245 .793* _.524 _.760* _.653 _.635 _.359 _1.000 _.853** _.829* _.842** _.593 _.567 _.762* _.894** _.680

Sig. (2-tailed) .559 .019 .182 .029 .079 .091 .383 .007 .011 .009 .121 .142 .028 .003 .063 Corr. Coeff. 0.000 .567 .469 .613 .449 .352 .331 .853** _1.000 _.611 _.524 _.213 _.611 _.680 _.775* _.762*

Sig. (2-tailed) 1.000 .142 .241 .106 .264 .393 .423 .007 .108 .182 .612 .108 .063 .024 .028 Corr. Coeff. .267 .643 .723* _.829* _.802* _.904** _.723* _.829* _.611 _1.000 _.813* _.720* _.714* _.802* _.586 _.713*

Sig. (2-tailed) .522 .086 .043 .011 .017 .002 .043 .011 .108 .014 .044 .047 .017 .127 .047 Corr. Coeff. .127 .753* _.664 _.842** _.761* _.693 _.464 _.842** _.524 _.813* _1.000 _.704 _.497 _.662 _.741* _.408

Sig. (2-tailed) .765 .031 .072 .009 .028 .057 .246 .009 .182 .014 .051 .210 .074 .036 .315 Corr. Coeff. .653 .538 .524 .593 .694 .842** _.524 _.593 _.213 _.720* _.704 _1.000 _.240 _.517 _.537 _.435

Sig. (2-tailed) .079 .169 .182 .121 .056 .009 .182 .121 .612 .044 .051 .567 .189 .170 .281 Corr. Coeff. -.267 .643 .723* _.829* _.802* _.497 _.723* _.567 _.611 _.714* _.497 _.240 _1.000 _.356 _.195 _.445

Sig. (2-tailed) .522 .086 .043 .011 .017 .210 .043 .142 .108 .047 .210 .567 .386 .643 .269 Corr. Coeff. .250 .252 .521 .517 .417 _.747* _.521

.762* .680 .802* .662 .517 .356 1.000 .730* .889** Sig. (2-tailed) .550 .546 .185 .189 .304 .033 .185 .028 .063 .017 .074 .189 .386 .040 .003 Corr. Coeff. .365 .586 .370 .537 .365 .432 .123 .894** .775* .586 .741* .537 .195 .730* 1.000 .609 Sig. (2-tailed) .374 .127 .367 .170 .374 .285 .771 .003 .024 .127 .036 .170 .643 .040 .109 Corr. Coeff. .250 .193 .465 .435 .417 .690 .578 .680 .762* _.713* _.408 _.435 _.445 _.889** _.609 _1.000

Sig. (2-tailed) .550 .647 .246 .281 .304 .058 .134 .063 .028 .047 .315 .281 .269 .003 .109

163

*. Correlation is significant at the 0.05 level (2-tailed). **. Correlation is significant at the 0.01 level (2-tailed).

subdist14

subdist15 subdist10

subdist11

subdist12

subdist13 Spearman's rho

subdist1

subdist2

subdist3

subdist4

subdist5

subdist6

subdist7

subdist8

subdist9