Data preprocessing Directory UMM :Data Elmu:jurnal:I:Insurance Mathematics And Economics:Vol26.Issue2-3.2000:

298 A.F. Shapiro, R. Paul Gorman Insurance: Mathematics and Economics 26 2000 289–307 The technologies mentioned earlier in this section were listed in the order of the amount of manual heuristic knowledge inherent in each stage. Ideally, tasks are pushed down to where the development is automatic and the structure in the data is used to extract domain boundaries and information in the data is used to extract the interaction terms. Again, however, the process can be thwarted by small sample size and poor signal-to-noise ratios. 7.4. Model development Once the best performance predictors have been identified, the next step is the development of the nonlinear model, and a considerable portion of this paper is devoted to that topic. Related issues that need to be reconciled are the advantages and disadvantages of both the linear paradigm and nonlinear paradigm, and the reasons for taking on the complexities of trying to extract nonlinearities. 7.5. Benchmarking and validation The final step in the model development process is benchmarking and model validation. The latter is a part of comparative performance testing and is done iteratively during model development to verify that if the approach adds complexity, it also adds comparable value. 13 It should be clear that the approach is very empiri- cal and that the nature of the problem determines the approach. This is even more apparent in the remainder of the paper where the details of each of these steps is discussed.

8. Data preprocessing

The primary considerations in data preprocessing are to reconcile disparate sources of data, to reduce or eliminate intrinsic data bias, and to aggregate variables, when appropriate. These issues are addressed in this section. 13 The accounting profession refers to this consideration as the “materiality criterion”. 8.1. Reconcile disparate sources of data Generally, a number of sources of data are needed to develop the model. This might include insurers and agencies, household demographics, econometrics data, and client internal transaction data. Conse- quently, reconciling disparate sources of data becomes critical. 8.2. Intrinsic data bias Another challenge when dealing with data is to reduce some of its internal biases. In the area of con- sumer behavior, for example, where adverse selection is the issue, the insured data base may provide limited guidance in some cases because it contains only in- sureds, and people that need to be identified on the adverse side already have been selected away. So, strate- gies need to be developed to compensate for these biases. 8.3. Aggregate variables One productive approach is to develop a set of aggregate variables to help take the raw state of these sources of variables and bring them together into a concise set of aggregates. A common example of this is the use of residential areas as a proxy for socioeconomic characteristics. 14 Where this is done, that level will typically be used to begin the modeling process. As discussed by Bishop 1995, Section 8.6.2, one might approach this issue using a kind of a neuro-network architecture that is autoassociative, 15 which tries to predict the same patterns that are at the input at the output through a narrow number of units. This results in compression and at the same time takes advantage of nonlinearities or interaction terms between the observables. 14 The use of aggregate information as a proxy for the individual characteristics of interest has to be used with care because it can result in biases. The reason for this is that there is a tendency for aggregate proxies to exaggerate the effects of micro-level variables and to do more poorly than micro-level variables at controlling for confounding. This has been found, for example, when socioeconomic characteristics of residential areas such as median income associated with a zip code are used to proxy for individual characteristics. See Geronimus et al. 1996. 15 An autoassociative network is a network where the target data set is identical to the input data set. A.F. Shapiro, R. Paul Gorman Insurance: Mathematics and Economics 26 2000 289–307 299 Another approach would be to use nonlinear compression, which is kind of the nonlinear correlate to factor analysis 16 or principle components. 17 This can be accomplished, for example, with a four-layer autoassociative network, where the first and third hidden layers have sigmoidal nonlinear activation functions.

Data preprocessing Directory UMM :Data Elmu:jurnal:I:Insurance Mathematics And Economics:Vol26.Issue2-3.2000:

8. Data preprocessing

9. Domain segmentation

Parts

Dokumen yang terkait

PERSEPSI MAHASISWA TENTANG MATERI TAYANGAN SANG PEMBURU DI LATIVI ( Studi Pada Mahasiswa Jurusan Ilmu Komunikasi UMM Angkatan 2002)

APRESIASI MAHASISWA TERHADAP TAYANGAN Â“OPERA VAN JAVAÂ” DI TRANS7 (Studi Pada Mahasiswa Jurusan Ilmu Komunikasi UMM Angkatan 2008)

FAKTOR-FAKTOR PENYEBAB KESULITAN BELAJAR BAHASA ARAB PADA MAHASISWA MA’HAD ABDURRAHMAN BIN AUF UMM

38 Mahasiswa UMM Pilih KKN di Padang

FPP UMM Seleksi Sarjana Membangun Desa

Pengajian Muhammadiyah di UMM Akan Hadirkan BJ Habibie

The Influence Of Islamic Value Towards Social Reporting : a case study:BSM And BMI

An Identity Crisis In Hanrahan's Lost Girls And Love Hotels

ANALISIS MANAJEMEN PENCEGAHAN DAN PENANGGULANGAN KEBA- KARAN DI PUSKESMAS KECAMATAN CIPAYUNG JAKARTA TIMUR Analysis Of Management Prevention And Fight Fire At The Health Center Of Cipayung East Jakarta

Building Character And Literacy Skills Of Primary School Students Through Puppet Contemplative Sukuraga

Dukungan

Links

Data preprocessing Directory UMM :Data Elmu:jurnal:I:Insurance Mathematics And Economics:Vol26.Issue2-3.2000:

8. Data preprocessing

9. Domain segmentation

Parts

Dokumen yang terkait

PERSEPSI MAHASISWA TENTANG MATERI TAYANGAN SANG PEMBURU DI LATIVI ( Studi Pada Mahasiswa Jurusan Ilmu Komunikasi UMM Angkatan 2002)

APRESIASI MAHASISWA TERHADAP TAYANGAN Â“OPERA VAN JAVAÂ” DI TRANS7 (Studi Pada Mahasiswa Jurusan Ilmu Komunikasi UMM Angkatan 2008)

FAKTOR-FAKTOR PENYEBAB KESULITAN BELAJAR BAHASA ARAB PADA MAHASISWA MA’HAD ABDURRAHMAN BIN AUF UMM

38 Mahasiswa UMM Pilih KKN di Padang

FPP UMM Seleksi Sarjana Membangun Desa

Pengajian Muhammadiyah di UMM Akan Hadirkan BJ Habibie

The Influence Of Islamic Value Towards Social Reporting : a case study:BSM And BMI

An Identity Crisis In Hanrahan's Lost Girls And Love Hotels

ANALISIS MANAJEMEN PENCEGAHAN DAN PENANGGULANGAN KEBA- KARAN DI PUSKESMAS KECAMATAN CIPAYUNG JAKARTA TIMUR Analysis Of Management Prevention And Fight Fire At The Health Center Of Cipayung East Jakarta

Building Character And Literacy Skills Of Primary School Students Through Puppet Contemplative Sukuraga

Dokumen yang Anda mencari sudah siap untuk unduhkan