A comparison of linear and nonlinear models

300 A.F. Shapiro, R. Paul Gorman Insurance: Mathematics and Economics 26 2000 289–307 vant, by using aggregates developed from experience in other applications. 10.2. Primary methodologies The primary methodologies include rule induction technologies, 19 which encompass such things as CHAID 20 and neuro-fuzzy inferencing Wang et al., 1995, p. 89, and significance testing using regression and sensitivity analysis. 21 Again, wherever the structure within the domain allows the use of dynamic variable selection based on pruning the parameters or pruning the weights, that approach is adopted. 22 Of course, depending on the domain and the strength of 19 Rule induction comprises a wide variety of technologies, but the basic intent is to take a set of sample data and extract implicit rules in the data itself. For instance, in the case of some technologies that might be considered neuro-fuzzy technologies, which are really kind of kernel-based neuro-networks, rules if-then statements, really can be represented in terms of membership functions that are defined over ranges of variables. The model can be set up with both the position and the boundaries of these membership functions randomized and then the parameters associated with the boundaries can be adapted by looking at the data itself. Hence, implicit rules can be extracted to help predict the output. An example would be whether an individual with a certain pattern was a high risk or whether a contract on that individual was likely to be profitable. 20 CHAID chi-squared automatic interaction detection SPSS, 1993 has been a popular method for segmentation and profiling, which is used when the variable responses are categorical in nature and a relationship is sought between the predictor variables and a categorical outcome measure. It seeks to formulate interaction terms between variables and uses kind of a maximum likelihood technique to determine where the boundaries are along the ranges of variables. It then builds that up hierarchically, which allows rules to be extracted. 21 These may involve such things as CART classification and regression trees Breiman et al., 1984, which is a procedure for analyzing categorical classification or continuous regression data and C4.5 Quinlan, 1993, which is an algorithm for inducing decision trees from data. 22 This optimization technique is kind of a connectionist architec- ture which uses gradient decent Hayes, 1996, p. 499 or a con- jugate gradient technique, which is an improved steepest descent approach, or perhaps evolutionary GAs, to optimize the parameters and locate the appropriate boundaries, and thus develop the best set of predictions. These technologies are used primarily to discover domains within the data but they also provide some insight into which variables are predictive. Moreover, they have the advantage of being able to address joint relationships between variables, as opposed to something like regression which looks at how significant predictors are independently. the structure exhibited in the data, that technique may or may not work. 10.3. Behavioral changes Empirical evidence suggests that a very important aspect of predicting behavior is not simply the current status of an individual, but how that status changes over time. So, a number of aggregates have been derived to help capture that characteristic and some of the predictive variables are sampled over time to mon- itor the trend. With respect to credit cards, for example, key considerations are the balance-to-credit ratio, patterns of status updates, and age difference between primary and secondary household member.

11. A comparison of linear and nonlinear models

The stage is set to discuss the actual development of the model. Before doing that, however, it is appropriate to digress to compare the linear and nonlinear models and to describe the motivations for the nonlinear approach and some of its shortcomings. The assumption is made a priori that the goal is to extract interactions out of the sample data. 11.1. The linear modeling paradigm Approaching the world from a linear perspective is a very powerful strategy. This, coupled with the su- perposition assumption that complex behavior can be modeled as a linear combination of simpler behaviors Hayes, 1996, p. 10 and independence of dimensions, provides a powerful set of technologies for analyzing performance and significance of variables. In practice, this can lead to a tendency to ignore the nonlinear fac- tors, the justification being that the higher order terms add only a slight perturbation to the overall behavior the model is attempting to capture. Interactions are often extraordinarily important from the perspective of many of the less well un- derstood financial problems. 23 Consequently, if it is assumed that all the variables are independent, the model must involve an enormous number of degrees 23 This is not a new phenomenon, it also is true from a target recognition perspective. A.F. Shapiro, R. Paul Gorman Insurance: Mathematics and Economics 26 2000 289–307 301 of freedom to capture that complexity. In due course, since computational complexity scales with model complexity, a threshold is reached with linear systems where the model becomes very brittle 24 as its dimensionality is increased and an attempt is made to capture finer and finer behavior. Of course, nonlinearities can be accommodated in the linear regime and still take advantage of many of the powerful technologies available when modeling from a linear perspective. This can be accomplished by making assumptions about the form of the nonlinearity, by explicitly representing higher order statis- tics Nikias and Petropulu, 1993, and by using nonlinear basis functions that are orthogonal 25 Chen et al., 1991. However, trying to represent these interaction terms becomes a combinatorial problem 26 as the dimensionality of the model increases. So, the linear approach has been very powerful but it has its limitations when the complexity increases. 11.2. The nonlinear modeling paradigm The nonlinear approach that guides many of the technologies were derived from studies of complex systems like the neural or evolutionary systems, where complexity grew out of relatively simple components whose interactions were the key to the emergent behavior. Interactions, of course, imply nonlinearity. The nonlinear modeling approach is depicted in Fig. 13. As indicated, the process models complexity with- out computational complexity. It starts with very simple transforms in the case of NNs where a weighted sum is developed and passed through a nonlinearity. That is done interactively through layers and though the transformation is simple, these simple nonlinearities are combined to approximate very complex nonlinear behavior. As a result, one is forced to approach the problem adaptively simply because there are no 24 “Brittle” is a common and descriptive term in engineering, which implies that the effects of the model are very sensitive to minor changes in the parameters. 25 Two random variables are said to be orthogonal if their corre- lation is zero. 26 Combinatorial optimization problems present difficulties because they cannot be computed in polynomial time. Instead, they require times that are exponential functions of the problem size. Fig. 13. Nonlinear modeling approach. closed form solutions when dealing with nonlinear basis functions that are nonorthoginal. The only way to estimate the parameters associated with models of this type, where there are hundreds of degrees of freedom, is to adopt some kind of nu- merical optimization technique involving incremental optimization. From a cost-benefit perspective, the primary reason this would be attempted is that, theoret- ically at least, if no stringent assumptions are made, one can model an arbitrary nonlinear function to any arbitrary degree of accuracy by overlaying these basis functions. This is an ideal; it obviously is not always the case. 11.3. Linear vs. nonlinear models One example which clearly distinguishes between the two approaches when trying to capture complex behavior involves the determination of the underlying structure of a time series. Resorting to spectrum es- timation Hayes, 1996, Chapter 8, one might try to capture the structure in the time series by building it up from simple sine and cosine functions which serve as orthogonal basis functions. 27 Given a Fourier transform, the parameters associated with that transform can be determined analytically. It can turn out, however, that although the time series looks periodic, the power spectrum 28 has a very 27 The orthogonal functions are cos 2π tL and sin 2π tL, where L is the period. 28 The power spectrum is the Fourier transform of an autocor- relation sequence. It is a representation of the magnitude of the various frequency components of an image that has been trans- formed with the Fourier transform from the spacial domain into the frequency domain. 302 A.F. Shapiro, R. Paul Gorman Insurance: Mathematics and Economics 26 2000 289–307 Fig. 14. Model performance high signal-to-noise case. broad band, 29 which is problematic for the Fourier approach, since it indicates the existence of a contin- uum of frequencies. Typically, in order to capture the complex behavior, it is necessary to sample over a large number of time samples and to do the transform with sufficient spectral resolution. That can result in a large numbers of degrees of freedom, perhaps on the order of a 1000, or more, depending on the situation. If the nonlinear approach is taken in this case, and this is indicative of many problems Gorman, 1996, an NN can be used based on only a few of the temporal samples, time delayed. Then the underlying dynamics of a time series that is generated by a nonlinear dynamic system can be rebuild by taking the time delays and imbedding them in a state space. 30 The details of this process are described by Packard et al. 1980 and expanded upon by Tufillaro et al. 1992, Chapter 3. Briefly, assuming that the time series, xt, is pro- duced by a deterministic dynamical system that can be modeled by some nth-order ordinary differential equation, then the trajectory of the system is uniquely specified at time 0 by its value and its first n−1 deriva- tives. If the sampling time is evenly spaced, almost all the information about the derivative is contained 29 A broad band power spectrum suggests either a purely random or noisy process or chaotic motion. In this instance, since the series looks periodic, we apparently are confronted with a chaotic time series. 30 State phase space is an abstract space used to represent all possible states of a system. In state space, the value of a variable is plotted against all possible values of the other variables at the same time. Conceptually, if one thinks in terms of a bouncing ball, the height of the ball at any time could be represented by a time series, and the state space of the motion of the bouncing ball could be represented by the two dimensions height and velocity. in the differences of the original series, and almost all the information about the orbit can be recovered from embedded variables of the form y j i = x i −rj , where j denotes the jth embedded variable, i denotes the ith term of the series, and the time delay, r, is unique to each variable. Of course, since the form of those nonlinearities are not know a priori, they have to be built up from sigmoids. 31 So, it does take tens of parameters to capture the underlying structure but not tens of tens. In many cases, once the interactive terms are captured successfully, a much more concise description of the underlying process is obtained. 11.4. The bias–variance tradeoff Another dimension to the issue of using ANMs, rather than linear models, has to do with the bias–variance tradeoff Geman et al., 1992. The field began when Rumelhart et al. 1986 decided against the consensus of their peers to try gradient decent in a multilayered NN and found that it converged. Initially, when these networks were applied, the primary focus was on very high signal-to-noise problems. This situation is depicted in Fig. 14, 32 which shows model performance of the linear and nonlinear pro- cesses as a function of the variancebias tradeoff. As illustrated by the solid line in the figure, a linear high bias model does not model a nonlinear process 31 In the context of NNs, the sigmoid S-shaped function is a nonlinear activation function of a neuron. Bishop, 1995, p. 82, pp. 232–234. 32 Adopted from Gorman 1996, Slide 14. A.F. Shapiro, R. Paul Gorman Insurance: Mathematics and Economics 26 2000 289–307 303 Fig. 15. Model performance low signal-to-noise case. very well. This is a consequence of the many assumptions made in a linear model about the underlying structure. In contrast, as the solution tends to the stan- dard canonical nonlinear architectures, where fewer and fewer assumptions are made, the ability to capture the nonlinearities in the problem are improved. This was an important result, since relatively simple components can be pieced together to capture very nonlinear behavior. 11.5. Financial models Fig. 15 portrays the complication that occurs when the foregoing technologies are applied to low signal-to-noise situations, such as those that often accompany financial modeling. Now the nonlinear model does not capture the nonlinear process the solid line very well. The reason being that, while NNs and architectures of that type have low bias, very few assumptions are made and there is a tendency to overfit. This, coupled with finite sample data, leads to significant problems with the variance. So, depending on the initial conditions and the sample used from the overall population, widely varying solutions can be obtained. In many cases, linear solutions are better in the sense that they capture the underlying structure, at least the first-order structure, much better than the high dimensional low bias models. This follows because of the bias imposed on the solution in the linear technique. The foregoing anomaly arose because of the enormous change in the underlying characteristics of the problem. Initially, the problem involved improving the classification performance or decision performances from the 80 range up to the 95–98 range. When it came to financial issues, however, the problem be- came one of achieving one or two percentage points over chance, and it was clear that if this was to be accomplished, the high variance issue had to be ad- dressed. Part of the solution involved domain segmentation, variable selection, the use of aggregates, and so forth. In addition, embedded expert knowledge was used to impose constraints on the solution of these low bias models in order to avoid the problem of overfit- ting. This is a very heuristic and ad hoc approach but, to date, there is no satisfactory alternative.

A comparison of linear and nonlinear models

11. A comparison of linear and nonlinear models

12. Model constraints

Parts

Dokumen yang terkait

PERSEPSI MAHASISWA TENTANG MATERI TAYANGAN SANG PEMBURU DI LATIVI ( Studi Pada Mahasiswa Jurusan Ilmu Komunikasi UMM Angkatan 2002)

APRESIASI MAHASISWA TERHADAP TAYANGAN Â“OPERA VAN JAVAÂ” DI TRANS7 (Studi Pada Mahasiswa Jurusan Ilmu Komunikasi UMM Angkatan 2008)

FAKTOR-FAKTOR PENYEBAB KESULITAN BELAJAR BAHASA ARAB PADA MAHASISWA MA’HAD ABDURRAHMAN BIN AUF UMM

38 Mahasiswa UMM Pilih KKN di Padang

FPP UMM Seleksi Sarjana Membangun Desa

Pengajian Muhammadiyah di UMM Akan Hadirkan BJ Habibie

The Influence Of Islamic Value Towards Social Reporting : a case study:BSM And BMI

An Identity Crisis In Hanrahan's Lost Girls And Love Hotels

ANALISIS MANAJEMEN PENCEGAHAN DAN PENANGGULANGAN KEBA- KARAN DI PUSKESMAS KECAMATAN CIPAYUNG JAKARTA TIMUR Analysis Of Management Prevention And Fight Fire At The Health Center Of Cipayung East Jakarta

Building Character And Literacy Skills Of Primary School Students Through Puppet Contemplative Sukuraga

Dukungan

Links

A comparison of linear and nonlinear models

11. A comparison of linear and nonlinear models

12. Model constraints

Parts

Dokumen yang terkait

PERSEPSI MAHASISWA TENTANG MATERI TAYANGAN SANG PEMBURU DI LATIVI ( Studi Pada Mahasiswa Jurusan Ilmu Komunikasi UMM Angkatan 2002)

APRESIASI MAHASISWA TERHADAP TAYANGAN Â“OPERA VAN JAVAÂ” DI TRANS7 (Studi Pada Mahasiswa Jurusan Ilmu Komunikasi UMM Angkatan 2008)

FAKTOR-FAKTOR PENYEBAB KESULITAN BELAJAR BAHASA ARAB PADA MAHASISWA MA’HAD ABDURRAHMAN BIN AUF UMM

38 Mahasiswa UMM Pilih KKN di Padang

FPP UMM Seleksi Sarjana Membangun Desa

Pengajian Muhammadiyah di UMM Akan Hadirkan BJ Habibie

The Influence Of Islamic Value Towards Social Reporting : a case study:BSM And BMI

An Identity Crisis In Hanrahan's Lost Girls And Love Hotels

ANALISIS MANAJEMEN PENCEGAHAN DAN PENANGGULANGAN KEBA- KARAN DI PUSKESMAS KECAMATAN CIPAYUNG JAKARTA TIMUR Analysis Of Management Prevention And Fight Fire At The Health Center Of Cipayung East Jakarta

Building Character And Literacy Skills Of Primary School Students Through Puppet Contemplative Sukuraga

Dokumen yang Anda mencari sudah siap untuk unduhkan