A.F. Shapiro, R. Paul Gorman Insurance: Mathematics and Economics 26 2000 289–307 299
Another approach would be to use nonlinear com- pression, which is kind of the nonlinear correlate to
factor analysis
16
or principle components.
17
This can be accomplished, for example, with a four-layer au-
toassociative network, where the first and third hidden layers have sigmoidal nonlinear activation functions.
9. Domain segmentation
Domain segmentation involves the identification of segments within the decision space where the implicit
relationship between variables is constant. It is a very important step and has been demonstrated to provide
enormous amounts of value-added performance Kelly et al., 1995. Fig. 12 exemplifies the situation.
18
Traditionally, two approaches have been used for segmentation. One is to try to find segments in the pop-
ulation that have relatively constant behavior within a group and then to assign either a score or an output
to that entire group, assuming they have uniform be- havior. Another approach is to ignore the segmenta-
tion altogether and attempt to fit a model to the entire decision space.
Fig. 12 demonstrates that neither approach really gets at the underlying structure in the data since, in
Fig. 12 Domain segmentation.
16
In the current context, factor analysis may be thought of as a technique which uses measures of association correlations to
extract patterns latent structure in the data of association a dependence on common processes in complex data sets. To be
really useful and valid, factor analysis needs large data arrays as correlations can be found for spurious reasons.
17
Principal component analysis is a methodology for finding the structure of a cluster located in multidimensional space. Conceptu-
ally, it is equivalent to choosing that rotation of the cluster which best depicts its underlying structure.
18
Adopted from Gorman 1996, Slide 9.
both approaches, much of the resolution in the model is lost. Typically, what needs to be done is to isolate
the unique domains and model within them. This has the advantage of improving the ability of these adapted
technologies to extract the structure. In essence, the technique is allowed to focus in on relatively stationary
behavior, so that it has a better opportunity to extract the information.
9.1. Isotropic subdomains Domains that may be used in the development of
insurance models include the insurance companies themselves. In the area of consumer behavior models,
for example, it may turn out, as it has with credit bureaus Gorman, 1996, that each insurer actually
reflects relatively distinct characteristics of individual consumer behavior and so modeling within companies
rather than across them may have some advantages. Depending on the inquiry, one would also expect that
there are geographic regions that should be cordoned off. So, those groups can be isolated and modeled
within those domains to get a better resolution with regard to that behavior.
It also generally makes sense to classify clients by adverse selection characteristics and to model within
each of those classes. Moreover, as discussed below, certain aspects of temporal behavior are likely to be
much more important than cross-sectional behavior. Since the goal is to refine the detection of these types
of behavior the data could be segregated accordingly.
10. Variable selection and derivation
The next step in the process of developing a model is to select and aggregate raw variables to obtain the
most concise representation of the information content within the data.
10.1. Concise representation The objective here is to identify a set of predictors
predictive variables within domain segments that al- low the reduction of the overall dimensionality of the
problem. The goal is to reduce the number of vari- ables, particularly those that are only marginally rele-
300 A.F. Shapiro, R. Paul Gorman Insurance: Mathematics and Economics 26 2000 289–307
vant, by using aggregates developed from experience in other applications.
10.2. Primary methodologies The primary methodologies include rule induction
technologies,
19
which encompass such things as CHAID
20
and neuro-fuzzy inferencing Wang et al., 1995, p. 89, and significance testing using regression
and sensitivity analysis.
21
Again, wherever the struc- ture within the domain allows the use of dynamic
variable selection based on pruning the parameters or pruning the weights, that approach is adopted.
22
Of course, depending on the domain and the strength of
19
Rule induction comprises a wide variety of technologies, but the basic intent is to take a set of sample data and extract implicit rules
in the data itself. For instance, in the case of some technologies that might be considered neuro-fuzzy technologies, which are really
kind of kernel-based neuro-networks, rules if-then statements, really can be represented in terms of membership functions that
are defined over ranges of variables. The model can be set up with both the position and the boundaries
of these membership functions randomized and then the parameters associated with the boundaries can be adapted by looking at the
data itself. Hence, implicit rules can be extracted to help predict the output. An example would be whether an individual with a
certain pattern was a high risk or whether a contract on that individual was likely to be profitable.
20
CHAID chi-squared automatic interaction detection SPSS, 1993 has been a popular method for segmentation and profiling,
which is used when the variable responses are categorical in nature and a relationship is sought between the predictor variables and
a categorical outcome measure. It seeks to formulate interaction terms between variables and uses kind of a maximum likelihood
technique to determine where the boundaries are along the ranges of variables. It then builds that up hierarchically, which allows
rules to be extracted.
21
These may involve such things as CART classification and regression trees Breiman et al., 1984, which is a procedure
for analyzing categorical classification or continuous regression data and C4.5 Quinlan, 1993, which is an algorithm for inducing
decision trees from data.
22
This optimization technique is kind of a connectionist architec- ture which uses gradient decent Hayes, 1996, p. 499 or a con-
jugate gradient technique, which is an improved steepest descent approach, or perhaps evolutionary GAs, to optimize the parame-
ters and locate the appropriate boundaries, and thus develop the best set of predictions.
These technologies are used primarily to discover domains within the data but they also provide some insight into which variables
are predictive. Moreover, they have the advantage of being able to address joint relationships between variables, as opposed to
something like regression which looks at how significant predictors are independently.
the structure exhibited in the data, that technique may or may not work.
10.3. Behavioral changes Empirical evidence suggests that a very important
aspect of predicting behavior is not simply the current status of an individual, but how that status changes
over time. So, a number of aggregates have been de- rived to help capture that characteristic and some of
the predictive variables are sampled over time to mon- itor the trend. With respect to credit cards, for exam-
ple, key considerations are the balance-to-credit ratio, patterns of status updates, and age difference between
primary and secondary household member.
11. A comparison of linear and nonlinear models