A.F. Shapiro, R. Paul Gorman Insurance: Mathematics and Economics 26 2000 289–307 291
ers who are unfamiliar with the area. Following that, the reminder of this section discusses the grouping
of ANM technologies into functional classes and the team approach.
3. The soft computing technologies
This section gives a cursory overview of the soft computing technologies NNs, FL and GAs. The reader
is referred to the references for more detail on each of these technologies.
3.1. Neural networks NNs Bishop, 1995 are software programs that em-
ulate the biological structure of the human brain and its associated neural complex and are used for pat-
tern classification, prediction and financial analysis, and control and optimization. The core of an NN is
the neural processing unit, a representation of which is shown in Fig. 2.
The inputs to the neuron, x
j
, are multiplied by their respective weights, w
j
, and aggregated. The weight w serves the same function as the intercept in a regression
formula. The weighted sum is then passed through an activation function, F, to produce the output of the
unit. Often, the activation function takes the form of the logistic function F z = 1 + e
− z
− 1
, where z = P
j
w
j
x
j
, as shown in the figure. NN can be either supervised or unsupervised. The
distinguishing feature of a supervised NN is that its input and output is known and its objective is to dis-
cover a relationship between the two. Insurance appli- cations using supervised NNs include Tu 1993, who
compared NNs and logistic regression models for pre- dicting length of stay in the intensive care unit follow-
Fig. 2. Neural processing unit.
ing cardiac surgery, and Brockett et al. 1994, who sought to improve the early warning signals associated
with property-liability insurance company insolvency. The distinguishing feature of an unsupervised NN is
that only the input is known and the goal is to un- cover patterns in the features of the input data. Insur-
ance applications involving unsupervised NNs include Jang 1997, who investigated insolvencies in the life
insurance industry and Brockett et al. 1998, who in- vestigated automobile bodily injury claims fraud. The
remainder of this section is devoted to an overview of supervised and unsupervised NNs.
3.1.1. Supervised neural networks A sketch of the operation of a supervised NN is
shown in Fig. 3. Since supervised learning is involved, the system
will attempt to match a know output, such as firms that have become insolvent or claims which are fraudulent.
The process begins by assigning random weights to the connection between each set of neurons in the
network. These weights represent the intensity of the connection between any two neurons and will contain
the memory of the network. Given the weights, the intermediate values a hidden layer and the output of
the system are computed. If the output is optimal, the process is halted; if not, the weights are adjusted and
the process is continued until an optimal solution is obtained or an alternate stopping rule is reached.
If the flow of information through the network is from the input to the output, it is known as a
feed forward network. The NN is said to involve back-propagation since inadequacies in the output are
fed back through the network so that the algorithm can be improved.
Fig. 3. The operation of a supervised NN.
292 A.F. Shapiro, R. Paul Gorman Insurance: Mathematics and Economics 26 2000 289–307
Fig. 4. Three-layer neural network.
3.1.2. A three-layer neural network An NN is composed of layers of neurons, an ex-
ample of which is the three-layer NN depicted in Fig. 4. Extending the notation associated with Fig. 2, the
first layer, the input layer, has three neurons labeled x
0j
, j=0, 1, 2, the second layer, the hidden process- ing layer, has three neurons labeled x
1j
, j=0, 1, 2, and the third layer, the output layer, has one neuron
labeled x
21
. There are two inputs I1 and I2. The neurons are connected by the weights w
ij k
, where the subscripts i, j, and k refer to the ith layer,
the jth node of the ith layer, and the kth node of the i+1th layer, respectively. Thus, for example, w
021
is the weight connecting node 2 of the input layer layer 0 to node 1 of the hidden layer layer 1. It
follows that the aggregation in the neural process- ing associated with the hidden neuron x
11
results in z=x
00
w
001
+ x
01
w
011
+ x
02
w
021
, which is the input to the activation function.
3.1.3. The learning rules The weights of the network serve as its memory, and
so the network “learns” when its weights are updated. The updating is done using a learning rule, a common
example of which is the Delta rule Shepherd, 1997, p. 15, which is the product of a learning rate, which
controls the speed of convergence, an error signal, and the value associated with the jth node of the ith layer.
The choice of the learning rate is critical: if its value is too large, the error term may not converge at all, and
if it is too small, the weight updating process may get stuck in a local minimum andor be extremely time
intensive. 3.1.4. The learning strategy of a neural network
The characteristic feature of NNs is their ability to learn and the strategy by which this takes place
involves training, testing, and validation. Briefly, the clean and scrubbed data is randomly subdivided into
three subsets: T1: which is used for training the net- work; T2: which is used for testing the stopping rule;
T3: which is used for validating the resulting net- work. For example, T1, T2 and T3 may be 50, 25 and
25 of the database, respectively. The stopping rule reduces the likelihood that the network will become
overtrained, by stopping the training on T1 when the predictive ability of the network, as measured on T2,
is no longer improved.
3.1.5. Unsupervised neural networks This section discusses one of the most common
unsupervised NNs, the Kohonen network Kohonen, 1988, which often is referred to as a self-organizing
feature map SOFM. The purpose of the network is to emulate our understanding of how the brain uses
spatial mappings to model complex data structures. Specifically, the learning algorithm develops a map-
ping from the input patterns to the output units that embodies the features of the input patterns.
In contrast to the supervised network, where the neurons are arranged in layers, in the Kohonen net-
work they are arranged in a planar configuration and the inputs are connected to each unit in the network.
The configuration is depicted in Fig. 5.
Fig. 5. Two-dimensional Kohonen network.
A.F. Shapiro, R. Paul Gorman Insurance: Mathematics and Economics 26 2000 289–307 293
Fig. 6. Operation of a 2D Kohonen network.
As indicated, the Kohonen SOFM is a two-layered network consisting of a set of input units in the input
layer and a set of output units arranged in a grid called a Kohonen layer. The input and output layers are to-
tally interconnected and there is a weight associated with each link, which is a measure of the intensity of
the link.
The sketch of the operation of an unsupervised NN is shown in Fig. 6.
The first step in the process is to initialize the pa- rameters and organize the data. This entails setting the
iteration index, t, to 0, the interconnecting weights to small positive random values, and the learning rate to
a value smaller than but close to 1. Each unit has a neighborhood of units associated with it and empirical
evidence suggests that the best approach is to have the neighborhoods fairly broad initially and then to have
them decrease over time. Similarly, the learning rate is a decreasing function of time.
Each iteration begins by randomizing the training sample, which is composed of P patterns, each of
which is represented by a numerical vector. For ex- ample, the patterns may be composed of solvent and
insolvent insurance companies and the input variables may be financial ratios. Until the number of patterns
used p exceeds the number available pP, the pat- terns are presented to the units on the grid, each of
which is assigned the Euclidean distance between its connecting weight to the input unit and the value of the
input. This distance is given by [
P
j
x
j
− w
ij 2
]
0.5
, where w
ij
is the connecting weight between the jth input unit and the ith unit on the grid and x
j
the input
Fig. 7. An FL system.
from unit j. The unit which is the best match to the pattern, the winning unit, is used to adjust the weights
of the units in its neighborhood. The process contin- ues until the number of iterations exceeds some pre-
determined value T.
In the foregoing training process, the winning units in the Kohonen layer develop clusters of neighbors
which represent the class types found in the training patterns. As a result, patterns associated with each
other in the input space will be mapped on to output units which also are associated with each other. Since
the class of each cluster is know, the network can be used to classify the inputs.
3.2. Fuzzy logic FL
6
was developed as a response to the fact that most of the parameters we encounter in the real world
are not precisely defined. For example, a particular investor may have a “high risk capacity” or the rate
of return on an investment might be “around 6”; the first of these is known as a linguistic variable while the
second is known as a fuzzy number. These concepts and the structure of an FL system are discussed in this
section.
3.2.1. The structure of a fuzzy logic system The essential structure of an FL system is depicted
in the flow chart shown in Fig. 7, which was adapted from Von Altrock 1997, p. 37.
6
Following Zadeh 1994, p. 192, in this paper the term FL is used in the broad sense where it is essentially synonymous with
fuzzy set theory.
294 A.F. Shapiro, R. Paul Gorman Insurance: Mathematics and Economics 26 2000 289–307
Fig. 8. Fuzzy set of clients with high risk capacity.
In the figure, numerical variables are the input of the system. These variables are passed through a fuzzifi-
cation stage, where they are transformed to linguistic variables and subjected to inference rules. The linguis-
tic results are then transformed by a defuzzification stage into numerical values which become the output
of the system.
3.2.2. Linguistic variables A linguistic variable Zedeh, 1975a,b, 1981 is a
variables whose values are expressed as words or sen- tences. Risk capacity, for example, may be viewed
both as a numerical value ranging over the interval [0,100], and a linguistic variable that can take on val-
ues like high, not very high, and so on. Each of these linguistic values may be interpreted as a label of a
fuzzy subset of the universe of discourse X=[0,100], whose base variable, x, is the generic numerical value
risk capacity. Such a set, an example of which is shown in Fig. 8, is characterized by a membership function,
µ
high
x, which assigns to each object a grade of mem- bership ranging between zero and one. In this case,
which represents the set of clients with a high risk ca- pacity, individuals with a risk capacity of 50, or less,
are assigned a membership grade of zero and those with a risk capacity of 80, or more, are assigned a
grade of one. Between those risk capacities, 50, 80, the grade of membership is fuzzy.
Fuzzy sets are implemented by extending many of the basic identities that hold for ordinary sets. Thus,
for example, the union of fuzzy sets A and B is the smallest fuzzy set containing both A and B, and the
intersection of A and B is the largest fuzzy set which is contained in both A and B.
Representative insurance paper involving linguistic variables include DeWit 1982, the first FL paper in
the area, which dealt with individual underwriting, and Young 1993, 1996, who modeled the selection and
rate changing process in group health insurance.
3.2.3. Fuzzy numbers The general characteristic of a fuzzy numbers
Zedeh, 1975a,b; Dubois and Prade, 1980 is repre- sented in Fig. 9.
This shape of fuzzy number is referred to as a “flat” fuzzy number; if m
2
was equal to m
3
, it would be referred to as a “triangular” fuzzy number. The points
m
j
, j=1, 2, 3, 4, and the functions f
j
y|M, j = 1, 2, M a fuzzy number, which are inverse functions
mapping the membership function onto the real line, characterize the fuzzy number. As indicated, a fuzzy
number is usually taken to be a convex fuzzy subset of the real line.
As one would anticipate, fuzzy arithmetic can be ap- plied to the fuzzy numbers. Using the extension prin-
ciple Zedeh, 1975a,b, the nonfuzzy arithmetic oper- ations can be extended to incorporate fuzzy sets and
fuzzy numbers. Briefly, if is a binary operation such as addition + or min ∧, the fuzzy number z, de-
fined by z=xy, is given as a fuzzy set by
µ
z
w = V
u,v
µ
x
u ∧ µ
y
v, u, v, w ∈ R,
subject to the constraint that w=uv, where µ
x
, µ
y
, and µ
z
denote the membership functions of x, y, and z, respectively and V
u,v
denotes the supremum over u, v. Representative insurance papers that focused on
fuzzy numbers include Lemaire 1990, who showed how to compute a fuzzy premium for a pure en-
dowment policy, Ostaszewski 1993, who extended Lemaire, and Cummins and Derrig 1997, who ad-
dressed the financial pricing of property-liability insurance contracts.
Fig. 9. A fuzzy number.
A.F. Shapiro, R. Paul Gorman Insurance: Mathematics and Economics 26 2000 289–307 295
A large number of potential FL applications in in- surance are mentioned in Ostaszewski 1993. Read-
ers interested in a grand tour of the first 30 years of FL are urged to read the collection of Zadeh’s papers
contained in Yager et al. 1987 and Klir and Yuan 1996.
3.3. Genetic algorithms GAs are automated heuristics that perform opti-
mization by emulating biological evolution. They are particularly well suited for solving problems that in-
volve loose constraints, such as discontinuity, noise, high dimensionality, and multimodal objective func-
tions. Examples of GA applications in the insurance area include Wendt 1995, who used a GA to built
a portfolio efficient frontier a set of portfolios with optimal combinations of risk and returns and Tan
1997, who developed a flexible framework to mea- sure the profitability, risk, and competitiveness of in-
surance products.
GAs can be thought of as an automated, intelligent approach to trial and error, based on principles of nat-
ural selection. In this sense, they are modern succes- sors to Monte Carlo search methods. The flow chart
in Fig. 10 gives a representation of the process.
As indicated, GAs are iterative procedures, where each iteration g represents a generation. The pro-
cess starts with an initial population of solutions, P0, which are randomly generated. From this initial pop-
ulation, the best solutions are “bred” with each other and the worse are discarded. The process ends when
the termination criterion is satisfied.
For a simple example, suppose that the problem is to find by trial and error, the value of x, x=0, 1, . . . ,
Fig. 10. Flow chart of GA.
31, which maximizes fx, where fx is the output of a black box. Using the methodology of Holland 1975,
an initial population of potential solutions {y
j
|j=1, . . . , N
} would be randomly generated, where each so- lution would be represented in binary form. Thus, if
0 and 31 were in this initial population of solutions, they would be represented as 00000 and 11111, re-
spectively.
7
A simple measure of the fitness of y
j
is p
j
= f y
j
P
j
f y
j
, and the solutions with the highest p
j
’s would be bred with one another. There are three ways to develop a new generation
of solutions: reproduction, crossover and mutation. Reproduction adds a copy of a fit individual to the
next generation. In the previous example, reproduc- tion would take place by randomly choosing a solu-
tion from the population, where the probability a given solution would be chosen depends on its p
j
value. Crossover emulates the process of creating children,
and involves the creation of new individuals children from the two fit parents by a recombination of their
genes parameters. In the example, crossover would take place in two steps: first, the fit parents would be
randomly chosen on the basis of their p
j
values; sec- ond, there would be a recombination of their genes.
If, for example, the fit parents were 11000 and 01101, crossover might result in the two children 11001 and
01100. Under mutation, there is a small probability that some of the gene values in the population will
be replaced with randomly generated values. This has the potential effect of introducing good gene values
that may not have occurred in the initial population or which were eliminated during the iterations. In this
illustration, the process is repeated until the new gen- eration has the same number of individuals M as the
current one.
3.4. Hybrid systems While the foregoing discussions focused on each
technology separately, a natural evolution in soft com- puting has been the emergence of hybrid systems,
where the technologies are used simultaneously. FL based technologies can be used to design NNs or GAs,
with the effect of increasing their capability to display good performance across a wide range of complex
problems with imprecise data. Thus, for example, a
7
31=1×2
4
+ 1×2
3
+ 1×2
2
+ 1×2
1
+ 1×2
.
296 A.F. Shapiro, R. Paul Gorman Insurance: Mathematics and Economics 26 2000 289–307
fuzzy NN can be constructed where the NN possesses fuzzy signals andor has fuzzy weights. Conversely,
FL can use technologies from other fields, like NNs or GAs, to deduce or to tune, from observed data, the
membership functions in fuzzy rules, and may also structure or learn the rules themselves.
4. Functional classes