generates Q as Q
j
¼ f
¯ w
jk
þ X
I i ¼ 1
w
ijk
x
i
k ¼ 1 2a
Second, ANN presents Q to the output layer to generate y as
y
j
¼ f
¯ w
jk
þ X
J i ¼ 1
w
ijk
Q
i
k ¼ 2 2b
Eqn 2b is the ANN response y to x, and y may be expressed as
y
j
¼ G
j
¯ w
,
w
,
x 3a
G ·
¼ G
1
· ,
G
2
· , …,
G
K
·
T
3b
where G· ¼ underlying x–y response vector approximated by ANN; and G
j
· ¼ jth component of G·. Thus, ANN may be viewed to follow the belief that intelligence
manifest itself from the communication of a large number of simple processing elements.
7
The transfer function, f·, is usually selected to be a non- linear, smooth, and monotonically increasing function, and
the two common forms are f
· ¼
sgm ·
¼ 1
1 þ exp ·
4a f
· ¼
tanh ·
¼ exp
· ¹
exp ¹
· exp
· þ
exp ¹
· 4b
where sgm· ¼ sigmoid function and tanh· ¼ hyperbolic tangent function. Fig. 1b shows these functions which are
generally assumed to be equally applicable.
2
3 ARTIFICIAL NEURAL NETWORK DEVELOPMENT
ANN training is performed to determine the weights, ¯ w and
w, optimally, and training is performed using an appropriate algorithm. Several algorithms exist to perform the training,
and each algorithm is based on one of three approaches: unsupervised training, supervised training, or reinforced
training; for example, the Hebbian, back-propagation, and genetic algorithms are based on the unsupervised, super-
vised, and reinforced approaches, respectively.
12
In the next section, the ANN training procedure and two training
algorithms are discussed.
3.1 Training procedure
ANN is developed in two phases: training accuracy or cali- bration phase, and testing generalization or validation
phase. In general, a subset of patterns is first sampled from the domain into a training and testing subset, S, and
S is then exhausted by allocating its patterns to a training subset, S
1
, and a testing subset, S
2
. The notations used are
d ¼
[ d
1
, d
2
, …, d
K
] 5a
S
1
{ x
1
,
d
1
,
x
2
,
d
2
, …,
x
P
1
,
d
P
1
} 5b
S
2
¼ {
x
1
,
d
1
,
x
2
,
d
2
, …,
x
P
2
,
d
P
2
} 5c
where d ¼ desired output vector of K components corresponding to x; S
1
¼ training subset with P
1
patterns; and S
2
¼ testing subset with P
2
patterns. As S
2
represents the domain partially, testing as a synonym for validation
should be viewed with caution. In the ANN training phase, the objective is to determine
the ¯ w and w that minimizes a specific error criterion defined
to measure an average difference between the desired responses and ANN responses for the P
1
patterns contained in S
1
. As such, ANN training becomes an unconstrained, nonlinear, optimization problem in the weight space, and
an appropriate algorithm may be used to solve this problem. In the ANN testing phase, the objective is to determine the
acceptability of the ¯ w and w, thus obtained, in minimizing
the same error criterion for the P
2
patterns contained in S
2
. As the S
2
set is not used in determining ¯ w and w, ANN
testing assesses domain generalization achieved by the trained ANN and, thereby, builds confidence levels
expected from the trained ANN for future predictions. In the next section, the two most commonly used training algo-
rithms, back-propagation algorithm BPA and genetic algorithm GA, will be discussed.
3.2 Back-propagation algorithm
Rumelhart et al.
7
presented the standard BPA, a gradient- based algorithm. Since then, BPA has undergone many
modifications to overcome its limitations, and NeuralWare, Inc.
12
presents many such modifications. In general, BPA
determines ¯ w and w in two steps. First, BPA initializes ¯
w and w with small, random values. Second, BPA starts updat-
ing ¯ w and w using S
1
. During an update from m ¹ 1 to m m ¼ updating index, BPA selects a random integer p [ [1,
P
1
] and uses x
p
, d
p
to minimize the mean squared error function defined as
E
m p
¼ 1
2 X
K i ¼ 1
y
m ¹ 1 pi
¹ d
pi
ÿ
2
6a y
m pi
¼ G
i
¯ w
m
,
w
m
,
x
p
ÿ 6b
where E
m p
¼ mean squared error of x
p
, d
p
after mth
update; ¯ w
m
¼ ¯
w after mth update; w
m
¼ w after mth
update; and d
pj
¼ d
j
for ðx
p
; d
p
Þ . As such, the updating
equations are written as ¯
w
m jk
¼ ¯
w
m ¹ 1 jk
¹ m
]E
m p
] ¯ w
m ¹ 1 jk
þ y ¯
w
m ¹ 1 jk
¹ ¯
w
m ¹ 2 jk
ÿ m [
, 1
, y [
[ ,
1 ]
ð 7aÞ
148 J. Morshed, J. J. Kaluarachchi
w
m ijk
¼ w
m ¹ 1 ijk
¼ m
]E
m p
]w
m ¹ 1 ijk
þ y w
m ¹ 1 jk
¹ w
m ¹ 2 jk
ÿ m [
, 1
, y [
[ ,
1 ]
ð 7bÞ
where m ¼ training rate and y ¼ momentum factor. Note y ¼ 0 for m ¼ 1. BPA continues updating ¯
w and w until
m ¼ ¯ M, where ¯
M is a user-defined number.
3.3 Genetic algorithm