Training procedure Back-propagation algorithm

generates Q as Q j ¼ f ¯ w jk þ X I i ¼ 1 w ijk x i k ¼ 1 2a Second, ANN presents Q to the output layer to generate y as y j ¼ f ¯ w jk þ X J i ¼ 1 w ijk Q i k ¼ 2 2b Eqn 2b is the ANN response y to x, and y may be expressed as y j ¼ G j ¯ w , w , x 3a G · ¼ G 1 · , G 2 · , …, G K · T 3b where G· ¼ underlying x–y response vector approximated by ANN; and G j · ¼ jth component of G·. Thus, ANN may be viewed to follow the belief that intelligence manifest itself from the communication of a large number of simple processing elements. 7 The transfer function, f·, is usually selected to be a non- linear, smooth, and monotonically increasing function, and the two common forms are f · ¼ sgm · ¼ 1 1 þ exp · 4a f · ¼ tanh · ¼ exp · ¹ exp ¹ · exp · þ exp ¹ · 4b where sgm· ¼ sigmoid function and tanh· ¼ hyperbolic tangent function. Fig. 1b shows these functions which are generally assumed to be equally applicable. 2 3 ARTIFICIAL NEURAL NETWORK DEVELOPMENT ANN training is performed to determine the weights, ¯ w and w, optimally, and training is performed using an appropriate algorithm. Several algorithms exist to perform the training, and each algorithm is based on one of three approaches: unsupervised training, supervised training, or reinforced training; for example, the Hebbian, back-propagation, and genetic algorithms are based on the unsupervised, super- vised, and reinforced approaches, respectively. 12 In the next section, the ANN training procedure and two training algorithms are discussed.

3.1 Training procedure

ANN is developed in two phases: training accuracy or cali- bration phase, and testing generalization or validation phase. In general, a subset of patterns is first sampled from the domain into a training and testing subset, S, and S is then exhausted by allocating its patterns to a training subset, S 1 , and a testing subset, S 2 . The notations used are d ¼ [ d 1 , d 2 , …, d K ] 5a S 1 { x 1 , d 1 , x 2 , d 2 , …, x P 1 , d P 1 } 5b S 2 ¼ { x 1 , d 1 , x 2 , d 2 , …, x P 2 , d P 2 } 5c where d ¼ desired output vector of K components corresponding to x; S 1 ¼ training subset with P 1 patterns; and S 2 ¼ testing subset with P 2 patterns. As S 2 represents the domain partially, testing as a synonym for validation should be viewed with caution. In the ANN training phase, the objective is to determine the ¯ w and w that minimizes a specific error criterion defined to measure an average difference between the desired responses and ANN responses for the P 1 patterns contained in S 1 . As such, ANN training becomes an unconstrained, nonlinear, optimization problem in the weight space, and an appropriate algorithm may be used to solve this problem. In the ANN testing phase, the objective is to determine the acceptability of the ¯ w and w, thus obtained, in minimizing the same error criterion for the P 2 patterns contained in S 2 . As the S 2 set is not used in determining ¯ w and w, ANN testing assesses domain generalization achieved by the trained ANN and, thereby, builds confidence levels expected from the trained ANN for future predictions. In the next section, the two most commonly used training algo- rithms, back-propagation algorithm BPA and genetic algorithm GA, will be discussed.

3.2 Back-propagation algorithm

Rumelhart et al. 7 presented the standard BPA, a gradient- based algorithm. Since then, BPA has undergone many modifications to overcome its limitations, and NeuralWare, Inc. 12 presents many such modifications. In general, BPA determines ¯ w and w in two steps. First, BPA initializes ¯ w and w with small, random values. Second, BPA starts updat- ing ¯ w and w using S 1 . During an update from m ¹ 1 to m m ¼ updating index, BPA selects a random integer p [ [1, P 1 ] and uses x p , d p to minimize the mean squared error function defined as E m p ¼ 1 2 X K i ¼ 1 y m ¹ 1 pi ¹ d pi ÿ 2 6a y m pi ¼ G i ¯ w m , w m , x p ÿ 6b where E m p ¼ mean squared error of x p , d p after mth update; ¯ w m ¼ ¯ w after mth update; w m ¼ w after mth update; and d pj ¼ d j for ðx p ; d p Þ . As such, the updating equations are written as ¯ w m jk ¼ ¯ w m ¹ 1 jk ¹ m ]E m p ] ¯ w m ¹ 1 jk þ y ¯ w m ¹ 1 jk ¹ ¯ w m ¹ 2 jk ÿ m [ , 1 , y [ [ , 1 ] ð 7aÞ 148 J. Morshed, J. J. Kaluarachchi w m ijk ¼ w m ¹ 1 ijk ¼ m ]E m p ]w m ¹ 1 ijk þ y w m ¹ 1 jk ¹ w m ¹ 2 jk ÿ m [ , 1 , y [ [ , 1 ] ð 7bÞ where m ¼ training rate and y ¼ momentum factor. Note y ¼ 0 for m ¼ 1. BPA continues updating ¯ w and w until m ¼ ¯ M, where ¯ M is a user-defined number.

3.3 Genetic algorithm