differ in their output ranges Fig. 1b. The output range of sgm· is 0, 1 while the output range of tanh· is ¹ 1,
þ 1. As BPA uses the output of a transfer function as a
multiplier in the weight update, sgm· produces a small multiplier when the summation is small, and vice versa.
Therefore, there is a bias towards training higher desired outputs. In contrast, tanh· produces equal multipliers
when the summation is either small or large and, therefore, tanh· leads to no bias towards training lower or higher
desired outputs.
The ANN results are further studied to determine the effect of the number of hidden nodes on the weight distribu-
tion. Fig. 5 shows the weight histograms of 1-J-1 ANN for J ¼ {3, 15, 30}, and the weights are observed to decrease
with increasing J. As J is increased, more terms are consid- ered in the argument of a output neuron eqn 2b, and its
weighted sum becomes very small or large if the weights do not decrease. At a very small or a large sum, the output of a
transfer function is essentially independent of the sum Fig. 1b, and the ANN may fail to approximate the input–
output response robustly. As such, BPA decreases weights with increasing J to train the ANN robustly and makes ANN
insensitive to J over a wide range. However, this observa- tion contradicts recent observations by others suggesting
poor ANN testing with increasing J beyond an optimal value.
4-5
As such, the present example suggests that ANN is sufficiently, but not completely, insensitive to J around
the optimal value.
6.2 Example 2: simulation of BTC with nonlinear adsorption
In the previous example, the applicability of ANN in pre- dicting C
was studied when the mass transport simulation is simple due to linear solid phase adsorption. In the present
example, the applicability of ANN in predicting C is
studied when the mass transport is complicated due to non- linear solid phase adsorption. As such, ANN is attempted to
simulate C
with n ¼ {0.5, 1, 2} Fig. 2. In this case, C is a
nonlinear nonmonotonic function of continuous t and dis- crete n, and helps to assess the applicability of ANN in
simulating such a function. As the inputs are only t and n I ¼ 2, a hidden layer of five neurons J ¼ 5 is used,
3
and a 2-5-1 ANN is used to simulate C . In predicting C
for 160 days, HYDRUS generated a total of 5405 patterns of
C, t, n. These patterns are included in S, and the allocation method with ˜f ¼ 0.5 and r ¼ 1 was used for allocating S to
S
1
and S
2
. The ANN is trained using the tanh· transfer function.
Fig. 6 shows the predicted C , and the results show that
ANN failed to predict C satisfactorily. For n ¼ {0.5, 1.0},
ANN introduces large errors in the predicted spread and maximum concentration and predicts the desired responses
qualitatively. However, for n ¼ 2, ANN prediction is poor both qualitatively and quantitatively. In the present exam-
ple, C
varies more rapidly with t than with n, and ANN finds it difficult to cope with these different paces of
variations. In order to improve the robustness of ANN, the sensitivity of ANN to ˜f ¼ {0.25, 0.5, 0.75}, r ¼ {1, 5, 257},
and J ¼ {3, 5, 10} were analyzed, and the results are summarized in Table 2. For ˜f ¼ {0.25, 0.5, 0.75}, ANN
performance increases with increasing ˜f. As ˜f increases, ANN receives more patterns for training, acquires a clearer
picture of the domain, and improves its generalization. For r ¼ {1, 5, 257}, ANN performance is intermediate for r ¼ 1,
worst for r ¼ 5, and best for r ¼ 257. As r varies, ANN acquires different weight configurations due to the
0.5 0.4
0.1
Relati v
e frequenc y
Ð1 Ð1
Ð 5
10 15
0.3 0.2
Weight J = 3
15 30
Fig. 5. Weight histogram of 1-J-1 ANN for predicting the break-
through concentration of Example 1
16 12
8
2
Concentration µ
gL
20 40
60 80
100 120
140 160
Time days ANN
Hydrus
14 10
4 6
MCL n = 1
0.5
2
Fig. 6. BTC of Example 2 using HYDRUS and 2-5-1 ANN with
different n values
Table 2. Sensitivity of 2-5-1 ANN to expected fraction ˜f, seed r, and hidden nodes J for Example 2
˜f r
J R
0.25 1
5 0.875
0.50 1
5 0.896
0.75 1
5 0.914
0.50 1
5 0.896
0.50 5
5 0.867
0.50 257
5 0.901
1 0.50
3 0.867
1 0.50
5 0.896
1 0.50
10 0.912
ANN and algorithms in flow and transport simulations 153
entrapment of BPA at different local optimal solutions. For J ¼ {3, 5, 10}, ANN performance increases with increasing
J. As J increases, ANN acquires more freedom for approximating C
. Also, this observation suggests that J
opt
. 10 . J
HN
ð J
opt
¼ J at optimal performance and
J
HN
¼ J recommended by Hecht-Nielsen
3
for the present example.
In investigating this problem, an interesting phenomenon is noted; i.e. t is changing more rapidly than n. In these
scenarios, t is the primary, independent variable and is sup- posed to change more rapidly compared to other parameters
such as n, and the weights may fail adjusting to these inputs with disproportionate variations. Although this problem
may be handled to some extent by increasing J, this approach will lead to larger ANN, a larger optimization
problem, and a greater difficulty in training. Alternatively, an innovative ANN architecture may be considered. Instead
of simulating C
as a continuos function of t, ANN may be used to simulate C
as a discrete function of time. Notation- ally, the continuous function, C
¼ C
t, n, may be replaced by the discrete function, C
¼ {C
t
i
, n, i ¼ 1, 2,.., T}, and an 1-3-T ANN may be used to simulate this discrete func-
tion. As such, t is distributed across the output layer, and this approach resembles the concept of distributed input–output
representation.
6.3 Example 3: use of ANN to simulate BTC parameters