1 1 Example 2: simulation of BTC with nonlinear adsorption

differ in their output ranges Fig. 1b. The output range of sgm· is 0, 1 while the output range of tanh· is ¹ 1, þ 1. As BPA uses the output of a transfer function as a multiplier in the weight update, sgm· produces a small multiplier when the summation is small, and vice versa. Therefore, there is a bias towards training higher desired outputs. In contrast, tanh· produces equal multipliers when the summation is either small or large and, therefore, tanh· leads to no bias towards training lower or higher desired outputs. The ANN results are further studied to determine the effect of the number of hidden nodes on the weight distribu- tion. Fig. 5 shows the weight histograms of 1-J-1 ANN for J ¼ {3, 15, 30}, and the weights are observed to decrease with increasing J. As J is increased, more terms are consid- ered in the argument of a output neuron eqn 2b, and its weighted sum becomes very small or large if the weights do not decrease. At a very small or a large sum, the output of a transfer function is essentially independent of the sum Fig. 1b, and the ANN may fail to approximate the input– output response robustly. As such, BPA decreases weights with increasing J to train the ANN robustly and makes ANN insensitive to J over a wide range. However, this observa- tion contradicts recent observations by others suggesting poor ANN testing with increasing J beyond an optimal value. 4-5 As such, the present example suggests that ANN is sufficiently, but not completely, insensitive to J around the optimal value.

6.2 Example 2: simulation of BTC with nonlinear adsorption

In the previous example, the applicability of ANN in pre- dicting C was studied when the mass transport simulation is simple due to linear solid phase adsorption. In the present example, the applicability of ANN in predicting C is studied when the mass transport is complicated due to non- linear solid phase adsorption. As such, ANN is attempted to simulate C with n ¼ {0.5, 1, 2} Fig. 2. In this case, C is a nonlinear nonmonotonic function of continuous t and dis- crete n, and helps to assess the applicability of ANN in simulating such a function. As the inputs are only t and n I ¼ 2, a hidden layer of five neurons J ¼ 5 is used, 3 and a 2-5-1 ANN is used to simulate C . In predicting C for 160 days, HYDRUS generated a total of 5405 patterns of C, t, n. These patterns are included in S, and the allocation method with ˜f ¼ 0.5 and r ¼ 1 was used for allocating S to S 1 and S 2 . The ANN is trained using the tanh· transfer function. Fig. 6 shows the predicted C , and the results show that ANN failed to predict C satisfactorily. For n ¼ {0.5, 1.0}, ANN introduces large errors in the predicted spread and maximum concentration and predicts the desired responses qualitatively. However, for n ¼ 2, ANN prediction is poor both qualitatively and quantitatively. In the present exam- ple, C varies more rapidly with t than with n, and ANN finds it difficult to cope with these different paces of variations. In order to improve the robustness of ANN, the sensitivity of ANN to ˜f ¼ {0.25, 0.5, 0.75}, r ¼ {1, 5, 257}, and J ¼ {3, 5, 10} were analyzed, and the results are summarized in Table 2. For ˜f ¼ {0.25, 0.5, 0.75}, ANN performance increases with increasing ˜f. As ˜f increases, ANN receives more patterns for training, acquires a clearer picture of the domain, and improves its generalization. For r ¼ {1, 5, 257}, ANN performance is intermediate for r ¼ 1, worst for r ¼ 5, and best for r ¼ 257. As r varies, ANN acquires different weight configurations due to the 0.5 0.4 0.1 Relati v e frequenc y Ð1 Ð1 Ð 5 10 15 0.3 0.2 Weight J = 3 15 30 Fig. 5. Weight histogram of 1-J-1 ANN for predicting the break- through concentration of Example 1 16 12 8 2 Concentration µ gL 20 40 60 80 100 120 140 160 Time days ANN Hydrus 14 10 4 6 MCL n = 1 0.5 2 Fig. 6. BTC of Example 2 using HYDRUS and 2-5-1 ANN with different n values Table 2. Sensitivity of 2-5-1 ANN to expected fraction ˜f, seed r, and hidden nodes J for Example 2 ˜f r J R 0.25 1 5 0.875

0.50 1

5 0.896 0.75 1 5 0.914

0.50 1

5 0.896 0.50 5 5 0.867 0.50 257 5 0.901 1 0.50 3 0.867 1 0.50 5 0.896 1 0.50 10 0.912 ANN and algorithms in flow and transport simulations 153 entrapment of BPA at different local optimal solutions. For J ¼ {3, 5, 10}, ANN performance increases with increasing J. As J increases, ANN acquires more freedom for approximating C . Also, this observation suggests that J opt . 10 . J HN ð J opt ¼ J at optimal performance and J HN ¼ J recommended by Hecht-Nielsen 3 for the present example. In investigating this problem, an interesting phenomenon is noted; i.e. t is changing more rapidly than n. In these scenarios, t is the primary, independent variable and is sup- posed to change more rapidly compared to other parameters such as n, and the weights may fail adjusting to these inputs with disproportionate variations. Although this problem may be handled to some extent by increasing J, this approach will lead to larger ANN, a larger optimization problem, and a greater difficulty in training. Alternatively, an innovative ANN architecture may be considered. Instead of simulating C as a continuos function of t, ANN may be used to simulate C as a discrete function of time. Notation- ally, the continuous function, C ¼ C t, n, may be replaced by the discrete function, C ¼ {C t i , n, i ¼ 1, 2,.., T}, and an 1-3-T ANN may be used to simulate this discrete func- tion. As such, t is distributed across the output layer, and this approach resembles the concept of distributed input–output representation.

6.3 Example 3: use of ANN to simulate BTC parameters