Methodology and Software

5.2 Methodology and Software

5.2.1 Case Study Sites

The ICW study presented in this section relates to 13 ICW systems, which were constructed to treat farmyard runoff and wastewater within the Anne Valley near Waterford in Ireland. The farmyard runoff and waste entering an ICW typically consists of yard and dairy washings and rainfall on open yard and farmyard roofed areas along with silage (usually only spillages) and manure (occasional droppings) effluents. Construction of the ICW systems began in 2000 and was followed by commissioning in February 2001. Scholz et al. (2007) describe these systems and their catchments in detail.

The ICW 3, 9, and 11 were built on dairy farms operated for 50, 55, and 77 cows,

respectively. The corresponding wetland sizes were 10,288, 7,964, and 7,676 m 2 . The ICW 9 and 11 had four cells, while ICW 3 had five cells; all wetland cells had a linear sequential arrangement. The mean ICW size was approx. 1.7 times the size of the farmyard areas. The primary vegetation types planted in the ICW systems were emer- gent species (helophytes). Figures 5.1 and 5.2 show the ICW 11 system in winter.

Figure 5.1 Sedimentation tank of representative integrated constructed wetland system 11 in winter 2006

220 5 Modeling Complex Wetland Systems

Figure 5.2 Inlet arrangement of the first cell of the repre- sentative integrated construct-

ed wetland system 11 in winter 2006

5.2.2 Data and Variables

The ICW data were collected by monitoring the inflow and outflow water qualities of all 13 ICW systems for more than 6 years (August 2001 to December 2007). However, this section is based on only a fraction of the overall data set to address the corresponding aims. Only data obtained from the representative and typical ICW system sites 3, 9, and 11 (characterized by Scholz et al. (2007)) were com- bined and subsequently used in this section because these systems have linear sequential cell configurations and single influent entry points. In contrast, the other ICW have either multiple influent entry points or parallel treatment cells. All three selected ICW sites are typical FCW (specific application of ICW to treat farmyard runoff), previously defined by Carty et al. (2008).

Water samples were analyzed for ammonia–nitrogen, SRP, dissolved oxygen (DO), temperature, pH, chloride, and conductivity according to standard methods (Allen 1974; APHA 1998). Ammonia–nitrogen and chloride were determined using automated colorimetry. Soluble reactive phosphorus was determined as MRP with an auto-analyzer (Method 2540-D; APHA 1998). DO, temperature, pH, and conductivity were measured in the field with portable meters. Scholz et al. (2007) provides a detailed description of the water quality analysis.

The inexpensive and easy-to-measure SOM input water quality variables of the outflow were DO (mg/l), temperature (ºC), pH (–), chloride (mg/l), and conductiv- ity ( μS). The corresponding expensive and time-consuming-to-measure model output parameters were outflow ammonia–nitrogen (mg/l) and SRP (mg/l).

5.2.3 Statistical Analyses

All statistical analyses were performed using the standard software packages Origin 7.0, Matlab 7.0, and Econometrics Views 5.0. Significant differences

5.2 Methodology and Software 221

(usually p < 0.05, unless stated otherwise) between data sets are indicated where appropriate.

5.2.4 Self-organizing Map

The SOM is a neural network model and algorithm that implements a characteris- tic non-linear projection from the high-dimensional space of sensory or other input signals onto a low-dimensional array of neurons and has been widely applied to the visualization of dimensional systems and data mining (Kohonen et al. 1996). The SOM is a competitive learning neural network and based on unsupervised learning, which means that no human intervention is required during the learning process and that little needs to be known about the characteristics of the input data (Alhoniemi et al. 1999).

In the SOM algorithm, the topological relations and the number of neurons or nodes are fixed from the beginning. Each neuron i is represented by an n-dimen- sional weight, or model vector m i = [m i1 ,…,m in ] (n, dimension of the input vectors). Each neuron contains a weight vector. At the start of the model, the weight vectors are initialized to random values. During the training, the weight vectors are calcu- lated using some distance measure such as the Euclidian distance, which is defined in Equation 5.1.

i = ∑ ( x ij − m ij );i = 1, 2, ..., M, (5.1)

where

D i = Euclidian distance between the input vector and the weight vector m; x ij = jth element of the current input vector; m ij = jth element of the weight vector m; M = number of neurons in the SOM; and n = dimension of the input vectors.

Node c (Equation 5.2), whose weight vector is closest to the input vector, is chosen as the best matching unit (BMU). When the BMU is found, the weight vectors m i are updated. The BMU and its topological neighbors are moved closer to the input vector. The update rule of the weight vector is shown in Equation 5.3.

xm − c = min { xm − i } , (5.2)

where x = input vector;

m = weight vector; and = a distance measure.

mt i ( + 1) = mt i () + α () thtxt ci ( )[ ( ) − mt t ( )] , (5.3)

5 Modeling Complex Wetland Systems

where m (t) = weight vector indicating the output unit’s location in the data space at time t ;

α (t) = learning rate at time t ;

h ci (t) = neighborhood function centered in the winner unit c at time t ; and x (t) = input vector drawn from the input data set at time t.

After this competitive learning exercise, the clusters corresponding to charac- teristic features can be shown on the map. The quality of the mapping is usually measured with a quantization error and a topographic error. The learning rate and neighborhood radius were set with default values. The default number of neurons was determined by the heuristic Equation 5.4. The ratio between side lengths of the map grid was set to the square root of the ratio of the two highest eigenvalues of the data sample (Vesanto et al. 2000).

M ≈ 5 n , (5.4) where

M = number of neurons i and n = total number of data samples.

A 2-D lattice with a map size of M = 14 × 7 hexagonal units was used for both ammonia–nitrogen and SRP modeling. The final quantization and topographic errors were 8.852 and 0.096, and 6.541 and 0.123 for ammonia–nitrogen and SRP, respectively. These values were relatively low if compared to the error values with other parameter settings, indicating that the quality of the mappings was relatively good.

Since the codebook vectors of the SOM represent the local mean of the input vector, the SOM can be used for the prediction of missing components of an input vector. A prediction can be made by seeking the BMU for a vector with unknown components. The predicted values can be obtained from the BMU. The application of the SOM for prediction purposes is illustrated in Figure 5.3.

Figure 5.3 Predicting missing components of the input vector using a self-organizing map

5.3 Results and Discussion 223

The model is trained using the training data set, which is removed from the vector to predict a set of variables as part of an input vector. The depleted vector is subsequently presented to the SOM to identify its BMU. The values for the miss- ing variables are then obtained by their corresponding values in the BMU (Rustum et al. 2008).

Lee and Scholz (2006) applied an SOM model to elucidate heavy metal re- moval mechanisms and to predict heavy metal concentrations in experimental constructed wetlands. The results demonstrated that heavy metals could be effi- ciently estimated by utilizing the SOM model.

The SOM toolbox (version 2) for Matlab 7.0 developed by the Laboratory of Computer and Information Science at Helsinki University of Technology was used in this study. The toolbox is available online at http://www.cis.hut.fi/projects/ somtoolbox (Vesanto et al. 1999). The SOM model was applied to ammonia– nitrogen and SRP removal data to better understand the corresponding removal mechanisms in ICW.