Results Directory UMM :Data Elmu:jurnal:A:Aquaculture:Vol187.Issue1-2.Jul2000:

species, and farming techniques were collected. The third section identified problems related to water and sediment, diseases, and their consequences. The economic section gathers information about inputs, costs, revenue, and production and profit trends. The final section identified social aspects of conflict and resolution. In this analysis, 480 shrimp farms in Vietnam, including 86 semi-intensive and 394 extensive farms, are used. With the purpose of analyzing the cause–effect relationship of environmental and management factors tied to aquaculture disease outbreaks, only information in the first three sections of the questionnaire: site description, farming system, and problem analysis, are used. Data were randomly divided into two sets: an estimation set with 369 observations Ž . about three-quarters of the whole data set used to develop the logistic regression model and the PNN model, and a validation set with 111 observations. The partition of the data was arbitrary, balancing the need to have enough data for parameter estimation in the training data set while maintaining a reasonable number of observations for validation.

3. Results

3.1. Logistic regression The logistic regression was estimated using both forward and backward stepwise procedures with 68 variables that consist of 16 continuous and 52 categorical variables Ž . a complete listing of all the variables can be found in the Appendix . The categorical variables with n attributes were converted into n y 1 binary variables in estimation and they were forced into or out of the regression collectively in one step. The Wald statistics was used for selecting variables to enter and leave the regression. The significance level for entering was set at 0.05 and for deletion at 0.10. The backward procedure is generally considered to be more preferable since the forward approach might exclude some important variables from the model. However, in our case, the results of both the backward and forward approach were similar in terms of the variables selected and the predictive accuracy. In the end, we decided to use the results of the forward approach, as several variables selected with the backward approach were exhibiting the wrong signs and not easily interpretable. Six variables were chosen in the Ž . final model Table 1 . All of them are categorical variables. 2 Ž . The model x value of 204.42, is statistically significant P s 0.0000 , implying that the estimated model, containing the constant and the six explanatory variables, fits the data. In other words, there is a significant relationship between the logarithm of odds of a disease occurrence with the explanatory variables. Coefficients of all six selected variables are significant at the 1 level except that for the variable WATER-SOURCE, there are no significant differences whether the water came directly from sea or through a canal as compared to water from a saltwater creek. The parameter estimates also suggest that, as expected, the effects of POLYCULTURE, DRY POND, and SITE- SELECTION on the logarithm of the odds of a disease occurrence are negative, and the Table 1 Results of the logistic regression model a b Ž . Variables Estimates of b Standard error P-value Exp b Probability POLYCULTURE y1.0961 0.3055 0.0003 0.3342 0.250 DRY POND y1.0393 0.3957 0.0086 0.3537 0.261 IrD-CANAL 1.3984 0.3560 0.0001 4.0486 0.802 WATER-SOURCE: 0.0002 —EstuaryrRiver y2.1598 0.5288 0.0000 0.1154 0.103 —Direct-from-sea 0.0606 0.6094 0.9208 1.0625 0.515 —Canal-from-sea y0.0479 0.3884 0.9018 0.9532 0.488 —Other y1.3885 0.5887 0.0184 0.2495 0.200 SITE-SELECTION y1.6936 0.4026 0.0000 0.1839 0.155 SILT-DEPOSIT 1.0638 0.3001 0.0004 2.8975 0.743 Constant 1.1428 0.5079 0.0244 a Model x 2 s 204.42. b Variables: — POLYCULTURE: yes s1, 0 otherwise; — DRY POND: yes s1, 0 otherwise; — IrD-CANAL: water discharge into intakerdrainage canal; yes s1, 0 otherwise; — WATER-SOURCE: the main saltrbrackish water source. The effect of the four categories in the table are compared to the category of ‘Saltwater creek’. — SITE-SELECTION: site selection to avoid impacts of other users; yes s1, 0 otherwise; — SILT-DEPOSIT: deposit silt on-farm; yes s1, 0 otherwise. effects of SILT-DEPOSIT and IrD-CANAL are positive. IrD-CANAL and SILT-DE- POSIT are the two most influential positive variables affecting the odds of a disease occurrence. The logarithm of the odds of a disease occurrence, after controlling for the effects of other variables, increases by 1.40 and 1.06, for the farms that discharge water into intakerdrainage canal and deposit silt on-farm, respectively. Restated, after control- ling for all other variables, the odds of a disease occurrence increases by 4.05 and 2.90 times for the farms that discharge water into intakerdrainage canal and deposit silt on-farm, respectively. Table 1 also provides the estimated probability of disease occurrence for each explanatory variable when all the other variables are set at 0. For example, the chance of a disease occurrence for farms that discharge water into an intake or drainage canal is about 80 if the farms do not practice polyculture, do not dry ponds, do not exercise careful site selection, do not deposit silt on-farm and obtain their water from saltwater creek. The estimated probability will be higher or lower depending Table 2 Classification accuracy of the logistic regression model 0 denotes ‘‘no disease occurrence’’, 1 denotes ‘‘disease occurrence’’. Estimation subset Validation subset Predicted Percent correct Predicted Percent correct 1 1 Observed 162 33 83.08 45 13 77.59 1 30 144 82.76 9 44 83.02 Overall 82.93 80.18 Table 3 Classification accuracy of the PNN model, using full set of input variables 0 denotes ‘‘no disease occurrence’’, 1 denotes ‘‘disease occurrence’’. Estimation subset Validation subset Predicted Percent correct Predicted Percent correct 1 1 Observed 179 16 91.79 50 8 86.21 1 17 157 90.23 7 46 86.79 Overall 91.06 86.49 on the combination of values of all the other explanatory variables. Similarly, the chance of a disease occurrence is about 74 for farms depositing silt on-farm. On the other hand, the chance of a disease occurrence is quite low, 16, 25, and 26 for farms which exercise careful site selection, practice polyculture, and dry ponds, respectively. Farms that obtain their water from river or estuary seem to have a lower chance of disease occurrence as compared to those obtaining their water from a saltwater creek, directly from the sea or through a canal from the sea. The estimated logistic regression model was then applied to the estimation and validation data sets. The predictive accuracy as applied to each of the data set is shown in Table 2. The table shows the number of farms predicted to have disease outbreak, i.e., farms with estimated probability of disease occurrence of more than 0.5. The estimated model appears to have good predictive power, correctly classifying 82.93 and 80.18 of the observations in the estimation and validation subsets, respectively. 3.2. Probabilistic neural network PNN First we constructed a PNN model using the same estimation data set as in the logistic regression procedure. Then the PNN model was applied to the estimation and validation subsets. Its prediction accuracy is shown in Table 3. Table 4 Classification accuracy of the PNN model, using the same six input variables as in the final logistic regression model 0 denotes ‘‘no disease occurrence’’, 1 denotes ‘‘disease occurrence’’. Estimation subset Validation subset Predicted Percent correct Predicted Percent correct 1 1 Observed 145 50 74.36 41 17 70.69 1 24 150 86.21 12 41 77.36 Overall 79.95 73.87 Recall that only six variables were chosen in the final logistic regression model. These same six variables were used to build another PNN model. Table 4 shows the classification accuracy of this PNN model on the estimation and validation subsets.

4. Discussion