Correlation Coefficients Top-Down Correlation for Comparison of Sensitivity Analysis Methods

Two-Dimensional Probabilistic Risk Assessment Model 1515 Table I. Input Variables and Corresponding Mean Values, 95 Probability Ranges, and Distribution Types in the Growth Estimation Part of the E. coli Model 8 Distribution Type 95 Probability Factor Mean Value a Range a Variability Uncertainty b Unit Storage temperature, retail Temp1 8.7 7.7, 13.1 EDF c Beta ◦ C Storage temperature, transportation Temp2 9.4 7.7, 15.5 EDF c Beta ◦ C Storage temperature, home Temp3 9.1 7.7, 19 EDF c Beta ◦ C Storage time, retail Time1 23.4 0.5, 92.6 Exponential Uniform hr Storage time, transportation Time2 1.0 0, 2.0 EDF c Beta hr Storage time, home Time3 25.2 0.6, 99.0 Exponential Uniform hr Maximum density MD 7.5 5.4, 9.6 Triangle d Uniform log 10 Lag period, retail LP1 73.5 19.6, 136.1 Exponential Normal hr Lag period, transportation LP2 64.9 12.5, 131.5 Exponential Normal hr Lag period, home LP3 72.0 8.2, 137.9 Exponential Normal hr Generation time, retail GT1 9.9 2.8, 15.9 Exponential Normal hr Generation time, transportation GT2 8.7 1.9, 15.5 Exponential Normal hr Generation time, home GT3 9.8 1.3, 16.0 Exponential Normal hr a Values are based on the comingled analysis of variability and uncertainty see Section 3.4.2. b Uncertainty distributions are defined for parameters of variability distributions. c Empirical distribution function EDF based on FSIS data. 8 Beta distribution is used to define the corresponding cumulative frequency at each temperature as: C F i = BetaINV α, β where, i = Cumulative rank of data associated with a temperature. CF i = Cumulative frequency at value i of the empirical distribution. BetaINV = Inverse of a beta distribution. α = Parameter α of the beta distribution: i k= 1 n k . β = Parameter β of the beta distribution: n T − i k= 1 n k + 1. n i = Number of available data at the ith value of storage temperature. n T = Total number of available survey data. d Uncertainty is defined for the most likely value of the triangular distribution. the main effect of the jth level of Factor B, and the in- teraction effect between the two factors. If additional factors are involved in the analysis, the concept will be the same. ANOVA uses the F-test to determine whether a statistically significant difference exists among mean responses for main effects or interactions between factors. The F-test is used to test the significance of each main and interaction effect. For the example of Equation 2, the estimators of F values for main and interaction effects are given in Table II. 35 The rela- tive magnitude of F values can be used to rank the fac- tors in sensitivity analysis. 37 The higher the F value, the more sensitive the response variable is to the fac- tor. Therefore, factors with higher F values are given higher rankings. The coefficient of determination, R 2 , is used to judge the adequacy of the ANOVA model. R 2 mea- sures the proportionate reduction of total variation in the response variable Y associated with the use of selected main and interaction effects of factors in the ANOVA model. 33–35 Although the F values calculated for each effect indicate the statistical sig- nificance of corresponding effect, the coefficient of determination indicates whether the selected effects adequately capture variability in the output. Gen- erally, a high R 2 value implies that results are not compromised by incomplete specification of effects or by inappropriate definition of the levels for a fac- tor. When ANOVA is applied to two-dimensional simulations of variability and uncertainty, a CDF graph for R 2 indicates uncertainty in the adequacy of the ANOVA model for multiple realizations of the model.

3.2. Correlation Coefficients

Correlation-based sensitivity analysis methods are widely used in practice, and thus are considered 1516 Mokhtari and Frey Table II. ANOVA Table for a Two-Factor Model Source of Variation Sum of Square a Degree of Freedom b Mean Square F Value c Factor A SSA = nb ¯ Y i.. − ¯ Y ... 2 a − 1 MSA = SSA a − 1 F A = MSA MSE Factor B SSB = na ¯ Y . j. − ¯ Y ... 2 b − 1 MSB = SSB b − 1 F B = MSB MSE Interaction SSAB = n ¯ Y i j. − ¯ Y i.. − ¯ Y . j. + ¯ Y ... 2 a − 1 × b − 1 MSAB = SSAB a − 1 × b − 1 F AB = MSAB MSE Error SSE = Y i j k − ¯ Y ij. 2 a × b × n − 1 MSE = SSE a × b × n − 1 Total SSTO = Y ijk − ¯ Y ... 2 n × a × b − 1 a ¯ Y i.. = b j = 1 n k= 1 Y i j k b × n , ¯ Y . j. = a i = 1 n k= 1 Y i j k a × n , ¯ Y ... = a i = 1 b j = 1 n k= 1 Y i j k a × b × n . b a = number of levels for Factor A; b = number of levels for Factor B; n = number of values for the response variable. c 5 significance level is considered for statistically significant F values. as a useful benchmark for comparison with ANOVA. The Pearson correlation coefficient PCC, for in- stance, can be used to characterize the degree of linear relationship between the output values and sampled values of individual inputs. If the relationship between an input and an output is nonlinear but monotonic, Spearman correlation coefficients SCC provide bet- ter performance. 38–40 SCCs are based on ranks, not sample values, of each input and output. Neither PCCs nor SCCs can provide insight regarding possible in- teraction effects between inputs. The magnitude of a PCC or SCC is typically used as a basis to rank order model inputs based on their influence on the output.

3.3. Top-Down Correlation for Comparison of Sensitivity Analysis Methods

When applying different sensitivity analysis methods to a case study, an important question is whether different techniques agree in their identifica- tion of important inputs. The so-called top-down cor- relation method is useful for this purpose. Details of the method are available elsewhere. 36,41 This method gives greater weight to agreement or disagreement in rank ordering of the most important inputs and gives less weight to comparisons in ranks of inputs with low importance. Large positive values for the top-down correlation result indicate agreement between two sets of ranks for the most important inputs.

3.4. Probabilistic Analysis Scenarios