Discussion Directory UMM :Data Elmu:jurnal:L:Livestock Production Science:Vol66.Issue1.Sept2000:

C .W. Rougoor et al. Livestock Production Science 66 2000 71 –83 79 ‘Age at Calving’ was not used by the model because to the PCR-model, no significance values are given all path coefficients to and from this factor were here, because traditional statistical testing methods smaller than 0.20. Table 3 and Fig. 2 show that milk are not well suited. The Stone–Geisser test criterion 2 production was higher on farms with managers who Q was used as an alternative method to evaluate the thought that ‘milk production per cow’ was a CSF model. It had a value of 0.31 indicating that the for their farm. At these farms the breeding value for model had predictive relevance, because it was conformation was higher. The breeding goal of the bigger than zero. The same main results as with PCR producer indicated, however, that these producers put were found with PLS. Small differences were found relatively much emphasis on the quality of the udder in the relation between the synthetic factors ‘Natural and less on the kg of milk. Service Sires’ and ‘Breeding Value Conformation’. PCR found a path coefficient of 0.25, whereas in the 3.3. Partial Least Squares PLS PLS-model it was smaller than 0.20 and therefore deleted. This indicates that a high percentage of Table 4 provides the factor loadings for each of natural services at the farm has a relatively strong 2 the measures. The R of each synthetic factor, the negative effect on the breeding value for production variance extracted for each variable, and the average and a smaller negative effect on breeding value variance extracted for each synthetic factor are conformation. Besides that, in the PLS-model, direct given. The factor loadings show that the variable effects of the synthetic factor ‘Critical Success ‘Winter milk’ is the most important variable of the Factors’ on ‘Breeding Value Production’ and ‘Natural synthetic variable ‘Critical Success Factors’. The Service Sires’ were found, whereas in the PCR- positive and negative signs of the two variables in model these path coefficients were too small. the synthetic variable ‘Breeding Goal Producer’ show that a farmer who has a high score on this synthetic factor has said that the udder is an im-

4. Discussion

portant breeding goal at his farm, whereas kg of milk is not. In this model, the age at calving was also not 4.1. Breeding management used, because the path coefficient was here also 2 lower than 0.20. The R of the synthetic factor ‘Milk The path coefficient diagrams Figs. 2 and 3 Production’ shows that the model explained 47 of showed the same main effects. Milk production per the differences in milk production. cow was inverse related to farm size the regression Fig. 3 gives a graphical representation of the coefficients and loadings were negative for this PLS-model with the inner path coefficients. Contrary synthetic factor. Milk production per cow was Fig. 3. Structural path coefficients for PLS-modelling. 80 C .W. Rougoor et al. Livestock Production Science 66 2000 71 –83 directly positively related to breeding value for 4.2. Comparing the methods conformation, and to breeding value for production. These variables, in turn, were related to goals and Wold 1985 states that PLS is useful when the CSFs of the producer, indicating that milk pro- main focus of the study shifts from individual duction is not only related to technical parameters, variables and parameters to packages of variables but also to the attitude of the producer. So, with and aggregate parameters. He stated that ‘in large, respect to the aim of the data collection to determine complex models with latent variables PLS is virtual- the relationship between breeding management and ly without competition’. Rossa 1982 showed a map 305-day milk production, it can be concluded that of statistical methods with regard to the complexity the producers’ breeding management was related of the problem and their degree of prior information with the 305-day milk production. Surprisingly, it and concluded that PCR and PLS are both useful for was found that farmers who stated that they focused complex problems. However, for PLS-modelling mainly on ‘kg of milk’ as a breeding goal, had a more prior information is needed, because the re- lower breeding value for milk production and they searcher has to design a path diagram with expected realised a lower 305-day production than producers relationships on forehand. who stated that they also took into account ‘udder’ Helland and Almøy 1994 compared PCR and into their breeding strategy. A second aspect that PLS and concluded that there is not one method that comes forward is the use of natural services sires. dominates the other, and that the difference between Table 1 shows that natural services were rarely used the methods is typically small when the number of by the producers in the research group: only 3 of observations is large. PCR does well when the eigen the cows was inseminated with natural service sires. values from the irrelevant components are extremely However, it still was related with the breeding value. small or extremely large. PLS does well for inter- Producers who made more use of artificial insemina- mediate irrelevant eigen values Helland and Almøy, tion, had cows with a higher breeding value and, 1994. In case of multicollinearity, the eigen values related with that, a higher 305-day milk production. might not be dominating ones. In that case PLS The CSFs of the producer were related to the becomes closer to ordinary least squares, which is a breeding goal of the producer, which in turn was desirable property of PLS. Garthwaite 1994 com- related to the breeding value for production through pared PLS with four other methods, including PCR, the selection decision. Differences between PCR and and concluded that PLS is a useful method for PLS came out for the synthetic factor ‘Breeding forming prediction equations when there are a large Value Conformation’. The underlying variables of number of explanatory variables. 2 this factor were highly related to each other correla- The R of the milk production models differed tions between 0.65 and 0.94. PLS deals with that by considerably between the two methodologies: 0.36 making one synthetic factor out of it, which has a for the PCR-model and 0.47 for the PLS-model. This high loading on all these variables. PCR, in turn, can be explained by differences in optimizing tech- tries to minimize multicollinearity by taking one niques employed in deriving the synthetic factors. variable more into account than the other one. Table PLS forms the synthetic factors by using the co- 2 shows that especially ‘Breeding Value Legs’ and variance between the X- and Y-variables already, ‘Breeding Value Type’ were included in this factor in whereas with PCR the PCs are formed based on the the PCR. Because the factor ‘Breeding Value Con- X-variables only. As a result of that, the synthetic formation’ was built up differently in the two factors in PLS explain differences in the Y-variable models, the relationships towards the other synthetic better than PCR can do. In the current PCR 14 PCs factors were also different. The positive relation were eliminated, based on their low eigen value. between ’CSF’ and ’Breeding Value Conformation’ Another option is to eliminate components that have in the PCR-model indicated that farmers who stated low correlation with the response variable. This 2 that milk production per cow was a major critical results in a larger R 0.45 in this case when five PCs success factor for the farm had a higher breeding with the highest correlation with 305-day milk value for type and udder. production were selected. However, the elimination C .W. Rougoor et al. Livestock Production Science 66 2000 71 –83 81 Table 5 Requirements and disadvantages of Principal Component Regression PCR and Partial Least Squares PLS PCR PLS Requirements Possibilities complexity path-analysis Not complex Very complex Degree of prior-information required Not much Much [ cases: No. of variables [ cases . .[ variables [ cases,,5, or.[ variables Assumption on distribution variables Normal distribution Distribution-free Number of Y-variables 51 . 51 Disadvantages Multicollinearity Accounted for Accounted for Analysis Complete Partial Y-variable included in optimisation No Yes Calculation P-values Possible Not possible procedure that was used in this study guarantees certain data set, other aspects of PLS and PCR have variance reduction in the X-variables, but using the to be compared as well. Advantages that did not alternative method does not Mason and Gunst, come out of the current analyses but which are useful 1985, and the alternative method gives less stable to take into account are that in PLS the investigator results Xie and Kalivas, 1997. is free to define more than one Y-variable, that the The results of the two analyses showed some number of variables can be large compared to the advantages and disadvantages of both methodolo- number of observations, and that no distributional gies. PLS has a clear advantage that it is optimizing assumptions are made. This last aspect makes more towards the Y-variable right from the beginning, data sets suitable for PLS-analysis. However, at the whereas with PCR some variance in the data set same time it implicates the disadvantage that signifi- might be left out that still has a reasonable effect on cance values cannot be calculated. A disadvantage of the Y-variable. As a result of that, the percentage of PLS is that it is a partial procedure in the sense that variance that can be explained with the model is each step of the estimation minimizes a residual bigger for PLS. PCR, on the contrary, has a well- variance with respect to a subset of X-, Y- and developed theory, which makes it possible to esti- synthetic variables Steenkamp and Van Trijp, 1996. mate P-values within the model. This makes the So, there is no total residual variance or other overall ¨ model statistically more attractive than PLS that optimum criterion that is strictly optimized Joreskog lacks a good statistical inferential base. This could and Wold, 1982. The requirements, advantages and probably be overcome by using data permutation to disadvantages of both methodologies are summarized generate distributions under the null hypothesis in Table 5. Churchill and Doerge, 1994. Besides that, the regression coefficients of PCR on the original scale can be interpreted more easily. In PCR the synthetic

5. Conclusions