C .W. Rougoor et al. Livestock Production Science 66 2000 71 –83
79
‘Age at Calving’ was not used by the model because to the PCR-model, no significance values are given
all path coefficients to and from this factor were here, because traditional statistical testing methods
smaller than 0.20. Table 3 and Fig. 2 show that milk are not well suited. The Stone–Geisser test criterion
2
production was higher on farms with managers who Q was used as an alternative method to evaluate the
thought that ‘milk production per cow’ was a CSF model. It had a value of 0.31 indicating that the
for their farm. At these farms the breeding value for model had predictive relevance, because it was
conformation was higher. The breeding goal of the bigger than zero. The same main results as with PCR
producer indicated, however, that these producers put were found with PLS. Small differences were found
relatively much emphasis on the quality of the udder in the relation between the synthetic factors ‘Natural
and less on the kg of milk. Service Sires’ and ‘Breeding Value Conformation’.
PCR found a path coefficient of 0.25, whereas in the 3.3. Partial Least Squares PLS
PLS-model it was smaller than 0.20 and therefore deleted. This indicates that a high percentage of
Table 4 provides the factor loadings for each of natural services at the farm has a relatively strong
2
the measures. The R of each synthetic factor, the
negative effect on the breeding value for production variance extracted for each variable, and the average
and a smaller negative effect on breeding value variance extracted for each synthetic factor are
conformation. Besides that, in the PLS-model, direct given. The factor loadings show that the variable
effects of the synthetic factor ‘Critical Success ‘Winter milk’ is the most important variable of the
Factors’ on ‘Breeding Value Production’ and ‘Natural synthetic variable ‘Critical Success Factors’. The
Service Sires’ were found, whereas in the PCR- positive and negative signs of the two variables in
model these path coefficients were too small. the synthetic variable ‘Breeding Goal Producer’
show that a farmer who has a high score on this synthetic factor has said that the udder is an im-
4. Discussion
portant breeding goal at his farm, whereas kg of milk is not. In this model, the age at calving was also not
4.1. Breeding management used, because the path coefficient was here also
2
lower than 0.20. The R of the synthetic factor ‘Milk The path coefficient diagrams Figs. 2 and 3
Production’ shows that the model explained 47 of showed the same main effects. Milk production per
the differences in milk production. cow was inverse related to farm size the regression
Fig. 3 gives a graphical representation of the coefficients and loadings were negative for this
PLS-model with the inner path coefficients. Contrary synthetic factor. Milk production per cow was
Fig. 3. Structural path coefficients for PLS-modelling.
80 C
.W. Rougoor et al. Livestock Production Science 66 2000 71 –83
directly positively related to breeding value for 4.2. Comparing the methods
conformation, and to breeding value for production. These variables, in turn, were related to goals and
Wold 1985 states that PLS is useful when the CSFs of the producer, indicating that milk pro-
main focus of the study shifts from individual duction is not only related to technical parameters,
variables and parameters to packages of variables but also to the attitude of the producer. So, with
and aggregate parameters. He stated that ‘in large, respect to the aim of the data collection to determine
complex models with latent variables PLS is virtual- the relationship between breeding management and
ly without competition’. Rossa 1982 showed a map 305-day milk production, it can be concluded that
of statistical methods with regard to the complexity the producers’ breeding management was related
of the problem and their degree of prior information with the 305-day milk production. Surprisingly, it
and concluded that PCR and PLS are both useful for was found that farmers who stated that they focused
complex problems. However, for PLS-modelling mainly on ‘kg of milk’ as a breeding goal, had a
more prior information is needed, because the re- lower breeding value for milk production and they
searcher has to design a path diagram with expected realised a lower 305-day production than producers
relationships on forehand. who stated that they also took into account ‘udder’
Helland and Almøy 1994 compared PCR and into their breeding strategy. A second aspect that
PLS and concluded that there is not one method that comes forward is the use of natural services sires.
dominates the other, and that the difference between Table 1 shows that natural services were rarely used
the methods is typically small when the number of by the producers in the research group: only 3 of
observations is large. PCR does well when the eigen the cows was inseminated with natural service sires.
values from the irrelevant components are extremely However, it still was related with the breeding value.
small or extremely large. PLS does well for inter- Producers who made more use of artificial insemina-
mediate irrelevant eigen values Helland and Almøy, tion, had cows with a higher breeding value and,
1994. In case of multicollinearity, the eigen values related with that, a higher 305-day milk production.
might not be dominating ones. In that case PLS The CSFs of the producer were related to the
becomes closer to ordinary least squares, which is a breeding goal of the producer, which in turn was
desirable property of PLS. Garthwaite 1994 com- related to the breeding value for production through
pared PLS with four other methods, including PCR, the selection decision. Differences between PCR and
and concluded that PLS is a useful method for PLS came out for the synthetic factor ‘Breeding
forming prediction equations when there are a large Value Conformation’. The underlying variables of
number of explanatory variables.
2
this factor were highly related to each other correla- The R
of the milk production models differed tions between 0.65 and 0.94. PLS deals with that by
considerably between the two methodologies: 0.36 making one synthetic factor out of it, which has a
for the PCR-model and 0.47 for the PLS-model. This high loading on all these variables. PCR, in turn,
can be explained by differences in optimizing tech- tries to minimize multicollinearity by taking one
niques employed in deriving the synthetic factors. variable more into account than the other one. Table
PLS forms the synthetic factors by using the co- 2 shows that especially ‘Breeding Value Legs’ and
variance between the X- and Y-variables already, ‘Breeding Value Type’ were included in this factor in
whereas with PCR the PCs are formed based on the the PCR. Because the factor ‘Breeding Value Con-
X-variables only. As a result of that, the synthetic formation’ was built up differently in the two
factors in PLS explain differences in the Y-variable models, the relationships towards the other synthetic
better than PCR can do. In the current PCR 14 PCs factors were also different. The positive relation
were eliminated, based on their low eigen value. between ’CSF’ and ’Breeding Value Conformation’
Another option is to eliminate components that have in the PCR-model indicated that farmers who stated
low correlation with the response variable. This
2
that milk production per cow was a major critical results in a larger R 0.45 in this case when five PCs
success factor for the farm had a higher breeding with the highest correlation with 305-day milk
value for type and udder. production were selected. However, the elimination
C .W. Rougoor et al. Livestock Production Science 66 2000 71 –83
81 Table 5
Requirements and disadvantages of Principal Component Regression PCR and Partial Least Squares PLS PCR
PLS Requirements
Possibilities complexity path-analysis Not complex
Very complex Degree of prior-information required
Not much Much
[ cases: No. of variables
[ cases . .[ variables
[ cases,,5, or.[ variables
Assumption on distribution variables Normal distribution
Distribution-free Number of Y-variables
51 . 51
Disadvantages Multicollinearity
Accounted for Accounted for
Analysis Complete
Partial Y-variable included in optimisation
No Yes
Calculation P-values Possible
Not possible
procedure that was used in this study guarantees certain data set, other aspects of PLS and PCR have
variance reduction in the X-variables, but using the to be compared as well. Advantages that did not
alternative method does not Mason and Gunst, come out of the current analyses but which are useful
1985, and the alternative method gives less stable to take into account are that in PLS the investigator
results Xie and Kalivas, 1997. is free to define more than one Y-variable, that the
The results of the two analyses showed some number of variables can be large compared to the
advantages and disadvantages of both methodolo- number of observations, and that no distributional
gies. PLS has a clear advantage that it is optimizing assumptions are made. This last aspect makes more
towards the Y-variable right from the beginning, data sets suitable for PLS-analysis. However, at the
whereas with PCR some variance in the data set same time it implicates the disadvantage that signifi-
might be left out that still has a reasonable effect on cance values cannot be calculated. A disadvantage of
the Y-variable. As a result of that, the percentage of PLS is that it is a partial procedure in the sense that
variance that can be explained with the model is each step of the estimation minimizes a residual
bigger for PLS. PCR, on the contrary, has a well- variance with respect to a subset of X-, Y- and
developed theory, which makes it possible to esti- synthetic variables Steenkamp and Van Trijp, 1996.
mate P-values within the model. This makes the So, there is no total residual variance or other overall
¨ model statistically more attractive than PLS that
optimum criterion that is strictly optimized Joreskog lacks a good statistical inferential base. This could
and Wold, 1982. The requirements, advantages and probably be overcome by using data permutation to
disadvantages of both methodologies are summarized generate distributions under the null hypothesis
in Table 5. Churchill and Doerge, 1994. Besides that, the
regression coefficients of PCR on the original scale can be interpreted more easily. In PCR the synthetic
5. Conclusions