Chapter 17 Partial Least Squares Regression

Chapter 17
Partial Least Squares Regression
The Partial Least Squares Regression procedure estimates partial least
squares (PLS, also known as "projection to latent structure") regression
models. PLS is a predictive technique that is an alternative to ordinary
least squares (OLS) regression, canonical correlation, or structural
equation modeling, and it is particularly useful when predictor variables
are highly correlated, or when the number of predictors exceeds the
number of cases.
PLS combines features of principal components analysis and multiple
regression. It first extracts a set of latent factors that explain as much of
the covariance as possible between the independent and dependent
variables. Then a regression step predicts values of the dependent
variables using the decomposition of the independent variables.
17.1. Using Partial Least Squares Regression to Model Vehicle Sales
An automotive industry group keeps track of the sales for a variety of
personal motor vehicles. In an effort to be able to identify over- and
underperforming models, you want to establish a relationship between
vehicle sales and vehicle characteristics.
Information concerning different makes and models of cars is contained
in car_sales.sav. See the topic Sample Files for more information. Since

vehicle characteristics are correlated, partial least squares regression
should be a good alternative to ordinary least squares regression.
17.1.1. Running the Analysis
1. To run a Partial Least Squares Regression analysis, from the menus
choose:
Analyze > Regression > Partial Least Squares...

224

Figure 269 Partial Least Squares Regression Variables tab

2. Select Log-transformed sales [lnsales] as a dependent variable.
Select Vehicle type [type] through Fuel efficiency [mpg] as independent
variables.
3. Click the Options tab.

225

Figure 270 Options tab


4. Select Save estimates for individual cases and type indvCases as the
name of the dataset.
5. Select Save estimates for latent factors and type latentFactors as the
name of the dataset.
6. Select Save

estimates

for

independent

variables and

type indepVars as the name of the dataset.
7. Click OK.
17.1.2. Proportion of Variance Explained
Figure 271 Proportion of variance explained

The proportion of variance explained table shows the contribution of

each latent factor to the model.

226

The first factor explains 20.9% of the variance in the predictors and
40.3% of the variance in the dependent variable.
The second factor explains 55.0% of the variance in the predictors
and 2.9% of the variance in the dependent.
The third factor explains 5.3% of the variance in the predictors and
4.3% of the variance in the dependent. Together, the first three factors
explain 81.3% of the variance in the predictors and 47.4% of the
variance in the dependent.
Though the fourth factor adds very little to the Y variance explained, it
contributes more to the X variance than the third factor, and its
adjusted R-square value is higher than that for the third factor.
The fifth factor contributes the least of any factor to both
the X and Y variance explained, and the adjusted R-square dips
slightly. There is no compelling evidence for choosing a four-factor
solution over five in this table.
17.1.3. Output for Independent Variables

Figure 272 Parameters

The parameters table shows the estimated regression coefficients for
each independent variable for predicting the dependent variable. Instead
of the typical tests of model effects, look to the variable importance in the
projection table for guidance on which predictors are most useful.

227

Figure 273 Variable importance in the projection

The variable importance in the projection (VIP) represents the
contribution of each predictor to the model, cumulative by the number of
factors in the model. For example, in the one-factor model, price loads
heavily on the first factor and has a VIP of 2.088. As more factors are
added, the cumulative VIP for price slowly drops to 1.946, presumably
because it does not load very heavily on those factors. By
contrast, engine_s has a VIP of 0.512 in the one-factor model, which
rises to 0.932 in the five-factor model.
Figure 274 indepVars dataset


The parameter coefficients and VIP information is also saved to
the indepVars dataset and can be used in further analysis of the data.
The cumulative variable importance chart, for example, is created using
this dataset.

228

Figure 275 Cumulative variable importance chart

The cumulative variable importance chart provides a visualization of the
variable importance in the projection table. For information on the
contribution of predictors to individual factors instead of the cumulative
model, see the output for latent factors.
17.1.4. Output for Latent Factors
Figure 276 Weights

The predictor weights represent the association between the predictors
and the Y scores, by latent factor. Likewise, the weights for the
dependent variable lnsales represents the association between lnsales

and the Xscores. As expected from the VIP table, the weight for price is
229

largest on the first latent factor and relatively small in the others, while
the weight for engine_s is relatively small on the first factor. What
becomes clear from this table is to which factors engine_s contributes
most; it has the largest weight of any predictor on the third factor and the
second largest on the fourth. Its relatively small weight on the fifth factor
explains the slight dip in cumulative importance from the four-factor
model to the five-factor model.
Figure 277 latentFactors dataset

The weights and loadings, which are similar to the weights and will not
be discussed here, are saved to the latentFactors dataset and can be
used in further analysis of the data. The factor weights charts, for
example, are created using this dataset.

230

Figure 278 Factor weights 2 vs. 1


The factor weights charts provide a visualization of the pairwise
comparison of factor weights for the first three factors. In the twodimensional space defined by the first two factor weights, you can see
that price, horsepow,
correlated

and [type=Automobile] appear

with lnsales,

since

they

are

negatively

in


opposite

point

directions.length, wheelbase, and mpg are somewhat positively correlated
with lnsales,

and

the

others

are

at

best

weakly


with lnsales because they point perpendicularly to lnsales.

231

correlated

Figure 279 Factor weights 3 vs. 1

In the space defined by factor weights 3 and 1, fuel_cap, which was
positively correlated with engine_s in the 2 vs. 1 plot, is negatively
correlated on factor 3.
Figure 280 Factor weights 3 vs. 2

232

In the space defined by factor weights 3 and 2, lnsales appears more
strongly correlated with mpg, engine_s, and fuel_cap than in previous
plots, illustrating the importance of multiple points of view.
17.1.5. Output for Individual Cases

Figure 281 indvCases dataset

There is no tabular output for individual cases; however, a wealth of
casewise information is written to the indvCases dataset, including the
original values of the variables in the model, model-predicted values for
the predictors, model-predicted values for lnsales, residuals for the
predictors and lnsales, X scores, Y scores, and the X and Y distances to
the

model

(the

PRESS

statistic

is

simply


the

sum

of

the

squared Y distances to the model). This dataset is used to create
the Y scores vs. X scores plot and the X scores vs. X scores plot.

233

Figure 282 Y scores vs. X scores

This scatterplot matrix should show high correlations in the first couple of
factors (plots in the upper left of the matrix), gradually diffusing to very
little correlation. It can be useful for identifying potential outliers for
further investigation.
Figure 283 X scores vs. X scores

234

Plotting the X scores against themselves is a useful diagnostic. There
shouldn't be any patterns, groupings, or outliers.
Outliers are potential influential cases; there are a few to investigate in
this plot.
Patterns and groupings indicate a more complex model, or separate
analyses of groups, may be necessary. The near-separation of
Automobiles and Trucks on X-Score 4 is somewhat troubling, especially
in the plot of X-Score 2 vs. X-Score 4, where the two groups appear to lie
along parallel lines. Separate analyses of autos and trucks is something
to consider in further analysis.

235

Dokumen yang terkait

Implikasi Undang-Undang Nomor 17 Tahun 2014 Tentang MPR, DPR, DPD, dan DPRD Terhadap Kewenangan DPR RI Dalam Hal Penentuan Pimpinan DPR Dan Hak Imunitas DPR.

1 35 32

INTERAKSI SIMBOLIK DALAM PROSES KOMUNIKASI NONVERBAL PADA SUPORTER SEPAKBOLA (Studi Pada Anggota Juventus Club Indonesia Chapter Malang)

5 66 20

(Efektivitas Agonis GABA terhadap Penurunan Ekspresi Reseptor N-Methyl-D-Aspartate (NMDA) Subunit NR2B pada Mencit Neuropati dengan Metode Partial Sciatic Nerve Ligation (PSNL))

0 24 7

KAJIAN YURIDIS PENERAPAN SANKSI ADMINISTRASI BERDASARKAN UNDANG-UNDANG NOMOR 17 TAHUN 2000 TENTANG PAJAK PENGHASILAN (PPh)

0 16 71

Aplikasi keamanan informasi menggunakan teknik steganografi dengan metode Least Significant Bit (LSB) insertion dan RC4

34 174 221

Pengaruh Tingkat Kecerdasan Emosional Terhadap Prestasi Belajar Pendidikan Agama Islam Pada Siswa Smp Muhammadiyah 17 Ciputat

1 48 98

un bahasa inggris dear big 16 17 ragunan 20 22

4 72 17

PERBEDAAN HASIL BELAJAR IPS TERPADU ANTARA PENGGUNAAN MODEL PEMBELAJARAN KOOPERATIF TIPE NUMBERED HEAD TOGHETHER (NHT) DAN SNOWBALL THROWING (ST) DENGAN MEMPERHATIKAN SIKAP SISWA TERHADAP PEMBELAJARAN PADA SISWA KELAS VIII DI SMP YP 17 BARADATU WAYKANAN T

0 25 90

PERBEDAAN HASIL BELAJAR IPS TERPADU ANTARA PENGGUNAAN MODEL PEMBELAJARAN KOOPERATIF TIPE NUMBERED HEAD TOGHETHER (NHT) DAN SNOWBALL THROWING (ST) DENGAN MEMPERHATIKAN SIKAP SISWA TERHADAP PEMBELAJARAN PADA SISWA KELAS VIII DI SMP YP 17 BARADATU WAYKANAN T

2 37 89

TINJAUAN HISTORIS KONFLIK ANTARA PENJAGA KEAMANAN RAKYAT (PKR) DAN RAKYAT MELAWAN TENTARA JEPANG DI TALANG PADANG 17 NOVEMBER 1945

0 20 49