Multivariate Tables, Scatter Plots and 3D Plots

2.2.3 Multivariate Tables, Scatter Plots and 3D Plots

Multivariate tables display the frequencies of multivariate data. Figure 2.19 shows the format of a bivariate table displaying the counts n ij corresponding to the several combinations of categories of two random variables. Such a bivariate table is called a cross table or contingency table.

When dealing with continuous variables, one can also build cross tables using categories in accordance to the bins that would be assigned to a histogram representation of the variables.

y 1 n 11 n 12 . . . n 1 c r 1

y 2 n 21 n 22 . . . n 2 c r 2

n r 1 n r 2 . . . n rc r r

c c Figure 2.19. An r × c contingency table with the observed absolute frequencies (counts n ij ). The row and column totals are r i and c j , respectively.

Example 2.3

Q: Consider the variables SEX and Q4 (4 th enquiry question) of the Freshmen dataset (see Appendix E). Determine the cross table for these two categorical variables.

2.2 Presenting the Data 53

A: The cross table can be obtained with the commands listed in Commands 2.4. Table 2.4 shows the counts and frequencies for each pair of values of the two categorical variables. Note that the variable Q4 can be considered an ordinal variable if we assign ordered scores, e.g. from 1 till 5, from “fully disagree” through “fully agree”, respectively.

A cross table is an estimate of the respective bivariate probability or density function. Notice the total percentages across columns (last row in Table 2.4) and across rows (last column in Table 2.4), which are estimates of the respective marginal probability functions (see section A.8.1).

Table 2.4. Cross table (obtained with SPSS) of variables SEX and Q4 of the Freshmen dataset.

Q4 Total

comment Agree

No

Fully agree

SEX male Count 3 8 18 37 31 97 % of Total

23.5% 73.5% female Count

1 2 4 13 15 35 % of Total

11.4% 26.5% Total Count

4 10 22 50 46 132 % of Total

Table 2.5. Trivariate cross table (obtained with SPSS) of variables SEX, LIKE and DISPL of the Freshmen dataset.

LIKE Total DISPL

no comment yes SEX male Count

like

dislike

25 25 % of Total

67.6% female Count

10 2 12 % of Total

5.4% 32.4% Total Count

35 2 37 % of Total

5.4% 100.0% no SEX male Count

64 1 6 71 % of Total

6.4% 75.5% female Count

21 2 23 % of Total

2.1% 24.5% Total Count

85 1 8 94 % of Total

54 2 Presenting and Summarising the Data

Example 2.4

Q: Determine the trivariate table for the variables SEX, LIKE and DISPL of the Freshmen dataset.

A: In order to represent cross tables for more than two variables, one builds sub- tables for each value of one of the variables in excess of 2, as illustrated in Table

Commands 2.4. SPSS, STATISTICA, MATLAB and R commands used to obtain cross tables.

SPSS Analyze; Descriptive Statistics; Crosstabs Statistics; Basic Statistics and Tables;

STATISTICA Descriptive Statistics; (Tables and banners | Multiple Response Tables)

MATLAB crosstab(x,y) R

table(x,y) | xtabs(~x+y)

The MATLAB function crosstab and the R functions table and xtabs generate cross-tabulations of the variables passed as arguments. Supposing that the dataset Freshmen has been read into the R data frame freshmen, one would obtain Table 2.4 as follows (the ## symbol denotes an R user comment):

> attach(freshmen) > table(SEX,Q4)

## or xtabs(~SEX+Q4) Q4 SEX 1 2 3 4 5

Commands 2.5. SPSS, STATISTICA, MATLAB and R commands used to obtain scatter plots and 3D plots.

SPSS Graphs; Scatter; Simple

Graphs; Scatter; 3-D STATISTICA Graphs; Scatterplots Graphs; 3D XYZ Graphs; Scatterplots

MATLAB scatter(x,y,s,c)

scatter3(x,y,z,s,c) R

plot.default(x,y)

2.2 Presenting the Data 55

The s, c arguments of MATLAB scatter and scatter3 are the size and colour of the marks, respectively.

The plot.default function is the x-y scatter plot function of R and has several configuration parameters available (colours, type of marks, etc.). The R Graphics package has no 3D plot available.

Figure 2.20. Scatter plot (obtained with STATISTICA) of the variables ART and PRT of the cork stopper dataset.

Figure 2.21. 3D plot (obtained with STATISTICA) of the variables ART, PRT and N of the cork stopper dataset.

The most popular graphical tools for multivariate data are the scatter plots for bivariate data and the 3D plots for trivariate data. Examples of these plots, for the cork stopper data, are shown in Figures 2.20 and 2.21. As a matter of fact, the 3D

56 2 Presenting and Summarising the Data

plot is often not so easy to interpret (as in Figure 2.21); therefore, in normal practice, one often inspects multivariate data graphically through scatter plots of the variables grouped in pairs.

Besides scatter plots and 3D plots, it may be convenient to inspect bivariate histograms or bar plots (such as the one shown in Figure A.1, Appendix A). STATISTICA affords the possibility of obtaining such bivariate histograms from within the Frequency Tables window of the Descriptive Statistics menu.