Multivariate Tables, Scatter Plots and 3D Plots
2.2.3 Multivariate Tables, Scatter Plots and 3D Plots
Multivariate tables display the frequencies of multivariate data. Figure 2.19 shows the format of a bivariate table displaying the counts n ij corresponding to the several combinations of categories of two random variables. Such a bivariate table is called a cross table or contingency table.
When dealing with continuous variables, one can also build cross tables using categories in accordance to the bins that would be assigned to a histogram representation of the variables.
y 1 n 11 n 12 . . . n 1 c r 1
y 2 n 21 n 22 . . . n 2 c r 2
n r 1 n r 2 . . . n rc r r
c c Figure 2.19. An r × c contingency table with the observed absolute frequencies (counts n ij ). The row and column totals are r i and c j , respectively.
Example 2.3
Q: Consider the variables SEX and Q4 (4 th enquiry question) of the Freshmen dataset (see Appendix E). Determine the cross table for these two categorical variables.
2.2 Presenting the Data 53
A: The cross table can be obtained with the commands listed in Commands 2.4. Table 2.4 shows the counts and frequencies for each pair of values of the two categorical variables. Note that the variable Q4 can be considered an ordinal variable if we assign ordered scores, e.g. from 1 till 5, from “fully disagree” through “fully agree”, respectively.
A cross table is an estimate of the respective bivariate probability or density function. Notice the total percentages across columns (last row in Table 2.4) and across rows (last column in Table 2.4), which are estimates of the respective marginal probability functions (see section A.8.1).
Table 2.4. Cross table (obtained with SPSS) of variables SEX and Q4 of the Freshmen dataset.
Q4 Total
comment Agree
No
Fully agree
SEX male Count 3 8 18 37 31 97 % of Total
23.5% 73.5% female Count
1 2 4 13 15 35 % of Total
11.4% 26.5% Total Count
4 10 22 50 46 132 % of Total
Table 2.5. Trivariate cross table (obtained with SPSS) of variables SEX, LIKE and DISPL of the Freshmen dataset.
LIKE Total DISPL
no comment yes SEX male Count
like
dislike
25 25 % of Total
67.6% female Count
10 2 12 % of Total
5.4% 32.4% Total Count
35 2 37 % of Total
5.4% 100.0% no SEX male Count
64 1 6 71 % of Total
6.4% 75.5% female Count
21 2 23 % of Total
2.1% 24.5% Total Count
85 1 8 94 % of Total
54 2 Presenting and Summarising the Data
Example 2.4
Q: Determine the trivariate table for the variables SEX, LIKE and DISPL of the Freshmen dataset.
A: In order to represent cross tables for more than two variables, one builds sub- tables for each value of one of the variables in excess of 2, as illustrated in Table
Commands 2.4. SPSS, STATISTICA, MATLAB and R commands used to obtain cross tables.
SPSS Analyze; Descriptive Statistics; Crosstabs Statistics; Basic Statistics and Tables;
STATISTICA Descriptive Statistics; (Tables and banners | Multiple Response Tables)
MATLAB crosstab(x,y) R
table(x,y) | xtabs(~x+y)
The MATLAB function crosstab and the R functions table and xtabs generate cross-tabulations of the variables passed as arguments. Supposing that the dataset Freshmen has been read into the R data frame freshmen, one would obtain Table 2.4 as follows (the ## symbol denotes an R user comment):
> attach(freshmen) > table(SEX,Q4)
## or xtabs(~SEX+Q4) Q4 SEX 1 2 3 4 5
Commands 2.5. SPSS, STATISTICA, MATLAB and R commands used to obtain scatter plots and 3D plots.
SPSS Graphs; Scatter; Simple
Graphs; Scatter; 3-D STATISTICA Graphs; Scatterplots Graphs; 3D XYZ Graphs; Scatterplots
MATLAB scatter(x,y,s,c)
scatter3(x,y,z,s,c) R
plot.default(x,y)
2.2 Presenting the Data 55
The s, c arguments of MATLAB scatter and scatter3 are the size and colour of the marks, respectively.
The plot.default function is the x-y scatter plot function of R and has several configuration parameters available (colours, type of marks, etc.). The R Graphics package has no 3D plot available.
Figure 2.20. Scatter plot (obtained with STATISTICA) of the variables ART and PRT of the cork stopper dataset.
Figure 2.21. 3D plot (obtained with STATISTICA) of the variables ART, PRT and N of the cork stopper dataset.
The most popular graphical tools for multivariate data are the scatter plots for bivariate data and the 3D plots for trivariate data. Examples of these plots, for the cork stopper data, are shown in Figures 2.20 and 2.21. As a matter of fact, the 3D
56 2 Presenting and Summarising the Data
plot is often not so easy to interpret (as in Figure 2.21); therefore, in normal practice, one often inspects multivariate data graphically through scatter plots of the variables grouped in pairs.
Besides scatter plots and 3D plots, it may be convenient to inspect bivariate histograms or bar plots (such as the one shown in Figure A.1, Appendix A). STATISTICA affords the possibility of obtaining such bivariate histograms from within the Frequency Tables window of the Descriptive Statistics menu.