Counts and Bar Graphs

2.2.1 Counts and Bar Graphs

Tables of counts and bar graphs are used to present discrete data. Denoting by X the discrete random variable associated to the data, the table of counts – also know as tally sheet – gives us:

– The absolute frequencies (counts), n k ; – The relative frequencies (or simply, frequencies) of occurrence f k =n k /n,

for each discrete value (category), x k , of the random variable X (n is the total number of cases).

Example 2.1

Q: Consider the Meteo dataset (see Appendix E). We assume that this data has been already read in by SPSS, STATISTICA, MATLAB or R. Obtain a tally sheet showing the counts of maximum precipitation categories (discrete variable PClass). What is the category with higher frequency?

A: The tally sheet can be obtained with the commands listed in Commands 2.1. Table 2.1 shows the results obtained with SPSS. The category with higher rate of occurrence is category 2 (64%). The Valid Percent column will differ from the Percent column, only in the case of missing data, with the Valid Percent removing the missing data from the computations.

Table 2.1. Frequency table for the discrete variable PClass, obtained with SPSS. Cumulative

Frequency Percent Valid Percent Percent

Valid 1.00

3.00 3 12.0 12.0 100.0 Total 25 100.0 100.0

2.2 Presenting the Data 41

In Table 2.1 the counts are shown in the column headed by Frequency, and the frequencies, given in percentage, are in the column headed by Percent. These last ones are unbiased and consistent point estimates of the corresponding probability values p k . For more details see A.1 and the Appendix C.

Commands 2.1. SPSS, STATISTICA, MATLAB and R commands used to obtain frequency tables. For SPSS and STATISTICA the semicolon separates menu options that must be used in sequence.

SPSS Analyze; Descriptive Statistics;

Frequencies STATISTICA

Statistics; Basic Statistics and Tables; Descriptive Statistics; Frequency Tables

MATLAB tabulate(x) R

table(x); prop.table(x)

When using SPSS or STATISTICA, one has to specify, in appropriate windows, the variables used in the statistical analysis. Figure 2.8 shows the windows used for that purpose in the present “Descriptive Statistics” case.

With SPSS the variable specification window pops up immediately after choosing Frequencies in the menu Descriptive Statistics. Using a select button that toggles between select (

) and remove ( ), one can specify which variables to use in the analysis. The frequency table is outputted into the output sheet, which constitutes a session logbook, that can be saved ( *.spo file) and opened at a later session. From the output sheet the frequency table can be copied into the clipboard in the usual way (e.g., using the CTRL+C keys) by first selecting it with the mouse (just click the mouse left button over the table).

Figure 2.8. Variable specification windows for descriptive statistics: a) SPSS;

b) STATISTICA.

42 2 Presenting and Summarising the Data

With STATISTICA, the variable specification window pops up when clicking the Variables tab in the Descriptive Statistics window. One can select variables with the mouse or edit their identification numbers in a text box. For instance, editing “2-4”, means that one wishes the analysis to be performed starting from variable v2 up to variable v4. There is also a Select All variables button. The frequency table is outputted into a specific scroll-sheet that is part of a session workbook file, which constitutes a session logbook that can be saved ( *.stw file) and opened at a later session. The entire scroll-sheet (or any part of the screen) can be copied to the clipboard (from where it can be pasted into

a document in the normal way), using the Screen Catcher tool of the Edit menu. As an alternative, one can also copy the contents of the table alone in the normal way.

The MATLAB tabulate function computes a 3-column matrix, such that the first column contains the different values of the argument, the second column values are absolute frequencies (counts), and the third column are these frequencies in percentage. For the PClass example we have:

» t=tabulate(PClass) t=

Text output of MATLAB can be copied and pasted in the usual way. The R table function – table(PClass) for the example – computes the

counts. The function prop.table(x) computes proportions of each vector x element. In order to obtain the information of the above last column one should use prop.table(table(PClass)). Text output of the R console can be copied and pasted in the usual way.

Figure 2.9. Bar graph, obtained with SPSS, representing the frequencies (in percentage values) of PClass.

2.2 Presenting the Data 43

With SPSS, STATISTICA, MATLAB and R one can also obtain a graphic representation of a tally sheet, which constitutes for the example at hand an estimate of the probability function of the associated random variable X PClass , in the form of a bar graph (see Commands 2.2). Figure 2.9 shows the bar graph obtained with SPSS for Example 2.1. The heights of the bars represent estimates of the discrete probabilities (see Appendix B for examples of bar graph representations of discrete probability functions).

Commands 2.2. SPSS, STATISTICA, MATLAB and R commands used to obtain bar graphs. The “|” symbol separates alternative options or functions.

SPSS Graphs; Bar Charts STATISTICA

Graphs; Histograms MATLAB

bar(f) | hist(y,x) R

barplot(x) | hist(x)

With SPSS, after selecting the Simple option of Bar Charts one proceeds to choose the variable (or variables) to be represented graphically in the Define Simple Bar window by selecting it for the Category Axis, as shown in Figure 2.10. For the frequency bar graph one must check the “% of cases” option in this window. The graph output appears in the SPSS output sheet in the form of a resizable object, which can be copied (select it first with the mouse) and pasted in the usual way. By double clicking over this object, the SPSS Chart Editor pops up (see Figure 2.11), with many options for the user to tailor the graph to his/her personal preferences.

With STATISTICA one can obtain a bar graph using the Histograms option of the Graphs menu. A 2D Histograms window pops up, where the user must specify the variable (or variables) to be represented graphically (using the Variables button), and, in this case, the Regular type for the bar graph. The user must also select the Codes option, and specify the codes for the variable categories (clicking in the respective button), as shown in Figure 2.12. In this case, the Normal fit box is left unchecked. Figure 2.13 shows the bar graph obtained with STATISTICA for the PClass variable.

Any graph in STATISTICA is a resizable object that can be copied (and pasted) in the usual way. One can also completely customise the graph by clicking over it and modifying the required specifications in the All Options window, shown in Figure 2.14. For instance, the bar graph of Figure 2.13 was obtained by: choosing the white background in the Graph Window sub-window; selecting black hatched fill in the Plot Bars sub-window; leaving the Gridlines box unchecked in the Axis Major Units sub-window (shown in Figure 2.14).

MATLAB has a routine for drawing histograms (to be described in the following section) that can also be used for obtaining bar graphs. The routine,

44 2 Presenting and Summarising the Data

hist(y,x), plots a bar graph of the y frequencies, using a vector x with the categories. For the PClass variable one would have to write down the following commands:

» cat=[1 2 3]; %vector with categories » hist(pclass,cat)

Figure 2.10. SPSS Define Simple Bar window, for specifying bar charts.

Figure 2.11. The SPSS Chart Editor, with which the user can configure the graphic output (in the present case, Figure 2.9). For instance, by using Color from the Format menu one can modify the bar colour.

2.2 Presenting the Data 45

Figure 2.12. Specification of a bar chart for variable PClass (Example 2.1) using STATISTICA. The category codes can be filled in directly or by clicking the All button.

Figure 2.13. Bar graph, obtained with STATISTICA, representing the frequencies (counts) of variable PClass (Example 2.1).

If one has available the vector with the counts, it is then also possible to use the bar command. In the present case, after obtaining the previously mentioned t vector (see Commands 2.1), one would proceed to obtain the bar graph corresponding to column 3 of t, with:

» colormap([.5 .5 .5]); bar(t(:,3))

46 2 Presenting and Summarising the Data

Figure 2.14. The STATISTICA All Options window that allows the user to completely customise the graphic output. This window has several sub-windows that can be opened with the left tabs. The sub-window corresponding to the axis units is shown.

The colormap command determines which colour will be used for the bars. Its argument is a vector containing the composition rates (between 0 and 1) of the red, green and blue colours. In the above example, as we are using equal composition of all the colours, the graph, therefore, appears grey in colour.

Figures in MATLAB are displayed in specific windows, as exemplified in Figure 2.15. They can be customised using the available options in Tools. The user can copy a resizable figure using the Copy Figure option of the Edit menu.

The R hist function when applied to a discrete variable plots its bar graph. Instead of providing graphical editing operations in the graphical window, as in the previous software products, R graphical functions have a whole series of configuration arguments. Figure 2.16a was obtained with hist(PClass, col=“gray”). The argument col determines the filling colour of the bars. There are arguments for specifying shading lines, the border colour of the bars, the labels, and so on. For instance, Figure 2.16b was obtained with hist(PClass, density = 10, angle = 30, border = “black”, col = “gray”, labels = TRUE). From now on we assume that the reader will browse through the on-line help of the graphical functions in order to obtain the proper guidance on how to set argument values. Graphical plots in R can be copied as bitmaps or metafiles using menu options popped up with the mouse right button.

2.2 Presenting the Data 47

Figure 2.15. MATLAB figure window, containing the bar graph of PClass. The graph itself can be copied to the clipboard using the Copy Figure option of the Edit menu.

Figure 2.16. Bar graphs of PClass obtained with R: a) Using grey bars; b) Using dashed gray lines and count labels.