Calculators, Computers, and Software Systems

even though these averages are meaningless. Used intelligently, these packages are convenient, powerful, and useful—but be sure to examine the output from any com- puter run to make certain the results make sense. Did anything go wrong? Was some- thing overlooked? In other words, be skeptical. One of the important acronyms of computer technology still holds; namely, GIGO: garbage in, garbage out. Throughout the textbook, we will use computer software systems to do most of the more tedious calculations of statistics after we have explained how the cal- culations can be done. Used in this way, computers and associated graphical and statistical analysis packages will enable us to spend additional time on interpret- ing the results of the analyses rather than on doing the analyses.

3.3 Describing Data on a Single Variable: Graphical Methods

After the measurements of interest have been collected, ideally the data are organ- ized, displayed, and examined by using various graphical techniques. As a general rule, the data should be arranged into categories so that each measurement is classi- fied into one, and only one, of the categories. This procedure eliminates any ambiguity that might otherwise arise when categorizing measurements. For example, suppose a sex discrimination lawsuit is filed. The law firm representing the plaintiffs needs to summarize the salaries of all employees in a large corporation. To examine possible inequities in salaries, the law firm decides to summarize the 2005 yearly income rounded to the nearest dollar for all female employees into the categories listed in Table 3.2. TABLE 3.3 Format for summarizing salary data Income Level Salary 1 less than 20,000 2 20,000 to 40,000 3 40,000 to 60,000 4 60,000 to 80,000 5 80,000 to 100,000 6 100,000 or more TABLE 3.2 Format for summarizing salary data Income Level Salary 1 less than 20,000 2 20,000 to 39,999 3 40,000 to 59,999 4 60,000 to 79,999 5 80,000 to 99,999 6 100,000 or more The yearly salary of each female employee falls into one, and only one, income cat- egory. However, if the income categories had been defined as shown in Table 3.3, then there would be confusion as to which category should be checked. For exam- ple, an employee earning 40,000 could be placed in either category 2 or 3. To reiterate: If the data are organized into categories, it is important to define the categories so that a measurement can be placed into only one category. When data are organized according to this general rule, there are several ways to display the data graphically. The first and simplest graphical procedure for data organized in this manner is the pie chart. It is used to display the percentage of the total number of measurements falling into each of the categories of the vari- able by partitioning a circle similar to slicing a pie. The data of Table 3.4 represent a summary of a study to determine which types of employment may be the most dangerous to their employees. Using data from the National Safety Council, it was reported that in 1999, approximately 3,240,000 workers suffered disabling injuries an injury that results in death, some degree of physical impairment, or renders the employee unable to perform regular activities for a full day beyond the day of the injury. Each of the 3,240,000 disabled workers was classified according to the industry group in which they were employed. Although you can scan the data in Table 3.4, the results are more easily inter- preted by using a pie chart. From Figure 3.1, we can make certain inferences about which industries have the highest number of injured employees and thus may require a closer scrutiny of their practices. For example, the services industry had nearly one-quarter, 24.3, of all disabling injuries during 1999, whereas, govern- ment employees constituted only 14.9. At this point, we must carefully consider what is being displayed in both Table 3.4 and Figure 3.1. These are the number of disabling injuries, and these figures do not take into account the number of work- ers employed in the various industry groups. To realistically reflect the risk of a dis- abling injury to the employees in each of the industry groups, we need to take into account the total number of employees in each of the industries. A rate of disabling injury could then be computed that would be a more informative index of the risk to a worked employed in each of the groups. For example, although the services group had the highest percentage of workers with a disabling injury, it had also the TABLE 3.4 Disabling injuries by industry group Number of Disabling Percent Industry Group Injuries in 1,000s of Total Agriculture 130 3.4 Construction 470 12.1 Manufacturing 630 16.2 Transportation Utilities 300 9.8 Trade 380 19.3 Services 750 24.3 Government 580 14.9 Source: Statistical Abstract of the United States—2002, 122nd Edition. pie chart FIGURE 3.1 Pie chart for the data of Table 3.4 Category Services 24.3 14.9 12.1 9.8 3.4 19.3 16.2 Trade Manufacturing Government Construction Transportation and utilities Agriculture