2.3 VARIABLES AND DATA
. . .
data variables.
Quantitative variable: Qualitative variable:
Nominal data:
Ordinal data:
data
, ,
,
38
CHAPTER 2 DESCRIBING PATTERNS IN DATA
We have used the term to mean numbers or measurements obtained from
sampling units. In the previous chapter, sampling units included Wednesdays in January, engines, and time intervals. More formally,
are numbers that represent particular characteristics of sampling units. The characteristics themselves are called
Income is a variable and, if you are a sampling unit, your particular income is a measurement or data. Gender is a variable and, if you are a sampling unit, your
gender is data. The previous examples indicate that variables are of two types:
A variable that is naturally numerical, such as income A variable whose values are categories, such as gender
The values of a quantitative variable fall on some scale of measurement. Qualitative variables are somewhat different. The variables “gender,” “employment status,” and
“Moody’s bond rating” are not naturally numerical. The “values” for these variables are categories such as male female, employed unemployed, and Aaa Aa
C, re- spectively. We can make variables like these numerical by assigning numbers to the
categories, and sometimes it is convenient to do so.
The numbers assigned to distinguish the separate categories of a qualitative variable
The number 1 assigned to “male” and the number 2 assigned to “female” are nominal data. Sometimes, however, it is useful to retain the original verbal descriptions of the
categories. If the outcomes for qualitative variables are ordered, that is, if there is an implied
hierarchy of categories, an increasing or decreasing set of numbers can be assigned to represent the ordered categories.
An increasing or decreasing set of numbers assigned to the ordered categories of a qualitative variable
For example, Moody’s has nine categories of bond ratings ranging from C ex- tremely poor in investment quality to Aaa a “gilt-edge” security . We might code
these categories using the integers 1 through 9 with category C assigned the number 1 and category Aaa assigned the number 9. The increasing order of the integers matches
the increasing order—from worst extremely risky to best virtually no risk — of the
. . .
Binary coding:
Discrete variable:
4
4 4
` `
` `
4 4
proportion .
, , ,
39
2.3 VARIABLES AND DATA
bond categories. A group of 10 bonds might yield the data 4, 9, 9, 5, 6, 8, 2, 7, 7, 6, where the numbers correspond to the Moody’s ratings.
The magnitudes of the numbers we have assigned have meaning: 7 is a better less risky bond than 6 because 7 is larger than 6. But arithmetic operations performed on
these numbers have no meaning, since there is no well-defined origin and no natural unit of measurement. For example, we can compute the difference 9
8 1, but we
cannot say that this is the difference between Aaa bonds and Aa bonds. We could just as well have assigned the number 20 to Aaa bonds, 15 to Aa bonds, and so forth.
In this case, the implied difference between the two highest-rated bond groups is 5. The differences or sums or products or ratios could be anything we want them to
be, because there is no natural unique choice for an increasing set of numbers to represent the bond categories.
For a qualitative variable with two unordered categories, it is often helpful to assign the number 0 to one category and the number 1 to the other. With employment
status, we might make the assignment employed
1 unemployed
and, if several people were involved, the data would consist of a sequence of 0’s and 1’s, where a single digit corresponds to a specific person.
Assigning 0 and 1 to the only two unordered categories of a qualitative variable
Using binary coding, a group of five people, three of them employed, could yield the data 0, 1, 1, 0, 1. For binary coding, summing the data gives the count in the
category designated by 1. There are 0 1
1 1
3 people employed in our group of five. Dividing the sum by the total number of items gives the
of items in the 1 category. For our five people, a proportion 60 are employed.
Because of the interpretation of these specific arithmetic operations, binary, or 0 – 1, coding is useful.
Closer scrutiny of quantitative variables reveals two distinct types. American shoe sizes, such as 7 7
8 , proceed in steps of . Stock prices are expressed in steps
of ’s of a dollar. Counts such as number of directors or vote tallies are, of course,
integers.
A quantitative variable whose values are distinct numbers with gaps between them
Shoe size, stock price, number of directors, and vote tally are examples of discrete variables.
3 5
1 1
2 2
1 8
2
Continuous variable:
2.1
2.2 40