Variable Labels
2.4 Variable Labels
To analyze the data values of a variable requires a reference to the variable by its name. These names are as short as possible, usually around 10 characters or less. Long variable names are more difficult to type and the spelling of a variable name must be exact, including any capitalization, for the variable to be properly referenced. Shorter variable names simplify this reference to the corresponding variables. Also, the output tends to be more concise with shorter names. The names and values of many more variables can be listed in columns across a computer screen or printed with shorter variable names.
The disadvantage is that a short name may not adequately describe the meaning of the variable. Consider the sixth item on the Mach IV scale.
Mach IV scale, Table 1.2 , p. 26
Honesty is the best policy in all cases.
The corresponding variable name is m06 , which is efficiently compact and communicates the location of the item on the Mach IV scale. The disadvantage is that by itself this name carries no indication as to the content of the item.
46 Read/Write Data
A variable label is a longer, more descriptive label that more fully describes the variable’s
variable label: A
meaning than its usually shorter name. For an item on a survey, an appropriate variable label
description of a variable’s meaning.
is the content of the item. Reference the variables in the subsequent function calls with the shorter variable names, but the more descriptive variable labels appear on the output to help interpret the results.
Unlike standard R , lessR has a provision for variable labels, which are stored with the data in the data frame.
Scenario Read the variable labels into R Increase the interpretability of the graphics and text output of each data analysis procedure by displaying variable labels.
When using lessR functions, the existence of any variable labels in the data frame triggers their automatic use on the text or graphics output. For example, the label on the horizontal axis of the histogram for Variable m06 is ordinarily the variable name, m06 . With a variable label present the axis label is the corresponding variable label, here the content of the item. When using standard R functions the variable labels can still be employed, though they must
be manually invoked, such as in the specification of an axis label for a histogram. The details follow.
2.4.1 The Variable Labels File
The variable labels are read into R with the data. These labels for native SPSS and R files would have already been included in the respective data files. The SPSS user would have defined the variable labels within SPSS before writing the .sav data file. The R user would already have read the variable labels into the data frame and then written the data frame that included these labels to an external R data file. Reading either the resulting SPSS or the R native data file yields
a data frame that includes both the data as well as any included variable labels.
How to read the variable labels when reading a text data file such as in csv or fwd format? Variable labels can be read from one of two text files. One possibility is a file dedicated specifically
csv file,
to variable labels, which is a csv file. Each row of the file consists of a variable name, a comma,
Section 1.6.4 , p. 24
and then the corresponding variable label. If there is a comma in the variable label, then the entire label must be enclosed in quotes, which a worksheet application such as Excel provides automatically when the file is saved to the csv format.
The example in Figure 2.1 shows how the Mach IV items are entered into Excel, one column for the variable names and one column for the corresponding variable labels. Save the worksheet in csv format so that the labels file is just another csv file. The organization of the information
1 m01
Never tell anyone the real reason you did something unless it is useful to do so.
2 m02
The best way to handle people is to tell them what they want to hear.
3 m03
One should take action only when sure it is morally right.
4 m04
Most people are basically good and kind.
Figure 2.1 The Excel representation of the variable labels for Variables m01 to m04, the first four Mach IV items.
Read/Write Data 47
in this labels file is flexible. There is no need to list the variables in any particular order, nor do all variables in the corresponding data file need to have a variable label in the label file.
The variable labels are read with the lessR function Read at the same time the data values are read. To read the accompanying variable labels, specify the location of the file of variable labels option: labels with the labels option as part of the same Read instruction for reading the data. If the Location of the variable labels. labels csv file is located in the same file directory (folder) as the data file, then just the file name in quotes need be specified. If the labels file is located somewhere else then the full path name needs to be provided.
In the following example the location of the data file is not specified, so the Read function will prompt the user to browse the file directory for the location of the file. The labels option in this example specifies just the file name of the corresponding variable labels file, so that file is presumed to be in the same directory as the data file.
lessR Input Read variable labels from the file called varlabels.csv > mydata <- Read(labels="varlabels.csv")
As usual with the Read function, if a full path name or web address (URL) is provided as the first argument of the function call, within quotes, then the file is read directly from the specified location.
2.4.2 Variable Labels in the Data File
A second option for providing variable labels is to include the labels in the main data file formatted as a csv or tab-delimited text file. Just place the labels in the second line of the data file, so that the first line contains the variable names and the data itself then begins on the third line of the file. Figure 2.2 provides an example.
1 Mach_1 Mach_2
2 Never tell anyone The best way to ha One should take a Most people are b It is safest to assur Honesty is the bes
Figure 2.2 A Mach IV data file with variable labels and the first four rows of data, opened in Excel.
The format of the information in Figure 2.2 is the format provided by the popular Qualtrics Qualtrics on-line on-line survey software. This combined data/labels file can be constructed and/or modified in survey software, www.qualtrics.com
a worksheet application such as Excel. Or the file can be obtained directly from the Qualtrics labels="row2"
on-line survey software with a download of the survey responses in csv format. To read the option: Read combined data/labels file, use Read with labels="row2" .
variable labels from the second row of
a csv data file.
lessR Input Read data and variable labels from the same file > mydata <- Read(labels="row2")
48 Read/Write Data
Write function,
Now that the variable labels have been read with the data values, they can be saved with
Section 2.5.2 , p. 50
the data to an external data file with the lessR function Write . Future Read statements to read the native R data file will automatically read the variable labels as well.
2.4.3 Using Variable Labels with Standard R Functions
If variable labels are present for an existing data frame, such as mydata , then standard R functions can also access the labels with the lessR function label . The argument of the
label function: Use
function is the corresponding variable name.
variable labels with R functions.
For example, the option to specify a title for a histogram, and virtually every other R graph, is main . The following function call is to the standard R histogram function, hist . The histogram is constructed for variable m06 , the sixth item on the Mach IV scale. The label function accesses the corresponding variable label in the data frame, here for display on the graph title.
Unlike lessR functions, R functions cannot reference the variables directly by their name. Instead, each standard R function also needs the name of the data frame that contains the data.
$ notation,
One way to provide this name is to precede the variable name with the name of the data frame
Section 1.3.5 , p. 14
that contains the variable and a $ sign, as follows.
hist function: Standard R
> hist(mydata$m06, xlab=label(m06))
histogram.
The result is that whatever variable label is in the data frame for variable m06 is displayed as
Histogram
the label for the horizontal or x-axis of the resulting histogram. In general, however, the lessR
function, Section 5.2 , p. 100
histogram function Histogram provides more pleasing aesthetics, more information regarding the underlying distribution, improved error diagnostics, and directly references the variable by name.