Recode Data

3.4 Recode Data

Transformations change all of the values of a variable according to a specified formula, such as standardization or taking the logarithm of each value. Transformations can be applied to

recode: Map each

continuous variables with many different values. Categorical variables, or continuous variables

value of a data variable into a new

with a small number of possible data values, can have their values changed by specifying how

value.

each existing value is converted to a new value, what is called a recode.

Edit Data 63

3.4.1 Reverse Score Items

A common recode application relates to the computation of a scale score from a survey questionnaire. Consider the 20-item Mach IV scale. Some of the items are written so that an Mach IV scale, Agree response indicates a Machiavellian attitude, a pro-attitude item. The 15th item on the Table 1.2 , p. 26 scale provides an example.

15. It is wise to flatter important people. Other items are written so that Agreement with the item indicates the opposite of a

Machiavellian attitude, a reversed-attitude item. An example here is the 3rd Mach IV item.

3. One should take action only when sure it is morally right. To encourage the respondent to read and comprehend the meaning of each item, up to

around half of the items on a scale are written such that agreement indicates the opposite of the attitude of interest. Otherwise, as applied to the Mach IV scale, a Machiavellian individual would respond Agree or Strongly Agree to each of the 20 items, perhaps then reading each item less critically.

To calculate a total score on the scale, all the item responses for a respondent must be scored so that summing over them consistently indicates endorsement for the attitude of interest. In reverse score: the case of the Mach IV data, code a reverse-scored item such that a 0 indicates Strongly Reverse the scoring of an item so that Agree , whereas for a pro-attitude item a 0 indicates a Strongly Disagree . Every Strongly high values Agree response is initially coded as a 5. In this situation the reverse scoring must adjust the indicate Disagreement. coding of the relevant items before this analysis of the data values.

Scenario Reverse score Likert data The responses to an attitude survey are coded as integers 0 through 5, which corresponds to the response categories of Strongly Disagree through Strongly Agree . However, about half of the items are written so that disagreement with the item indicates agreement with the attitude of interest. Reverse score these items so that, for example, a response of 0 that corresponds to Strongly Disagree is transformed to a 5 to indicate Strongly Agree .

Recode function: Accomplish a

To implement this reverse scoring, use the lessR function Recode , abbreviated rec . The recode of designated first argument is a list of one or more variables, such as items, that are to be recoded. The old variables. parameter specifies the list of existing values. The new parameter specifies the corresponding list old argument: List of recoded values listed in the same order as the values in the old list. The optional argument of existing values. is new.var , the name of a newly created variable to contain the recoded values. If omitted, new argument:

then the recoded values are written over the original values of the variable of interest. If the List of new values.

new.var option:

variable to be recoded is not in the mydata data frame, then specify the data frame with the Name of the new data option.

variable.

In the case of the Mach IV data with responses to a 6-point Likert scale, the recoded values data option: Data are written over the original values. The responses here are coded from 0 to 5. As always for a list frame with the

of multiple values with the values in the list separated by commas, use the combine function variable to be c

recoded, default is

to combine the values together. This list could be written as c(0,1,2,3,4,5) , or, without the mydata.

64 Edit Data

c function,

commas in the abbreviated form using the equivalent expression 0:5 . The reverse scoring flips

Section 1.3.6 , p. 15

this coding, so a 5 goes to a 0, a 4 goes to a 1, and so forth. This recoded list could be specified as c(5,4,3,2,1,0) , or, more simply, 5:0 . This recoding for the reversed-attitude item m03 follows.

lessR Input Reverse score a Likert item > mydata <- Recode(m03, old=0:5, new=5:0)

list of variables,

The variable m03 could be replaced with list of variables to be recoded.

Section 8.3 , p. 195

The Recode function first lists the first several values of the variable or variables to be recoded, as in Listing 3.6 .

First four rows of data to recode for data frame: mydata --------------------------------------------------------

m03 1 4 2 1 3 5 4 3

Listing 3.6 Some data from the variable to be recoded.

Next, Recode displays the implemented recode specification, as shown in Listing 3.7 . Also provided are the total number of rows of data to be recoded, and also a note that the values of the original variable are replaced with the new values.

Recoding Specification ----------------------

0 --> 5 1 --> 4 2 --> 3 3 --> 2 4 --> 1 5 --> 0

Number of cases (rows) to recode: 351 Replace existing values of each specified variable,

no value for option: new.var

Listing 3.7 The mapping of existing values to recoded values.

Recode then provides some information, the number of unique values, the number of values that are to be recoded, and each variable to be recoded, as shown in Listing 3.8 .

As a final verification of the recode, in Listing 3.9 , the first several data values of the recoded variable are displayed. These values can be compared to the values before the recode to verify that it worked as intended.

Edit Data 65

--- Recode: m03 --------------------------------- Number of unique values of m03 in the data: 6 Number of values of m03 to recode: 6

Listing 3.8 Overview of recoding for Variable (Item) m03.

First four rows of recoded data for data frame: mydata ------------------------------------------------------

m03 1 1 2 4 3 0 4 2

Listing 3.9 First four recoded values of m03.

The Recode function also allows multiple variables to be specified in the recoding. Again, the items could be listed individually, or a range of consecutive items in the data frame could

be specified with the colon notation for a variable list. The complete list of Mach IV items to

be recoded follows. > mydata <-

Recode(c(m03,m04,m06,m07,m09,m10,m11,m14,m16,m17,m19), old=0:5, new=5:0)

Because Items m09 , m10 , and m11 occur consecutively in the data frame, this expression could be shortened a little by using the : notation to specify the sequence of these three items, replacing m09,m10,m11 with m09:m11 .

3.4.2 Missing Data

By default missing data in the data file indicates data values that are literally not present. In missing option, addition, the missing argument of the Read function informs R of missing data codes such as Section 2.2.5 , p. 36 −

99, which although physically present in the data, indicate that the corresponding value is missing. It is also possible to assign these codes after the data values are read.

Scenario Assign missing data codes after the data values have been read After the data have been read, assign new codes to data that have been previously defined as missing, or, redefine specific existing values as missing.

To define missing data after the data values are read, use Recode with the value of "missing" for either the old or new specifications. The Recode function can assign missing data from a given data value or values, or it can assign missing data to a specific data value or values.

66 Edit Data

For example, returning to the Employee data set, suppose the value of 1 for HealthPlan indicates a missing data value. If this coding was not specified when the data values were read, then use Recode to assign the missing value.

lessR Input Recode an existing value to missing > mydata <- Recode(Years, old=1, new="missing")

Or suppose values originally coded as missing now should be assigned some other code. Here, assign missing values for the variables Years and Salary a value of 99.

lessR Input Recode missing data to another value > mydata <- Recode(c(Years, Salary), old="missing", new=99)

Note that these assignments cannot be done with variables that are factors. The reason for this is that applying Recode to a factor removes the factor attribute, so that after the

factor function,

transformation the variable is no longer a factor. Manipulating the values of a factor should

Section 3.3.2 , p. 59

be instead accomplished with the factor function.