Time Plot

5.6 Time Plot

The previous analyses in this chapter applied to the distribution of values for a variable. The values were ordered from the smallest value to the largest with the frequencies of similar values noted. Here we continue to consider the distribution of the values of a variable, but now focus on the order of the values as they appear in the data table. When the entered order of the data values is of interest it is usually because the order reflects the time that each value was generated. The focus on time leads to a consideration of two different kinds of data.

Cross-sectional data values are measurements of the same variable at about the same time over

cross-sectional

different people, or whatever is the unit of analysis. Administering an attitude survey one time

data: Data for a variable collected

to many different people generates cross-sectional data. The analysis of the frequencies of the

at about the same

data values ordered from smallest to largest is a primary analysis for these data. The graphs

time.

presented previously in this chapter all provide this type of analysis: a histogram, one-variable scatter plot, box plot, and density plot.

Longitudinal data values are measurements of the same variable with variation obtained over

longitudinal

different time periods. Data collected over time can, and usually should, be analyzed with graphs

data: Data for a variable collected

such as histograms. The frequency of different ranges of values usually is always of interest.

at different times.

Longitudinal data, however, also present the additional dimension of time for analysis.

LineChart

To plot longitudinal data use the lessR function LineChart , abbreviated lc . The values of

function: Plot points over time

the variable appear on the vertical or y-axis, and some indicator of time defines the horizontal

with a line

or x-axis. LineChart generates the indicator of time that scales the horizontal axis, so the only

segment that joins consecutive points.

data passed to the function are the values of the variable to be plotted.

5.6.1 Run Chart

run chart: Values

One version of a time oriented plot is the run chart, in which the horizontal axis lists the ordinal

plotted and identified in the

position of each value, from 1 to the last value. The default label for the horizontal axis is index .

order that they

A primary purpose of the run chart is to understand the performance of an on-going process

occur.

over time. For example, consider the student ratings of overall teaching performance for a professor who is about to be evaluated for tenure. The data set is on the web.

http://lessRstats.com/data/Ratings.csv

For the last 5 years the professor taught the same course once a term, four quarters a year. First the Dean examines the scatter plot of the mean ratings from these 20 classes. The variable name is Rating.

> ScatterPlot(Rating)

The result appears in Figure 5.10 . The ratings are on a 10-point scale, where the most favorable rating possible is a 10. The Dean concludes that the ratings are generally favorable, though some terms the students were relatively displeased. Are those less-well-received terms earlier in the professor’s employment, or are they more recent, which would indicate a diminishing of performance in terms of student ratings. The run chart provides the visual answer to this question.

Continuous Variables 119

Figure 5.10 Distribution of the average student ratings over the 20 terms.

lessR Input Run chart of the average teacher ratings for the last 20 terms > LineChart(Rating)

Figure 5.11 Run chart of the mean rating each term for a course.

Obtain the run chart with the lessR function LineChart , abbreviated lc . To facilitate comparison of values at different time periods, if the run chart does not exhibit pronounced trends up or down, then LineChart adds a horizontal dotted line at the median. For Figure 5.11 , comparing the ratings against the plotted median line, the professor’s ratings may exhibit a mild trend upwards, demonstrating some improvement over time. The worst ratings occurred toward the beginning of the professor’s employment. For example, the worst mean rating, close to a

7.0, occurred for the sixth term. The best ratings occurred for Terms 15 and 19. There are also several available options. As usual, to obtain the complete list of options color choices,

Section 1.1 , p. 17

enter ?LineChart . As with all lessR graphic functions, the usual color choices apply. Use center.line

the center.line option to specify the reference line as the mean, center.line="mean" ; option: Specify the the median, center.line="median" ; turned off, center.line="off" ; or through zero, type of center line drawn, if any. center.line="zero" .

type option:

By default the LineChart function displays both the points for each individual data value Specify to plot line and the line segment that connects each adjacent pair of points. To display only the line use the segments (l), points (p), or the type="l" option. By default, the plotted point is a filled circle. Access alternative shapes that default both (b).

120 Continuous Variables

R provides with the shape.points option according to numeric codes. Enter ?points to view

all the possibilities. For example, shape.points=23 provides diamonds as the plotted points.

shape.points

option: Specify the type of point to plot.

5.6.2 Time Series

Similar to a run chart, a time series chart also plots the values of a variable over time. The distinction is that for the time series chart the horizontal axis is labeled with times or dates that indicate when each data value was generated. To illustrate, the US Census Bureau provides historical estimates of the world population from 1950 and before, and results based on actual counts after that time (US Census Bureau, 2012).

Some of the data from the US Census Bureau website is extracted and reformatted, and available at the book’s web site.

> mydata <- Read("http://lessRstats.com/data/WorldPopulation.csv")

time.start option, Starting time of the

The name of the variable of interest is Population. Inform LineChart of the variable to

data values.

analyze and then specify the time periods from which LineChart will scale the horizontal axis.

time.by option,

Specify the periods with the time.start and time.by options.

Time increment with units of "days", "weeks", "months", or

lessR Input Plot the time series of world population from 1900

"years" in the

> LineChart(Population,

singular or plural.

time.start="1900/1/1", time.by="10 years", xlab="Year", ylab="World Population (millions)")

Figure 5.12 shows the explosive growth of the world population during the 20th century and into the 21st century.

u lation (millions)

orld Pop

Figure 5.12 World population according to the US Census Bureau (2012).

Continuous Variables 121

If the data values are systematically changing over time, such as the dramatic increases in world population exhibited in Figure 5.12 , then LineChart does not plot a center line by default but does fill in the area under the curve. To remove the filled-in area under the curve, set col.area="transparent" in the call to LineChart .

The data values for the time series chart need to be ordered from the earliest to the last data Sort function, value. If they are ordered in the opposite direction, one possibility is to use the lessR function Section 3.5 , p. 66

Sort to re-order the entire data frame. Another possibility is to set the time.reverse option time.reverse

to TRUE as part of the call to LineChart .

option: Applies to data values listed from last to earliest.

Worked Problems

1 Consider the Cars93 data set, available from within the lessR package. > mydata <- Read("Cars93", format="lessR")

?dataCars93 for more information.

One of the variables in this data set is MPGcity. (a) What is the sample mean and standard deviation of city MPG?

(b) Provide the histogram of city MPG. (c) Provide the density curve of city MPG.

(d) Provide a box plot of city MPG. Are there are any outliers and potential outliers? (e) Provide a scatter plot (dot plot) of city MPG.

(f) Describe the distribution of city MPG. 2 Consider the Mach4 data set, available from within the lessR package.

> mydata <- Read("Mach4", format="lessR")

?Mach4 for more information.

Now consider the ninth item, named m09, scored on a 6-point Likert scale from 0 for Strongly Disagree to 5 for Strongly Agree.

9. All in all, it is better to be humble and honest than to be important and dishonest. (a) What is the sample mean and standard deviation of m09?

(b) Provide the histogram of m09. (c) Provide the density curve of m09.

(d) Provide a box plot of m09. Are there are any outliers and potential outliers? (e) Provide a scatter plot (or dot plot) of m09.

(f) Describe the distribution of m09. 3 R provides several functions for simulating data. The function rnorm generates n simulated

data values randomly sampled from a normal distribution with a specified population mean and population standard deviation.

> rnorm(n= , mean= , sd= ) When using the function, fill in the three blanks for the three respective values.

(a) Generate 20 randomly sampled values from a normal distribution with µ = 50 and σ= 10.

122 Continuous Variables

(b) Generate another 20 values and store them into a data vector for later analysis with the R assignment statement. Call the data vector Y.

> Y <- rnorm( ... ) (c) List the simulated data values with > Y.

(d) Display their histogram. Describe the result. (e) Compare the sample mean and sample standard deviation to their population coun-

terparts. Are the sample values equal to the corresponding population values? Why or why not?

(f) Display their run chart. Describe the result.