Analyzing and Interpreting Data
Analyzing and Interpreting Data
After morbidity, mortality, and other relevant data about a health problem have been gathered and compiled, the data should be analyzed by time, place, and person. Different types of data are used for surveillance, and different types of analyses might be needed for each. For example, data on individual cases of disease are analyzed differently than data aggregated from multiple records; data received as text must be sorted, categorized, and coded for statistical analysis; and data from surveys might need to
be weighted to produce valid estimates for sampled populations.
For analysis of the majority of surveillance data, descriptive methods are usually appropriate. The display of frequencies (counts) or rates of the health problem in simple tables and graphs, as discussed in Lesson 4, is the most common method of analyzing data for surveillance. Rates are useful — and frequently preferred — for comparing occurrence of disease for different geographic areas or periods because they take into account the size of the population from which the cases arose. One critical step before calculating a rate is constructing a denominator from appropriate population data. For state- or countywide rates, general population data are used. These data are available from the U.S. Census Bureau or from a state planning agency. For other calculations, the population at risk can dictate an alternative denominator. For example, an infant mortality rate uses the number of live-born infants; rates of surgical wound infections in a hospital requires the number of such procedures performed. In addition to calculating frequencies and rates, more sophisticated methods (e.g., space-time cluster analysis, time series analysis, or computer mapping) can be applied.
To determine whether the incidence or prevalence of a health problem has increased, data must be compared either over time or across areas. The selection of data for comparison depends on the health problem under surveillance and what is known about its typical temporal and geographic patterns of occurrence.
For example, data for diseases that indicate a seasonal pattern (e.g., influenza and mosquito-borne diseases) are usually compared with data for the corresponding season from past years. Data for diseases without a seasonal pattern are commonly compared with data for previous weeks, months, or years, depending on the nature of the disease. Surveillance for chronic diseases typically requires data covering multiple years. Data for acute infectious diseases might only require data covering weeks or months, although data For example, data for diseases that indicate a seasonal pattern (e.g., influenza and mosquito-borne diseases) are usually compared with data for the corresponding season from past years. Data for diseases without a seasonal pattern are commonly compared with data for previous weeks, months, or years, depending on the nature of the disease. Surveillance for chronic diseases typically requires data covering multiple years. Data for acute infectious diseases might only require data covering weeks or months, although data
Analyzing by time Basic analysis of surveillance data by time is usually conducted to
characterize trends and detect changes in disease incidence. For notifiable diseases, the first analysis is usually a comparison of the number of case reports received for the current week with the number received in the preceding weeks. These data can be organized into a table, a graph, or both (Table 5.5 and Figures 5.2 and 5.3) . An abrupt increase or a gradual buildup in the number of cases can be detected by looking at the table or graph. For example, health officials reviewing the data for Clark County in Table 5.5 and Figures 5.2 and 5.3 will have noticed that the number of cases of hepatitis A reported during week 4 exceeded the numbers in the previous weeks. This method works well when new cases are reported promptly.
Table 5.5 Reported Cases of Hepatitis A, by County and Week of Report, 1991
Week of report County 1 2 3 4 5 6 7 8 9
Another common analysis is a comparison of the number of cases during the current period to the number reported during the same period for the last 2–10 years (Table 5.6). For example, health officials will have noted that the 11 cases reported for Clark County during weeks 1–4 during 1991 exceeded the numbers reported during the same 4-week period during the previous 3 years. A related method involves comparing the cumulative Another common analysis is a comparison of the number of cases during the current period to the number reported during the same period for the last 2–10 years (Table 5.6). For example, health officials will have noted that the 11 cases reported for Clark County during weeks 1–4 during 1991 exceeded the numbers reported during the same 4-week period during the previous 3 years. A related method involves comparing the cumulative
Table 5.6 Reported Cases of Hepatitis A, by County for Weeks 1–4, 1988–1991
Analysis of long-term time trends, also known as secular trends, usually involves graphing occurrence of disease by year. Figure
5.1 illustrates the rate of reported cases of malaria for the United States during 1932–2003. Graphs can also indicate the occurrence of events thought to have an impact on the secular trend (e.g., implementation or cessation of a control program or a change in the method of conducting surveillance). Figure 5.2 illustrates reported morbidity from malaria for 1932–1962, along with events
and control activities that influenced its incidence. 2
Figure 5.1 Rate (per 100,000 Persons) of Reported Cases of Malaria, By Year — United States, 1932– 2003
Adapted from: Centers for Disease Control and Prevention. Summary of notifiable diseases, United States, 1993b. MMWR 1993; 42(53):38, and Langmuir AD. The surveillance of communicable diseases of national importance. N Engl J Med 1963;268:182–92.
Figure 5.2 Reported Malaria Morbidity — United States, 1932–1962
Adapted from: Centers for Disease Control and Prevention. Summary of notifiable diseases, United States, 1993b. MMWR 1993;42(53):38, and National Notifiable Diseases Surveillance System [Internet]. Atlanta: CDC [updated 2005 Oct 14; cited 2005 Nov 16]. Available from: http://www.cdc.gov/mmwr/preview/mmwrhtml/00035381.htm , Langmuir AD. The surveillance of communicable diseases of national importance. N Engl J Med 1963;268:184.
Statistical methods can be used to detect changes in disease occurrence. The Early Aberration Detection System (EARS) is a package of statistical analysis programs for detecting aberrations or deviations from the baseline, by using either long- (3–5 years) or
short-term (as short as 1–6 days) baselines. 16 Analyzing by place
The analysis of cases by place is usually displayed in a table or a map. State and local health departments usually analyze surveillance data by neighborhood or by county. CDC routinely analyzes surveillance data by state. Rates are often calculated by adjusting for differences in the size of the population of different counties, states, or other geographic areas. Figure 5.3 illustrates lung cancer mortality rates for white males for all U.S. counties for 1998–2002. To deal with county-to-county variations in population
size and age distribution, age-adjusted rates are displayed.
Figure 5.3 Age-Adjusted Lung and Bronchus Cancer Mortality Rates (per 100,000 Population) By State — United States, 1998–2002
Data Source: National Cancer Institute [Internet] Bethesda: NCI [cited 2006 Mar 22] Surveillance Epidemiology and End Results (SEER). Available from: http://seer.cancer.gov/faststats/ .
The advent of geographic information systems (GIS) allows more robust analysis of data by place and has moved spot and shaded, or
choropleth
, maps to much more sophisticated applications. 17 Using GIS is particularly effective when different types of
information about place are combined to identify or clarify geographic relationships. For example, in Figure 5.4, the absence or presence of the tick that transmits Lyme disease, Ixodes
scapularis,
are illustrated superimposed over habitat suitability. 18 Such software packages as SatScan™ (Martin Kulldorff, Harvard
University and Information Management System, Inc., Silver Spring, Maryland), EpiInfo™ (CDC, Atlanta, Georgia), and Health Mapper (World Health Organization, Geneva, Switzerland) provide GIS functionality and can be useful when analyzing
surveillance data. 19-21
Figure 5.4 Predictive Risk Map of Habitat Suitability for Ixodes scapularis in Wisconsin and Illinois
Source: Guerra M, Walker E, Jones C, Paskewitz S, Cortinas MR, Stancil A, Beck L, Bobo M, Kitron U. Predicting the risk of Lyme disease: habitat suitability for Ixodes scapularis in the north central United States. Emerg Infect Dis. 2002;8:289–97.
Analyzing by time and place As a practical matter, disease occurrence is often analyzed by time
and place simultaneously. An analysis by time and place can be organized and presented in a table or in a series of maps highlighting different periods or populations (Figures 5.5 and 5.6).
Figure 5.5 Age-Adjusted Colon Cancer Mortality Rates* for White Females by State — United States, 1950–1954, 1970–1974, and 1990–1994
*Scale based on 1950–1994 rates (per 100,000 person years). Data Source: Customizable Mortality Maps [Internet] Bethesda: National Cancer Institute [cited 2006 Mar 22]. Available from: http://ratecalc.cancer.gov/ratecalc//.
Figure 5.6 Age-Adjusted Colon Cancer Mortality Rates* for White Males by State — United States, 1950–1954, 1970–1974, and 1990–1994
*Scale based on 1950–1994 rates (per 100,000 person years). Data Source: Customizable Mortality Maps [Internet] Bethesda: National Cancer Institute [cited 2006 Mar 22]. Available from: http://ratecalc.cancer.gov/ratecalc// .
Analyzing by person The most commonly collected and analyzed person characteristics
are age and sex. Data regarding race and ethnicity are less consistently available for analysis. Other characteristics (e.g., school or workplace, recent hospitalization, and the presence of such risk factors for specific diseases as recent travel or history of cigarette smoking) might also be available and useful for analysis, depending on the health problem.
Age
Meaningful age categories for analysis depend on the disease of interest. Categories should be mutually exclusive and all-inclusive. Mutually exclusive means the end of one category cannot overlap with the beginning of the next category (e.g., 1–4 years and 5–9 years rather than 1–5 and 5–9). All-inclusive means that the categories should include all possibilities, including the extremes Meaningful age categories for analysis depend on the disease of interest. Categories should be mutually exclusive and all-inclusive. Mutually exclusive means the end of one category cannot overlap with the beginning of the next category (e.g., 1–4 years and 5–9 years rather than 1–5 and 5–9). All-inclusive means that the categories should include all possibilities, including the extremes
Standard age categories for childhood illnesses are usually <1 year and ages 1–4, 5–9, 10–14, 15–19, and ≥20 years. For pneumonia and influenza mortality, which usually disproportionally affects older persons, the standard categories are <1 year and 1–24, 25–44, 45–64, and ≥65 years. Because two-thirds of all deaths in the United States occur among persons aged ≥65 years, researchers often divide the last category into ages 65–74, 75–84, and ≥85 years.
The characteristic age distribution of a disease should be used in deciding the age categories — multiple narrow categories for the peak ages, broader categories for the remainder. If the age distribution changes over time or differs geographically, the categories can be modified to accommodate those differences.
To use data in the calculation of rates, the age categories must be consistent with the age categories available for the population at risk. For example, census data are usually published as <5 years, 5–9, 10–14, and so on in 5-year age groups. These denominators could not be used if the surveillance data had been categorized in different 5-year age groups (e.g., 1–5 years, 6–10, 11–15, and so forth).