Effects of training and information on t

Journal of Applied Psychology
IMS, Vol. 73, No. 2, 146-153

Copyright 1988 by the American Psychological Association, Inc.
0021-9010/88/S00.75

Effects of Training and Information on the Accuracy and
Reliability of Job Evaluations
David C. Hahn

Robert L. Dipboye
Rice University

ENSERCH Corporation
Houston, Texas

Subjects evaluated 23 benchmark jobs on 10 dimensions after either receiving training or no training
in job evaluation and after being given only a title, only a job description, or both a title and job
description. The amount of information affected primarily the accuracy and reliabi lity of the ratings.
Those given both a title and a job description were generally more accurate and reliable in their
ratings than those given only a title. Trained subjects demonstrated less leniency and greater dispersion than did those who were untrained. Although training affected primarily the distribution of

ratings, the most accurate and reliable subjects tended to be those who received training as well as
full information in the form of a job description and title.

The primary function of job evaluation is to determine the
relative worth of jobs in order to maintain equity in a compensation system. Although it is seldom the only basis for setting
wages, the role of job evaluation in salary administration has
grown in significance as more organizations, particularly in the
public sector, have attempted to implement comparable worth
policies. It is surprising then that so little is known about the
factors influencing the quality of these ratings. The present experiment examined two such factors: the amount of information that job analysts have on the jobs they are rating and their
training in job evaluation procedures.
The quality of job evaluations has been assessed in past research most frequently in terms of interrater reliability, or the
extent to which different evaluators give similar ratings to the
same jobs. The results have varied considerably across studies
(Ash, 1948; Doverspike & Barrett, 1984; Doverspike, Carlisi,
Barrett, & Alexander, 1983; Fraser, Cronshaw, & Alexander,
1984; Gomez-Mejia, Page, & Tornow, 1982; Lawshe & Wilson,
1947;Madigan, 1985; Schwab &Heneman, 1986), but reliabilities in the 70s and 80s appear to be the norm. Although quite
acceptable for some uses, these reliabilities may be lower than
needed for setting wages (Treiman, 1979). Unfortunately, the

crucial issue of how to increase the reliability with which job
analysts evaluate jobs has been largely ignored. Additionally,
none of the research has dealt with how to improve accuracy,
01 the extent to which ratings on job evaluation dimensions correspond to criterion or target scores on those dimensions (Cronbach, 1955).
One means of improving the accuracy and reliability of evaluations might be to increase the amount of information provided on the jobs being rated. As obvious as this may seem, the

research so far is quite mixed in showing effects of information.
Two studies have shown that job analysts who have a large
amount of job information differ very little from those given
only a small amount of job information. Smith and Hakel
(1979) found that college students given only the titles of jobs
did not appreciably differ on interrater agreement (r = .51)
from the agreement obtained with students given job specifications (r = .49), job incumbents (r = .59), job supervisors (r =
.63), and job analysts (r = .63). Consistent with this hypothesis,
Arvey, McGowen, and Dipboye (1982) found differences due
to the amount of information on only 2 of 13 Position Analysis
Questionnaire (PAQ) dimensions, regardless of the fact that the
high-information group of job analysts was given twice as many
task statements. In contrast to the Smith and Hakel (1979) and
Arvey et al. (1982) studies, other research has provided more

support for the notion that increasing the amount of job information available to raters not only changes but improves job
ratings (Cornelius, DeNisi, & Blencoe, 1984; Harvey & Lozada-Larsen, 1987; Madden & Giorgia, 1965). The present experiment extended this previous research by examining the independent effects of job title, job description, and the combination of title and job description on both the accuracy and reliability of job evaluations.
Training in job evaluation techniques is a second factor that
should influence the quality of job evaluations. Trained job analysts were used in most of the previous studies, with the extent
of training ranging from 1- or 2-hr orientations (Arvey et al.,
1982; Doverspike & Barrett, 1984) to intensive sessions lasting
1 or more days (Schwab &Heneman, 1986;Madigan&Hoover,
1986). Whether training job analysts improves job evaluations
has not been examined, but: the findings of research on training
in performance appraisal are probably generalizable to job evaluation. A well-established finding is that raters can be trained
to change the distribution of performance ratings (e.g., halo,
leniency, and range). Unfortunately, such training is ineffective
in improving accuracy or reliability (Bernardin & Buckley,
1981) and can even reduce the accuracy of the ratings (Bernardin & Pence, 1980; Cooper, 1981). More recent research has
shown that training that is focused on improving the accuracy

This study is partially based on a dissertation completed by the first
author at Rice University.
The authors appreciate the helpful suggestions of William Howell,
R. J. Harvey, and Ronald Taylor.

Correspondence concerning this article should be addressed to
Robert L. Dipboye, Department of Psychology, Rice University, Houston, Texas 77251.
146

JOB EVALUATIONS
of ratings, rather than changing response distributions, can improve both the accuracy (Pulakos, 1984, 1986) and reliability
(Pulakos, 1986) of appraisals. Consistent with these recent
findings, the training in the present investigation focused on

147

7 women), trained/job title and job description (n = 22, 11 men and 11
women), not trained/job title only (n = 24, 12 men and 12 women), not
trained/job description only (n = 23, 11 men and 12 women), or not
trained/job title and job description (n = 24,10 men and 14 women).

increasing accuracy.
In summary, we examined the extent to which training in job
evaluation and job information could increase the accuracy and
reliability of job ratings. An attempt was made to improve on

previous research in several respects. First, much of the aforementioned research was concerned with inventory methods of
job analysis, such as the PAQ. These methods can be used for
job evaluation but are more task-specific than the generic scales
typically used as the basis for wage and salary structures. We
examined the effects of training and information on scales more
representative of those used in job evaluation. Second, many of
the past comparisons of expert and nonexpert raters have been
difficult to interpret because they have confounded experience
and training in the use of the rating system with prior familiarity with the jobs. The experts in our experiment were persons
who were more knowledgeable and experienced in job evaluation but were no more familiar with the specific jobs being evaluated than were the lay raters. By holding familiarity constant,
we hoped to determine the effects of experience and training in
job evaluation, independent of experts' knowledge of the idiosyncratic features of specific jobs. Third, individual studies
have focused on the distribution of ratings, reliability, or accuracy, but the present research is the first to look at all three of
these in the context of the same study.
In the absence of any clear evidence on the effects of training
and information on job evaluations, our two hypotheses largely
derive from common sense:
Hypothesis 1. Persons given only job titles will be less accurate and reliable in their evaluations of jobs than will persons
given more complete information on the tasks performed in the
jobs.

Hypothesis 2. Persons who are trained in how to make accurate and reliable job evaluations will be more accurate and reliable than persons who are untrained.

Method
Summary of Design
The experiment consisted of a 3 X 2 X 2 factorial experiment, in
which the factors were the amount of information provided on the jobs,
the training provided raters, and the sex of the rater. In the manipulation
of amount of information, one third of the subjects were given only a
job title, another third only a job description, and the remainder a job
title and a description. In the manipulation of the second factor, one
half of the subjects received training in job evaluation procedures,
whereas the other half did not. The third factor was included to assess
the effects of an imbalance across conditions in the proportion of male
and female subjects.

Subjects
Undergraduate college students served as our nonexpert raters. They
were mostly White (86%) and ranged in age from 17 to 27 years, with
slightly more men than women (55% men). Subjects participated in return for course credit in psychology or economics and were randomly
assigned to one of the six conditions: trained/job title only (n = 23, 17

men and 6 women), trained/job description only (n = 21, 14 men and

Procedure
Jobs rated. The jobs subjects evaluated were from a 3,800-employee
health science organization that performed a variety of functions related to medical care, research, and training. A total of 23 jobs were
selected from a larger pool of 343 titles.
Job information manipulation. Written job descriptions are the most
common source of information in job evaluations, according to Schwab
and Grams (1985), and were the primary source of information in the
present study. A job description was prepared for each of the 23 jobs on
the basis of intensive interviews with job incumbents and a review of
available information. Each job description included a narrative summary of the important aspects of the job, a listing of the 10 most important tasks (as determined through interviews with incumbents), and
personal requirements such as education, experience, and certification
or licenses. (See Appendix for information provided in full-information
condition on 1 of the 23 jobs.) Three booklets of job information were
compiled; each corresponded to one of the three conditions of the information manipulation.
Training manipulation. Subjects were randomly assigned to the two
training conditions. Of the subjects, 66 were trained in job evaluation
procedures and in the specific scales used in the experiment. The remaining 71 subjects evaluated the jobs without any prior training.
Two trainers were used, both of whom were senior PhD students in

industrial psychology, with previous experience and training in job
analysis and job evaluation. Nine sessions in all were conducted in
which subjects were trained in job evaluation procedures. To keep the
training standard across these sessions, both trainers followed the same
lecture outline and used the same exercises, audiovisuals, and examples.
Also, one of the two trainers attended the first three sessions of the other
trainer to further ensure that they would conduct the training in the
same manner. The training sessions lasted 1 hr and were divided into
two major sections. The first section consisted of a 5-min lecture on
basic concepts and definitions pertaining to job analysis, job evaluation,
and rating accuracy. The major objective of the remaining 55 min was
to define the rating dimensions so that subjects could make accurate
ratings on these dimensions. First, the trainer described the 10 scales in
detail, providing examples of the anchors on each scale. Next, subjects
evaluated a sample job on the 10 dimensions and received group feedback on their ratings.
Rating instructions. Subjects in all of the conditions were given a
description of the organization and were told that they would rate 23
jobs to determine the salaries of the persons in these positions. Subjects
were instructed to take as much time as they needed to complete the
task and to refer back to the description of the dimension anchors frequently. The amount of time actually spent on the task ranged from I lh

to 3 hr.

Dependent Measures
Job evaluation dimensions. Subjects evaluated the 23 job on 10 job
evaluation dimensions. Of the dimensions, 6 were drawn from Fine and
Wiley's (1971) Functional Job Analysis system. Subjects rated the level
of complexity with which an incumbent must deal with data, people,
and things, as well as the complexity of reasoning, language, and math
required to perform the job adequately. The number of response alternatives ranged from 9 in the case of the people dimension (1 = taking
instructions-helping: 9 = mentoring) to 5 in the case of the language,
math, and reasoning dimensions. Subjects also used 5-point scales to
rate each job on the Hay (Milkovich & Newman, 1984) dimensions of
know-how, problem solving, and accountability. Finally, the subjects

148

DAVID C. HAHN AND ROBERT L. DIPBOYE

used another 5-point scale to rate the jobs on Jacque's (1961) time span
of discretion (TSD). Time span of discretion is basically a measure of

the amount of time it would take before inadequate job performance
becomes evident. Jobs with longer TSD are viewed as being more critical to the organization than are those with shorter TSD. Given that
calculation of several of our dependent measures required that all of the
dimensions have the same number of scale intervals, a linear transformation was used to convert responses on each dimension to a 9-point
scale.
Reliability. The average correlation among raters in their evaluations
of a job was used as the measure of reliability. In the analysis of variance
(ANOVA) conducted to test hypotheses pertaining to reliability, jobs
were treated as a random effect, with one reliability coefficient being
contributed by each of the 12 experimental groups for each of the 23
jobs. To obtain the reliability coefficient for a job within a condition,
each subject's ratings of a job on the 10 dimensions were correlated
with the ratings on the same dimensions of every other subject in that
condition. After calculating all correlations among the «(n - 1) pairs of
raters in a condition, these correlations were transformed to Fisher's z
equivalents, the average of these zs were calculated and then the average
2 was converted back to a correlation. This method of estimating interrater reliability focuses on the extent of agreement among raters in the
use of rating dimensions. Note that there are several other approaches
to estimating reliability (e.g., intraclass correlation, test-retest, and internal consistency) but that interrater agreement has been the most frequent method used in research on job analysis and job evaluation (e.g.,
Cain & Green, 1983; DeNisi, Cornelius, & Blencoe, 1987; Friedman &

Harvey, 1986).
Rating accuracy. The effects of training and information on accuracy
were evaluated in two ways. One measure consisted of the discrepancy
between each rater's evaluations of the jobs and target scores obtained
by averaging the ratings of 10 experts on each of the 10 dimensions for
each job. The target score should be viewed as referent scores that were
still subject to error but came closer to representing the hypothetical
true values than the nonexpert ratings. The 10 experts ranged in age
from 22 to 39 years. Two were employed in the personnel department
of a small local organization and had recently completed a job evaluation of 123 positions. Of the remaining experts, 7 were graduate students in psychology and the other was a final-semester undergraduate.
Consistent with our intention to hold familiarity with jobs constant,
the experts reported no more familiarity with the 23 jobs than did the
nonexpert subjects, F(6, 140) = .71,p> .05.
As a check on the convergent and discriminant validity with which
our experts evaluated the jobs, we analyzed their ratings in a Job X
Dimension X Rater design described by Kavanagh, MacKinney, and
Wolins (1971). Significant effects were found for job, F(22, 1782) =
107.21, p < .001; the Job X Dimension interaction, ^198, 1782) =
8.67, p< . 001; and the Job X Rater interaction, ^(198, 1782) = 3.17,
p < .05. The intraclass indices were .52, .43, and .18 for the effects of
job, Job X Dimension, and Job X Rater, respectively. The convergent
and discriminant validity demonstrated in the ratings of our experts are
comparable with those shown by experts in research on accuracy of
performance appraisals (Murphy & Balzer, 1986).
To test the hypotheses pertaining to accuracy, an examination was
made of the differences between subjects' ratings of the jobs and the
experts' ratings. Cronbach (1955) noted that global indexes of judgmental accuracy, in which one simply sums the squared differences between a judge's ratings and target values, can be highly misleading. As
an alternative, Cronbach suggested decomposing global scores and analyzing the separate components. We examined four of Cronbach's
(1955) accuracy components as dependent measures in the present experiment: (a) elevation—the accuracy of the grand mean of a rater's
evaluations across all 23 jobs and 10 dimensions; (b) differential elevation—the accuracy of the mean rating given to each job across all 10 of
the dimensions; (c) stereotype accuracy—the accuracy of the mean rating given on each of the 10 dimensions across all 23 jobs; and (d) differ-

ential accuracy—accuracy in discriminating among the 23 jobs on each
of the 10 job evaluation dimensions.
For each of the accuracy components, lower scores reflect higher accuracy. The formulas for each of these components can be found in
Cronbach (1955) and in Pulakos (1986). Cronbach (1958) later elaborated on the 1955 article and noted that a difference on a global dyadic
difference between two groups could be attributed to no less than 243
different patterns of results on five parameters! Nevertheless, the 1955
accuracy component model has been the more frequently used model
in personnel research (Nathan & Alexander, 1985) and appeared best
suited to analyzing the data of the present experiment.
Another index of the accuracy of job evaluation ratings was the correlation between each subject's ratings and the actual job evaluation used
in the administration of the compensation system. These scores had
been obtained by a professional consulting firm using the Hay pointmethod system and job evaluation committees. Data were available on
three of the dimensions used in this system: know-how, problem solving, and accountability. The measure of rating accuracy was the correlation between each subject's ratings of the 23 jobs on a dimension and
the actual rating of the 23 jobs used in the compensation system of the
organization. Although job evaluations such as these can be biased by a
variety of factors, including market wages and other exogenous factors
(Treiman, 1979), they provided a unique opportunity to assess the extent to which information and training could allow subjects to capture
the current policy of the company.
In addition to the job evaluations, experts and subjects estimated the
minimum and maximum salaries of incumbents in each of the 23 jobs.
As an exploratory analysis, we compared the correlations between these
estimates and the actual minimum and maximum salaries.
Rating distribution measures. Three characteristics of the rating distribution were included in the analyses for exploratory purposes. The
two dispersion measures were concerned with the extent to which a rater
showed variation in his or her ratings across dimensions or jobs. The
dimension dispersion measure was computed for each rater by first taking the average of the rater's ratings across the 23 jobs on each of the
10 dimensions and then computing the standard deviation of these 10
dimension ratings. The job dispersion measure was computed by taking
the average of a rater's evaluations across the 10 dimensions for each of
the 23 jobs and computing a standard deviation of these average job
ratings. Finally, the leniency of each rater's ratings was computed by
averaging all of the evaluations across all 10 dimensions for all 23 jobs.
In contrast to much of the past research, the distribution measures were
viewed in the present study as rating tendencies rather than as rating
errors (Cooper, 1981; Pulakos, Schmitt, & Ostroff, 1986).

Results
Effects of Training on Knowledge of Job Evaluation
As a check on whether subjects who were trained differed
from those who were untrai ned on their knowledge of job evaluation, subjects were given a 10-item multiple-choice quiz. Some
questions dealt with general definitions and concepts related to
evaluating jobs, whereas others were more specific to the scales
that they would be using in the experiment. Subjects who were
trained obtained higher scores on the test than did subjects who
were not trained, F(l, 131) = 70.19,p < .001.

Interrater Reliability in Evaluating Jobs
Effects

of training and information on reliability. To test the

effects of the manipulations on reliability, a 3 X 2 X 2 (Information X Training X Sex) ANOVA was conducted. As previously
mentioned, the ANOVA model was a completely replicated de-

149

JOB EVALUATIONS

Table 1
Interrater Reliabilities in Each Condition for Jobs and for Evaluation Dimensions
Condition
Measure
Jobs
Average across 23 jobs
Low
High
Dimensions
Average across 10 dimensions
Range
Low
High

Experts

NT/TO

NT/DO

NT/FI

T/TO

T/DO

T/FI

.720

.490

.510

.560

.440

.530

.610

.160
.904

.242
.701

.224
.739

.233
.777

.180
.603

.259
.736

.239
.760

.670

.430

.540

.510

.460

.550

.600

.389
.792

.181
.606

.259
.647

.217
.651

.349
.545

.242
.707

.248
.770

Note. NT = No training; T = training; TO = title only; DO = description only; Fl = full information.

sign in which the three factors were fixed effects and job was a
random effect. The data analyzed were 23 interrater correlations (one for each of the 23 jobs) in each of the 12 conditions.
To reduce the risk of a Type I error resulting from the repeated measures nature of the design, an alpha ofp < .001 was
used as the criterion for rejection of the null hypothesis. Using
this criterion, three effects were found to be significant: a main
effect for information, the interaction of information and sex,
and the interaction of information and training. The strongest
finding was the main effect for information, F(2, 44) = 22.61,
p < .001, a2 = .18, in which more information was related to
higher levels of reliability (title only, r = .46; description only,
r = .51; and full information, r = .57). Newman-Keuls post hoc
tests revealed that the differences among the three conditions
were all statistically significant. Also significant were the interactions of Information X Training, F(2, 44) = 9.70, p < .001,
2 = .02, and the standard

ing effects were found for the standard deviation of the ratings

sure and two correlational components, DE, and DA, (Cron-

deviation of the ratings across dimensions, F(\, 125) = 3.04,

bach, 1955). Significant effects for information were found for

p < .10, w2 = .01. Subjects who were trained tended to show

all three of these measures. Subjects with only the title were less

more dispersion than those who were untrained on jobs (Ms =

accurate (M = 0.37 for DA,, 0.69 for DE,, and 4.182 for DA2)

2.11 vs. 2.04) and on dimensions (Ms = 1.85 vs. 1.78).

than those with a description only (M = 0.40 for DA,, 0.83 for

Correlations between job evaluation ratings and actual rat-

DE,, and 3.434 for DA2) and full information (M = 0.44 for

ings and salaries. Also related to the hypotheses was the ques-

DA,, 0.83 for DE,, and 3.283 for DA2). The only other effect was

tion of how well subjects in the various conditions could esti-

a significant Training X Information interaction found for DE,

mate the job evaluations actually used by the organization and

and DA2, in which trained subjects were more accurate than
untrained subjects in the full-information condition, but the re-

the salaries of present employees. Correlations between the ratings obtained by a professional consulting firm on three dimen-

verse held true in the title-only condition.

sions and the organization's ratings are reported in Table 3. Ex-

Information X Training interactions were found for both

perts' ratings tended to correspond more closely to actual rat-

differential accuracy, f\2, 125) - 5.47, p < .01, a2 = .06, and

ings than did the subjects' ratings. However, the combination of

differential elevation, F(2, 125) = 3.00, p < .05, a2 = .02. These

training and full information elevated the correlations of the

interactions revealed patterns similar to those reported for in-

nonexperts almost to the level of experts. A Dunnett test

terrater agreement. The effects of information on differential

(Winer, 1971) was computed comparing the expert group and

elevation appeared stronger when raters were trained (Ms =

each of the experimental conditions. For problem solving and

1.25, .746, and 0.61 for title only, description only, and full in-

know-how, the expert group's correlations were significantly

formation, respectively) than when they were untrained (Ms =

higher than the correlations achieved in the training/title-only,

1.04,0.73, and 0.77, for title only, description only, and full in-

no training/title-only, and the

formation, respectively). Similarly, the effects of information on

groups. No significant differences were found for account-

differential accuracy were somewhat more pronounced when

ability.
The correlations between estimates of the minimum and

raters were trained (Ms = 2.07, 1.96, and 1.60 for the title only,
description only, and full information, respectively) than when
they were untrained (Ms = 1.91,1.57,and 1.84 for the title only,

no-training/full-information

maximum job salaries and the actual minimum and maximum
salaries are also reported in Table 3. Dunnett tests revealed that

description only, and full information, respectively). Stronger

the experts were better at estimating minimum salary than were

effects were found for differential accuracy than for differential

subjects in the training/title-only, no-training/title-only, no-

elevation. Although the former is often viewed as the most im-

training/description-only,

portant accuracy component in interpersonal perception, one

conditions. Also, the experts were better at estimating maxi-

and

no-training/full-information

could argue that differential elevation (i.e., accuracy of raters

mum salary than were subjects in all except the training/des-

in differentiating among jobs on all dimensions combined) is

cription-only condition.

151

JOB EVALUATIONS

Table 3
Correlations Between the Actual Hay Ratings and Salaries and the Subjects' Estimates
Condition
Measure
Hay dimension
Know-how
Problem solving
Accountability
Salary
Minimum
Maximum

Experts

NT/TO

NT/DO

NT/FI

T/TO

T/DO

T/F1

.76
.72
.53

.44
.57
.35

.58
.63
.39

.44
.56
.32

.47
.50
.36

.65
.63
.37

.70
.66
.45

.77
.80

.60
.64

.65
.71

.64
.68

.57
.62

.67
.73

.67
.70

Note. All the correlations significantly differed from zero, p < .01, where n = 23, corresponding to the number of jobs used in the study. For each
rater within a condition, a correlation was computed between that rater's evaluations or salary estimates on each of the 23 jobs and the actual
evaluations or salaries. Consequently, there was one correlation for each subject in a condition and the values reported in the table represent the
average of the correlations. NT = no training; T = training; TO = title only; DO = description only; FI = full information.

Discussion

iar with the jobs than were the subjects in the full-information
condition, these differences seem attributable to their greater

The present experiment tested the hypotheses that training
raters and providing them with information on the tasks in the
jobs increases the accuracy and reliability of their job evaluations. Data provided stronger support for the hypotheses with
respect to the effects of information than the effects of training.
Generally, subjects who received both a title and a description
of the job were more reliable than subjects receiving a title only
or a description only. More information also led to higher accuracy, as measured with differential elevation, differential accuracy, and the correlation between subjects' ratings and the organization's actual evaluations.
The effects of information on accuracy and reliability are not
as obvious as they may appear at first glance. The dimensions
found in job evaluation measures are typically rather abstract
(e.g., know-how and accountability). If, as suggested by Smith
and Hakel (1979), people share stereotypes regarding differences among jobs on these dimensions and if these stereotypes
are relatively accurate, then increased information on the job
may do little to increase the accuracy and reliability of ratings.
Contrary to Smith and Hakel's hypothesis, even the small increment in information represented by the full-information condition boosted the accuracy and reliability of job evaluations
above the baseline levels obtained in the title-only condition.
Training had stronger effects on the distribution of job evaluations than on the reliability and accuracy of these ratings. Despite the admonition to focus on accuracy of ratings, the strategy subjects apparently derived from their training was to increase the variance and severity of their ratings. Although we
cannot claim strong support for the hypothesized main effects
of training, training did appear to moderate the effects of infor-

experience and training in job evaluation procedures.
At the same time that the findings show some positive effects
for training, other aspects of the data suggest circumstances in
which training raters can be dysfunctional. Specifically, untrained raters in the title-only condition tended to show higher
levels of agreement and greater accuracy than did the trained
raters. One possible explanation is that untrained raters tended
to rely more on shared conceptions of the relative position of
the jobs on the dimensions in the title-only condition, whereas
trained raters tended to avoid stereotypic evaluations in this
same condition and relied on more idiosyncratic conceptions
of the jobs. Thus, training raters who have information on jobs
can bring them close to the level of experts on their reliability
and accuracy, but training raters who lack sufficient information might do more harm than good.
The usual precautions should be observed in generalizing
these findings to organizational contexts. Although we found
that a small amount of information and minimal training improved the ratings of inexperienced raters, this does not mean
that organizations should seek the nearest introductory psychology class when looking for job analysts. Even if no differences were found between experts and nonexperts, compelling
legal reasons would still exist for relying on the opinion of professionally trained and knowledgeable raters. Moreover, the
utility of a job evaluation system depends on not only its psychometric characteristics but also how well it is accepted by
management and employees (Gomez-Mejia et al., 1982). What
the findings of this study do suggest is that when inexperienced
persons are given the responsibility of evaluating jobs, they may

mation on accuracy and reliability. The most reliable judg-

not need massive amounts of training or job information for

ments tended to be obtained when a description and a title were
accompanied by training. Similarly, increased information was

the reliability and accuracy of their ratings to rise substantially

more likely to increase differential elevation and differential ac-

above the baseline their ratings can be improved and at what

curacy when raters were trained. Finally, correlations between

point a diminishing return may be found, are likely to be influ-

the organization's ratings and the subjects' ratings were at their

enced by a number of factors that will need to be investigated

above baseline levels of accuracy and reliability. Just how far

highest levels when full information was combined with train-

in future research. Attention also needs to be given to the effects

ing. Higher reliabilities were found for the experts across most

of factors such as the political machinations that frequently ac-

jobs and dimensions, providing further evidence of the possible

company job evaluations, market wages (Rynes & Milkovich,

benefits of training. Given that the experts were no more famil-

1986; Schwab & Wichern, 1983), the analyst's knowledge of the

152

DAVID C. HAHN AND ROBERT L. DIPBOYE

current pay level (Grams & Schwab, 1985; Schwab & Grams,
1985), and ratings by committee.
In conclusion, the present findings are encouraging insofar as
they show that relatively small amounts of training and information can improve the quality of job evaluation ratings. More
important, however, the present research encourages further research on interventions to improve the accuracy and reliability
of job evaluation measures.

References
Arvey, R. D., McGowen, S. L., & Dipboye, R. L. (1982). Potential
sources of bias in job analytic processes. Academy of Management
Journal, 25, 618-629.
Ash, P. (1948). The reliability of job evaluation rankings. Journal of
Applied Psychology, 32, 313-320.
Bernardin, H. J., & Buckley, M. R. (1981). Strategies in rater training.
Academy of Management Review, 6,205-212.
Bernardin, H. J., & Pence, E. C. (1980). Effects of rater training: Creating new response sets and decreasing accuracy. Journal of Applied
Psychology, 65,60-66.
Cain, P. S., & Green, B. F. (1983). Reliabilities of selected ratings available from the Dictionary of Occupational Titles. Journal of Applied
Psychology, 68, 155-165.
Cooper, W. H. (1981). Ubiquitous halo. Psychological Bulletin, 90,218224.
Cornelius, E. X, DeNisi, A. S., & Blencoe, A. G. (1984). Expert and
naive raters using the PAQ: Does it matter? Personnel Psychology, 3 7,
453-464.
Cronbach, L. J. (1955). Processes affecting scores on "understanding of
others" and "assumed similarity." Psychological Bulletin. 52, 177193.
Cronbach, L. J. (195 8). Proposals leading to analytic treatment of social
perception scores. In R. Tagiuri & L. Petrullo (Eds.), Person perception and interpersonal behavior (pp. 353-379). Stanford, CA: Stanford University Press.
DeNisi, A. S., Cornelius, E. T. Ill, & Blencoe, A. G. (1987). Further
investigation of common knowledge effects on job analysis ratings.
Journal of Applied Psychology, 72, 262-268.
Doverspike, D., & Barrett, G. V. (1984). An internal bias analysis of a
job evaluation instrument. Journal of Applied Psychology, 69, 648662.
Doverspike, D., Carlisi, A. M., Barrett, G. V., & Alexander, R. A.
(1983). Generalizability analysis of a point-method job evaluation
instrument. Journal of Applied Psychology, 68, 476-483.
Fine, S. A., & Wiley, W. W. (1971). An introduction to functional job
analysis methods for manpower analysis. (Monograph No. 4). Kalamazoo, MI: W. E. Upjohn Institute.
Fraser, S. L., Cronshaw, S. F., & Alexander, R. A. (1984). Generalizability analysis of a point-method job evaluation instrument: A field
study. Journal of Applied Psychology, 69,643-647.
Friedman, L., & Harvey, R. J. (1986). Can raters with reduced job descriptive information provide accurate Position Analysis Questionnaire (PAQ) ratings? Personnel Psychology, 39, 779-790.
Gomez-Mejia, L. R., Page, R. C., & Tornow, W. W. (1982). A comparison of the practical utility of traditional, statistical, and hybrid job

evaluation approaches. Academy of Management Journal, 25, 790809.
Grams, R., & Schwab, D. (1985). An investigation of systematic genderrelated error in job evaluation. Academy of Management Journal, 28,
279-290.
Harvey, R. J., & Lozada-Larsen, S. R. (1987). Influence of amount of
information of job descriptive information on job analysts' rating accuracy. Unpublished manuscript, Rice University, Houston, TX.
Jacques, E. (1961). Equitable payment. New York: Wiley.
Kavanaugh, M., MacKinney, A., & Wolins, L. (1971). Issues in managerial performance: Multitrait-multimethod analysis of ratings. Psychological Bulletin, 75, 34-49.
Lawshe, C. H., & Wilson, R. F. (1947). Studies in job evaluation: VI.
The reliability of two-point rating systems. Journal of Applied Psychology, 31, 355-365.
Madden, J. M., & Giorgia, M. J. (1965). Identification of job requirement factors by use of simulated jobs. Personnel Psychology, 18,321331.
Madigan, R. M. (1985). Comparable worth judgments: A measurement
properties analysis. Journal of Applied Psychology, 70,137-147.
Madigan, R. M., & Hoover, D. J. (1986). Effects of alternative job evaluation methods on decisions involving pay equity. Academy of Management Journal, 29, 84-100.
Milkovich, G. X, & Newman, J. M. (1984). Compensation. Piano, TX:
Business Publications.
Murphy, K. R., & Balzer, W. K. (1986). Systematic distortions in memory-based behavior ratings and performance evaluation: Consequences for rating accuracy. Journal of Applied Psychology, 71, 3944.
Nathan, B. R., & Alexander, R. A. (1985). The role of inferential accuracy in performance rating. Academy of Management Review, 10,
109-115.
Pulakos, E. D. (1984). A comparison of rater training programs: Error
training and accuracy training. Journal of Applied Psychology, 69,
581-588.
Pulakos, E. D, (1986). The development of training programs to increase accuracy with different rating tasks. Organization Behavior
and Human Decision Processes, 38,76-91.
Pulakos, E. D., Schmitt, N., & Ostroff, C. (1986). A warning about the
use of a standard deviation across dimensions within ratees to measure halo. Journal of Applied Psychology, 71, 29-32.
Rynes, S. L., & Milkovich, G. T. (1986). Wage surveys: Dispelling some
myths about the "market wage." Personnel Psychology, 39,11-90.
Schwab, D. P., & Grams, R. (1985). Sex-related errors in job evaluation:
A "real-world" test. Journal of Applied Psychology, 70.533-539.
Schwab, D. P., & Heneman, H. G. Ill, (1986). Assessment of a consensus-based multiple information source job evaluation system. Journal of Applied Psychology, 71, 354-356.
Schwab, D. P., & Wichern, D. W. (1983). Systematic bias in job evaluation and market wages: Implications for the comparable worth debate. Journal of Applied Psychology, 68, 60-69.
Smith, J. E., & Hakel, M. D. (1979). Convergence among data sources,
response bias, and reliability and validity of a structured job analysis
questionnaire. Personnel Psychology, 32, 677-692.
Treiman, D. J. (1979). Job evaluation: An analytical review (Interim
report to the Equal Employment Opportunity Commission). Washington, DC: National Academy of Sciences.
Winer, B. J. (1971). Statistical principles in experimental design (2nd
ed.). New York: McGraw-Hill.

153

JOB EVALUATIONS

Appendix
Assistant Chief Radiologic Technologist
Position component

Definition

Job description

Provides supervision and performs a variety of tasks related to the operation of diagnostic
radiology.

Education

Graduate of an approved 2-yr program in radiologic technology including an internship.

Experience

Three years experience as a radiologic technologist.

Certification and licenses

Registered with the American Registry of Radiologic Technologists.

Supervisory duties

Supervises 4 radiologic technologists and/or clerks.

Job duties

Obtains information from patients concerning medical and personal history.
Prepares reports on operations and services rendered.
Explains x-ray procedure to patients.
Positions patient under X-ray machine and takes X rays.
Evaluates clinic operations to assess need for improvement.
Supervises and assigns tasks to staff personnel.
Develops X rays manually or with the aid of an automatic film processor.
Counsels and comforts apprehensive patients.
Maintains a file of patient records and related reports.
Attends to patients who become ill during the examination.

Received October 9, 1986
Revision received October 30, 1987
Accepted November 19,1987

Women and Minorities:
Reviewers for Journal Manuscripts Wanted
If you are interested in reviewing manuscripts for APA journals, the APA Publications and
Communications Board would like to invite your participation. Manuscript reviewers are vital
to the publication process. As a reviewer, you would gain valuable experience in publishing.
The P&C Board is particularly interested in encouraging women and ethnic minority men and
women to participate more in this process.
If you are interested in reviewing manuscripts, please write to Leslie Cameron at the address
below. Please note the following important points:
• To be selected as a reviewer, you must have published articles in peer-reviewed journals. The
experience of publication provides a reviewer with the basis for preparing a thorough, objective
evaluative review.
• To select the appropriate reviewers for each manuscript, the editor needs detailed information. Please include with your letter your vita. In your letter, please identify which APA journal
you are interested in and describe your area of expertise. Be as specific as possible. For example,
"social psychology" is not sufficient—you would need to specify "social cognition" or "attitude
change" as well.
• Reviewing a manuscript takes time. If you are selected to review a manuscript, be prepared
to invest the necessary time to evaluate the manuscript thoroughly.
Write to Leslie Cameron, Journals Office, APA, 1400 N. Uhle Street, Arlington, Virginia
22201.