Manajemen | Fakultas Ekonomi Universitas Maritim Raja Ali Haji 08832320109599648

Journal of Education for Business

ISSN: 0883-2323 (Print) 1940-3356 (Online) Journal homepage: http://www.tandfonline.com/loi/vjeb20

Peer Evaluation in the Classroom: A Check for Sex
and Race/Ethnicity Effects
Jai Ghorpade & James R. Lackritz
To cite this article: Jai Ghorpade & James R. Lackritz (2001) Peer Evaluation in the Classroom: A
Check for Sex and Race/Ethnicity Effects, Journal of Education for Business, 76:5, 274-281, DOI:
10.1080/08832320109599648
To link to this article: http://dx.doi.org/10.1080/08832320109599648

Published online: 31 Mar 2010.

Submit your article to this journal

Article views: 22

View related articles

Citing articles: 4 View citing articles


Full Terms & Conditions of access and use can be found at
http://www.tandfonline.com/action/journalInformation?journalCode=vjeb20
Download by: [Universitas Maritim Raja Ali Haji], [UNIVERSITAS MARITIM RA JA ALI HA JI
TANJUNGPINANG, KEPULAUAN RIAU]

Date: 13 January 2016, At: 17:50

Downloaded by [Universitas Maritim Raja Ali Haji], [UNIVERSITAS MARITIM RAJA ALI HAJI TANJUNGPINANG, KEPULAUAN RIAU] at 17:50 13 January 2016

Peer Evaluation in the Classroom:
A Check for Sex and
Race/Ethnicity Effects
JAI GHORPADE
JAMES R. LACKRITZ

zyxwv

San Diego State University
San Diego, California


T

raditionally, college students’ performance in the classroom has
been assessed unilaterally by professors. In recent years, however, there has
been a move to include students in the
appraisal process, as raters of their own
and their peers’ performance. This move
has been driven by the expectation that
it would provide more complete information on student performance and
improve the process generally (Henderson, Rada, & Chen, 1998; Johnson &
Smith, 1997). The scope of such
involvement is extensive, with students
being asked to participate in appraisal
and learning facilitation activities
involving a wide range of student performance dimensions. These areas
include enhancement of writing performance of students in general (Topping,
1998) and of students with learning disabilities (Ammer, 1998), grading of
term papers (Haaga, 1993), selection of
candidates for Best Student Awards

(Khoon, Abd-Shukor, Deraman, & Othman, 1994), and assessment of peer
ability (Norton, 1992). So pervasive is
this practice that one study has found
that peer evaluations may replace some
or all of the grading traditionally done
by professors (Henderson, Rada, &
Chen, 1998).
The move to include students in this
evaluation process inevitably raises
questions about students’ ability to per-

zyxwvut

ABSTRACT. Using a sample of 221
undergraduate students enrolled in
human resource management courses
taught in a business school, this study
checked for sex and race/ethnicity
effects in peer ratings of classroom
presentations. Student age and student

presenter frequency of participation in
the classroom were the control variables. Our primary goal was to find
out whether peer ratings were susceptible to the same-group preference
bias. Results showed no consistent

Debbs, & Walker, 1990; Jordan, 1989;
Kraiger & Ford, 1985; Pulakos, Oppler,
White, & Borman, 1989; Sackett &
DuBois, 1991).
Research on peer ratings in the university classroom context is relatively
sparse. But the little that is available
suggests that student raters are not free
from the influence of personal characteristics in assigning ratings to their
peers. For example, Sherrard, Raafat, &
Weaver (1994) claimed that women in a
class gave student groups higher ratings
than the men did. With regard to race,
Harber (1998) used a college sample to
test the prediction that Whites would
supply more lenient feedback to African

Americans than to their fellow Whites.
As predicted, the feedback was less critical when the supposed feedback recipient was African American rather than
White.
The need for understanding the
behavior of students as raters is now
urgent, as institutions of higher education in the United States are becoming
more and more integrated by gender and
race. Women now constitute a majority
of the student body at the university
level of education (Koerner, 1999; Sommers, 2000), and the demographic composition of the students is now mixed,
particularly in the urbanized areas and
on the West Coast. In California, for
example, the majority of students at 10

zyxwvutsrqp

274

tendency by students to favor student
presenters from their own groups. Frequency of participation by presenters

in classroom discussions turned out to

be a better predictor of student ratings
of presentations by peers than any of
the other factors studied.

form as raters. For example, can they be
impartial when rating their peers? Do
the demographics of student raters and
ratees affect the ratings given and
received (e.g., do females favor females
when rating)? Are students more prone
to rate peers of other races differently
than members of their own race?
There is reason for concern about the
behavior of students as raters because
peer ratings have been known to be
influenced by personal characteristics
of raters and ratees, both at work and in
university contexts. At work, a number

of studies have claimed that race and
sex have played a part in ratings given,
and received, by coworkers serving as
peer raters (Bass & Turner, 1973; Die,

zyxwvutsrqpo
Journal of Education for Business

Downloaded by [Universitas Maritim Raja Ali Haji], [UNIVERSITAS MARITIM RAJA ALI HAJI TANJUNGPINANG, KEPULAUAN RIAU] at 17:50 13 January 2016

of the 21 campuses of the California
State University are now non-Whites
(California State University, 1999).
Thus, it is vitally important that student
involvement in peer appraisal yield
results that are not tainted, and that do
not have the appearance of being tainted, by sex and race preferences.
The fact that ratings given and
received by students serving as peer
raters may be influenced by their personal characteristics or those of the

ratees raises a serious problem of interpretation. For example, suppose that for
a given sample of men and women, it is
found that women gave higher ratings to
women than they gave to men in a contest involving essay writing. Does this
automatically mean that the women
were biased in favor of women? An
alternative explanation might be that the
women writers were in fact better essay
writers and judges of essays than men.
Still another explanation might be that
the women had better handwriting and
thus presented more legible essays. In
short, higher ratings received by peers
of their own sex or race/ethnicity group
might either reflect real differences or
be the result of moderating variables.
For example, Waldman and Aviolo
(1991) reported that race effects in performance appraisal disappear when
cognitive ability, education, and experience are taken into account (see also
Borman, White, & Dorsey, 1995).

We began our study by investigating
whether ratings given and received by
student raters were related to the sex
and race/ethnicity of the raters and
ratees. The study design consisted of
having students in a classroom rate a
dimension of classroom performance
(verbal presentations of case analysis)
of their peers, and then sorting the ratings (given and received) according to
the sex and race/ethnicity of the participants. Thus, we sought to investigate the
following two primary questions:

We then introduced two other variables into the investigation: student age
and frequency of student participation
in classroom discussions and exercises.
Age has been identified as a factor in
performance ratings in the industrial
arena (Rosen & Jerdee, 1979), as well
as in the classroom (Crew, 1984). One
might expect older presenters to get

higher ratings because of a possible perceived maturity level.
Frequency of participation in the
classroom refers to participation initiated
by the student during lectures and discussions. This typically consists of students raising their hands and asking
questions of the professor and other
speakers during class meetings and student presentations. To our knowledge,
nobody yet has investigated the potential
effect that a student’s level of participation in classroom discussions has on the
ratings that he or she receives from peers
for class presentations. We suspected that
there might be a connection for two reasons. First, students who participate frequently in general class discussions can
be expected to be more at ease when
making individual presentations to the
class. Second, the “halo effect” (Cascio,
1998) can occur, whereby the raters’
knowledge of the ratee’s achievement on
one dimension (in this case, general
classroom participation) might lead
some raters to assign higher ratings on
another dimension (in this case, the presentation). The halo effect might also

work to reduce or enhance the role of sex
and race in the ratings assigned.
Those two variables, age and frequency of participation, served as control variables in our study. We investigated whether those two variables play
a part in this issue, and what their relative importance is compared with sex
and racelethnicity. Thus, our third question was

Smith, 1997). In particular, we did not
seek to correlate student peer ratings of
classroom presentations with objective
indices of learning. First, the type of
assignment rated (classroom presentations) does not lend itself readily to
objective assessment. Research has
shown that control of the peer rating
process through structuring reduces
subjectivity but does not eliminate it
(Johnson & Smith, 1997). Second, this
study was done in a natural setting
(actual classes taught in a university
context) that did not allow for elaborate
scientific controls. The current study
should thus be viewed as an exploration
of the consequences of allowing students of both sexes and multiple
race/ethnicity groups to participate in
the peer rating process in a relatively
unstructured situation that allowed for
the expression of rater preferences.

zy
zyxw
zyxwvutsrq
zyxw
zyxw

1. Do men and women student raters
favor members of their own sex in rating classroom presentations given by
peers?
2. Do members of different race/ethnicity groups favor members of their
own group in rating classroom presentations by peers?

3. Do the ages and frequency of class
participation of student presenters contribute to the ratings assigned to them
by their peers, and if so, how large is
their effect relative to those of sex and
race/etfinicity?
Before proceeding further, we should
acknowledge that our study did not deal
directly with the validity of student ratings in the classical sense (Johnson &

Method

Subjects

The subjects in this study were 221
undergraduate students enrolled in six
sections of a course in human resource
management taught in a business school
of a state university located on the West
Coast. All the sections were taught by
the same professor over 3 semesters.
The average number of students per
class was 36.8, with a range of 24 to 48.
Personal characteristics of students
were obtained through biographical
information sheets administered during
the semester. Approximately 56% were
males. The majority (63.55%) was
White (Euro-Americans), and the rest
were Mexican Americans ( 1 1.68%),
African Americans (4.21%), Asian
Americans ( I 3.55%), and Other
(7.01%). The Asian American category
consisted of students of East Asian origin, including Filipinos. The Other category consisted of international students
from the Middle East and other regions,
students of mixed-race backgrounds,
and others who did not fit into any of the
four race categories. (Note that the
terms race and ethnicity are used interchangeably in discussions of population
subgroups in the United States. In this
study, they were combined with the use
of a slash [e.g., race/ethnicity]).

May/June 2001

275

Downloaded by [Universitas Maritim Raja Ali Haji], [UNIVERSITAS MARITIM RAJA ALI HAJI TANJUNGPINANG, KEPULAUAN RIAU] at 17:50 13 January 2016

zyxwvutsrq
zyxwvutsrqpo

Almost 69% of the students were
under 25 years in age, and an additional
23% were between 25 to 29. Three students were over 40 years old, and the
mean age of the entire sample was 24.57
years.
Assignment Rated

The rated assignment was a portion
of a classroom presentation required of
all students as members of teams consisting of three to five members. The
presentation consisted of analysis of
cases from a casebook on human
resource management (Nkomo, Fottler,
& McAffee, 1996). The time allotted to
the group was 20 to 25 minutes per presentation, and the duration of the presentations ranged from 5 to 10 minutes,
with each student presenting different
aspects of the case (e.g., introduction,
analysis, conclusions, and recommendations). Before commencing the presentations, all students identified themselves by writing their names on the
chalkboard.
Rating Procedure

All students were expected to participate as raters of presentations and were
provided with a rating sheet containing
three provisions:
1. a rating scale that ranged from 1
(poor) to 10 (excellent);
2. two rating criteria: (a) content of
the presentation, including level and
scope of knowledge displayed, reasoning, and supporting details provided,
and (b) delivery, covering clarity, organization and flow, coordination, time
allocation, and stimulation of discussion; and
3. spaces for recording the group
identification numbers and the names of
the presenters.

Directions for Raters

As the purpose of this research was to
study whether sex and race/ethnicity
play a part in student peer appraisals,
the question of what directions to give
the participants was considered before
the initiation of the study. Research suggests that rater training in sex and
race/ethnicity issues can control distributional errors (halo, leniency, and central tendency) as well as sex and
race/ethnicity effects in performance
appraisal (for a summary, see Latham &
Wexley, 1994, pp. 137-167). We considered whether such rater training was
necessary, and decided not to make an
issue of sex and racelethnicity in the
directions, but rather to exhort the raters
in general terms to minimize distributional errors by being impartial, conscientious, timely, and detailed in the completion of their ratings. The data
presented in our study are thus a relatively raw expression of rater preferences.

cases; in two of these, the difference
was 3 points. In those two cases, the
professor’s ratings were entered
because all the objective data (attendance records, ratings by peers in group
discussion participation, and observation by the professor) indicated that
these self-ratings were inflated. All the
other cases were left unchanged. Note
that the results given in Table 2 appear
to be reasonably free from leniency, a
major problem facing self-ratings (Harris & Schaubroeck, 1988).
The column titled “Mean peer ratings”
gives the mean of the ratings given by
peers to presenters in each of the frequency-of-class-participation categories.
The data show that students who participated frequently or always (scale points
4 and 5) received significantly higher
mean ratings on their presentations than
did students who participated sometimes,
seldom, or never (points 1, 2, and 3).
Peer ratings on presentations given in
class by students are thus at least partially influenced by how frequently the presenters participate in regular class activities. These data were combined with the
other three factors-sex, race/ethnicity,
and age-later in our study to test for
combined effects among the factors.

Results
Four variables were included in the
statistical analysis: sex, race/ethnicity,
age, and frequency of participation of
presenters in class discussions. The
results are given in Tables 2 to 5.
In Tables 2 to 4, we provide mean
peer ratings of the presentations by sex
and race/ethnicity in various combinations. We calculated three types of significance results: (a) I tests for comparing mean scores of men and women, (b)
ANOVA F tests for differences in mean
rating scores among race/ethnicity categories, and (c) Tukey’s Multiple Comparkon Procedure for identifying significant group differences in instances
where the F test yielded significant
results (referred to hereafter as Tukey’s
MC Test.) We also provide the total
number of ratings given by all the raters
who rated individual presenters.
In Table 5, we give the results of
regression analysis, using ratings
assigned by students to their peer pre-

zyx
zyxwvut
zyxwvuts
zyxwvutsrqpo

Students serving as raters were asked
to provide two ratings per presentation:
group and individual. In assigning the
group ratings, they were asked to qualitatively combine the two criteria given
above. In assigning ratings to individuals, they were to provide an overall
assessment of the individual presenters
by taking into account the quality of
their contributions to the presentation,

276

including facilitation of coordination
and discussion. It is this portion of the
presentation that provides the data for
our study.
All students were provided with
copies of the rating form at the beginning of the class. Unfortunately, a number of students either failed to bring rating forms to the class or gave a very
small number of ratings. The ratings of
those who gave five or fewer ratings
were dropped from the analysis. The
overall return rate of usable forms was
about 85%.

Journal of Education for Business

Frequency of Class Participation

These data were acquired by asking
students at the end of the semester to
rate the frequency of their own participation in class on a 5-point scale ranging from 1 (almost never) to 5 (almost
always). The scale and results attained
are given in Table 1. The self-ratings
given by the students were compared
with those assigned by the professor.
Differences in the two rating sets were
found in about 15% of the cases. However, in most of those cases, the differences were within plus or minus 1 point
on the 5-point scale. Differences greater
than 2 points were obtained in only five

Downloaded by [Universitas Maritim Raja Ali Haji], [UNIVERSITAS MARITIM RAJA ALI HAJI TANJUNGPINANG, KEPULAUAN RIAU] at 17:50 13 January 2016

senters as the dependent variable, and
raterhatee sex, race/ethnicity, age, and
presenter frequency of participation as
the independent variables.
Sex Differences

The data relating to peer ratings by
gender are given in Table 2. The top part

zyxwvu
zyxw

of the table (2a) shows the results of ratings given to the presenters according to
the sex of the presenter. The mean ratings received by men and women were
not significantly different.
Table 2b contains the mean ratings
given to presenters by rater sex groups.
Ratings assigned by student raters did
.not follow the sex group of the raters.

The ratings that men gave to men were
not statistically significantly different
from those that they gave to women.
Similarly, the ratings that women raters
gave to women presenters were not statistically significantly different from
those that they gave to the male presenters. Viewed by itself, sex was not a significant factor in peer ratings, either
given or received, in the current study.
Race/Ethnicity Differences

TABLE 1. Mean Ratings Given to Presenters, by Their Scores in
Frequency of Class Participation

In Tables 3a and 3b, we provide the
mean ratings received, by presenter’s
race/ethnicity and rater’s race/ethnicity,
for all groups. As can be seen in Table
3a, the differences in the mean ratings
broken down by presenter’s ethnic
background were highly significant (p <
.OOO). The presenting group that
received the highest rating was the
Other category, and the lowest score
was assigned to African Americans. The
results of Tukey’s MC Test showed that
the Asian American, White, and Other
categories received significantly higher
ratings than did African Americans, and
that the White and Other presenters
scored significantly higher than did
Mexican Americans.
On the face of it, it thus appears that
race/ethnicity of the presenter does matter in peer ratings given in the classroom, with Mexican Americans and
African Americans receiving significantly lower ratings than two or more of
the other groups. But before accepting
this as bias, it is essential also to look at
the rater behavior data in Tables 3b and
5. The data in Table 3b shows that, for
the sample as a whole, there were highly significant differences among the
mean ratings given by the different
groups (p < .OOO). The Asian American
group gave significantly higher ratings
than did Whites, Mexican Americans,
and the Other category. This raises the
possibility of differences in rating tendencies, rather than rating bias, among
rater groups, working independently or
alongside same-group preferences.
To investigate this further, we extended the analysis in Table 3b to provide
the mean peer ratings given to presenters broken down by rater race/ethnicity
(see Table 4). In Table 4a, we give the
ratings received by Whites according to

zyxwvutsrqpo
zyxwvutsrq

No. of
Mean
Frequency of class participation indexa students peer ratingb
1. Almost never
2. Seldom (a few meetings)
3. Sometimes (one fourth to one half
of meetings)
4. Frequently (one half to three fourths
of meetings)
5. Almost always (all or almost all
meetings)

SD

No. of
ratings given

26
55

8.181
8.249

1.114
1.119

562
1,090

44

8.215

1.129

729

54

8.510

1.089

1,134

39

8.521

1.180

694

zyxwvutsrqpon

Significance level, one-way ANOVA F tests, p = .0001

Significant multiple comparisons, Tukey’s Procedure, at p < .05
4, 5 > 1, 2, 3

aStudents were asked at the end of the semester to rate their own participation in class discussions
during lectures and student presentations.
q h i s statistic represents the mean ratings given by peers to presenters in each of the class participation categories.

TABLE 2. Mean Ratings Given to Presenters, by Gender
M

SD

No. of ratings given

2a. Mean ratings given to presenters by presenters’ gender

For entire population
Female presenter
Male presenter

8.347
8.336
8.359

Significance level, t test, p = SO8

1.136
1.143
1.132

4,237
2,061
2,176

zyxwvutsrq
zyxwvuts

2b. Mean ratings by gender of presenters and raters
Rating by males
Presenter gender
Male
Female
Rating by females
Presenter gender
Male
Female

8.372
8.358
8.340
8.307

1.107
1.143

1,169
1,091

1.166
1.148

993
995

Tests for sex differences (two-way ANOVA)
Between male and female presenters: k’ = 0.469, p = .494
Between male and female raters:
F = 1.387,p = .239
Presenterhater interaction:
F = 0.073, p = .788

May/June 2001

zy
277

Downloaded by [Universitas Maritim Raja Ali Haji], [UNIVERSITAS MARITIM RAJA ALI HAJI TANJUNGPINANG, KEPULAUAN RIAU] at 17:50 13 January 2016

zyxwvu
zyxwvutsrq

TABLE 3. Mean Ratings by Presenters’ Race/Ethnicity: All Groups
M

SD

No. of ratings given

3a. Mean ratings given to presenters by presenters’ race/ethnicity

For entire population
White

Mexican American
African American
Asian American

Other

8.347
8.366
8.164
7.929
8.347
8.480

1.136
1.129
1.227
.914
1.148
1.034

4,237
2,664
445
106
554
347

zyxwvuts
zyxwvutsrq
zyxwvutsr

Significance level, one-way ANOVA F tests, p = ,000

Significant multiple comparisons, Tukey’s Procedure, at .05
Asian American, White, Other > African American
White, Other > Mexican American

zyxwvuts

3b. Mean ratings to presenters by raters’ ethnicity
White
Mexican American

African American
Asian American
Other

8.270
8.37 1
8.370
8.586
8.294

1.211
,911
376
1.015
1.157

Significance level, one-way ANOVA F tests, p = .OOO

2,521
525
141
572
37 1

Significant multiple comparisons, Tukey’s Procedure, at p < .05
Asian American > White > Mexican American, Other

rater race/ethnicity. Note that the differences in these mean ratings are statistically significant (p = .010). The highest
ratings given to Whites were not by
Whites, as would be expected if a samerace/ethnicity preference existed within
this group, but by Asian Americans and
African Americans. The relatively high
score received by Whites as presenters
from the whole sample (Table 3a) thus
cannot be attributed significantly to
their fellow Whites (Table 4a).
In Table 4b, we provide the mean ratings received by Mexican American
presenters by raters’ ethnicity. Again,
the differences among the means were
highly significant. The lowest score
given to Mexican Americans was given
by African American raters; the highest
by Asian American raters.
In Table 4c, we provide the mean ratings received by Asian American presenters by rater race/ethnicity. The differences among the mean ratings were
marginally statistically significant (p =
.0540). The highest ratings given to
Asian American presenters were
assigned by Asian Americans. But in
interpreting this result for bias, it is
278

dent factors were rater and presenter
sex, race/ethnicity, age, and presenter
frequency of participation. For race/ethnicity, dummy variables were created to
represent the groups, and the model was
based on a White presenter and a White
rater. It is important to begin this analysis by noting that all factors together
explained only 3.4% of the rating variation. There are, however, some interesting variations in the results. Sex, of both
presenters and raters, was an insignificant influence in the rating process. Of
the total variance in the rating process
explained by these factors (3.4%), presenter and rater age contributed 6% and
12%, respectively. Age was thus a player in the peer rating process, with
increasing age contributing positively to
the ratings given and received. The
increase in ratings received by age is
understandable, as experience and selfconfidence increase with age. However,
we were puzzled by the finding that
older raters gave higher ratings, which
needs to be further researched.
Presenter
race/ethnicity,
when
viewed in combination with the other
factors, emerges as a significant player
in the rating process. Being African
American and Mexican American
resulted in lower ratings, with both of
these factors jointly accounting for 18%
of the explained variance. Asian American raters gave mean ratings that were
almost .3 higher than the other raters,
accounting for 21% of the explained
variation of the ratings.
The single most significant influence
behind the ratings process was frequency of participation in classroom discussions by the presenters. That factor
accounted for 32% of the explained
variance, with higher general participation scores being associated with high
peer ratings of presentations. We also
tested for potential interaction effects,
under the notion that people would give
higher ratings to presenters of their own
race/ethnic group. However, none of the
interaction terms were statistically significant.

important to note that the rating
assigned by the Asian American raters
to their own race/ethnicity group was
lower than that assigned by them to the
Mexican American group (Table 4b).
The underlying force here may be a
group rating tendency rather than a
same-race/ethnicity preference.
In Table 4d, we provide the mean ratings received by African American presenters by rater race/ethnicity. Unfortunately, because of the small sample size
of this group, the results of tests of significance are not meaningful. It is interesting to note, however, that the mean
rating given by the African American
raters to peers from their own ethnic
group (8.25, Table 4d) was lower than
that given by them to Asian Americans
(8.5, Table 4c) and to Whites (8.46,
Table 4a).
Regression Analysis Results

zyxwvuts
zyxwvutsr

Journal of Education for Business

The results of a standard linear
regression analysis are presented in
Table 5. In this model, the dependent
variable was ratings assigned to the presenters by the raters, and the indepen-

Discussion

Using a sample of senior-level college students, in this study we sought
to investigate whether sex and race/eth-

Downloaded by [Universitas Maritim Raja Ali Haji], [UNIVERSITAS MARITIM RAJA ALI HAJI TANJUNGPINANG, KEPULAUAN RIAU] at 17:50 13 January 2016

zyxwvu
zy
zyxwvutsrqpo
zyxwvutsrq
zyxwvutsrq
zyxwvuts

TABLE 4. Mean Ratings Given to Race/Ethnic Groups by Raters’
RacelEthnicity
M

SD

No. of ratings given

4a. Mean ratings given to White presenters by raters’ race/ethnicity

For entire population
White
Mexican American
African American
Asian American
Other

8.354
8.325
8.353
8.462
8.532
8.233

1.135
1.198
.902
.796
1.021
1.253

2,561
1,544
329
94
354
240

Significance level, one-way ANOVA F tests, p = .010

Significant multiple comparisons, Tukey’s Procedure, at p < .05
Asian American >White, Other

4b. Mean ratings given to Mexican American presenters by raters’ race/ethnicity

For entire population
White
Mexican American
African American
Asian American
Other

8.156
7.975
8.348
7.906
8.823
8.292

1.222
1.300
.900
.905
,984
1.038

440
278
46
18
62
36

zyxwvuts

Significance level, one-way ANOVA F tests, p = .OOO

Significant multiple comparisons, Tukey’s Procedure, at p < .05
All groups > Asian American
Asian American > Whites, Other

4c. Mean ratings given to Asian American presenters by raters’ race/ethnicity

For entire population
White
Mexican American
African American
Asian American
Other

8.319
8.202
8.427
8.500
8.599
8.433

1.150
1.229
,976
1.041
1.046
.960

535
32 1
79
13
76
46

Significance level, one-way ANOVA F tests, p = ,054
Significant multiple comparisons, Tukey’s Procedure, at p < .05
None
4d. Mean ratings given to African American presenters by raters’ race/ethnicity

For entire population
White
Mexican American
African American
Asian American
Other

7.929
7.856
7.765
8.250
8.364
8.083

.914
.976
.666
1.084
.809
,736

106
66
17
6
11
6

Significance level, one-way ANOVA F tests, p = .363
Significant multiple comparisons, Tukey’s Procedure, at p < .05
None

nicity played any part in ratings given,
and received, by peer raters within a
classroom situation. Participant age
and frequency of participation in the
classroom were added as control variables. The data presented here rule out

lower mean ratings than two or more of
the other groups (Tables 3a and 5 ) , further analysis revealed that these differences could not be explained by any
simple manifestation of same racelethnicity group preferences. Our data show
inconsistencies in this regard. The highest ratings given by White raters were to
White presenters (Table 4a), but White
presenters received higher ratings from
three of the other race/ethnicity groups.
African American raters gave the
lowest rating to Mexican American presenters (Table 4b) and placed their own
group below the White (Table 4a) and
Asian American group (Table 4c). The
Asian American rater group turned out
to be the most lenient (Table 3b), but
gave the highest rating to the Mexican
American presenters. The Mexican
American raters gave their highest ratings to Asian American presenters
(Table 4c) and their lowest to African
American presenters (Table 4d).
Because race/ethnicity was not a reliable predictor of peer ratings, we introduced two alternative explanations: participant age and frequency of
participation in the classroom. The
influence of both of these factors was
studied in combination with the other
two (Table 5 ) . Both contributed significantly to the variation in the ratings
given, with frequency of presenter participation in the classroom contributing
about 32% of the total variation
explained by the four factors.

sex as a factor in peer ratings (Tables 2
and 5).
With regard to race/ethnicity, the
results were somewhat more complex.
Though African Americans and Mexican Americans received significantly

Suggestions for Teaching
and Research
How should professors cope with
peer ratings in the classroom? Our
research leads us to make three suggestions for faculty interested in researching this subject further and using students as peer raters in the classroom.
First, peer ratings should not be accepted at face value when used for appraisal
purposes. Our study clearly demonstrates that student peer ratings are
influenced by multiple raterhatee characteristics and that it would be incorrect
to use scores on any one of them individually as the sole basis for any judgment about the meaning and usefulness
of the ratings. When race/ethnicity differences are found, faculty should make
May/June 2001

279

Downloaded by [Universitas Maritim Raja Ali Haji], [UNIVERSITAS MARITIM RAJA ALI HAJI TANJUNGPINANG, KEPULAUAN RIAU] at 17:50 13 January 2016

zyxwvu
zyxwvutsrqpo
zyxw
zyxwvutsrq
zyxwv

TABLE 5. Regression Model for Factors Influencing Ratings

Variable
Constant
Presenter Sex
Rater Sex
Presenter Age
Rater Age
Participation Frequency
Presenter Mexican American
Presenter African American
Presenter Asian American
Presenter Other
Rater Mexican American

Rater African American
Rater Asian American

Rater Other

P

7.197
,035
.004
.013
.022
.083
-.206
-.380
.063
.I32
,050
,122
.27 1
-.027

SE

Standardized P

.I75
,037
.037
.005
.005
.016
.063
.112

-.O 16
.002
.046
.070
.095
-.055
-.054

.058

,019

.067
,057
,098
,054
,065

,032
.014
.020
.083
-.007

t

Significance

% variance explained

41.217
-0.956
0.113
2.803
4.427
5.512
-3.265
-3.391
1.077
1.978
0.883
I .239
4.999
-0.413

,000
.339
.910
.005

.2

.ooo
.ooo
.oo 1
.oo 1

.4

1.1
.3
.3

'28 1
.048

.I

,337
.215

.om

.7

.680

R2 = .034

a good faith effort to check for concomitant differences in age, abilities,
and experience of presenters and raters
(Waldman & Aviolo, 1991).
Second, in both using and researching peer ratings, it is important to be
more sensitive to potential bias across
groups. Much of the research on this
subject, cited earlier, focused on the
problem of within- or same-group preferences. The data from the present study
do not show any systematic same-group
preference, either between sexes and
among race/ethnicity groups. However,
we wonder about the across-group ratings, in particular, the low ratings
assigned to the Mexican American and
African American raters. Bias cannot be
claimed, because of the absence of
validity data and small size of the
African American sample. One alternative explanation is language ability differences, which may account for the low
ratings of the Mexican American presenters, many of whom spoke English
with an accent. But some Asian Americans had accents too, and their ratings
did not suffer. The issue is worthy of
additional research.
Our third suggestion is that professors consider the possibility of making
posr hoc adjustments of the rating
results. In other words, when faced with
differences in rating tendencies among
different student groups, the professor
should study the ratings of each group
and make corrections to level the play-

280

Journal of Education for Business

ing field. For example, the instructor
could convert such scores to ranks and
use the ranks for grading the ratees. This
practice is now followed in the sport of
figure-skating and has done much to
level the playing field by ironing out
differences in rating tendencies among
judges (International Skating Union,
1993).

Conclusion
Students are being recruited to serve
as raters of performance of their peers
for a wide range of classroom assign-.
ments. Ultimately, this practice will
have to stand the test of validation studies that correlate peer ratings with some
objective indices of performance.
Though our study did not seek to validate peer ratings in that classical sense,
it does provide insights into the role
played by the four student characteristics-sex, racelethnicity, age, and frequency of class participation by presenters-in the rating process. Our study's
data can be viewed from both statistical
and substantive perspectives. When
viewed statistically, because these factors explained only 3.4% of the variance
in peer ratings, they could be dismissed
as inconsequential to the peer rating
process. But in the world of higher education, even .2 of a point can make a difference between a B and C grade, with
serious consequences for the student.
Thus there is an urgent need to subject

zyxw
zyxw

peer ratings to a validity test by correlating them with other criterion measures.
REFERENCES

Ammer, J. J. (1998). Peer evaluation model for
enhancing writing performance of students with
learning disabilities. Reading & Wriring Quarterly, 14(3), 263-282.
Bass, A. R., & Turner, J. N. (1973). Ethnic group
differences in relationships among criteria of
job performance. Journal of Applied Psychology, 5, 101-109.
Bernardin, H. J., & Buckley, M. R. (1981). A consideration of strategies in rater training. Academy of Management Review, 6, 205-212.
Borman, W. C., White, L. A., & Dorsey, D. W.
(1995). Effects of ratee task performance and
interpersonal factors on supervisor and peer
performance ratings. Journal of Applied Psychology, 80, 168-177.
California State University. (1999). CSU enrollment by campus and ethnic group. [On-line].
Available: http:Nwww.co.calstate.edu/asd/statreports/ 1999-2000/FETH99-02.HMTL.
Cascio, W. F. (1998). Applied psychology in
human resource managemenf. Upper Saddle
River, NJ: Prentice Hall.
Crew. J. C. (1984). Age stereotypes as a function
of race. Academy of Management Journol,
27(2). 431-435.
Die. A. H., Debbs, T., & Walker, J. L. (1991).
Managerial evaluation by men and women
managers. Journal of Social Psychologv.
130(6), 763-769.
Haaga, D. A. (1993). Peer review of term papers
in graduate psychology courses. Teaching of
Psychology, 20( I), 28-32.
Harber, K. D. (1998). Feedback to minorities: Evidence of a positive bias. Journal of Personalirv
and Social Psychology, 74(3), 622428.
Harris, M. M., & Schaubroeck, J. (1988). A metaanalysis of self-supervisor. self-peer, and peersupervisor ratings. Personnel Psychology, 41,
4342.
Henderson, T., Rada, R., & Chen. C. (1997).
Quality management of student-student evaluations. Journal of Computing Research, 17(3),
199-2 15.

Downloaded by [Universitas Maritim Raja Ali Haji], [UNIVERSITAS MARITIM RAJA ALI HAJI TANJUNGPINANG, KEPULAUAN RIAU] at 17:50 13 January 2016

zyxwvuts
zyxwvuts
zyxwvutsrqpo
zyxwvutsrqponm
zyxwvutsrqponml
zyxwvutsrqpon
zyxwvutsrqpo
zyxwvu

International Skating Union. (1993). Judges handbook, single ,free skating. Lausanne, Switzerland: Author.
Johnson, C. B., & Smith, F. I. (1997). Assessment
of a complex peer evaluation instrument for
team learning and group processes. Accounting
Education. 2(1), 21-32.
Jordan, J. L. (1989). Effects of sex on peer ratings
of U. S. Army ROTC Cadets. Psychological
Reports, 64(3, Pt I), 939-944.
Khoon, K. A., Abd-Shukor, R., Deraman, M., &
Othman, M. Y. (1994). Validity of peer review
in selecting the best physics award. College Student Journal, 28(1), 119-121.
Koerner, B. (1999, February 8). Where the boys
aren’t. U. S. News & World Report, pp. 46-55.
Kraiger, K., & Ford, J. K. (1985). A meta-analysis of ratee race effects in performance ratings.

Journal of Applied Psychology, 70, 56-65.
Latham, G. P., & Wexley, K. N. (1994). Increasing
productivity through peqomance uppraisal.
Reading, MA: Addison-Wesley Publishing
Company.
Nkomo, S. M., Fottler, M. D., & McAfee, R. B.
(1996). Applicaiions in human resource management. Cincinnati, OH: South-Western College Publishing.
Norton, S. M. (1992). Peer assessment of performance and ability. Journal of Business and
psych dog^, 6(3), 387-399.
Pulakos, E. D., Oppler, S. H., White, L. A., & Borman, W. C. (1989). Examination of race and sex
effects on performance ratings. Journal qf
Applied Psychology, 74(5), 770-780.
Rosen, B., & Jerdee, T. H. (1979). The influence
of employee age, sex, and job status on man-

agerial recommendations for retirement. Academy of Management Journal, 22,169-173.
Sackett, P. R., & DuBois, C. L. Z. (1991). Raterratee effects on performance evaluation. Journal of Applied Psychology, 76, 873-871.
Sherrard, W. R., Raafat, F., & Weaver, R. R.
(1994). An empirical study of peer bias in evaluations: Students rating students. Journal of
Education for Business, 70( I), 4 3 4 7 .
Sommers, C. H. (2000). The war against boys.
The Atlantic Monthly, May, 59-74.
Topping, K. (1998). Peer assessment between students in colleges and universities. Review of
Educational Research, 68(3), 249-276.
Waldman, D. A., & Avolio, B. J. ( 199I). Race
effects in performance evaluations: Controlling
for ability, education, and experience. Journal
of Applied Psychology. 76, 897-901,

ERRATUM
On page 136 of the JanuaryFebruary 2001 issue of the Journal of Education for Business, Table 5 of “Educating Future Accountants:
Alternatives for Meeting the 150-Hour Requirement,” by Celia Renner and Margaret Tanner, was incorrectly printed. The correct Table 5
is as follows:

TABLE 5. Tests of Independence Between Respondents’ Characteristicsand Educational Choices (Numbers
Shown Are Chi-square Statistics)
Public accounting
versus
private industry
MIS
MBA
Master’s in accountancy
Master’s in tax
Finance
Management
Economics
Marketing
Business communication
Foreign language
Liberal arts

12.8 ***
10.9 ***
0.1
71.6 ***
15.9 ***
27.8 ***
8.0 **
1.2
1.O
2.1
0.2

Support 150-hour
requirement
0.0
1.6

5.0 **
10.4 ***
0.0
0.4
2.6 *
2.1
0.5
1.8
0.3

Pay more to
new hires with
master’s degrees
0.0
20.7 ***
12.2 ***
1.4
1.4
0.9
0.4
0.4
0.0
0.0
0.3

Involved
in hiring
2.7 *
1.3
0.1
1.6
5.5 **
6.3 **
4.1 **
0.1
0.6
0.4
0.1

CPA license
No. of
required for
employees new hires
11.5 ***
18.3 ***
0.7
22.0 ***
25.3 ***
19.1 ***
11.7 ***
4.5 **
1.5
2.1 *
0.5

1.8
1.4
1.8
21.6 ***
0.0
0.5
0.0
0.2
0.2
0.1
0.1

zyxwvutsrqp

*** Probability of ,001 or less.
**Probability
of .050 or less.
*Probability of .I00or less.

May/June 2001

281