Manajemen | Fakultas Ekonomi Universitas Maritim Raja Ali Haji joeb.84.1.40-46
Journal of Education for Business
ISSN: 0883-2323 (Print) 1940-3356 (Online) Journal homepage: http://www.tandfonline.com/loi/vjeb20
Deciphering Student Evaluations of Teaching: A
Factor Analysis Approach
Michael M. Barth
To cite this article: Michael M. Barth (2008) Deciphering Student Evaluations of Teaching:
A Factor Analysis Approach, Journal of Education for Business, 84:1, 40-46, DOI: 10.3200/
JOEB.84.1.40-46
To link to this article: http://dx.doi.org/10.3200/JOEB.84.1.40-46
Published online: 07 Aug 2010.
Submit your article to this journal
Article views: 72
View related articles
Citing articles: 12 View citing articles
Full Terms & Conditions of access and use can be found at
http://www.tandfonline.com/action/journalInformation?journalCode=vjeb20
Download by: [Universitas Maritim Raja Ali Haji]
Date: 11 January 2016, At: 22:43
Downloaded by [Universitas Maritim Raja Ali Haji] at 22:43 11 January 2016
DecipheringStudentEvaluationsof
Teaching:AFactorAnalysisApproach
MICHAELM.BARTH
THECITADEL
CHARLESTON,SOUTHCAROLINA
ABSTRACT.Theauthorexaminedthe
studentevaluationofteachinginstrument
usedintheCollegeofBusinessAdministrationatGeorgiaSouthernUniversity(GSU)
todeterminewhichtraitshavethegreatest
impactonstudents’overallratingofallindividualinstructors.Usingexploratoryfactor
analysis,theauthorfoundthattheoverall
instructorratingisprimarilydrivenbythe
qualityofinstruction.Althoughtheresultsof
thisresearchapplyspecificallytothesurvey
instrumentusedatGSU,thesetechniques
canbeappliedtoevaluatetheinstructorratinginstrumentsatotherinstitutions.
Keyword:educationalevaluation
Copyright©2008HeldrefPublications
40
JournalofEducationforBusiness
T
hepurposeofthisresearchwasto
determinewhichaspectsofthestudentevaluationofteaching(SET)instrumentusedatGeorgiaSouthernUniversity
(GSU) have the greatest impact on the
overall instructor rating. For part of its
instructorevaluationprocess,GSUusesa
studentsurveytoevaluateallinstructors
in every course offered each semester.
The questions in the survey instrument
ask about the workload, clarity of the
materials,theinstructor’sdelivery,prior
interestbythestudent,andothergeneral
questions to elicit more detail from the
students about why they liked (or did
notlike)thecourseandtheinstructor.To
preservestudentanonymity,administrators conduct the surveys during normal
classtimewhiletheinstructorisoutof
theroom,andtheresultsaresharedwith
theinstructoronlyafterthesemesterhas
ended. Students answer the 20 survey
questionsonastandardLikert-typescale
ranging from 1 (very poor) to 5 (very
good).There is additional space on the
back of the survey instrument for students’writtencomments.
The questions in the survey form of
GSU’s SET are shown in Table 1, with
median scores for each question. Many
oftheresponsestothe20questionsinthe
survey instrument tend to be highly correlatedwithoneanother,makingitharder
for researchers to evaluate differences
frominstructortoinstructor.Thus,administratorstendtofocusonasinglequestion
—for example, Question 18, “Overall,
how would you rate this instructor?”
—ratherthanusingalloftheinformation
availableinthesurveyresponses.
The SETs are an important element
of the annual evaluation of individual
facultymembersatGSUandotherinstitutions. The SETs also feature prominently in the tenure and promotion
process.Becauseadministratorsweight
the SET heavily, rightly or wrongly, it
isimportantforfacultyandadministratorstohaveaclearunderstandingofthe
determinants of the ratings, especially
with regard to Question 18 of the survey, which elicits the student’s overall
ratingoftheinstructor.
Thisresearchprovidesamorerefined
assessment of the GSU SET survey
resultsbyanalyzingtheunderlyingfactors that drive the overall rating of the
instructor. Providing faculty members
with more information on the determinants of instructor ratings should help
toeliminatesomeofthemistrustinherentinthecurrentsystem.Althoughthis
researchfocusesontheinstrumentthat
administrators use at GSU, the techniques that I use to evaluate the GSU
survey instrument are applicable to
otherinstitutions.
RELATEDLITERATURE
Becauseoftherelativelyhighweight
thatstudentratingsofinstructionpossess
TABLE1.StudentEvaluationofTeachingQuestionsandMedianScores
Downloaded by [Universitas Maritim Raja Ali Haji] at 22:43 11 January 2016
Surveyquestion
Q1
Q2
Q3
Q4
Q5
Q6
Q7
Q8
Q9
Q10
Q11
Q12
Q13
Q14
Q15
Q16
Q17
Q18
Q19
Q20
Mdn
score
Howmucheffortdidyouputintolearningthematerialcoveredinthiscourse?
Howmuchdidyoulearninthiscourse?
Towhatdegreewereyouintellectuallychallengedinthiscourse?
Howoftendidyouseekoutsidehelpwiththiscourse?
Howdifficultwasthiscourse?
Howwastheworkloadforthiscourse?
Overall,howwouldyouratethiscourse?
Thedegreetowhichimportantpointswerestressedinthiscourse.
Theinstructor’spreparationforthiscourse.
Theinstructor’sencouragementofclassparticipation,discussion,orquestions.
Theorganizationofthecoursematerial.
Theclarityofthepresentationofthecoursematerial.
Thedegreetowhichtestsandothergradedactivitiesreflectedcoursecontent.
Theinstructor’savailabilitytostudents.
Theinstructor’shelpfulnesstostudents.
Thedegreetowhichtheclassstayedfocusedoncourseobjectives.
Theinstructor’sinterestinthecontentofthiscourse.
Overall,howwouldyouratethisinstructor?
Yourlevelofinterestinthissubjectmatterbeforetakingthiscourse?
Yourlevelofinterestinthissubjectmatteraftertakingthiscourse?
Whatgradedoyouexpecttoreceiveinthisclass(A,B,C,D,orF)?
3.8
3.6
4.0
3.3
3.9
3.4
3.6
3.9
4.2
3.9
4.0
3.7
3.9
4.0
3.9
4.0
4.3
4.0
2.5
2.9
Note.Q=question.SurveyresponsesarebasedonaLikert-typescalerangingfrom1(verypoor)to5(verygood).
in the U.S. collegiate system, the academicresearchinthisareaisvast,accumulating for more than seven decades.
Thereareanumberofstudiesinprint,
and more emerge every day. The present literature review is purposely brief
and concise, but I direct readers to the
summaries of the existing literature
thatbothGermainandScandura(2005)
and Crumbley and Fliedner (2002)
haveprovided.
Griffin(2003)reportedthatthebulk
of the academic research showed that
faculty support the use of SET surveysasatoolforimprovingteaching,
butthatfacultyarealsouncomfortable
with the use of the SET surveys as
part of their annual evaluation process.Also,Yunker andYunker (2003)
reported that the majority of faculty
supportthevalidityofSETs,although
theyacknowledgedthatthereisagreat
deal of faculty discomfort with the
SET practices at many universities.
Germain and Scandura (2005) providedadditionaldiscussionof(a)thepros
and cons of student evaluations and
(b)facultyconcernsaboutthevalidity
of SETs. Moore (2006) reported that
SETs are in general effective measures of performance. At least some
of the faculty discomfort with SETs
must stem from a lack of trust in the
survey instrument’s ability to discern
qualityteachingfrompopularity.Also,
becauseofthehighdegreeofcorrelationamongtheanswerstotheindividual questions, it becomes difficult for
researchers and educators to decipher
theunderlyingmeaningofthestudent
responses.
Someuniversitiesprovidebenchmark
scores, whereas others do not, and the
lack of a benchmark may increase the
level of discomfort. Those instructors
withhigherratingstendtofavorthecurrentstudentevaluationsystem,whereas
thosewithlowerratingstendtobedismissive of it (Crumbley & Fliedner,
2002).Somefacultymembersfeelthat
higherratingsareassociatedwithgrade
inflationandloweracademicstandards
(Germain & Scandura, 2005; Griffin,
2004).Also,somefacultymembershave
criticized the student ratings because
they assume that the more quantitative business courses, such as statistics
and finance, generate lower evaluation
scores that reflect the topic and not
the instructor (Stapleton & Murkison,
2001). If these complaints are valid,
thentheuseofthestudentsurveysmay
haveanunintendednegativeimpacton
meritpayandpromotionforthefaculty
memberswhoteachinthoseareas.
One line of research has focused on
therelationsbetweenfacultydemographics and SETs. Researchers have found
linksbetweennonteachingfactors,such
asphysicalappearance,andSETscores.
Rinolo, Johnson, Sherman, and Misso
(2006)providedahistoricalsummaryof
the research along this axis and reported that studies have shown that physicallyattractiveinstructors(regardlessof
whethertheyaremaleorfemale)receive
higher ratings than their less attractive
colleagues.Felton,Mitchell,andStinson
(2004) reported that instructor ratings
online at ratemyprofessors.com highly
correlatedwith“hotness,”althoughFelton,Mitchell,andStintondidnotstrictly
definetheterm(p.3).
Researchers have often cited lax
grading standards as an inflator of student evaluations, although the evidence
to date has been inconclusive about
whether lax grading standards increase
evaluation scores. Two recent examples
includeMcPherson(2006),whoreported
thatgradeinflationincreasesSETs,and
Moore(2006),whofoundthatSETswere
not easily manipulated through grade
September/October2008
41
Downloaded by [Universitas Maritim Raja Ali Haji] at 22:43 11 January 2016
inflation. The perceived link between
higher grades and higher student evaluationshasraisedconcernsamongmany
faculty, and there is some evidence that
expected grades or actual grades positivelycorrelatewithstudentevaluations.
However, in many of these studies the
researchersmadenoefforttodetermine
whether the students had earned the
higher grades. Students should expect
to earn higher grades from excellent
teachers than from mediocre teachers.
Highergradesarenotthesameashigher
unearned grades, although much of the
discussionaboutgradeinflationseemsto
suggestthatthetwoaresynonymous.
Marsh and Roche (2000) found a
strongerlinkbetweenSETsandinstructorperformancethanbetweenSETsand
grades. Prior research has also shown
a correlation between faculty evaluationsoffaculty(e.g.,peerreviews)and
student evaluations of faculty teaching, although there are some lingering
questions about bias in those results
(Hobson & Talbot, 2001).Yunker and
Yunker(2003)craftedastudytoevaluatesubsequentperformanceinintermediate accounting to determine whether
the highly rated instructors of a basic
accounting course had better performing students in the follow-on courses.
Stapleton and Murkison (2001) also
tried to measure performance against
student evaluations by using a single
SETmeasureofoverallinstructorquality rather than the full set of survey
questions.IselyandSingh(2005)used
a fixed effects model to control for
instructor- and course-specific differencesinasampleof260economicsand
finance classes. Isely and Singh found
apositiverelationbetweentheirdependent variable, the average values for
25 different instructor evaluation questions,andrelativeexpectedgrades.One
ofthelimitationsoftheirstudyisthat,
although they have a detailed survey
instrument with multiple questions on
variousaspectsofteachingquality,they
didnotusethefullvalueoftheinformation in the SETs to measure effectiveness.MultipartSETsurveyinstruments
measure a variety of instructor and
course traits, thus simplistic measures
suchastheoverallinstructorratingcan
obfuscate more than illuminate in this
typeofempiricalstudy.
42
JournalofEducationforBusiness
Germain and Scandura (2005)
reviewedanumberofstudiesofbiasin
student ratings and suggested that more
careful construction of the student ratingofinstructionsurveyinstrumentmay
alleviate some of the unintended biases
built into simplistic student evaluations.
Consequently,manyuniversities(including GSU) have sought to improve the
evaluation process by moving to more
complex rating instruments. However,
asking more questions does not necessarilymaketheinstrumentmoreuseful.
Ifresearchersandeducatorscannotfully
interprettheresults,theadditionalquestionsaddmoreconfusionthanclarity.
TolandandDeAyala(2005)discussed
the various research efforts that have
produced multidimensional SET instrumentsovertheyears.TheycitedMarsh’s
(1987) 35-question survey instrument
that addresses nine latent factors dealing with instructional quality. Those
ninefactorsarelearning/value,instructor
enthusiasm, organization/clarity, group
interaction,individualrapport,breadthof
coverage, examination/grading, assignments/readings, and workload/difficulty.
TolandandDeAyalathenproposeda27question instrument that they designed
around three latent factors: instructor
course delivery, instructor/student interaction, and regulating student learning.
TolandandDeAyalausedfactoranalysis
inthedesignandanalysisoftheirinstrument, but their relatively small sample
sizeconstrictedtheirconclusions.
METHOD
Although a multidimensional SET
surveyinstrumentprovidesmoreinformationbywhichtoevaluateteaching,it
alsomakestheevaluationmoredifficult
withoutstepstobreakdowntheresults
forindependentevaluationofthescores
on the various dimensions to be evaluated.Factoranalysisisastatisticaltechnique that can take highly correlated
data, such as those of the responses to
SET survey questions, and reconfigure
them so that they provide an objective
measureoftheunderlyingtraitsthatare
most valued by students.When faculty
members better understand the survey
responses,theyshouldbeabletodevelop greater acceptance of the results of
the SET. If instructors continue to dis-
miss the SET as simply a reflection of
grade inflation or a popularity contest,
thenthoseinstructorswillneverreceive
the feedback from the SET, including
information about the traits that studentstrulyvalueinateacher.
In the present research, I used factor analysis to decompose the SET
responses from the survey instrument
that educators used at GSU’s College
of BusinessAdministration to provide
abetterunderstandingofthetraitsthat
students rate highest in their overall
evaluationofaninstructor.Therewere
two steps in the research design: (a)
developasetofmeasuresofthetraits
that are latent in the university’s SET
instrumentand(b)usemultipleregression analysis to evaluate how those
latenttraitsaffectthevalueofQuestion
18,theoverallinstructorrating.
I gathered the average responses by
class for each of the 20 SET questions
forfourcourses(seeTable1):principles
ofcorporatefinance,operationsmanagement,businessstatistics,andquantitative
methods. One of the reasons for using
theseparticularfourcourseswasrestrictionondataavailability,butanotherreasonwascomparability.Thesecoursesare
prerequisites for the capstone business
course, and all are somewhat quantitative in design.Also, unlike some of the
other courses in the common business
core,therearerelativelyfewnonbusiness
majorstakingthesecourses.
Thepresentdatarepresent33different
instructors over a 3-year period, ranging
fromadjunctinstructorstotenuredprofessors.Forthemostpart,instructorstaught
in only one of the four functional areas
(finance, operations management, businessstatistics,andquantitativemethods),
although a handful of instructors taught
inmorethanonearea.Inadditiontothe
class average scores from the SET surveys,Icomputed,(a)theexpectedgrade
pointaverage(GPA)foreachsectionfrom
the survey instrument and (b) the actual
GPA for the class from the university’s
grade-history database. I omitted Question18becauseitwasthedependentvariableinstagetwooftheresearch,andthere
were 21 original variables to include in
thefactoranalysis.
To be included in the analysis, the
averageclassscoreforthequestionsin
theSEThadtobebasedonatleast15
Downloaded by [Universitas Maritim Raja Ali Haji] at 22:43 11 January 2016
student responses. The lower limit on
thenumberofresponseswasatrade-off
betweendegreesoffreedomandstability of the average response values for
each of the 20 questions because, for
example,theaverageclassresponseina
classthathadonly3studentswouldnot
be comparable with the average class
response in a class of 30 students. A
totalof167usableobservations(classes) were used in the factor analysis.
These 167 classes represented the collectiveevaluationsofmorethan30differentinstructors,basedonthesurveys
frommorethan4,000students.Because
these classes are part of the common
business core, many—if not most—of
the surveys represent the same set of
students who were evaluating the variousinstructors.
Factoranalysisisastatisticalmethod
that researchers can use to reduce the
dimensions of a variable set of highly
correlated data into a smaller subset
of factors that are themselves linear
composites of the original variables.
Simply,itisadatareductiontechnique.
The factors that the analysis generates
areorthogonaltooneanother,butthey
still contain most of the information
from the original variable set. Because
the practical use of factor analysis is
to reduce a large number of correlated
variablesintoasmallersubsetofuncorrelatedvariables,thenumberoffactors
that the factor analysis process retains
isalmostalwaysfewerthanthenumber
of original variables. The number of
factorsthatareretaineddependsonthe
dimensionality of the original data and
the ability of the analyst to interpret
the resulting factors. Ideally, following
an orthogonal rotation procedure, each
of the resulting factors includes data
from several of the original variables,
and each of the original variables will
beincludedinonlyoneoftheresulting
factors. In practice, a variable may be
associated with more than one factor,
whichcanmakethefactorsmoredifficulttointerpret.Thefactorsareorthogonal to one another and can therefore
be used as the dependent variables in
a regression analysis without violating
themulticollinearityassumption.
I conducted this analysis by using
SAS,acomprehensivestatisticalanalysissystemthatincludesspecificsubrou
tinesforhundredsofstatisticalapplications. The SAS OnlineDoc 9.13 (SAS
Institute, 2004), the online user guide
for the SAS/STAT software product,
provides (a) an overview of the factor analysis concept and (b) detailed
and specific instructions on alternative
methods for conducting the analysis.
The subroutine for conducting a factor
analysis is known as PROC FACTOR
andisdescribedinSASOnlineDoc9.13
(SASInstitute).
RESULTS
FactorAnalysisResults
Oneofthedecisionsthatananalyst
mustmakeaspartofthefactoranalysis process is the number of factors
to retain. Two general rules are commonlyapplied:theminimum-eigenvalue-of-one rule and the scree diagram.
Theeigenvaluerulerequiresthateach
factorincludedexplainatleastasmuch
information as the original variables.
Therefore,onlyfactorsthatgeneratean
eigenvalueof1.0orhigherareretained.
Thesecondmethodisanevaluationof
ascreediagram,whichisalineplotof
the eigenvalues. Using this approach,
the analyst looks for a point at which
theeigenvaluesbecomelevel,meaning
that they are explaining less and less
of the total variability. A third criterion that researchers use in practice is
that each of the resulting factors must
be interpretable by the analyst. In the
present analysis, both the minimumeigenvalue-of-one rule and the scree
diagram indicated that either five or
sixfactorswouldbeappropriate.After
evaluatingtheresultsforboththefivefactormodelandthesix-factormodel,
I determined the five-factor set to be
themostinterpretable.
Table2showseachofthefivefactors
anditscorrelationwiththeoriginalset
of variables. For the factor analysis to
be interpretable, each of the original
variables should highly correlate with
onlyoneortwooftheresultingfactors.
By examining which of the original
variablesloadonwhichfactor,researcherscanidentifythelatenttraitsthateach
factorrepresents.Theoretically,thefactor analysis can generate as many factorsastherearevariablesintheoriginal
data set, but the goal is to generate a
reduceddatasetthateliminatesthemulticollinearity problems associated with
theoriginaldata.
Factor 1 had a high level of positive
correlation with the questions concerninginstructorpreparation,clarityofpresentation, relevance of the material to
learning, and the instructor’s focus on
course objectives. I labeled this factor
as quality of instruction, and it generally measures (a) the degree to which
the students felt that the instructor was
prepared and (b) their perceptions of
theoverallqualityofthepresentation.If
theinstructoriswellprepared,andifthe
course content and objectives are clear
tothestudents,researcherswouldexpect
themtofeelthatthecourseisorganized,
wellstructured,andvaluable.
Factor 2, which I labeled as course
rigor,washighlycorrelatedwiththose
surveyquestionsthatmeasuretherigor
of the coursework. The effort of the
student, intellectual challenge, course
difficulty, and workload on the student
allloadedpositivelyonthisfactor.
Factor3measuredthestudent’slevel
of interest in the subject matter before
and after taking the course (Questions
19and20).NotethatFactor1wasalso
highly loaded on Question 20, which
asked about the after-course interest in
the subject. Therefore, Factor 3 probablymeasuredtheoverallinterestofthe
studentandthestudent’sattitudetoward
the subject matter, whereas Factor 1
pickeduptheincreaseinstudentinterestaftertakingtheclass.
Factor 4, which I labeled as grades,
loadedhighlyonboththeaverageexpectedGPAofthestudentsintheclassand
theiractualaverageGPA.Itisinteresting
that, although these two GPA measures
highly correlated with one another, the
expectedGPAsthatstudentsreportedin
the survey were virtually always higher
than their actual class average GPAs.
This situation also could suggest that
when researchers or educators omit the
weakerstudents(thosefailingtheclass,
who have already given up and are not
attending)fromthesurvey,theresulting
evaluations bias upward. The practice
in the College of Business Administration at GSU is that the individual class
instructorchoosestheactualdaywithin
a4-weekperiodonwhichtoadminister
September/October2008
43
TABLE2.RotatedFactorPatternandFactorLoadings
Downloaded by [Universitas Maritim Raja Ali Haji] at 22:43 11 January 2016
Originalvariable
Q1
Q2
Q3
Q4
Q5
Q6
Q7
Q8
Q9
Q10
Q11
Q12
Q13
Q14
Q15
Q16
Q17
Q19
Q20
ActualGPA
ExpectedGPA
Factor1
Factor2
Factor3
Factor4
Factor5
Qualityof
Instruction
Course
Rigor
Levelof
Interest
Grades
Instructor
Helpfulness
.00
.78
.25
–.03
–.21
–.13
.86
.94
.94
.74
.96
.95
.88
.73
.78
.93
.76
.06
.56
.10
.31
.93
.36
.86
.76
.85
.66
.01
–.01
–.03
–.19
–.03
–.10
–.02
–.17
–.22
.02
–.11
–.08
–.08
–.22
–.33
.08
.39
.06
–.06
–.08
–.16
.37
.14
.06
.24
.03
.10
.08
.00
.06
.00
.17
.83
.80
–.01
.10
–.12
.10
–.26
–.12
–.38
.05
.21
.14
.03
.04
.05
.06
.32
.20
.18
.08
.02
–.01
.10
.67
.69
–.13
–.03
–.07
.06
–.02
–.06
.03
.06
–.01
.40
–.03
.01
.08
.53
.52
.01
.30
–.01
.11
.06
.02
Note.Valuesinboldarethefactorloadingsgreaterthan.4054.Q=surveyquestion;GPA=gradepointaverage.
the survey to their classes. Anecdotal
evidence suggests that some instructors
haveselectivelychosenthedatethatthey
administerthesurveysoastomanipulate
theresults.Forexample,someeducators
have said that a good day on which to
administer the SET is a Friday after an
exambecauselowerattendanceislikely,
especially by the weaker students. Both
(a) whether this proposed interpretation
isactuallytrueand(b)whatitseffecton
instructorratingswouldberemainareas
forfuturestudy.
Factor 5 was highly correlated with
Questions 14 and 15, concerning the
instructor’savailabilityandwillingness
toprovideoutsidehelptothestudents.
These questions also highly correlated
withFactor1,whichmeasuredtheoverall course quality. Therefore, Factor 5
was slightly more ambiguous. There
was also a relatively high loading with
Question10,whichmeasuredthedegree
towhichtheinstructorencouragesclass
participation and questions. Factor 5
seemed to measure some aspect of the
instructor’spersonality,approachability,
or openness with the students, which I
labeledasinstructorhelpfulness.
MultipleRegressionAnalysis
I then used the five factor scores for
eachcourseasexplanatoryvariablesina
multipleregressionmodeltoevaluatethe
impactofthesefivelatentcharacteristics
onQuestion18ofthesurveyinstrument,
the overall rating of the instructor. The
SAS factor analysis procedure outputs
scores that are standardized (i.e., M =
0, SD = 1) so that the five factors are
all on the same unit scale. The magnitudeofthecoefficientestimatesfromthe
regressionanalysisisthereforerelevant,
andcomparisonsbetweenthecoefficient
valuescanbemade.
Theregressionresults,whichIreport
inTable3,showthateachofthesefive
factorshasastrongstatisticallysignificantrelationwiththeoverallinstructor
rating.Forthemodel,r2isequalto.95,
indicatingthatthefivefactorsexplained
nearlyallofthevariationintheoverall
instructorrating(Question18).Because
TABLE3.MultipleRegressionResults
Variable
Label
Parameter
estimate
SE
Intercept
Factor1
Factor2
Factor3
Factor4
Factor5
QualityofInstruction
CourseRigor
LevelofInterest
Grades
InstructorHelpfulness
3.965
0.489
–0.075
0.057
0.094
0.135
0.0094
0.0094
0.0097
0.0096
0.0107
0.0100
423.63
51.53
–7.72
5.96
8.76
13.03
Note.ThedependentvariablewasresponsestoQuestion18(overallinstructorrating).
44
JournalofEducationforBusiness
t
p
ISSN: 0883-2323 (Print) 1940-3356 (Online) Journal homepage: http://www.tandfonline.com/loi/vjeb20
Deciphering Student Evaluations of Teaching: A
Factor Analysis Approach
Michael M. Barth
To cite this article: Michael M. Barth (2008) Deciphering Student Evaluations of Teaching:
A Factor Analysis Approach, Journal of Education for Business, 84:1, 40-46, DOI: 10.3200/
JOEB.84.1.40-46
To link to this article: http://dx.doi.org/10.3200/JOEB.84.1.40-46
Published online: 07 Aug 2010.
Submit your article to this journal
Article views: 72
View related articles
Citing articles: 12 View citing articles
Full Terms & Conditions of access and use can be found at
http://www.tandfonline.com/action/journalInformation?journalCode=vjeb20
Download by: [Universitas Maritim Raja Ali Haji]
Date: 11 January 2016, At: 22:43
Downloaded by [Universitas Maritim Raja Ali Haji] at 22:43 11 January 2016
DecipheringStudentEvaluationsof
Teaching:AFactorAnalysisApproach
MICHAELM.BARTH
THECITADEL
CHARLESTON,SOUTHCAROLINA
ABSTRACT.Theauthorexaminedthe
studentevaluationofteachinginstrument
usedintheCollegeofBusinessAdministrationatGeorgiaSouthernUniversity(GSU)
todeterminewhichtraitshavethegreatest
impactonstudents’overallratingofallindividualinstructors.Usingexploratoryfactor
analysis,theauthorfoundthattheoverall
instructorratingisprimarilydrivenbythe
qualityofinstruction.Althoughtheresultsof
thisresearchapplyspecificallytothesurvey
instrumentusedatGSU,thesetechniques
canbeappliedtoevaluatetheinstructorratinginstrumentsatotherinstitutions.
Keyword:educationalevaluation
Copyright©2008HeldrefPublications
40
JournalofEducationforBusiness
T
hepurposeofthisresearchwasto
determinewhichaspectsofthestudentevaluationofteaching(SET)instrumentusedatGeorgiaSouthernUniversity
(GSU) have the greatest impact on the
overall instructor rating. For part of its
instructorevaluationprocess,GSUusesa
studentsurveytoevaluateallinstructors
in every course offered each semester.
The questions in the survey instrument
ask about the workload, clarity of the
materials,theinstructor’sdelivery,prior
interestbythestudent,andothergeneral
questions to elicit more detail from the
students about why they liked (or did
notlike)thecourseandtheinstructor.To
preservestudentanonymity,administrators conduct the surveys during normal
classtimewhiletheinstructorisoutof
theroom,andtheresultsaresharedwith
theinstructoronlyafterthesemesterhas
ended. Students answer the 20 survey
questionsonastandardLikert-typescale
ranging from 1 (very poor) to 5 (very
good).There is additional space on the
back of the survey instrument for students’writtencomments.
The questions in the survey form of
GSU’s SET are shown in Table 1, with
median scores for each question. Many
oftheresponsestothe20questionsinthe
survey instrument tend to be highly correlatedwithoneanother,makingitharder
for researchers to evaluate differences
frominstructortoinstructor.Thus,administratorstendtofocusonasinglequestion
—for example, Question 18, “Overall,
how would you rate this instructor?”
—ratherthanusingalloftheinformation
availableinthesurveyresponses.
The SETs are an important element
of the annual evaluation of individual
facultymembersatGSUandotherinstitutions. The SETs also feature prominently in the tenure and promotion
process.Becauseadministratorsweight
the SET heavily, rightly or wrongly, it
isimportantforfacultyandadministratorstohaveaclearunderstandingofthe
determinants of the ratings, especially
with regard to Question 18 of the survey, which elicits the student’s overall
ratingoftheinstructor.
Thisresearchprovidesamorerefined
assessment of the GSU SET survey
resultsbyanalyzingtheunderlyingfactors that drive the overall rating of the
instructor. Providing faculty members
with more information on the determinants of instructor ratings should help
toeliminatesomeofthemistrustinherentinthecurrentsystem.Althoughthis
researchfocusesontheinstrumentthat
administrators use at GSU, the techniques that I use to evaluate the GSU
survey instrument are applicable to
otherinstitutions.
RELATEDLITERATURE
Becauseoftherelativelyhighweight
thatstudentratingsofinstructionpossess
TABLE1.StudentEvaluationofTeachingQuestionsandMedianScores
Downloaded by [Universitas Maritim Raja Ali Haji] at 22:43 11 January 2016
Surveyquestion
Q1
Q2
Q3
Q4
Q5
Q6
Q7
Q8
Q9
Q10
Q11
Q12
Q13
Q14
Q15
Q16
Q17
Q18
Q19
Q20
Mdn
score
Howmucheffortdidyouputintolearningthematerialcoveredinthiscourse?
Howmuchdidyoulearninthiscourse?
Towhatdegreewereyouintellectuallychallengedinthiscourse?
Howoftendidyouseekoutsidehelpwiththiscourse?
Howdifficultwasthiscourse?
Howwastheworkloadforthiscourse?
Overall,howwouldyouratethiscourse?
Thedegreetowhichimportantpointswerestressedinthiscourse.
Theinstructor’spreparationforthiscourse.
Theinstructor’sencouragementofclassparticipation,discussion,orquestions.
Theorganizationofthecoursematerial.
Theclarityofthepresentationofthecoursematerial.
Thedegreetowhichtestsandothergradedactivitiesreflectedcoursecontent.
Theinstructor’savailabilitytostudents.
Theinstructor’shelpfulnesstostudents.
Thedegreetowhichtheclassstayedfocusedoncourseobjectives.
Theinstructor’sinterestinthecontentofthiscourse.
Overall,howwouldyouratethisinstructor?
Yourlevelofinterestinthissubjectmatterbeforetakingthiscourse?
Yourlevelofinterestinthissubjectmatteraftertakingthiscourse?
Whatgradedoyouexpecttoreceiveinthisclass(A,B,C,D,orF)?
3.8
3.6
4.0
3.3
3.9
3.4
3.6
3.9
4.2
3.9
4.0
3.7
3.9
4.0
3.9
4.0
4.3
4.0
2.5
2.9
Note.Q=question.SurveyresponsesarebasedonaLikert-typescalerangingfrom1(verypoor)to5(verygood).
in the U.S. collegiate system, the academicresearchinthisareaisvast,accumulating for more than seven decades.
Thereareanumberofstudiesinprint,
and more emerge every day. The present literature review is purposely brief
and concise, but I direct readers to the
summaries of the existing literature
thatbothGermainandScandura(2005)
and Crumbley and Fliedner (2002)
haveprovided.
Griffin(2003)reportedthatthebulk
of the academic research showed that
faculty support the use of SET surveysasatoolforimprovingteaching,
butthatfacultyarealsouncomfortable
with the use of the SET surveys as
part of their annual evaluation process.Also,Yunker andYunker (2003)
reported that the majority of faculty
supportthevalidityofSETs,although
theyacknowledgedthatthereisagreat
deal of faculty discomfort with the
SET practices at many universities.
Germain and Scandura (2005) providedadditionaldiscussionof(a)thepros
and cons of student evaluations and
(b)facultyconcernsaboutthevalidity
of SETs. Moore (2006) reported that
SETs are in general effective measures of performance. At least some
of the faculty discomfort with SETs
must stem from a lack of trust in the
survey instrument’s ability to discern
qualityteachingfrompopularity.Also,
becauseofthehighdegreeofcorrelationamongtheanswerstotheindividual questions, it becomes difficult for
researchers and educators to decipher
theunderlyingmeaningofthestudent
responses.
Someuniversitiesprovidebenchmark
scores, whereas others do not, and the
lack of a benchmark may increase the
level of discomfort. Those instructors
withhigherratingstendtofavorthecurrentstudentevaluationsystem,whereas
thosewithlowerratingstendtobedismissive of it (Crumbley & Fliedner,
2002).Somefacultymembersfeelthat
higherratingsareassociatedwithgrade
inflationandloweracademicstandards
(Germain & Scandura, 2005; Griffin,
2004).Also,somefacultymembershave
criticized the student ratings because
they assume that the more quantitative business courses, such as statistics
and finance, generate lower evaluation
scores that reflect the topic and not
the instructor (Stapleton & Murkison,
2001). If these complaints are valid,
thentheuseofthestudentsurveysmay
haveanunintendednegativeimpacton
meritpayandpromotionforthefaculty
memberswhoteachinthoseareas.
One line of research has focused on
therelationsbetweenfacultydemographics and SETs. Researchers have found
linksbetweennonteachingfactors,such
asphysicalappearance,andSETscores.
Rinolo, Johnson, Sherman, and Misso
(2006)providedahistoricalsummaryof
the research along this axis and reported that studies have shown that physicallyattractiveinstructors(regardlessof
whethertheyaremaleorfemale)receive
higher ratings than their less attractive
colleagues.Felton,Mitchell,andStinson
(2004) reported that instructor ratings
online at ratemyprofessors.com highly
correlatedwith“hotness,”althoughFelton,Mitchell,andStintondidnotstrictly
definetheterm(p.3).
Researchers have often cited lax
grading standards as an inflator of student evaluations, although the evidence
to date has been inconclusive about
whether lax grading standards increase
evaluation scores. Two recent examples
includeMcPherson(2006),whoreported
thatgradeinflationincreasesSETs,and
Moore(2006),whofoundthatSETswere
not easily manipulated through grade
September/October2008
41
Downloaded by [Universitas Maritim Raja Ali Haji] at 22:43 11 January 2016
inflation. The perceived link between
higher grades and higher student evaluationshasraisedconcernsamongmany
faculty, and there is some evidence that
expected grades or actual grades positivelycorrelatewithstudentevaluations.
However, in many of these studies the
researchersmadenoefforttodetermine
whether the students had earned the
higher grades. Students should expect
to earn higher grades from excellent
teachers than from mediocre teachers.
Highergradesarenotthesameashigher
unearned grades, although much of the
discussionaboutgradeinflationseemsto
suggestthatthetwoaresynonymous.
Marsh and Roche (2000) found a
strongerlinkbetweenSETsandinstructorperformancethanbetweenSETsand
grades. Prior research has also shown
a correlation between faculty evaluationsoffaculty(e.g.,peerreviews)and
student evaluations of faculty teaching, although there are some lingering
questions about bias in those results
(Hobson & Talbot, 2001).Yunker and
Yunker(2003)craftedastudytoevaluatesubsequentperformanceinintermediate accounting to determine whether
the highly rated instructors of a basic
accounting course had better performing students in the follow-on courses.
Stapleton and Murkison (2001) also
tried to measure performance against
student evaluations by using a single
SETmeasureofoverallinstructorquality rather than the full set of survey
questions.IselyandSingh(2005)used
a fixed effects model to control for
instructor- and course-specific differencesinasampleof260economicsand
finance classes. Isely and Singh found
apositiverelationbetweentheirdependent variable, the average values for
25 different instructor evaluation questions,andrelativeexpectedgrades.One
ofthelimitationsoftheirstudyisthat,
although they have a detailed survey
instrument with multiple questions on
variousaspectsofteachingquality,they
didnotusethefullvalueoftheinformation in the SETs to measure effectiveness.MultipartSETsurveyinstruments
measure a variety of instructor and
course traits, thus simplistic measures
suchastheoverallinstructorratingcan
obfuscate more than illuminate in this
typeofempiricalstudy.
42
JournalofEducationforBusiness
Germain and Scandura (2005)
reviewedanumberofstudiesofbiasin
student ratings and suggested that more
careful construction of the student ratingofinstructionsurveyinstrumentmay
alleviate some of the unintended biases
built into simplistic student evaluations.
Consequently,manyuniversities(including GSU) have sought to improve the
evaluation process by moving to more
complex rating instruments. However,
asking more questions does not necessarilymaketheinstrumentmoreuseful.
Ifresearchersandeducatorscannotfully
interprettheresults,theadditionalquestionsaddmoreconfusionthanclarity.
TolandandDeAyala(2005)discussed
the various research efforts that have
produced multidimensional SET instrumentsovertheyears.TheycitedMarsh’s
(1987) 35-question survey instrument
that addresses nine latent factors dealing with instructional quality. Those
ninefactorsarelearning/value,instructor
enthusiasm, organization/clarity, group
interaction,individualrapport,breadthof
coverage, examination/grading, assignments/readings, and workload/difficulty.
TolandandDeAyalathenproposeda27question instrument that they designed
around three latent factors: instructor
course delivery, instructor/student interaction, and regulating student learning.
TolandandDeAyalausedfactoranalysis
inthedesignandanalysisoftheirinstrument, but their relatively small sample
sizeconstrictedtheirconclusions.
METHOD
Although a multidimensional SET
surveyinstrumentprovidesmoreinformationbywhichtoevaluateteaching,it
alsomakestheevaluationmoredifficult
withoutstepstobreakdowntheresults
forindependentevaluationofthescores
on the various dimensions to be evaluated.Factoranalysisisastatisticaltechnique that can take highly correlated
data, such as those of the responses to
SET survey questions, and reconfigure
them so that they provide an objective
measureoftheunderlyingtraitsthatare
most valued by students.When faculty
members better understand the survey
responses,theyshouldbeabletodevelop greater acceptance of the results of
the SET. If instructors continue to dis-
miss the SET as simply a reflection of
grade inflation or a popularity contest,
thenthoseinstructorswillneverreceive
the feedback from the SET, including
information about the traits that studentstrulyvalueinateacher.
In the present research, I used factor analysis to decompose the SET
responses from the survey instrument
that educators used at GSU’s College
of BusinessAdministration to provide
abetterunderstandingofthetraitsthat
students rate highest in their overall
evaluationofaninstructor.Therewere
two steps in the research design: (a)
developasetofmeasuresofthetraits
that are latent in the university’s SET
instrumentand(b)usemultipleregression analysis to evaluate how those
latenttraitsaffectthevalueofQuestion
18,theoverallinstructorrating.
I gathered the average responses by
class for each of the 20 SET questions
forfourcourses(seeTable1):principles
ofcorporatefinance,operationsmanagement,businessstatistics,andquantitative
methods. One of the reasons for using
theseparticularfourcourseswasrestrictionondataavailability,butanotherreasonwascomparability.Thesecoursesare
prerequisites for the capstone business
course, and all are somewhat quantitative in design.Also, unlike some of the
other courses in the common business
core,therearerelativelyfewnonbusiness
majorstakingthesecourses.
Thepresentdatarepresent33different
instructors over a 3-year period, ranging
fromadjunctinstructorstotenuredprofessors.Forthemostpart,instructorstaught
in only one of the four functional areas
(finance, operations management, businessstatistics,andquantitativemethods),
although a handful of instructors taught
inmorethanonearea.Inadditiontothe
class average scores from the SET surveys,Icomputed,(a)theexpectedgrade
pointaverage(GPA)foreachsectionfrom
the survey instrument and (b) the actual
GPA for the class from the university’s
grade-history database. I omitted Question18becauseitwasthedependentvariableinstagetwooftheresearch,andthere
were 21 original variables to include in
thefactoranalysis.
To be included in the analysis, the
averageclassscoreforthequestionsin
theSEThadtobebasedonatleast15
Downloaded by [Universitas Maritim Raja Ali Haji] at 22:43 11 January 2016
student responses. The lower limit on
thenumberofresponseswasatrade-off
betweendegreesoffreedomandstability of the average response values for
each of the 20 questions because, for
example,theaverageclassresponseina
classthathadonly3studentswouldnot
be comparable with the average class
response in a class of 30 students. A
totalof167usableobservations(classes) were used in the factor analysis.
These 167 classes represented the collectiveevaluationsofmorethan30differentinstructors,basedonthesurveys
frommorethan4,000students.Because
these classes are part of the common
business core, many—if not most—of
the surveys represent the same set of
students who were evaluating the variousinstructors.
Factoranalysisisastatisticalmethod
that researchers can use to reduce the
dimensions of a variable set of highly
correlated data into a smaller subset
of factors that are themselves linear
composites of the original variables.
Simply,itisadatareductiontechnique.
The factors that the analysis generates
areorthogonaltooneanother,butthey
still contain most of the information
from the original variable set. Because
the practical use of factor analysis is
to reduce a large number of correlated
variablesintoasmallersubsetofuncorrelatedvariables,thenumberoffactors
that the factor analysis process retains
isalmostalwaysfewerthanthenumber
of original variables. The number of
factorsthatareretaineddependsonthe
dimensionality of the original data and
the ability of the analyst to interpret
the resulting factors. Ideally, following
an orthogonal rotation procedure, each
of the resulting factors includes data
from several of the original variables,
and each of the original variables will
beincludedinonlyoneoftheresulting
factors. In practice, a variable may be
associated with more than one factor,
whichcanmakethefactorsmoredifficulttointerpret.Thefactorsareorthogonal to one another and can therefore
be used as the dependent variables in
a regression analysis without violating
themulticollinearityassumption.
I conducted this analysis by using
SAS,acomprehensivestatisticalanalysissystemthatincludesspecificsubrou
tinesforhundredsofstatisticalapplications. The SAS OnlineDoc 9.13 (SAS
Institute, 2004), the online user guide
for the SAS/STAT software product,
provides (a) an overview of the factor analysis concept and (b) detailed
and specific instructions on alternative
methods for conducting the analysis.
The subroutine for conducting a factor
analysis is known as PROC FACTOR
andisdescribedinSASOnlineDoc9.13
(SASInstitute).
RESULTS
FactorAnalysisResults
Oneofthedecisionsthatananalyst
mustmakeaspartofthefactoranalysis process is the number of factors
to retain. Two general rules are commonlyapplied:theminimum-eigenvalue-of-one rule and the scree diagram.
Theeigenvaluerulerequiresthateach
factorincludedexplainatleastasmuch
information as the original variables.
Therefore,onlyfactorsthatgeneratean
eigenvalueof1.0orhigherareretained.
Thesecondmethodisanevaluationof
ascreediagram,whichisalineplotof
the eigenvalues. Using this approach,
the analyst looks for a point at which
theeigenvaluesbecomelevel,meaning
that they are explaining less and less
of the total variability. A third criterion that researchers use in practice is
that each of the resulting factors must
be interpretable by the analyst. In the
present analysis, both the minimumeigenvalue-of-one rule and the scree
diagram indicated that either five or
sixfactorswouldbeappropriate.After
evaluatingtheresultsforboththefivefactormodelandthesix-factormodel,
I determined the five-factor set to be
themostinterpretable.
Table2showseachofthefivefactors
anditscorrelationwiththeoriginalset
of variables. For the factor analysis to
be interpretable, each of the original
variables should highly correlate with
onlyoneortwooftheresultingfactors.
By examining which of the original
variablesloadonwhichfactor,researcherscanidentifythelatenttraitsthateach
factorrepresents.Theoretically,thefactor analysis can generate as many factorsastherearevariablesintheoriginal
data set, but the goal is to generate a
reduceddatasetthateliminatesthemulticollinearity problems associated with
theoriginaldata.
Factor 1 had a high level of positive
correlation with the questions concerninginstructorpreparation,clarityofpresentation, relevance of the material to
learning, and the instructor’s focus on
course objectives. I labeled this factor
as quality of instruction, and it generally measures (a) the degree to which
the students felt that the instructor was
prepared and (b) their perceptions of
theoverallqualityofthepresentation.If
theinstructoriswellprepared,andifthe
course content and objectives are clear
tothestudents,researcherswouldexpect
themtofeelthatthecourseisorganized,
wellstructured,andvaluable.
Factor 2, which I labeled as course
rigor,washighlycorrelatedwiththose
surveyquestionsthatmeasuretherigor
of the coursework. The effort of the
student, intellectual challenge, course
difficulty, and workload on the student
allloadedpositivelyonthisfactor.
Factor3measuredthestudent’slevel
of interest in the subject matter before
and after taking the course (Questions
19and20).NotethatFactor1wasalso
highly loaded on Question 20, which
asked about the after-course interest in
the subject. Therefore, Factor 3 probablymeasuredtheoverallinterestofthe
studentandthestudent’sattitudetoward
the subject matter, whereas Factor 1
pickeduptheincreaseinstudentinterestaftertakingtheclass.
Factor 4, which I labeled as grades,
loadedhighlyonboththeaverageexpectedGPAofthestudentsintheclassand
theiractualaverageGPA.Itisinteresting
that, although these two GPA measures
highly correlated with one another, the
expectedGPAsthatstudentsreportedin
the survey were virtually always higher
than their actual class average GPAs.
This situation also could suggest that
when researchers or educators omit the
weakerstudents(thosefailingtheclass,
who have already given up and are not
attending)fromthesurvey,theresulting
evaluations bias upward. The practice
in the College of Business Administration at GSU is that the individual class
instructorchoosestheactualdaywithin
a4-weekperiodonwhichtoadminister
September/October2008
43
TABLE2.RotatedFactorPatternandFactorLoadings
Downloaded by [Universitas Maritim Raja Ali Haji] at 22:43 11 January 2016
Originalvariable
Q1
Q2
Q3
Q4
Q5
Q6
Q7
Q8
Q9
Q10
Q11
Q12
Q13
Q14
Q15
Q16
Q17
Q19
Q20
ActualGPA
ExpectedGPA
Factor1
Factor2
Factor3
Factor4
Factor5
Qualityof
Instruction
Course
Rigor
Levelof
Interest
Grades
Instructor
Helpfulness
.00
.78
.25
–.03
–.21
–.13
.86
.94
.94
.74
.96
.95
.88
.73
.78
.93
.76
.06
.56
.10
.31
.93
.36
.86
.76
.85
.66
.01
–.01
–.03
–.19
–.03
–.10
–.02
–.17
–.22
.02
–.11
–.08
–.08
–.22
–.33
.08
.39
.06
–.06
–.08
–.16
.37
.14
.06
.24
.03
.10
.08
.00
.06
.00
.17
.83
.80
–.01
.10
–.12
.10
–.26
–.12
–.38
.05
.21
.14
.03
.04
.05
.06
.32
.20
.18
.08
.02
–.01
.10
.67
.69
–.13
–.03
–.07
.06
–.02
–.06
.03
.06
–.01
.40
–.03
.01
.08
.53
.52
.01
.30
–.01
.11
.06
.02
Note.Valuesinboldarethefactorloadingsgreaterthan.4054.Q=surveyquestion;GPA=gradepointaverage.
the survey to their classes. Anecdotal
evidence suggests that some instructors
haveselectivelychosenthedatethatthey
administerthesurveysoastomanipulate
theresults.Forexample,someeducators
have said that a good day on which to
administer the SET is a Friday after an
exambecauselowerattendanceislikely,
especially by the weaker students. Both
(a) whether this proposed interpretation
isactuallytrueand(b)whatitseffecton
instructorratingswouldberemainareas
forfuturestudy.
Factor 5 was highly correlated with
Questions 14 and 15, concerning the
instructor’savailabilityandwillingness
toprovideoutsidehelptothestudents.
These questions also highly correlated
withFactor1,whichmeasuredtheoverall course quality. Therefore, Factor 5
was slightly more ambiguous. There
was also a relatively high loading with
Question10,whichmeasuredthedegree
towhichtheinstructorencouragesclass
participation and questions. Factor 5
seemed to measure some aspect of the
instructor’spersonality,approachability,
or openness with the students, which I
labeledasinstructorhelpfulness.
MultipleRegressionAnalysis
I then used the five factor scores for
eachcourseasexplanatoryvariablesina
multipleregressionmodeltoevaluatethe
impactofthesefivelatentcharacteristics
onQuestion18ofthesurveyinstrument,
the overall rating of the instructor. The
SAS factor analysis procedure outputs
scores that are standardized (i.e., M =
0, SD = 1) so that the five factors are
all on the same unit scale. The magnitudeofthecoefficientestimatesfromthe
regressionanalysisisthereforerelevant,
andcomparisonsbetweenthecoefficient
valuescanbemade.
Theregressionresults,whichIreport
inTable3,showthateachofthesefive
factorshasastrongstatisticallysignificantrelationwiththeoverallinstructor
rating.Forthemodel,r2isequalto.95,
indicatingthatthefivefactorsexplained
nearlyallofthevariationintheoverall
instructorrating(Question18).Because
TABLE3.MultipleRegressionResults
Variable
Label
Parameter
estimate
SE
Intercept
Factor1
Factor2
Factor3
Factor4
Factor5
QualityofInstruction
CourseRigor
LevelofInterest
Grades
InstructorHelpfulness
3.965
0.489
–0.075
0.057
0.094
0.135
0.0094
0.0094
0.0097
0.0096
0.0107
0.0100
423.63
51.53
–7.72
5.96
8.76
13.03
Note.ThedependentvariablewasresponsestoQuestion18(overallinstructorrating).
44
JournalofEducationforBusiness
t
p