Statistics in language research Analysis

ISSN: 1750-8649 (print)
ISSN: 1750-8657 (online)

Sociolinguistic
Studies

Review

Statistics in language research: Analysis of variance.
Tony Rietveld and Roeland van Hout (2005)
New York: Mouton de Gruyter. viii, pp. 265.
ISBN 978–3-11–018580–5

Reviewed by Jenifer Larson-Hall

With so many books on statistics in existence, I believe that any new book
on statistics should contain a clear raison d’être. In my opinion, Rietveld and
van Hout’s book lacks a clear target audience and I would recommend other
books on statistics instead. For example, this book, at only 265 pages, is not
one of the large, rather weighty, authoritative surveys on statistics like Howell
(2006), Tabachnick and Fidell (2006) or Kirk (1995). hese books give detailed

explanations of a large variety of statistical methods using examples from the
behavioral sciences and provide both mathematical and conceptual explanations of statistical procedures. In more recent editions they also give information
about how to calculate statistics with programs like SPSS, SAS, SYSTAT and
MINITAB. (I confess an especial fondness for Howell, whose explanative style
I ind very clear. I stay awake while walking through an example of how to
calculate sums of squares with Howell, which I consider a high compliment!)
Rietveld and van Hout is compact and provides some explanation about
how to use SPSS, which is clearly a widely-used statistical program in language
research, however it does not provide as much detail as one would really need
to understand the statistical procedure and results well. For example, although
the book gives instructions about which menus to follow in SPSS to conduct

Affiliation
University of North Texas, USA.
email: jenifer@unt.edu

Sols vol 2.2 2008 305–309
©2008, equinox publishing

doi : 10.1558/sols.v2i2.305


306

Sociolinguistic Studies

statistical procedures such as t-tests, one-way ANOVA, and MANOVA, it is
skimpy on procedures to assess the assumptions of the tests and never tells how
to call for any graphs (a practice which is recommended by Wilkinson and the
APA task force on statistical inference, 1999, in conjunction with looking at
numerical results). Rietveld and van Hout’s book is certainly not SPSS Survival
Manual (Pallant 2007), which at 352 pages is a functional introduction to the
essentials of SPSS statistical analysis without being overly simplistic.
One might think then that the value of this book is that it uses pertinent
examples to illustrate how statistics works in practice in the ield of linguistics.
One would also be mistaken here as well. he book contains relatively few
concrete examples of linguistic analysis. As an example, in discussing the four
scales of measurement, the nominal scale is illustrated by referring to language
background but the ratio scale is explained by talking about the price of cars
and the weight of objects. I found it highly annoying that in a book that would
have seemed to be written for a linguistic audience the paired-sample t-test was

illustrated by referring only to the variables ‘Group’ and ‘Dependent [variable]’.
Although some tests are exempliied with a speciic linguistic research design,
such as repeated measures over time on patients with dysfunctional velums by
a language pathologist, this turns out to be less common that I had hoped, and
the data are always intentionally fabricated. he linguistic examples which are
given range over a variety of subdisciplines of linguistics:

• Phonetics: Was a standard or non-standard allophone used?
• Psycholinguistics: What were the reaction times on diferent word types?
• Applied Linguistics: Which of three methods of vocabulary learning is
best?

• Clinical Linguistics: What are intelligibility scores of children with clet
palates?

• Sociolinguistics: Two Dutch-speaking language communities in Belgium
are studied, each one containing two dialect groups.
However these examples continue to be repeated in a vague form (they never
refer to any speciic studies) with little diference from chapter to chapter. For
example, no other type of psycholinguistic research is referred to except for

reaction times on words.
In reviews of the authors’ previous book, Statistical techniques for the study of
language and language behavior (1993), Meara (1995) and Banks (1995) both
note that the authors seemed to have misjudged their audience. he book is
highly technical and includes a large amount of mathematical explanation of
statistical procedures, including a lengthy explanation of matrix algebra in

review: larson-hall

307

an appendix. It is hard to see why linguists would need to understand matrix
algebra in order to use SPSS for their own data analysis, and the mathematical
examples are not explained in the kind of fashion, step-by-step, that most
linguists could follow. he book is certainly not one I would use for an introductory statistics class, as I found myself, no statistical novice, baled by some
explanations, and turned glassy-eyed in most of the mathematical excursuses.
Another problem with using this book for introductory statistics is that most
exercises are formal, technical questions (‘What are the p values associated
with z > 1.64 and with t29 > 1.699’, p. 30) rather than conceptual work with
situations that would lead beginners to understand what they need to do to

analyze their own data.
On the other hand, there is little here that is enlightening for those who are
more familiar with statistics. One reason is that the statistical topics are not as
wide-ranging as Rietveld and van Hout’s previous book (1993), as the focus
here is mostly on ANOVA techniques (with two chapters on one-sample and
two-sample t-tests). I did think that the illustration of various research designs
for ANOVA, which can be confusing, was handled quite well in Chapter 1. he
design boxes help clarify how many variables and which kinds are necessary
for data set-up.
For those who are already familiar with ANOVA designs, the inal two chapters provide some novel information. Chapter 9 includes a lengthy explanation
of diferent imputation methods that can be used to deal with missing data,
certainly a valuable topic especially for studies with repeated measures which
lose power if a whole row must be lost to one missing piece of data. Chapter
10 explains permutation tests, bootstrapping, and multilevel analysis (known
also as hierarchical linear modeling or HLM). None of these procedures can
be easily handled by SPSS (the book says SPSS syntax can be used for the
permutation tests but does not give it; bootstrapping is handled by the free
RESAMPLING program and HLM by the HLM5 sotware), but I agree with
the authors that such techniques will become part of statistical sotware packages in the future and linguists should be familiar with them. In fact, I would
go further to say that researchers will need to learn more about an arsenal

of techniques (including permutation tests and bootstrapping) in the tool
box called ‘robust statistics’ which do not rely on traditional assumptions of
normal distribution or equal variances of groups, and which are explained in
a very clear fashion in Wilcox (2001). hese types of statistics should not be
confused with traditional non-parametric statistics, which may lack power to
ind diferences. he authors focus some time and attention in the book to the
concept of power analysis to determine a priori sample sizes, but then go on
to repeat the outdated (and incorrect) assumption that ‘[i]n general, analysis
of variance is said to be robust with regard to violations of its assumptions’ (p.

308

Sociolinguistic Studies

126). Statistical simulation studies have found that this is true when the actual
situation is that there are no group diferences, but when group diferences
do exist, even small violations of assumptions can lead researchers to a Type
II error (inding no diferences when they really exist) (Hampel 1973; Tukey
1960; Wilcox 2003). It would of course be wonderful if researchers determined
sample sizes using power analysis, but robust methods of analysis could also be

helpful in improving power and accuracy of statistical analysis as well.
In short, I do not know where this book its in the panoply of statistical works
that already exist. It does not seem appropriate for beginners but may bale
even those who feel they have a irm understanding of which test to choose
for which occasion. It is not comprehensive enough to be an authoritative text
but it is not simple enough to be used to quickly get up and going with using
SPSS. It seems to be trying to ill a niche for those who want to understand
statistics by seeing examples in their own ield, but I believe it does not succeed
as well as a few quite recent books that have appeared or will appear: Baayen
(2008) successfully explains how to use advanced techniques such as principal
components analysis, factor analysis and linear mixed models that can be used
in studying language processing; Johnson (2008) provides a book organized by
linguistic topic including phonetics, psycholinguistics, sociolinguistics, historical linguistics and syntax, and which covers both elementary statistics such as
t-tests and one-way ANOVA as well as more advanced techniques like cluster
analysis, mixed-efects models and an interesting comparison between logistic
regression and Varbrul for Sociolinguists; or even my own book (Larson-Hall
forthcoming) which aims to be an introductory text for Applied Linguists.
All these books are illustrated with real data sets from real experiments and
provide many more opportunities for those working in their ields to see how
real, messy data sets can be analyzed with statistical sotware.

References
Baayen, R. H. (2008) Analyzing linguistic data: A practical introduction to statistics using R.
Cambridge: Cambridge University Press.
Banks, D. (1995) Review: Statistical techniques for the study of language and language
behavior. IRAL: International Review of Applied Linguistics in Language Teaching 33(1):
76–77.
Hampel, F. R. (1973) Robust estimation: A condensed partial survey. Zeitschrit für
Wahrscheinlichkeitstheorie und verwandte Gebiete 27: 87–104.
Howell, D. C. (2006) Statistical methods for psychology. (6th ed.) Paciic Grove, CA:
Duxbury/homson Learning.
Johnson, K. (2008) Quantitative methods in linguistics. Malden, MA: Blackwell.

review: larson-hall

309

Kirk, R. E. (1995) Experimental design: Procedures for the behavior sciences. Boston:
Brooks/Cole Publishing Company.
Larson-Hall, J. (forthcoming) A guide to doing statistical analysis in second language acquisition. Mahwah, NJ: Lawrence Erlbaum Associates.
Meara, P. (1995) Review: Statistical techniques for the study of language and language

behavior. Language Learning 54(2): 341–343.
Pallant, J. (2007) SPSS Survival manual. (3rd ed.) Philadelphia: Open University Press.
Rietveld, T. and Van Hout, R. (1993) Statistical techniques for the study of language and
language behaviour. Berlin: Mouton de Gruyter.
Tabachnick, B. G. and Fidell, L. S. (2006) Using multivariate statistics. (5th ed.) Boston:
Allyn and Bacon.
Tukey, J. W. (1960) A survey of sampling from contaminated distributions. In I. Olkin, S.
G. Ghwyne, W. Hoefding, W. G. Madow and H. B. Mann (eds) Contributions to probability and statistics: Essays in honour of Harold Hotelling 448–485. Stanford: Stanford
University Press.
Wilcox, R. (2001) Fundamentals of modern statistical methods: Substantially improving
power and accuracy. New York: Springer.
Wilcox, R. (2003) Applying contemporary statistical techniques. San Diego: Elsevier
Science.
Wilkinson, L. and Task force on statistical inference, APA, Science Directorate,
Washington, DC, US. (1999) Statistical methods in psychological journals: Guidelines
and explanations. American Psychologist 54(8): 594–604.