Second, that voting preference is independent of sex and social class

Multilevel modelling (also known as multilevel hand and voting preference and social class on the

regression) is a statistical method that recognizes other.

that it is uncommon to be able to assign students Suppose now that instead of casting our

in schools randomly to control and experimental data into a three-way classification as shown in

groups, or indeed to conduct an experiment that Box 25.17, we had simply used a 2 × 2 contingency

requires an intervention with one group while table and that we had sought to test the null

maintaining a control group (Keeves and Sellin hypothesis that there is no relationship between

sex and voting preference. The data are shown in Typically in most schools, students are brought Box 25.21.

together in particular groupings for specified When we compute chi-square from the above

purposes and each group of students has its own

different characteristics which renders it different of freedom are given by (r − 1)(c − 1) = (2 − 1)

data our obtained value is χ 2 =

4.48. Degrees

from other groups. Multilevel modelling addresses (2 − 1) = 1.

the fact that, unless it can be shown that different From chi-square tables we see that the critical

groups of students are, in fact, alike, it is generally value of χ 2 with 1 degree of freedom is 3.84 at

inappropriate to aggregate groups of students or p=

0.05. Our obtained value exceeds this. We data for the purposes of analysis. Multilevel models reject the null hypothesis and conclude that sex is

avoid the pitfalls of aggregation and the ecological significantly associated with voting preference.

fallacy (Plewis 1997: 35), i.e. making inferences But how can we explain the differing

about individual students and behaviour from conclusions that we have arrived at in respect

aggregated data.

of the data in Boxes 25.17 and 25.21? These Data and variables exist at individual and examples illustrate an important and general

group levels, indeed Keeves and Sellin (1997) point, Whiteley (1983) observes. In the bivariate

break down analysis further into three main analysis (Box 25.21) we concluded that there

levels: between students over all groups, between was a significant relationship between sex and

groups, and between students within groups. One voting preference. In the multivariate analysis

could extend the notion of levels, of course, to (Box 25.17) that relationship was found to be non-

include individual, group, class, school, local, re- significant when we controlled for social class. The

gional, national and international levels (Paterson

584 MULTIDIMENSIONAL MEASUREMENT

and Goldstein 1991). This has been done us- ing multilevel regression and hierarchical linear modelling. Multilevel models enable researchers to ask questions hitherto unanswered, e.g. about variability between and within schools, teach- ers and curricula (Plewis 1997: 34–5), in short about the processes of teaching and learning. 4 Useful overviews of multilevel modelling can

be found in Goldstein (1987), Fitz-Gibbon (1997) and Keeves and Sellin (1997). Multilevel analysis avoids statistical treatments associated with experimental methods (e.g. analysis of variance and covariance); rather it uses regression analysis and, in particular, multilevel regression. Regression analysis, argues Plewis (1997: 28), assumes homoscedasticity (where the residuals demonstrate equal scatter), that the residuals are independent of each other, and finally, that the residuals are normally distributed.

The whole field of multilevel modelling has proliferated rapidly since the early 1990s and is the basis of much research that is being undertaken on the ‘value added’ component of education and the comparison of schools in public ‘league tables’ of results (Fitz-Gibbon 1991; 1997). However, Fitz-Gibbon (1997: 42–4) provides important evidence to question the value of some forms of multilevel modelling. She demonstrates that residual gain analysis provides answers to questions about the value-added dimension of education which differ insubstantially from those answers that are given by multilevel modelling (the lowest correlation coefficient being 0.93 and 71.4 per cent of the correlations computed correlating between 0.98 and 1). The important point here is that residual gain analysis is

a much more straightforward technique than multilevel modelling. Her work strikes at the heart of the need to use complex multilevel modelling to assess the ‘value-added’ component of education. In her work (Fitz-Gibbon 1997: 5) the value-added score – the difference between a statistically-predicted performance and the actual performance – can be computed using residual gain analysis rather than multilevel modelling.

Nonetheless, multilevel modelling now attracts worldwide interest.

Whereas ordinary regression models do not make allowances, for example, for different schools (Paterson and Goldstein 1991), multilevel regression can include school differences, and, in- deed other variables, for example: socio-economic status (Willms 1992), single and co-educational schools (Daly 1996; Daly and Shuttleworth 1997), location (Garner and Raudenbush 1991), size of school (Paterson 1991) and teaching styles (Zuzovsky and Aitkin 1991). Indeed Plewis (1991) indicates how multilevel modelling can be used in longitudinal studies, linking educational progress with curriculum coverage.

Cluster analysis

Whereas factor analysis and elementary linkage analysis enable the researcher to group together factors and variables, cluster analysis enables the researcher to group together similar and homogeneous subsamples of people. This is best approached through software packages such as SPSS, and we illustrate this here. SPSS creates

a dendrogram of results, grouping and regrouping groups until all the variables are embraced. For example, here is a simple cluster based on

20 cases (people). Imagine that their scores have been collected on an item concerning the variable ‘the attention given to teaching and learning in the school’. One can see that, at the most general level there are two clusters (cluster one = persons

5, 17; cluster two = persons 7, 8, 4, 3, 6). If one were to wish to have smaller clusters then three groupings could be found: cluster one: persons 19,

20, 2, 13, 15, 9, 11, 18; cluster two: persons 14, 16,

1, 10, 12, 5, 17; cluster three: persons 7, 8, 4, 3, 6 (Box 25.22). Using this analysis enables the researcher to identify important groupings of people in a post-hoc analysis, i.e. not setting up the groupings and subgroupings at the stage of sample design, but after the data have been gathered. In the

CLUSTER ANALYSIS 585

Chapter

Box 25.22

example of the two-group cluster here one could Cluster analysis

examine the characteristics of those participants who were clustered into groups one and two, and,

Rescaled Distance Cluster Combine

for the three-group cluster, one could examine the characteristics of those participants who were

CASE

clustered into groups one, two and three for the variable ‘the attention given to teaching and

Label Num +---------+---------+---------+---------+---------+

20 learning in the school’.

26 Choosing a statistical test

There are very many statistical tests available to graphic means of presenting the issues in this the researcher. Which test one employs depends

chapter (see http://www.routledge.com/textbooks/ on several factors, for example:

9780415368780 – Chapter 26, file 26.1.doc).

the purpose of the analysis (e.g. to describe or explore data, to test a hypothesis, to seek

How many samples?

correlations, to identify the effects of one or more independent variables on a dependent

In addition to the scale of data being used (nominal, ordinal, interval, ratio), the kind of

variable, to identify differences between two or statistic that one calculates depends in part more groups; to look for underlying groupings on, first, whether the samples are related to, of data, to report effect sizes)

the kinds of data with which one is working or independent of, each other, and second, the (parametric and non-parametric)

number of samples in the test. With regard to

the scales of data being used (nominal, ordinal, the first point, as we have seen in previous interval, ratio)

chapters, different statistics are sometimes used

the number of groups in the sample when groups are related to each other and when

the assumptions in the tests they are independent of each other. Groups will

whether the samples are independent of each

be independent when they have no relationship other or related to each other.

to each other, e.g. in conducting a test to see if there is any difference between the voting of males

Researchers wishing to use statistics will need to and females on a particular item, say mathematics ask questions such as:

performance. The tests that one could use here are,

for example: the chi-square test (for nominal data), What statistics do I need to answer my research the Mann-Whitney U test and Kruskal-Wallis questions?

(for ordinal data), and the t-test and analysis of Are the data parametric or non-parametric?

variance (ANOVA) for interval and ratio data. How many groups are there (e.g. two, three or However, there are times when the groups more)?

might be related. For example, we may wish to Are the groups related or independent?

measure the performance of the same group at What kind of test do I need (e.g. a difference

test, a correlation, factor analysis, regression)? two points in time – before and after a particular intervention – or we may wish to measure the

We have addressed several of these points in voting of the same group on two different factors, the preceding chapters; those not addressed in

say preference for mathematics and preference for previous chapters are addressed here. In this

music. Here it is not different groups that are being chapter we draw together the threads of the

involved, but the same group on two occasions and discussion of statistical analysis and address what,

the same two on two variables respectively. In this for many researchers, can be a nightmare: deciding

case different statistics would have to be used, for which statistical tests to use. In the interests

example the Wilcoxon test, the Friedman test, the of clarity we have decided to use tables and

t-test for paired samples, and the sign test. Let us

HOW MANY SAMPLES? 587

Chapter

give a frequently used example of an experiment With regard to the number of samples in the (Box 26.1).

test, there are statistical tests which are for single In preceding chapters we have indicated which

samples (one group only, e.g. a single class in tests are to be used with independent samples and

school), for two samples (two groups, e.g. males and which are to be used with related samples.

females in a school) and for three or more samples

Box 26.1

Identifying statistical tests for an experiment

Control group

Control group

Wilcoxon test or t-test

for paired samples

t-test for independent samples for the pretest

t-test for independent

(depending on data

samples for the post-test →

Wilcoxon test or t-test

group

for paired samples (depending on data →

type)

Box 26.2

Statistical tests to be used with different numbers of groups of samples

Scale One sample

More than two samples of data

Two samples

Related Nominal

Fisher exact test

McNemar

Chi-square (χ 2 )

Cochran Q

k-samples test

Chi-square (χ 2 )

Chi-square (χ 2 )

one-sample test

two-samples test

Ordinal Kolmogorov-

Kruskal-Wallis test Friedman test Smirnov

Mann-Whitney U test

Wilcoxon

matched pairs test

one-sample test

Kolmogorov-Smirnov

Sign test

Ordinal regression

test

analysis

Wald-Wolfowitz Spearman rho Ordinal regression

analysis

One-way ANOVA Repeated and ratio

Interval t-test

t-test

t-test for paired

samples

measures ANOVA

Pearson product

Two-way ANOVA

moment correlation Tukey hsd test

Scheff´e test

588 CHOOSING A STATISTICAL TEST

Box 26.3

Types of statistical tests for four scales of data

Nominal

Ordinal

Interval and ratio

Measures of

Tetrachoric

Spearman’s rho

Pearson product-moment

Point biserial

Kendall rank order

correlation

correlation

Phi coefficient

Kendall partial rank correlation

Cramer’s V

Measures of

Chi-square

Mann-Whitney U test

t-test for two independent

t-test for two related samples

Cochran Q

Wilcoxon matched pairs

One-way ANOVA

Binomial test

Friedman two-way analysis of

Two-way ANOVA for more

variance Wald-Wolfowitz test

Tukey hsd test

Kolmogorov-Smirnov test

Scheff´e test

Measures of linear

Ordinal regression analysis

Linear regression

relationship between independent and dependent variables

Multiple regression

Identifying

Factor analysis

underlying factors, data reduction

Elementary linkage analysis

(e.g. parents, teachers, students and administrative The statistical tests to be used also depend on staff in a school). Tests which can be applied

the scales of data being treated (nominal – ratio) to a single group include the binomial test, the

and the tasks which the researcher wishes to chi-square one-sample test, and the Kolmogorov-

perform – the purpose of the analysis (e.g. to Smirnov one-sample test; tests which can be

discover differences between groups, to look applied to two groups include the chi-square test,

for degrees of association, to measure the Mann-Whitney U test, the t-test, the Spearman

effect of one or more independent variables and Pearson tests of correlation; tests which can

on a dependent variable etc.). In preceding

be applied to three or more samples include the chapters we have described the different scales chi-square test, analysis of variance and the Tukey

of data and the kinds of tests available test. We set out some of these tests in Box 26.2. It

for different purposes. In respect of these is essential to use the correct test for the correct

considerations, Box 26.3 summarizes some of the number of groups.

main tests here.

HOW MANY SAMPLES? 589

Chapter

Box 26.4

Choosing statistical tests for parametric and non-parametric data

Non-parametric

data

Descriptive Correlation

Goodness

Differences

Effects of

variables on

regression

dependent

Frequencies Spearman’s

Three or Mode

Three or

more related Cross-tabulations

Wallis test

test

U test

Parametric data

Descriptive Correlation

Differences

Effect size

Effects of

Grouping of

variables on

(reduction)

dependent

Cohen’s d

variable

Eta 2

Frequencies Pearson’s

Factor Mode

Regression analysis Mean

Standard deviation correlation Median

coefficient

Two related

Paired

Two or more

Three or

Three or more related

Repeated

samples

measures

ANOVA

590 CHOOSING A STATISTICAL TEST

Box 26.5

Statistics available for different types of data

Data type Legitimate statistics

Points to observe/questions/examples

Nominal Mode (the score achieved by the Is there a clear ‘front runner’ that receives the highest greatest number of people)

score with low scoring on other categories, or is the modal score only narrowly leading the other categories? Are there two scores which are vying for the highest score – a bimodal score?

Frequencies Which are the highest/lowest frequencies? Is the distribution even across categories?

Chi-square (a statistic that charts the Are differences between scores caused by chance/accident difference between statistically expected

or are they statistically significant, i.e. not simply caused by and actual scores)

chance?

Ordinal Mode Which score on a rating scale is the most frequent? Median (the score gained by the middle

What is the score of the middle person in a list of scores? person in a ranked group of people or, if there is an even number of cases, the score which is midway between the highest score obtained in the lower half of the cases and the lowest score obtained in the higher half of the cases).

Frequencies Do responses tend to cluster around one or two categories of a rating scale? Are the responses skewed towards one end of a rating scale (e.g. ‘strongly agree’)? Do the responses pattern themselves consistently across the sample? Are the frequencies generally high or generally low (i.e. whether respondents tend to feel strongly about an issue)? Is there a clustering of responses around the central categories of a rating scale (the central tendency, respondents not wishing to appear to be too extreme)?

Chi-square Are the frequencies of one set of nominal variables (e.g. sex) significantly related to a set of ordinal variables?

Spearman rank order correlation (a Do the results from one rating scale correlate with the statistic to measure the degree of

results from another rating scale? Do the rank order association between two ordinal

positions for one variable correlate with the rank order variables)

positions for another variable?

Mann-Whitney U-test (a statistic to Is there a significant difference in the results of a rating measure any significant difference

scale for two independent samples (e.g. males and between two independent samples)

females)?

Kruskal-Wallis analysis of variance (a Is there a significant difference between three or more statistic to measure any significant

nominal variables (e.g. membership of political parties) and differences between three or more

the results of a rating scale?

independent samples) continued

ASSUMPTIONS OF TESTS 591

Chapter

Box 26.5

continued

Interval and ratio Mode Mean

What is the average score for this group? Frequencies

Median Chi-square Standard deviation (a measure of the

Are the scores on a parametric test evenly distributed? dispersal of scores)

Do scores cluster closely around the mean? Are scores widely spread around the mean? Are scores dispersed evenly? Are one or two extreme scores (‘outliers’) exerting a disproportionate influence on what are otherwise closely clustered scores?

z-scores (a statistic to convert How do the scores obtained by students on a test scores from different scales, i.e. with

which was marked out of 20 compare to the scores by different means and standard

the same students on a test which was marked out of deviations, to a common scale, i.e.

with the same mean and standard deviation, enabling different scores to be compared fairly)

Pearson product-moment Is there a correlation between one set of interval data correlation (a statistic to measure

(e.g. test scores for one examination) and another set the degree of association between

of interval data (e.g. test scores on another two interval or ratio variables)

examination)?

t-tests (a statistic to measure the Are the control and experimental groups matched in difference between the means of

their mean scores on a parametric test? Is there a one sample on two separate

significant different between the pretest and post-test occasions or between two samples

scores of a sample group?

on one occasion) Analysis of variance (a statistic to

Are the differences in the means between test results ascertain whether two or more

of three groups statistically significant?

means differ significantly)

The type of tests used also vary according

Assumptions of tests

to whether one is working with parametric or non-parametric data. Boxes 26.4 and 26.5 draw

Statistical tests are based on certain assumptions. It is important to be aware of these assumptions and

together and present the kinds of statistical tests to operate fairly within them. Some of the more available, depending on whether one is using

parametric or non-parametric data, together with widely used tests have the following assumptions (Box 26.6).

the purpose of the analysis. Box 26.5 sets out The choice of which statistics to employ is not the commonly used statistics for data types and arbitrary, but dependent on purpose. purposes (Siegel 1956; Cohen and Holliday 1996;

Hopkins et al. 1996).

592 CHOOSING A STATISTICAL TEST

Box 26.6

Assumptions of statistical tests

Data are normally distributed, with no outliers.

Mode There are few values, and few scores, occurring which have a similar frequency. Median

There are many ordinal values.

Chi-square

Data are categorical (nominal). Randomly sampled population. Mutually independent categories. Data are discrete (i.e. no decimal places between data points).

80 per cent of all the cells in a cross-tabulation contain 5 or more cases. Kolmogorov-Smirnov

The underlying distribution is continuous. Data are nominal.

t-test and analysis of

Population is normally distributed.

variance

Sample is selected randomly from the population. Each case is independent of the other. The groups to be compared are nominal, and the comparison is made using interval and ratio data. The sets of data to be compared are normally distributed (the bell-shaped Gaussian curve of distribution). The sets of scores have approximately equal variances, or the square of the standard deviation is known. The data are interval or ratio.

Wilcoxon test

The data are ordinal. The samples are related.

Mann-Whitney and The groups to be compared are nominal, and the comparison is made using ordinal data. Kruskal-Wallis

The populations from which the samples are drawn have similar distributions. Samples are drawn randomly. Samples are independent of each other.

Spearman rank order

The data are ordinal.

correlation Pearson correlation

The data are interval and ratio.

Regression (simple

Assumptions underlying regression techniques:

and multiple)

The data derive from a random or probability sample. The data are interval or ratio (unless ordinal regression is used). Outliers are removed. There is a linear relationship between the independent and dependent variables. The dependent variable is normally distributed (the bell-shaped Gaussian curve of distribution). The residuals for the dependent variable (the differences between calculated and observed scores) are approximately normally distributed. Collinearity is removed (where one independent variable is an exact or very close correlate of another).

Factor analysis

The data are interval or ratio. The data are normally distributed. Outliers have been removed. The sample size should not be less than 100–150 persons. There should be at least five cases for each variable. The relationships between the variables should be linear. The data must be capable of being factored.

Notes

1 THE NATURE OF INQUIRY – SETTING THE FIELD

1 We are not here recommending, nor would we wish of scientific research, Mishler contends, is largely to encourage, exclusive dependence on rationally

tacit and unexplicated; moreover, scientists learn it derived and scientifically provable knowledge for the

through a process of socialization into a ‘particular conduct of education – even if this were possible.

form of life’. The discovery, testing and validation There is a rich fund of traditional and cultural

of findings is embedded in cultural and linguistic wisdom in teaching (as in other spheres of life)

practices and experimental scientists proceed in which we would ignore to our detriment. What we

pragmatic ways, learning from their errors and are suggesting, however, is that total dependence

failures, adapting procedures to their local contexts, on the latter has tended in the past to lead to

making decisions on the basis of their accumulated an impasse: and that for further development and

experiences. See, for example, Mishler, E. G. (1990) greater understanding to be achieved education must

Validation in inquiry-guided research: the role of needs resort to the methods of science and research.

exemplars in narrative studies. Harvard Educational

2 A classic statement opposing this particular view of Review , 60 (4), 415–42. science is that of Kuhn, T. S. (1962) The Structure

5 See, for example, Rogers, C. R. (1969) Freedom of Scientific Revolutions . Chicago, IL: University

to Learn . Columbus, OH: Merrill, and also Rogers, of Chicago Press. Kuhn’s book, acknowledged as

C. R. and Stevens, B. (1967) Person to Person: The an intellectual tour de force, makes the point

Problem of Being Human . London: Souvenir. that science is not the systematic accumulation of

6 Investigating social episodes involves analysing knowledge as presented in textbooks; that it is a

the accounts of what is happening from the far less rational exercise than generally imagined. In

points of view of the actors and the participant effect, it is ‘a series of peaceful interludes punctuated

spectator(s)/investigator(s). This is said to yield by intellectually violent revolutions . . . in each of

three main kinds of interlocking material: images of which one conceptual world view is replaced by

the self and others, definitions of situations, and rules another.’

for the proper development of the action. See Harr´e,

3 For a straightforward overview of the discussions R. (1976) The constructive role of models. In here see Chalmers, A. F. (1982) What Is This Thing

L. Collins (ed.) The Use of Models in the Social Called Science? (second edition). Milton Keynes:

Sciences . London: Tavistock. Open University Press.

7 See also Verma, G. K. and Beard, R. M. (1981) What

4 The formulation of scientific method outlined earlier is Educational Research? Aldershot: Gower, for further has come in for strong and sustained criticism.

information on the nature of educational research

E. G. Mishler, for example, describes it as a and also a historical perspective on the subject. ‘storybook image of science’, out of tune with the actual practices of working scientists who turn

2 THE ETHICS OF EDUCATIONAL AND SOCIAL out to resemble craftpersons rather than logicians.

RESEARCH

By craftpersons, Mishler (1990) is at pains to stress that competence depends upon ‘apprenticeship training, continued practice and experienced-based,

1 For example, Social Research Association (2003); contextual knowledge of the specific methods

American Sociological Association (1999); British applicable to a phenomenon of interest rather than

Educational Research Association (2000); Ameri- an abstract ‘‘logic of discovery’’ and application of

can Psychological Association (2002); British So- formal ‘‘rules’’ ’ (Mishler 1990). The knowledge base

ciological Association (2002); British Psychological

594 NOTES

Society (2005). Comparable developments may be Wade, N. (1983) Betrayers of Truth: Fraud and Deceit found in other fields of endeavour. For an exami-

in the Halls of Science . New York: Century. nation of key ethical issues in medicine, business, and journalism together with reviews of common

5 SENSITIVE EDUCATIONAL RESEARCH ethical themes across these areas, see Serafini, A. (ed.) (1989) Ethics and Social Concern. New York:

1 See also Walford (2001: 38) in his discussion of Paragon House. The book also contains an ac-

gaining access to UK public schools, where an early count of principal ethical theories from Socrates to

question that was put to him was ‘Are you one of R. M. Hare.

us?’

2 US Dept of Health, Education and Welfare, Public

2 Walford (2001: 69) comments on the very negative Health Service and National Institute of Health

attitudes of teachers to research on UK independent (1971) The Institutional Guide to D.H.E.W. Policy

schools, the teachers feeling that researchers had on Protecting Human Subjects . DHEW Publication

been dishonest and had tricked them, looking only (NIH): 2 December, 72–102.

for salacious, sensational and negative data on the

3 As regards judging researchers’ behaviour, perhaps school (e.g. on bullying, drinking, drugs, gambling the only area of educational research where the

and homosexuality).

term ethical absolute can be unequivocally applied and where subsequent judgement is unquestionable

8 HISTORICAL AND DOCUMENTARY is that concerning researchers’ relationship with

RESEARCH

their data. Should they choose to abuse their data

1 By contrast, the historian of the modern period, for whatever reason, the behaviour is categorically

i.e. the nineteenth and twentieth centuries, is more wrong; no place here for moral relativism. For

often faced in the initial stages with the problem once a clear dichotomy is relevant: if there is

of selecting from too much material, both at the such a thing as clearly ethical behaviour, such

stage of analysis and writing. Here the two most abuse is clearly unethical. It can take the form

common criteria for such selection are the degree of of, first, falsifying data to support a preconceived,

significance to be attached to data, and the extent often favoured, hypothesis; second, manipulating

to which a specific detail may be considered typical data, often statistically, for the same reason

of the whole.

(or manipulating techniques used – deliberately

2 However, historians themselves usually reject such including leading questions, for example); third,

a direct application of their work and rarely indulge using data selectively, that is, ignoring or excluding

in it on the grounds that no two events or the bits that don’t fit one’s hypothesis; and fourth,

contextual circumstances, separated geographically going beyond the data, in other words, arriving

and temporally, can possibly be equated. As the at conclusions not warranted by them (or over-

popular sayings go, ‘History never repeats itself’ and interpreting them). But even malpractice as serious

so, ‘The only thing we can learn from History is that as these examples cannot be controlled by fiat:

we can learn nothing from History’. ethical injunctions would hardly be appropriate

3 Thomas, W. I. and Znaniecki, F. (1918) The in this context, let alone enforceable. The only

Polish Peasant in Europe and America. Chicago, answer (in the absence of professional monitoring)

IL: University of Chicago Press. For a fuller is for the researcher to have a moral code that is

discussion of the monumental work of Thomas and ‘rationally derived and intelligently applied’, to use

Znaniecki, see Plummer, K. (1983) Documents of the words of the philosopher, R. S. Peters, and to

Life: An Introduction to the Problems and Literature

be guided by it consistently. Moral competence, like of a Humanistic Method. London: Allen & Unwin, other competencies, can be learned. One way of

especially Chapter 3, The Making of a Method. acquiring it is to bring interrogative reflection to

See also Madge, J. (1963) The Origin of Scientific bear on one’s own code and practice, e.g. did I

Sociology. London: Tavistock. For a critique of provide suitable feedback, in the right amounts, to

Thomas and Znaniecki, see Riley, M. W. (1963) the right audiences, at the right time? In sum, ethical

Sociological Research 1: A Case Approach . New York: behaviour depends on the concurrence of ethical

Harcourt, Brace & World. thinking which in turn is based on fundamentally

4 Sikes, P., Measor, L. and Woods, P. (1985) Teacher thought-out principles. Readers wishing to take

Careers . Lewes: Falmer. See also Smith, L. M. (1987) the subject of data abuse further should read Peter

Kensington Revisited . Lewes: Falmer; Goodson, I. Medawar’s (1991) elegant and amusing essay,

and Walker, R. (1988) Putting life into educa- ‘Scientific fraud’, in D. Pike (ed.) The Threat and the

tional research. In R. R. Sherman and R. B. Webb Glory: Reflections on Science and Scientists . Oxford:

(eds) Qualitative Research in Education: Focus and Oxford University Press, and also Broad, W. and

Methods . Lewes: Falmer; Acker, S. (1989) Teachers,

NOTES 595

Notes

Gender and Careers . Lewes: Falmer; Blease, D. and of bullying by their pupils: a study to investigate

Cohen, L. (1990) Coping with Computers: An incidence. British Journal of Educational Psychology, Ethnographic Study in Primary Classrooms . London:

68, 255–68; Hall, K. and Nuttall, W. (1999) The Paul Chapman; Evetts, J. (1990) Women in Pri-

relative importance of class size to infant teachers mary Teaching . London: Unwin Hyman; Goodson,

in England. British Educational Research Journal, 25

I. (1990) The Making of Curriculum. Lewes: (2), 245–58; Rigby, K. (1999) Peer victimisation Falmer; Evetts, J. (1991) The experience of sec-

at school and the health of secondary school ondary headship selection: continuity and change.

students. British Journal of Educational Psychology, 69, Educational Studies , 17 (3), 285–94; Sikes, P. and

95–104; Strand, S. (1999) Ethnic group, sex and Troyna, B. (1991) True stories: a case study in the

economic disadvantage: associations with pupils’ use of life histories in teacher education. Educa-

educational progress from Baseline to the end of tional Review , 43 (1), 3–16; Winkley, D. (1995)

Key Stage 1. British Educational Research Journal, 25 Diplomats and Detectives: LEA Advisers and Work .

London: Robert Royce. Examples of different kinds of survey studies are as follows: Francis’s (1992) ‘true cohort’ study of

9 SURVEYS, LONGITUDINAL, patterns of reading development, following a group CROSS-SECTIONAL AND TREND STUDIES

of 54 young children for two years at six-monthly intervals; Blatchford’s (1992) cohort, cross-sectional

1 There are several examples of surveys, including study of 133–175 children (two samples) and their the following: Millan, R., Gallagher, M. and Ellis, R.

attitudes to work at 11 years of age; a large- (1993) Surveying adolescent worries: development

scale, cross-sectional study by Munn et al. (1990) of the ‘Things I Worry About’ scale. Pastoral

into pupils’ perceptions of effective disciplinarians, Care in Education , 11 (1), 43–57; Boulton, M. J.

with a sample size of 543; a trend/prediction study (1997) Teachers’ views on bullying: definitions,

of school building requirements by a government attitudes and abilities to cope. British Journal of

department (Department of Education and Science Educational Psychology , 67, 223–33; Cline, T. and

1977), identifying building and improvement needs Ertubney, C. (1997) The impact of gender on

based on estimated pupil populations from births primary teachers’ evaluations of children’s difficulties

during the decade 1976–86; a survey study by Belson in school. British Journal of Educational Psychology,

(1975) of 1,425 teenage boys’ theft behaviour; a

67, 447–56; Dosanjh, J. S. and Ghuman, P. A. S. survey by Hannan and Newby (1992) of 787 student (1997) Asian parents and English education – 20

teachers (with a 46 per cent response rate) and years on: a study of two generations. Educational

their views on government proposals to increase the Studies , 23 (3), 459–72; Foskett, N. H. and

amount of time spent in schools during the training Hesketh, A. J. (1997) Constructing choice in

period.

continuous and parallel markets: institutional

2 Examples of longitudinal and cross-sectional stud- and school leavers’ responses to the new post-

ies include the following: Davies, J. and Brember, I

16 marketplace. Oxford Review of Education, (1997) Monitoring reading standards in year 6:

a 7-year cross-sectional study. British Educational Knip, D. (1997) Science education policy: a

23 (3), 299–319; Gallagher, T., McEwen, A. and

Research Journal , 23 (5), 615–22; Preisler, G. M. survey of the participation of sixth-form pupils

and Ahstr¨om, M. (1997) Sign language for hard in science and the subjects over a 10-year

of hearing children – a hindrance or a benefit for period, 1985–95. Research Papers in Education,

their development? European Journal of Psychol-

12 (2), 121–42; Jules, V. and Kutnick, P. (1997) ogy of Education , 12 (4), 465–77; Busato, V. V., Student perceptions of a good teacher: the gender

Prins, F. J., Elshant, J. J. and Hamaker, C. (1998) perspective. British Journal of Educational Psychology,

Learning styles: a cross-sectional and longitudi-

67, 497–511; Borg, M. G. (1998) Secondary nal study in higher education. British Journal of school teachers’ perceptions of pupils’ undesirable

Educational Psychology , 68, 427–41; Davenport, E. behaviours. British Journal of Educational Psychology,

C. Jr, Davison, M. L., Kuang, H., Ding, S., Kin,

68, 67–79; Papasolomoutos, C. and Christie, T. S-K. and Kwak, N. (1998) High school math- (1998) Using national surveys: a review of

ematics course-taking by gender and ethnic- secondary analyses with special reference to schools.

ity. American Educational Research Journal, 35 Educational Research , 40 (3), 295–310; Tatar, M.

(3), 497–514; Davies, J. and Brember, I. (1998) (1998) Teachers as significant others: gender

Standards in reading at key stage 1 – a cross- differences in secondary school pupils’ perceptions.

sectional study. Educational Research, 40 (2), 153–60; British Journal of Educational Psychology , 68,

Marsh, H. W. and Yeung, A. S. (1998) Lon- 255–68; Terry, A. A. (1998) Teachers as targets

gitudinal structural equation models of academic

596 NOTES

self-concept and achievement: gender differences

2 Examples of experimental research can be seen in the in the development of math and English con-

following: Dugard, P. and Todman, J. (1995) Anal- structs. American Educational Research Journal, 35 (4),

ysis of pre-test and post-test control group designs in 705–38; Noack, P. (1998) School achievement and

educational research. Educational Psychology, 15 (2), adolescents’ interactions with the fathers, mothers,

181–98; Bryant, P., Devine, M., Ledward, A. and and friends. European Journal of Psychology of Ed-

Nunes, T. (1997) Spelling with apostrophes and ucation , 13 (4), 503–13; Galton, M., Hargreaves, L,

understanding possession. British Journal of Educa- Comber, C., Wall, D. and Pell, T. (1999) Changes

tional Psychology , 67, 91–110; Hall, E., Hall, C. and in patterns in teacher interaction in primary class-

Abaci, R. (1997) The effects of human relations rooms, 1976–1996. British Educational Research Jour-

training on reported teacher stress, pupil control nal , 25 (1), 23–37.

ideology and locus of control. British Journal of Educa-

3 For further information on event-history analysis tional Psychology , 67, 483–96; Marcinkiewicz, H. R. and hazard rates we refer readers to Allison (1984);

and Clariana, R. B. (1997) The performance effects Plewis (1985); Hakim (1987); Von Eye (1990); Rose

of headings within multi-choice tests. British Jour- and Sullivan (1993).

nal of Educational Psychology , 67, 111–17; Tones, K. (1997) Beyond the randomized controlled trial:

a case for ‘judicial review’. Health Education Re- search , 12 (2), i–iv; Alfassi, M. (1998) Reading

11 CASE STUDIES

1 For further examples of case studies see Woods, P. for meaning: the efficacy of reciprocal teach- (1993) Managing marginality: teacher development

ing in fostering reading comprehension in high through grounded life history. British Educational

school students in remedial reading classes. Research Journal , 19 (5), 447–88; Bates, I. and

American Educational Research Journal , 35 (2), Dutson, J. (1995) A Bermuda triangle? A case

309–22; Bijstra, J. O. and Jackson, S. (1998) So- study of the disappearance of competence-based

cial skills training with early adolescents: effects vocational training policy in the context of practice.

on social skills, well-being, self-esteem and cop- British Journal of Education and Work , 8 (2),

ing. European Journal of Psychology of Education, 13 41–59; Jacklin, A. and Lacey, C. (1997) Gender

(4), 569–83; Cline, T., Proto, A., Raval, P. D. and integration in the infant classroom: a case study.

Paolo, T. (1998) The effects of brief exposure British Educational Research Journal , 23 (5), 623–40.

and of classroom teaching on attitudes children express towards facial disfigurement in peers. Ed- ucational Research , 40 (1), 55–68; Didierjean, A.

12 EX POST FACTO RESEARCH and Cauzinille-Marm`eche, E. (1998) Reasoning by analogy: is it schema-mediated or case-based? Euro-

1 In Chapters 12 and 13 we adopt the symbols pean Journal of Psychology of Education , 13 (3), and conventions used in Campbell, D. T. and

385–98; Overett, S. and Donald, D. (1998) Paired Stanley, J. C. (1963) Experimental and Quasi-

reading: effects of a parental involvement pro- Experimental Designs for Research on Teaching . Boston,

gramme in a disadvantaged community in South MA: Houghton Mifflin. These are presented fully in

Africa. British Journal of Educational Psychology, Chapter 13.

68, 347–56; Sainsbury, M., Whetton, C., Mason, K.

2 For further information on logical fallacies, see and Schagen, I. (1998) Fallback in attainment on Cohen, M. R. and Nagel, E. (1961) An Introduction

transfer at age 11: evidence from the summer lit- to Logic and Scientific Method . London: Routledge

eracy schools evaluation. Educational Research, 40 & Kegan Paul. The example of the post hoc, ergo

(1), 73–81; Littleton, K., Ashman, H., Light, P., propter hoc fallacy given by the authors concerns

Artis, J., Roberts, T. and Oosterwegel, A. (1999) sleeplessness, which may follow drinking coffee,

Gender, task contexts, and children’s performance but sleeplessness may not occur because coffee was

on a computer-based task. European Journal of Psy- drunk.

chology of Education , 14 (1), 129–39.

3 For a detailed discussion of the practical issues

13 EXPERIMENTS, QUASI-EXPERIMENTS, SINGLE- in educational experimentation, see Riecken and CASE RESEARCH AND META-ANALYSIS

Boruch (1974); Bennett and Lumsdaine (1975); Evans (1978: Chapter 4).

1 Questions have been raised about the authenticity of

4 An example of meta-analysis in educational research both definitions and explanations of the Hawthorne

can be seen in Severiens, S. and ten Dam, G. (1998) effect. See Diaper, G. (1990) The Hawthorne effect:

A multilevel meta-analysis of gender differences in

a fresh examination. Educational Studies, 16 (3), learning orientations. British Journal of Educational 261–7.

Psychology , 68, 595–618. The use of meta-analysis

NOTES 597

Notes

is widespread, indeed the Cochrane Collaboration into pupils’ perceptions of the behaviour of

is a pioneer in this field, focusing on meta-analyses white teachers towards minority pupils in school. of randomized controlled trials (see Maynard and

See Naylor, P. (1995) Adolescents’ perceptions of Chalmers 1997).

teacher racism. Unpublished PhD dissertation, Loughborough University of Technology.

15 QUESTIONNAIRES

1 This is the approach used in Belbin’s (1981)

18 OBSERVATION

celebrated work on the types of personalities in

1 For an example of time-sampling, see Childs, G.

a management team. (1997) A concurrent validity study of teachers’ ratings for nominated ‘problem’ children. British

16 INTERVIEWS Journal of Educational Psychology , 67, 457–74.

2 For an example of critical incidents, see Tripp, D. clude the following: Ferris, J. and Gerber, R. (1996)

1 Examples of interviews in educational research in-

(1994) Teachers’ lives, critical incidents and pro- Mature-age students’ feelings of enjoying learning

fessional practice. International Journal of Qualitative in a further education context. European Journal of

Studies in Education , 7 (1), 65–72. Psychology of Education , 11 (1), 79–96; Carroll, S.

3 For an example of an observational study, and Walford, G. (1997) Parents’ responses to the

see Sideris, G. (1998) Direct classroom observation. school quasi-market. Research Papers in Education, 12

Research in Education , 59, 19–28. (1), 3–26; Cullen, K. (1997) Headteacher appraisal:

a view from the inside. Research Papers in Educa-

20 PERSONAL CONSTRUCTS

tion , 12 (2), 177–204; Cicognani, C. (1998) Par- ents’ educational styles and adolescent autonomy.

1 See also the following applications of personal con- European Journal of Psychology of Education , 13 (4),

struct theory to research on teachers and teacher 485–502; Van Etten, S., Pressley, M., Freebern, G.

groups: Shapiro, B. L. (1990) A collaborative ap- and Echevarria, M. (1998) An interview study of col-

proach to help novice science teachers reflect on lege freshmen’s beliefs about their academic motiva-

changes in their construction of the role of the tion. European Journal of Psychology of Education, 13

science teacher. Alberta Journal of Educational Re- (1), 105–30; Robinson, P. and Smithers, A. (1999)

search , 36 (3), 203–22; Cole, A. L. (1991) Personal Should the sexes be separated for secondary educa-

theories of teaching: development in the forma- tion – comparisons of single-sex and co-educational

tive years. Alberta Journal of Educational Research, schools? Research Papers in Education, 14 (1), 23–49.

37 (2), 119–32; Corporal, A. H. (1991) Repertory grid research into cognitions of prospective pri-

17 ACCOUNTS mary school teachers. Teaching and Teacher Edu- cation , 36, 315–29; Lehrer, R. and Franke, M. L.

1 For an example of concept mapping in educational (1992) Applying personal construct psychology to research see Lawless, L., Smee, P. and O’Shea, T.

the study of teachers’ knowledge of fractions. Jour- (1998) Using concept sorting and concept

nal for Research in Mathematical Education , 23 (3), mapping in business and public administration, and

223–41; Shaw, E. L. (1992) The influence of meth- education: an overview. Educational Research, 40 (2),

ods instruction on the beliefs of preservice elemen- 219–35.

tary and secondary science teachers: preliminary

2 For further examples of discourse analysis, comparative analyses. School Science and Mathemat- see Ramsden, C. and Reason, D. (1997) Conver-

ics , 92, 14–22.

2 For an example of personal constructs in educational tion services. Education for Information, 15 (4),

sation – discourse analysis in library and informa-

research, see Morris, P. (1983) Teachers’ percep- 283–95; Butzkamm, W. (1998) Code-switching in

tions of their pupils: a Hong Kong case study.

a bilingual history lesson: the mother tongue as Research in Education , 29, 81–6; Derry, S. J. and

a conversational lubricant. Bilingual Education and Potts, M. K. (1998) How tutors model students: Bilingualism , 1 (2), 81–99; Mercer, N., Wegerif, R.

a study of personal constructs in adaptive tutor- and Dawes, L. (1999) Children’s talk and the de-

ing. American Educational Research Journal, 35 (1), velopment of reasoning in the classroom. British

Educational Research Journal , 25 (1), 95–111.

3 Cohen, L. (1993) Racism Awareness Materials in

21 ROLE-PLAYING

Initial Teacher Training . Report to the Leverhulme Trust, 11–19 New Fetter Lane, London, EC4A

1 For an account of a wide range of role- 1NR. The video scenarios are part of an inquiry

play applications in psychotherapy, see Holmes, P.

598 NOTES

and Karp, M. (1991) Psychodrama: Inspiration and analysis. British Journal of Educational Psychology, 67, Technique . London: Routledge.

323–38. For examples of research using correlation coefficients, see Lamb, S., Bibby, P., Wood, D. and

24 QUANTITATIVE DATA ANALYSIS Leyden, G. (1997) Communication skills, educa- tional achievement and biographic characteristics

1 Bynner and Stribley (1979: 242) present a useful of children with moderate learning difficul- table of alphas, which lists values of r ii from

ties. European Journal of Psychology of Education,

12 (4), 401–14; Goossens, L., Marcoen, A., van from 2 to 50. The values of alpha can then be

0.05 to 0.80 and the values of item numbers

Hees, S. and van de Woestlijne, O. (1998) interpolated. See Bynner, J. and Stribley, K. M. (eds.)

Attachment style and loneliness in adolescence. (1979) Social Research: Principles and Procedures.

European Journal of Psychology of Education , 13 London: Longman and the Open University Press,

(4), 529–42; Okagaki, L. and Frensch, P. A. (1998) Table 19.1.

Parenting and school achievement: a multiethnic

2 Muijs (2004) indicates that, in SPSS, one can perspective. American Educational Research Journal, find multicollinearity by looking at ‘collinearity

diagnostics’ in the ‘Statistics’ command box, and

4 Examples of multilevel modelling in educational re- in the collinearity statistics one should look at the

search can be seen in the following: Fitz-Gibbon, C. ‘Tolerance’ column on the output. He indicates that

T. (1991) Multilevel modelling in an indicator sys- values will vary from 0 to 1, and the higher the value

tem. In S. W. Raudenbush and J. D. Willms (eds) the less is the collinearity, whereas a value close to 0

Schools, Classrooms and Pupils: International Studies indicates that nearly all the variance in the variable

of Schooling from a Multilevel Perspective . San Diego, is explained by the other variables in the model.

CA: Academic Press; Bell, J. F. (1996) Question choice in English literature examination. Oxford

25 MULTIDIMENSIONAL MEASUREMENT AND Review of Education , 23 (4), 447–58; Hill, P. W. and FACTOR ANALYSIS

Rowe, K. J. (1996) Multilevel modelling in school effectiveness research. School Effectiveness and School

1 Robson, (1993) suggests that as few as 100 can be Improvement , 7 (1), 1–34; Schagen, I. and Sains- used.

bury, M. (1996) Multilevel analysis of the key stage

2 Self-serving bias refers to our propensity to accept

1 national curriculum data in 1995. Oxford Review responsibility for our successes, but to deny

of Education , 22 (3), 265–72; Croxford, L. (1997) responsibility for our failures.

Participation in science subjects: the effect of the

3 For examples of research conducted using fac- Scottish curriculum framework. Research Papers in tor analysis, see McEneaney, J. E. and Sheridan, E.

Education , 12 (1), 69–89; Thomas, S., Sammons, M. (1996) A survey-based component for pro-

P., Mortimore, P. and Smees, R. (1997) Differen- gramme assessment in undergraduate pre-service

tial secondary school effectiveness: comparing the teacher education. Research in Education, 55,

performance of different pupil groups. British Educa- 49–61; Prosser, M. and Trigwell, K. (1997) Rela-