Replication Principles of theory-oriented research
Although one-shot tests of propositions can be valuable contribu- tions to theory development, results should always be treated with cau-
tion because of two reasons. First, erroneous conclusions might be drawn regarding the instances studied. Second, one instance or one
group of instances is not representative of the domain to which the proposition is assumed to be applicable.
With respect to the first problem, even though the study would have been set up according to scientific standards, the study might have
been flawed and the reported conclusion regarding the hypothesis rejection or confirmation might be erroneous. The usual checks on
the veracity of published empirical work – mainly through peer review
Box 3 Scientific realism
We define the survey as a study in which a a single population in the real life context is selected and b scores obtained from this population are analysed in a quantitative
manner.
We define the case study as a study in which a one case single case study or a small number of cases comparative case study in their real life context are selected and
b scores obtained from these cases are analysed in a qualitative manner. A conclusion based on one or a small number of observations cannot be generated
by statistical means and can be characterized as “qualitative”. This distinction between quantitative or statistical and qualitative methods of analysis does not imply a differ-
ence in epistemological grounding of these methods. Epistemologically, both approaches are the same in all relevant respects: “they attempt to develop logically consistent
theories, they derive observable implications from these theories, they test these impli- cations against empirical observations or measurements, and they use the results of
these tests to make inferences on how to modify the theories tested” George and Bennett 2005: 6. Basic assumptions underlying this epistemology are a that there are
phenomena that exist independent of our theory and that they have attributes that exist independent of our scientific observations, b that we can make these attributes
observable through scientific instruments even in those cases in which the relevant phenomena are not observable in the everyday sense of the word, and c that, subject
to a recognition that scientific methods are fallible and that most scientific knowledge is approximate, we are justified in accepting findings of scientists as true descriptions of
phenomena and, therefore, as facts that matter in practice. We adhere to this common sense conception of science, known as scientific realism, in this book. We see no reason to
ground this position in philosophical arguments or to defend it against alternative ones, such as constructivism and other positions that argue against the possibility of
approximately true knowledge of aspects of a really existing world.
and critical commentary – are not sufficient protection against these problems. Replicating the study in the same situation the same
instance for case study research, or the same population of instances for survey research can address this problem. If the results of such
replication studies are in agreement with the original findings, there is more confidence in the correctness of these findings. The bottom line
is that we cannot be sure of the correctness of any published test result of one-shot studies that have not been replicated. Unfortunately, many
theories in business research have not been put to such a test.
With respect to the second problem, even though the study was adequately conducted and the reported conclusion regarding the
hypothesis was correct, the test result might be different if the hypoth- esis were tested in another experimental situation for the experi-
ment, another instance for the case study, or another population for the survey. If in a one-shot study the hypothesis is confirmed, it
is tempting to assume that the test has shown that the proposition is supported in general for the entire domain covered by the theory
and, from it, to formulate practical advice for managers. In particular, survey outcomes might be thought to be generalizable to the whole
domain claimed by the theory, often because no distinction is made between the population from which a probability sample is
drawn and the larger theoretical domain from which the sample is not drawn see Box 6 “Domain, instance, case, population, sample, and
replication”.
Principles of statistical sampling do not apply to the choice of a popu- lation for a survey either for a first test or for replication. If we take
seriously the claim of a theory that it applies to a domain of instances such as other instances and populations, in other time periods, in
other organizations, in other geographical areas, in other experimen- tal contexts, etc., than the one in which the single test was conducted,
we must test it in many other situations that belong to this domain. In other words, we need a series of replications. Research outcomes that
have not been replicated, even those that are highly statistically sig- nificant, are “only speculative in nature and virtually meaningless and
useless” regarding the wider domain Hubbard et al., 1998: 244. A ser- ious problem in business research literature is that “negative” outcomes
rejections of hypotheses in single studies tend not to be published. The resulting selection bias in published results exacerbates the risk of
drawing conclusions about the correctness of a theory based on a one- shot confirmation. Similarly, if the hypothesis is rejected in the one situ-
ation that it is studied, it is usually concluded that something is wrong with the proposition. However, it might be that the single instance did
not belong to the domain for which the theory is correct. The only way to assess whether this is true is through replication.
We emphasize the need of a series of replications before it can be claimed that a theory is generalizable to the specified domain. Given
the fact that the knowledge base in business studies mainly consists of propositions that have been tested only once and have not been put to
replication tests, an effective and appropriate way to contribute to a specific theory is by replicating published one-shot studies in the same
and in other instances or sets of instances populations. The common emphasis of journals on “originality” as a criterion for good and pub-
lishable research may hamper the much-needed increase of the num- ber of replication studies.
With every new replication another study is added to the previous ones, creating a situation in which a theory has not been tested in a sin-
gle test but rather in serial tests. If one research project consists of a series of replications one could call that project a serial study. If such
replications make use of the same research strategy experiment, case study, or survey, which, however, is not a prerequisite, then we could
call the project a serial experiment, a serial survey, or a serial case study. The number of replications within one study reported in a single research
article depends only on time and money constraints. We do not con- sider this a methodological choice.
Publications on experimental research often present the results of a series of experiments that replicate one another. The situation is very
different in survey research. Most reports on survey research present the results of a single survey with a single population of instances in which a
series of hypotheses is tested of which some are confirmed and some are rejected in that study. Replication of survey results is rather rare. The situ-
ation for case study research is more diverse. Theory-testing case studies of which the explicit aim is to test a proposition are rare, and so is replication
of test results. Many writers on case study research, such as Yin 2003 who explicitly supports the idea of replication, recommend the “multiple case
study” as the preferred research design. However, case studies in business research that are presented as multiple case studies are most often a series
of case studies in which theories are built or applied, and not a series of replications of tests of propositions. We think that these differences in
replication habits between experiments, surveys, and case studies have practical origins. Experiments are considered “smaller” less costly in
terms of time and money than surveys and case studies, and replications in experimental research can be conducted relatively efficiently once the
infrastructure and preparations for an experimental setting have been established. These are practical reasons, not methodological ones.
We coined the term serial study for a study in which a series of replica-
tions is conducted. We prefer the term “serial” instead of “multiple” as in “multiple case study” because it makes explicit that every single replica-
tion can best be seen as a next test in a series of replications. In this per- spective, every replication study begins with an evaluation of the results of
all preceding tests of the proposition and a study is designed such that it maximally contributes to the current theoretical debate about the robust-
ness
of the proposition and the domain to which it applies. A temporal order is assumed. This approach implies that for every next replication a
new test situation is chosen or designed on theoretical grounds: in an experiment this might be another version of the experimental stimulus
or another category of experimental subjects; in a survey this might be another sample of a same population or a sample of a new population; in
a case study this might be another, carefully selected case. Replication thus involves most often also the selection of a new case in a case study or
a new population in a survey. The new case or new population is selected from the universe of cases or populations to which the theory is
supposed to be applicable. In actual practice, however, many multiple experiments and most multiple case studies do not use this replication
logic. These studies are often designed as parallel not serial.
Multiple experiments are usually pre-planned parallel replications of the same experiment with different samples of subjects. In multiple
case studies usually a number of cases are selected beforehand and the test is conducted in parallel i.e. parallel replication.
Box 4 Replication of survey results
Yin 2003: 47–48 clearly explains the difference between replica- tion logic which he applies to results of the case study and of
experiments and sampling logic which he applies to procedures within a survey. Although this is not mentioned by Yin, replication
logic also applies to survey results see for example Hubbard et al., 1998 and Davidsson, 2004. Davidsson 2004: 181–184
demonstrates the necessity of replication of survey results with the example of three studies of small owner–managers’ expected
consequences of growth, using the same measurement instruments for the same variables. Relations that were “significant” on the
5 per cent risk level in one study were absent in the other studies. Davidsson’s comment is that “we would be in serious error” if con-
clusions about the propositions were drawn based on only one of these studies p.183.