The purposes for which evaluations are commissioned

Evaluation in the social sciences is a form of research that differs from other forms of research because it is commissioned by a sponsor. The evaluators work under contract to pursue objectives that relate either to a funded development programme or to some phenomenon of human activity observable in present-day society. Evaluation, therefore, differs from other research in its inescapably politicised nature, deriving from the need to manage the inter-relationships between the sponsor and other stakeholders and feed into the process of decision-making.

Although it may appear that the purpose of evaluations is obvious, they are in fact commissioned for many different reasons. They are always concerned with judging the value of what has been achieved, but the basis for judgements, the uses to which these judgements are put, and the range of stakeholders who have a right to know the outcomes of the evaluation, may vary considerably. An important part of the evaluator’s role is to discover the sponsor’s purposes, both explicit and hidden, and take these into account when designing and carrying out the evaluation.

Ideally, an evaluation could be said to be primarily educative. Its purpose is to fi nd out what has gone well and what not so well, in order to learn from this. This educative purpose may focus mainly on the future, by drawing out lessons at the end of the programme and making them available to inform the design and conduct of future programmes. Or it may focus more upon the current programme, drawing out lessons during the course of the work and feeding these back to those concerned so that the work of the programme can be continuously improved while it is on-going. The former are sometimes called ‘summative evaluations’ and the latter ‘formative evaluations’. In practice, most evaluations have both summative and formative elements, which are inter-related, but they usually have a stronger orientation to one than the other. The notion of an educative evaluation fi ts well with the concept of a ‘learning society’ but sponsors often fi nd it an unexpected approach. House et al. (1996, p. 139), in a study of evaluation practice in the US National Science Foundation, identify the importance of ‘establishing an evaluation culture’. I would argue that educative evaluation is essential to a genuine learning society (DfEE 1997; NCIHE 1997), but is antipathetic to the ‘technocratic model’ that Coffi eld argues underpins the rhetoric about the learning society in many European and UK policies (Coffi eld 1998).

The role of evaluation 127 Evaluations nearly always have a second purpose: to satisfy the need for

accountability for the spending of the sponsor’s (often the tax payers’) money. Has the money been spent wisely, have the outcomes been suffi cient in number and of suffi ciently high quality? Can the programme be said to have been cost effective? The sub-agenda here is who should get the credit if things go well, or bear the blame if they do not go well? Accountability is always layered in a hierarchy of responsibility: the programme director and team(s) are held accountable for their work on the programme; and the offi cials who hired them are held accountable for choosing them for the job and keeping them on track; politicians, in their turn, are held accountable for the actions of their offi cials; ultimately, accountability in a democratic state is to the electorate, and politicians who are seen to preside over the waste of public money are unlikely to be re-elected. In the UK, the Audit Commission carries out evaluations wholly to serve the need for public accountability. The focus of their work is entirely upon value for money and they claim an authority for their fi ndings that is grounded in facts and fi gures, without taking account of the value-laden assumptions that have determined what facts and fi gures should be collected (see for example Audit Commission (1990) which presents a case for closing small rural schools based on calculations which do not take account of the impact of closure of its school upon

a rural community).

These two main purposes of evaluation are, therefore, in tension with one another. In order to learn it is important to acknowledge failures, as well as successes, and explore the reasons for both, openly and honestly, but this will be diffi cult to do if individuals feel that their future careers will suffer as a result. Programme directors may not get further funding; offi cials may be side-lined or passed over for promotion; politicians may lose their seats in government. In practice, a lot depends on the prevailing culture set by those in positions of power, and the way they handle publicity and the media. If there is a strong blame culture at the top this will be passed down the line and make it diffi cult to learn from the evaluation. If, however, those at the top have trust in their offi cials and encourage open public debate, they can set the tone for

a learning culture that will permeate through all the levels. This analysis is too simple, since a strong personality in a powerful position can, to an extent, act as a protective layer against blame and enable a learning culture to grow among subordinates. There is also a role for an evaluator in mediating between sponsors and the programme team(s), providing explanations for problems, often in terms of the contextual factors such as policy decisions that have constrained the possibilities for action. MacDonald developed a model of democratic evaluation in which equal rights are accorded by the evaluator to all the participants (MacDonald 1974). In democratic evaluation, successes and failures are explored in relation to the policies and structures put in place by the sponsors rather than simply in terms of the programme’s outcomes. MacDonald found that this was frequently a surprise to those who had commissioned the evaluation, who assumed an alternative model of ‘autocratic’ evaluation in which those who hired the evaluator were themselves above judgement.

Another frequent reason for evaluation is largely symbolic. It may be a condition of setting up a programme that a proportion of the funds should be set aside for evaluation. In this case, the sponsors may not be greatly concerned with the evaluation’s

128 Research methods for ICT in education outcomes, provided there is a report and the evaluation can be said to have been

commissioned and carried out. Alternatively, evaluations may be commissioned in order to promote the programme, and sponsors may expect the evaluator to engage in advocacy on behalf of the programme. Stake (1998) provides a case study of work in the USA, which goes some way to explaining the differences in values and motivations which lead to this kind of grave misunderstanding and the ethical implications for evaluators.

It is common for evaluators to be led to believe that an evaluation has been commissioned to feed into decision-making when, in fact, the decision-making time- frame is too short to wait for the evaluation report. If this is the case, interim reports or meetings with the sponsor part-way through the course of the evaluation may

be much more inﬂ uential than the ﬁ nal report. An additional complication is that policies often begin to change and shift after the evaluation has been commissioned and begun work; and evaluators are often pressured to introduce new elements or shift the focus of the study. Although this can normally be resisted by citing the contract, many evaluators prefer to take on board the new focus in order to produce

a report that is more likely to have an impact on future policy. There are, in broad terms, two kinds of question which can be the focus of evaluations:

1 Questions which seek factual information These questions start with words like ‘how much/many … ?’ and ‘how quickly … ?’ In terms of hardware and software development this kind of question includes: ‘does it function … ?’ and ‘how robust is it … ?’ These include questions asking people for their opinions, in order to ﬁ nd out how many people hold one opinion and how many hold another. The latter involve administering structured questionnaires (e.g. using Likert scales) to a representative sample which can be compared with ‘a control group’. The resulting quantitative data can be analysed statistically (using probability theory) and the outcomes treated as facts, subject to a margin of error. These questions are designed to provide dependable, factual data to inform judgement. These data are sometimes called ‘hard data’.

2 Questions which seek reasons and explanations These questions start with words like ‘why … ?’ and ‘how … ?’ For example, ‘how effective was this piece of software?’ and ‘what strategies for collaboration proved most successful and why?’ These questions explore issues of quality and seek informed judgements and in-depth explanations. They are designed to generate data which can form the basis for explanatory theories. Data are normally collected using methods such as observation and interviewing or open-ended questionnaires. These data can be analysed using methods of qualitative data analysis, which will involve coding the data in some form and then interpreting the outcomes. Theorists of qualitative methods sometimes use terms such as ‘theoretical sensitivity’ (Strauss and Corbin 1990) to describe the most highly developed interpretative skills of the qualitative researcher or evaluator. These data are sometimes called ‘soft data’.

The role of evaluation 129

Learning cultures which place priority on the educative purposes of evaluation give

a higher priority to the second kind of question, whereas accountability cultures give

a higher priority to the former. The terms ‘hard data’ and ‘soft data’ are primarily used by those who favour quantitative methods as a means of undermining the credibility of qualitative methods – who after all wants to base judgements on something ‘soft’? If possible, evaluators should avoid relying completely on either one kind of data or the other – both quantitative and qualitative data are essential so that the evaluation can produce reliable information and robust explanatory theories. Arguably, without the latter an evaluation cannot be educative.

For various reasons politicians and policy-makers often place higher reliance upon quantitative methods. This is partly because there is always pressure on them to be able to claim certainty in justifying how they have spent public money, and they can do this more easily with research based on the methods of the natural sciences because they command a high level of public confi dence. Quantitative measurements of empirical studies have enabled the amazing technological progress that has led to the production of CIT resources; and statistical analyses of large bodies of data collected in controlled, experimental conditions have produced knowledge about illness and the human body which has revolutionised medical practice. However, a group of human beings interacting with one another to work collaboratively or learn new skills – say for example, a group of computer users – is subject to irrational responses and unpredictable behaviour in a way that the human body, as a functioning system, is not. The social sciences, of which evaluation is a sub-set, have adapted quantitative statistical methods to apply to group behaviour and human interactions, but the results of this kind of analysis are much less reliable than is often claimed (see House 1980, p. 71 who refers to the conclusions of an extensive study by Cronbach). It is particularly important to remember this when evaluating CIT initiatives because technologists may tend to privilege quantitative methods with which they are familiar in their own research. Paradoxically, qualitative methods may be essential to enable technologists to understand the complex emotional and cultural factors which make it diffi cult for non-technologists to become confi dent and competent users of technology (see Chapter 4).

Most evaluators now accept the need to use mixed methods in evaluating CIT initiatives, combining the collection of both quantitative and qualitative data (e.g. Greene and Caracelli 1997). However, the point is that many policy-makers and sponsors continue to have an unthinking preference for quantiﬁ cation and measure- ment. Evaluators need to continue to argue the case for mixed methods very persuasively.

The purposes for which evaluations are commissioned