Validity Past Future Continuous Tense Subject

20 themselves entirely to direct testing and will always include an indirect element in their tests. Of course, to obtain diagnostic information on underlying abilities, such as control of particular grammatical structures, indirect testing is called for.

3.3 Validity

When we discuss about the validity we cannot escape from the contents of validity. It cannot be denied that a test is said to have content validity if its content constitutes a representative sample of the language skills, structures, etc. with which it is meant to be concerned. It is obvious that a grammar test, for instance, must be made up of items testing knowledge or control of grammar. But this in itself does not ensure content validity. The test would have content validity only if it included a proper sample of the relevant structures. Just what are the relevant structures will depend, of course, upon the purpose of the test. We would not expect an achievement test for intermediate learners to contain just the same set of structures as one for advanced learners. In order to judge whether or not a rest has content validity, we need a specification of the skills or structures etc. that it is meant to cover. Such a specification should be made at a very early stage in test construction. It isn’t to be expected that everything in the specification wt11 always appear in the test; there may simply be too many things for all of them to appear in a single test. But t will provide the test constructor with the basis for making a principled selection of elements for inclusion in the test. A comparison of test specification and test content is the basis for judgments as to content validity. Ideally these judgments should be made by people who are familiar with 21 language teaching and testing but who are not directly concerned with the production of the test in question. The importance of content of the validity is,firstly, the greater a test’s content validity, the more likely it is to be an accurate measure of what it is supposed to measure. A test in which major areas identified in the specification are under-represented—or not represented at all—is unlikely to be accurate. Secondly, such a test is likely to have a harmful backwash effect. Areas which are not tested are likely to become areas ignored in teaching and learning. Too often the content of tests is determined by what is easy to test rather than what is important to test. The best safeguard against this is to write full test specifications and to ensure that the test content is a fair reflection of these. The criterion of the related validity.Another approach to test validity is to see how far results on the test agree with those provided by some independent and highly dependable assessment of the candidate’s ability. This independent assessment is thus the criterion measure against which the test is validated. There are essentially two kinds of criterion-related validity: concurrent validity and predictive validity. Concurrent validity is established when the test and the criterion are administered at about the same time. To exemplify this kind of validation in achievement testing, let us consider a situation where course objectives call for an oral component as part of the final achievement test. i objectives may list a large number of ‘functions’ which students respected to perform orally, to test all of which might take 45 minutes for each student. This could well be impractical. Perhaps it is felt that only ten minutes can be devoted to 22 each student for the oral component. The question then arises: can such a ten- minute session give a sufficiently accurate estimate of the student’s ability with respect to the functions specified in the course objectives? Is it, in other words, a valid measure? From the point of view of content validity, this will depend on how many of the functions are tested in the component, and how representative they are of the complete set of functions included in the objectivesvery effort should be made when designing the oral component to give it content validity. Once this has been done, however, we can go further. We can attempt to establish the concurrent validity of the component. To do this, we should choose at random a sample of all the students taking the test. These students would then be subjected to the full 45 minute oral component necessary for coverage of all the functions, using perhaps four scorers to ensure reliable scoring. This would be the criterion test against which the shorter test would be judged. The students’ scores on the full test would be compared with the ones they obtained on the ten-minute session, which would have been conducted and scored in the usual way, without knowledge of their performance on the longer version. If the comparison between the two sets of scores reveals a high level of agreement, then the shorter version of the oral component may be considered valid, inasmuch as it gives results similar to those obtained with the longer version. if, on the other hand, the two sets of scores show little agreement, the shorter version cannot be considered valid; it cannot be used as a dependable measure of achievement with respect to the functions specified in 23 the objectives. Of course, if ten minutes really is all that can be spared for each student, then the oral component may be included for the contribution that it makes to the assessment of students’ overall achievement and for its backwash effect. But it cannot be regarded as an accurate measure in itself. A test is said to have face validity if it looks as if it measures what it is supposed to measure. For example, a test which pretended to measure pronunciation ability but which did not require the candidate to speak and there have been some might be thought to lack face validity. This would be true even if the test’s construct and criterion-related validity could be demonstrated. Face validity is hardly a scientific concept, yet it is very important. A test which does not have face validity may not be accepted by candidates, teachers, education authorities or employers. It may simply not be used; and if it is used, the candidates’ reaction to it may mean that they do not perform on it in a way that truly reflects their ability. Novel techniques, particularly those which provide indirect measures, have to be introduced slowly, with care, and with convincing explanations. What use is the reader to make of the notion of validity? First, every effort should be made in constructing tests to ensure content validity. Where possible, the tests should be validated empirically against some criterion. Particularly where it is intended to use indirect testing, reference should be made to the research literature to confirm that measurement of the relevant underlying constructs has been demonstrated using the testing techniques that are to be used this may often result in disappointment—another reason for favoring direct 24 testing.Any published test should supply details of its validation, without it can be said that test the test is not valid.

3.4 Reliability