19
indicators. However, in order to meet the validity, the test should reflect the skills or behavior which would be assessed. There are five types of validity to determine
whether or not a test is valid namely face validity, content-related evidence content validity, criterion-related evidence, consequential validity, and construct
validity.
a. Face Validity
Gronlund 1998 states a test is considered having face validity if the students look the test as fair, pertinent, and utile for improving learning as cited
in Brown, 2004, p.26. Face validity itself refers to how the test looks good and it obviously appears to measure the skills which are going to be measured.
Furthermore, according to Brown 2004 criteria of a test which has face validity are that the test is well-constructed, the test has the time allotment, the items are
obvious and simple, the directions are clear, the tasks meet content validity, and the difficulty level presents a reasonable challenge p.27.
b. Content Validity
According to APA 1954, content validity refers to the scale that the content of assessment items reflects the content domain of interest as cited in
Miller, 2003, p.2. Shepard 1993 adds that content validity is an indicator to interference the result. It is
“evidence-in waiting” as cited in Miller, 2003, p.5. It means that whenever a test meets validity in the content, the items of the test
represent the skills or behavior to be measured in order to evaluate achievement tests.
20
Therefore, the scores of the test are effectively used as the meaningful indicators
of students’ competence, for instance, a test for reading skills would be considered as a valid reading test if a test of reading measures reading skill and
nothing else. The test is not a valid test for speaking or vocabulary because it does not test speaking or vocabulary. However, Seif 2004 claims it does not mean all
educational objectives of a particular course are included in the test. Due to test practicality, the test designers should compose several questions which are able to
be representatives of achieving the set educational goals. Seif 2004 claims content validity is one of essential parts to compose a test as cited in Jandhagi
and Shateria, 2008, p.2. As a test does not meet validity in its content, there will be two possible outcomes. First, students are not able to perform the needed skills
which are not included in the test. Second, there may be some inappropriate questions which students are not able to answer. Therefore, the test tasks should
be appropriate to the test specifications on the blueprints. It is similar to what Seif 2004 says, evaluating content validity of a test can be carried out by matching
the sample of the test questions to the test instructions as cited in Jandhagi and Shateria, 2008, p.2. Crocker and Algina 1986 advance that
‘matching method’ effectively ensure validity as cited in Miller, 2003, p.12.
According to Bachman and Palmer 1996, blueprint is a completed plan providing the characteristics to develop the entire tests p.90. It contains task
specifications for all type of tasks which are to be included in a particular test. The blueprints are evaluation tools to check whether or not the test items are
appropriate to the test specifications stated in the blueprints. Brown 2004 states
21
that test specifications include the general outlines of the test and the test tasks p.50. The test specifications refer to a certain curriculum and it consists of only
the general outlines of whole materials and skills to be tested since the test designers should consider test practicality.
c. Criterion-Related Evidence