The testing instruments Design, Testing Instruments, and Dependent Variables

tests where a contribution was expected to be given by individuals without prompting from peers. One way that such testing can be accomplished is given in detail by Stringer and Faraclas 1987:87–90, where three venues are used • with participants being instructed in one • with individuals moving from there to the second venue for testing, and then • proceeding to the third venue to take part in supervised activities until all have been tested. When this procedure was tried in the adult program in Urat, it was unacceptable; there were not enough venues or testing personnel and treating people in such an individual way seemed to cause stress. On the first occasion, the testing was carried out in the one available venue the communal church building with three testers. Participants were instructed to wait their turn and then leave after being tested. In the course of the testing, one young woman became so stressed she began crying and others showed signs of anxiety, while those who had been tested were reluctant to leave. The importance of eliminating anxiety from literacy learning is discussed by Downing 1982. As the program progressed, there was less tension shown when giving individual responses. At this stage in the program, it was obvious that another more comfortable way had to be found to collect assessment data for comparison. In the test on the first occasion, a number of procedures were included to make the students feel at ease. The test was explained by the test supervisor so that the participants would know what to expect. All students were given a piece of paper with a sentence printed on it on which they were asked to write their names. This was a familiar activity which all could accomplish. Students were then asked to read to one of the testers: first, a new sentence and then one they had been given. All contributions were recorded on audio-tape. Next, the tester asked the student to point out specific items, that is, two words and three syllables from the sentences read. Finally, the tester chose a word from a list of known key words and asked the student to write it from memory and then create and write a sentence including the word. In giving a different sentence to the students individually, it was expected that they would try to read it to add to their confidence before being asked to read it to the tester. This worked well for the first few, but those who could not read asked their peers to help them read their sentences correctly. The sentences were changed for these participants and the test went on, but much closer supervision was needed if we were to use the same procedure in future tests. One way to avoid some of these problems was to consider collecting more descriptive and interpretive data for analysis. Such data would need to be collected by each teacher in the course of the teaching program. Considering the short training, however, the lack of materials and places to keep contributions from each student, such data would be difficult to collect and analyse. At the end of the program, the teachers were certainly aware of the progress of each of their students, but to require informal, interpretive assessments to be recorded each week for each student seemed too much to expect from volunteer teachers in the circumstances in which they worked. It would detract, also, from the systematic nature of the comparison. The original testing format was continued with a number of changes to help relieve stress. In the test format there were four components: recognition of words and elements within words, reading, comprehension, and writing see Appendix E for the final test format. As the classes progressed, the learners were less anxious about being asked to give individual contributions. At the beginning of each test, the participants were given more detail on what to expect to make them feel more at ease. For each test there were two or three trained testers to interact with individual students. The first two items tested were identification of words and elements within words to give an opportunity for each person to succeed before the reading section. Although it was less accurate, it was more convenient for the testers to have the students point to the item to be identified rather than have multiple copies where each item could be marked in some way. To help alleviate tension, any negative feedback for an incorrect response, as well as any indication of the correct response, was discouraged. To control for comparability and continuity between groups, it would have been preferable for the researcher to do all of the testing, but the physical and cultural situation made that impossible. With the help of community volunteers the process became more manageable; the teachers were able to continue with their classes while individuals went to village houses nearby for testing. Some difficulties occurred when the testers had little or no experience with literacy testing. Although each tester practised in trial situations before the tests, some difficulties occurred with the use of the cassette recorders, the format and language of testing, and prompting, which is acceptable in such a cultural setting. The underlying cultural setting of group-oriented activities made it more acceptable for students to return to the classroom after being tested. To help control for a contamination effect through telling and copying, two or three similar excerpts for reading were prepared. These excerpts were short, covered similar content each in different contexts, and contained a similar number of words of equal difficulty. These words included the same elements but with a variety of word formations and syntax. Testers rotated the texts so that students did not know which actual text to expect. In the final test, all students were expected to read from four longer excerpts of texts, three of which had two different but similar sections for rotation. There was some misunderstanding in communication when testing in both of the programs. In the Tok Pisin situation, the second language was not always understood. The Urat program proved more difficult because it was necessary to translate the test instructions into Urat and check them for accuracy. For the early tests in Urat, even after using different drafts in trials, there were still misunderstandings. One such misunderstanding became clear when the responses to the comprehension questions continued to be other than were expected. During testing, after the participant re-told what he or she thought had happened in the passage read, there were two questions asked: “what do you think might have happened before that?” and, “what do you think might have happened after that?” It was necessary to make the questions generic because of the different texts in the instrument and the different responses. When the Urat translation of the test format was translated back into Tok Pisin, which was used by the researcher to check the sense, the meaning was acceptable. But it was not until an English speaker translated the questions back into English that the misunderstanding became clear. The two questions translated as: “can you remember think, sort out what comes first?” and, “can you remember what comes at the end?” Thus the responses were correct as repeats of the first and last instances in the passage, but not for the expected projection of thought outside the passage read, giving clear evidence of understanding. Since there were two different instructional methods involved, the texts used on the early testing occasions were prepared according to the constraints of the material taught in the primer series. The degree of difficulty depended on the amount of material already taught. For the reading section of the first test in the Urat program, a number of sentences were prepared by the linguist, who followed constraints dictated by the overlap of content taught in the two primer series. For example, such sentences were: Ta sisipe metapa. ‘She will taste the breadfruit’; and Ti wasme wi wat. ‘She quit the game and came’. Similar texts were prepared for the tests in the Tok Pisin program by the researcher. The test on the second occasion was based on the primers; for Urat the words, syllables, and texts were read from the primer pages, but for Tok Pisin separate texts were prepared. The purpose for using the primers in Urat was to ascertain if the teachers were teaching the content adequately. For Urat there were three comparable sets of readings based on common vocabulary from the Word-Building Track primer and the Gudschinsky primer. These sets were to be rotated; one for each student. For Tok Pisin there were three comparable texts prepared. In addition, students were asked to read one new sentence and to write a dictated sentence. Since the portions to be read for occasions 1 and 2 were based on known material and were not extensive, answers to comprehension questions were not relevant. The third and fourth tests were fuller versions of the first test see Appendix E. For the third test, longer portions were prepared and rotated between participants. The tester read the first sentences to introduce the topic and help the learners to be at ease and read more confidently. The learners were expected to read the final sentences. The final test included four different excerpts of texts. These texts were selected on the basis of familiar and unfamiliar material. The first selection came from a familiar story of a predictable legend which had been read in class. The second selection came from a story about an accident to a man in the village. The third selection was an excerpt from a legend, with unfamiliar language and circumstances. Similarly, the fourth selection had some unfamiliar language with unpredictable content. The data collected from the learners was scored in a similar way for all of the tests. The appropriateness of each variable for each particular test was governed by the scope of material in the test and the degree of expectation of skills learned. In the following section, a description of the variables and scoring procedures is given.

3.4.3. The dependent variables and scoring procedures

For the scoring procedure for each test, the variables were divided into two components: reading and writing. In the reading component there were two scoring sections: general reading variables and specific reading variables, each with five variable divisions. There were also two scoring sections in the writing component: mechanics of writing and meaning in writing, with three and five divisions of variables respectively. The reading variables are discussed first.

3.4.3.1. General reading variables

There were five divisions of general reading variables: 1. Recognition of Elements letters, syllables, words 2. Engaging of the Text 3. Reading Time 4. Proportion of Correct Syllables 5. Comprehension Recognition of Elements includes a variable set of Recognition of Letters, Syllables, and Words. Relevant comments for each variable are now presented, to give an understanding of the scoring procedures. Variable: Recognition of elements—1. Words Each student was first asked to identify a word in a story with a named vowel in it and a point was awarded if the attempt was correct. The student was then asked to read the word identified. A problem arose when the word was incorrectly identified. A point was awarded if the student could read the word correctly even if it did not contain the vowel asked for. A further point was awarded for recognition of a named word. Recognition of elements—2. Syllables This variable was a straight-forward identification of named syllables within words. A point was awarded when the student pointed to the correct syllable. Recognition of elements—3. Letters In the letter identification assessment, students were asked to point to two words in a story where the words 1 began with the same letter, and 2 ended with the same letter. In the former case the students were asked to read the syllable in each word which contained the letter. It must be pointed out that this variable was not scored for test one and the variable was not included in test two because the students had problems in the understanding the requirements for this assessment. In the two methods being taught, the basic component of instruction in the primers was the syllable, and students were not required to sound out letters. Phonemes were taught by contrast of minimal differences between syllables, not as isolated letters, so at this early stage few learners understood what was expected of them. When scoring tests three and four, although some students did indicate words where only the letters were identical, either identical letters or syllables were considered as correct. The problem was compounded by the difficulty of finding the right terminology in both languages used for testing. Some students in the Tok Pisin program did not understand the concepts of “first” and “last,” and in the Urat program there was some misunderstanding due to the translation difficulty. Consequently, to standardise the assessment, scores were not recorded for occasions 1 and 2 on this variable. Variable: Engaging the text Students who made some attempt to read and who showed in the taped attempt that they could read some words of the selection, were given a point for engaging the text. Students who created a text that had no relationship to the written text were considered as not engaging the text. In the final test, students were expected to read excerpts of four different stories. The tester read the heading of each story to give some orientation to the topic, then asked the participant to read a selected portion. Some students continually repeated the heading, while others created a text about the topic that had no relationship to the written form. These were deemed not to have engaged the text.