Towards a new methodology: The sentence retelling L2 test

containing a total of 98 words on average thus each sentence consisted of around 16 words. She tested 18 Scandinavian varieties, including Norwegian, Swedish and Faeroese, on 351 native Danish speakers in Copenhagen. The fable was translated into each of these 18 speech varieties and recorded by native speakers. Each test consisted of all six sentences, each sentence read in a different one of the 18 dialects to be tested. The order of the six sentences was randomised for each test with the condition that no two consecutive sentences in the story could follow one another in the test. This was to ensure that the subject could not guess the content of a sentence from the previous sentence which, by chance, they may have understood. The subjects had to listen to each sentence and write it down in i.e., translate it into standard Danish. Each sentence was played once as a whole, and once again in chunks of four to eight words so that the subjects did not have to rely on memory to do the test. Their intelligibility scores were calculated based on the total number of words they correctly translated into Danish. As with Tang and van Heuven’s sentence completion test, this method has the advantage of producing results which are directly comparable with each other. A large number of subjects can be tested, and exactly the same content is tested for every dialect. One drawback, though, is that only people literate in a standard speech variety in this case, Danish can be tested. Additionally, we wonder whether randomisation of sentences which, in the correct order, tell a coherent story could potentially confuse the subjects.

8.3.5 Content question and answer tests

A method similar to SIL’s traditional RTT method was designed by Delsing and Åkesson 2005. A news item about a runaway kangaroo being chased through the streets of Copenhagen was translated from standard Danish into Norwegian and Swedish and read aloud in all three languages by professional newsreaders from the respective countries. The mean number of words in the news item was 257. Each group of subjects, comprising around 40 secondary school students, listened to a recording in one of two neighbouring languages and answered five open-ended questions on the text. Their percentage of correct answers formed the resulting intelligibility score. The same test was used by Gooskens 2007 to measure mutual intelligibility of Dutch, Frisian and Afrikaans. The advantage that this method has over the traditional RTT is that the same content is tested for each dialect. A drawback associated with this is that each participant could only be tested on one other dialect, otherwise they would listen to the same content more than once. As with the traditional RTT, results have the potential to vary wildly depending on the choice and design of the five open questions. Despite this, Gooskens 2006, 2007 found that the results of this particular test bore a high correlation with phonetic distance calculations, suggesting that they were a reliable indicator of linguistic differences.

8.3.6 Towards a new methodology: The sentence retelling L2 test

As we explained in section 8.3.1, the authors have found Casad’s 1974 Recorded Text Test inadequate for accurately testing inter-dialect intelligibility. Our two key concerns with it are the lack of cultural appropriateness in asking questions and the lack of direct comparability of results across dialects. We therefore designed a different type of test which we call the “RTT sentence retelling L2 method”. We had several requirements in mind for the test. 1 We wanted to test comprehension of whole, connected discourse since language is usually encountered in this format; 2 We wanted to test comprehension of a wide range of content. Only in this way could we claim with credibility that our results reflect general inter-dialect intelligibility rather than merely comprehension of a small subset of the language e.g., a story; 3 We wanted the results to be directly comparable with each other, dialect to dialect; and 4 We wanted the test to be relatively stress-free for our test subjects, who in most cases will have low levels of education. The traditional RTT designed by Casad did not meet our last three requirements. Word recognition and sentence completion tests such as those described in sections 8.3.2 and 8.3.3 fail on point 1. The content question and answer method described in section 8.3.5 does not meet our first and fourth requirements. The sentence translation method described in section 3.4 is probably the best fit, although it still fails on points 2 and possibly 4. In order to meet our first three requirements, we chose to use translated sentences as our test material. We could guarantee a wide range of content by designing sentences which contained lexical items from many semantic domains along with a variety of grammatical structures. By using a Latin square design Box et al., 1978, Tang and van Heuven 2009 we could test identical content for every dialect, ensuring that our results were directly comparable. Critical to the success of such a test would be the natural translation of the test material from the source text in this case, Chinese into the target dialects. We explain our procedures in section 8.4 below. In order to meet requirement 4, we chose not to use a question and answer method. We had learnt from trying out various methods of RTT in the past Hansen and Castro 2010, Castro et al., 2010 that whilst RTT subjects of low educational background feel uncomfortable answering questions, they usually feel extremely comfortable retelling the content of something they have just heard, so much so that the participant would often begin retelling what they had just heard without even waiting for our question. Ring 1981, Kluge 2010 and others, who have misgivings about Casad’s method due to the lack of cultural appropriateness in asking questions, developed a “story retelling” method which utilises the naturalness of the retelling task. In Kluge’s method, a recorded story is played back to the test subjects section by section. After each section, the subject must repeat the content of what they just heard in their L1. Their responses are recorded, translated usually on location and scored based on how much key content was retold. In 2008 we trialed a similar “story retelling” method as part of a survey of Hmong dialects in southeastern Yunnan province Flaming and Castro, forthcoming. We discovered a problem with the approach: the subjects were often extremely good at mimicking the content of what they had heard even though they did not necessarily understand it. Even if the section was a couple of sentences in length, some subjects could piece enough of it together to be able to give a reasonable retelling, mimicking the words which they did not understand and using their own dialect for the rest of the retelling. Since our on-location translator was usually proficient in more than one dialect, he was often able to translate the whole retelling accurately, even though we guessed that the test subject did not understand everything in the section. 2 Our results were therefore not as reliable as we would have liked. In the same Hmong survey, although the majority of participants retold in their L1 in this case, Hmong, several subjects chose to retell in their L2, local Chinese dialect Southwestern Mandarin. They seemed to feel very comfortable doing so. We often felt that their L2 retellings were a more reliable indicator of comprehension since we could be confident about what they had actually understood because mimicking was impossible. In the case of Hmong, requiring all participants to retell in their L2 would have been impossible since proficiency levels in Southwestern Mandarin are very low in many remote Hmong villages. For Sui, however, proficiency in Southwestern Mandarin is generally much higher. We therefore chose a “sentence retelling L2 method” for our Sui intelligibility survey. Our final methodology looks very similar to that originally used by Voegelin and Harris 1951. Voegelin and Harris were the pioneers of functional intelligibility testing. Their method involved asking speakers of Dialect A to give an “interpreter translation” of a story in Dialect B section by section as they listened to a recording. The main difference between our method and theirs is that we used translated sentences whereas Voegelin and Harris used natural texts stories. Wolff 1959:34 criticised Voegelin and Harris’ method for being more of a test of “the ability to translate” than a test of intelligibility. Whilst he acknowledged that comprehension is a prerequisite for successful translation, he felt that unsuccessful translation was not an indicator of lack of comprehension. 2 Interested bystanders would often flag this up to us, saying something like, “He didn’t really understand that—we don’t use that word here” People may be very poor at translation or they may simply “not like translation”. It was largely to address these criticisms that the “question and answer” method Casad, 1974 was developed. Almost all early intelligibility studies using Voegelin and Harris’ method such as Hickerson et al., 1952 attempted to test mutual intelligibility of indigenous languages in the Americas. Their subjects were required to translate recorded texts into English which was usually their L2 and sometimes their L3. In these cases, Wolff’s criticism is probably justified. Translating an unfamiliar dialect of an indigenous American language into a language which is entirely unrelated and typologically extremely different English is no small task, even for those who understand well the dialect on which they are being tested and who have high proficiency in English. The experience of the Sui survey, however, suggests that a translation or, as we prefer to call it, a L2 retelling method can work extremely well provided that certain conditions are met. Firstly, our subjects were required to retell the text in a language with which they were very familiar, their local dialect of Southwestern Mandarin. Secondly, Sui and Southwestern Mandarin are typologically very similar. Both are SVO, uninflected and largely monosyllabic, and both rely on a rich inventory of aspect particles. Southwestern Mandarin spoken in the Sui area is itself heavily influenced by Sui. As we administered the tests, our impression was that most people found the translation task relatively straightforward. Thirdly, the investigator conducting the tests was herself a native speaker of Southwestern Mandarin Guizhou dialect. The entire testing procedure was conducted in Southwestern Mandarin and, we feel, our subjects were far more comfortable with the “L2 retelling” approach than our subjects in previous surveys who were faced with a question and answer method. In rare cases where the subject was not confident retelling in the local Chinese dialect, they were permitted to retell in their L1, in which case it was translated and scored in the same way as Kluge’s 2010 method. Fourthly, we were sensitive to the subjects’ proficiency in Chinese dialect when we scored the tests. A wide leeway was given in interpreting their retellings so as not to penalise an inability to express something in Chinese see section 8.4.6 for an explanation of our scoring methodology. Finally, we asked every subject to provide a judgment of their own comprehension of the texts. In cases where their own judgment was vastly different from their measured intelligibility score and where the interviewer felt that a low Chinese proficiency had adversely affected their results, their scores were discounted see section 8.4.6.4. Of course, our methodology is far from perfect and has many flaws. These will become apparent as the chapter progresses. Overall, though, we are more confident with the results using this method than results collected during past surveys using a story question and answer method and a story retelling L1 method.

8.4 Methodology