Sentence translation tests Content question and answer tests

abundant in content to produce meaningful measures of intelligibility. Secondly, comprehension scores resulting from this type of test cannot be meaningfully compared with each other since every story is different in content. Thirdly, participants are often uncomfortable with a “question and answer” type test, especially when administered by outsiders. Fourthly, only ten questions are asked about each text. The notion that correct responses to such a limited set of stimuli represent “full comprehension” of the text is dubious. Finally, preparation of this type of test is complex and cumbersome, involving translation of the questions into every dialect on which the text will be tested and insertion into every recording in the appropriate place.

8.3.2 Word recognition tests

Various types of test have been used for investigating intelligibility of isolated words. Schüppert and Gooskens 2010 designed a test in which each subject listened to a recording of fifty individual words in the target dialect, in this case either Danish or Swedish. As they heard each word they were presented with four pictures. The subject had to match the word with the correct picture. Tang and van Heuven 2009 designed a test in which each subject had to correctly categorise individual words such as “apple” or “duck” which they heard in a series of fifteen different dialects. There were ten possible semantic categories to choose from such as body parts, vegetables, textiles, verbs of action and animals. Word recognition tests have the advantage of being quick and easy to prepare and administer. A large number of subjects and different dialects can be tested. However, they do not test language in context and could therefore be deemed “unnatural”. People are used to hearing language in connected speech with a specific context. When comparing their word recognition test with a type of sentence intelligibility test see below, Tang and van Heuven 2009:727 concluded that “the sentence- intelligibility test yields more credible results than the word-intelligibility test … The results show that sentence-intelligibility reflects traditional dialect taxonomy better than word-intelligibility does.”

8.3.3 Sentence completion tests

The “sentence-intelligibility” test referred to by Tang and van Heuven 2009 was in effect a “sentence completion” test based on a version of the SPIN Speech Perception In Noise test designed by Kalikow et al., 1977 for testing comprehension of English. In Tang and van Heuven’s version, the subjects were played a series of incomplete sentences. Their task was to complete each sentence with a single word. The test designers tried to ensure that this word always a noun was predictable from the content of the sentence. An example sentence is: “Our seats were in the second row.” A set of sixty sentences was translated into all fifteen Chinese dialects to be tested. Tang and van Heuven used a “Latin square design” Box et al., 1978 to rotate sentences so that every listener listened to each sentence in a different dialect, and so that all sentences in all dialects were listened to by each entire group of subjects. This type of test has several advantages. Firstly, it tests comprehension of language in context. If the incomplete sentence is not completely understood, there is no way that the subject can provide the correct answer. Secondly, it tests a wide range of content. The sixty sentences covered a large number of domains and contexts, far more than could a single story of the type used by Casad and others see 8.2.1 above. Finally, it is easy to score and the results for each dialect are directly comparable to each other. There are also several disadvantages to such a test. Firstly, the subjects are not presented with natural discourse. Listening to incomplete sentences is artificial. Secondly, and perhaps more importantly, the subjects could potentially achieve low scores despite a high level of comprehension simply because their “logic” or “way of thinking” suggested a different way of completing the sentences to the one the test designers had envisaged.

8.3.4 Sentence translation tests

Arguably, a better way to test comprehension of sentences is Beijering’s 2007 sentence translation method. Beijering’s test consisted of six sentences narrating the fable of The North Wind and the Sun, containing a total of 98 words on average thus each sentence consisted of around 16 words. She tested 18 Scandinavian varieties, including Norwegian, Swedish and Faeroese, on 351 native Danish speakers in Copenhagen. The fable was translated into each of these 18 speech varieties and recorded by native speakers. Each test consisted of all six sentences, each sentence read in a different one of the 18 dialects to be tested. The order of the six sentences was randomised for each test with the condition that no two consecutive sentences in the story could follow one another in the test. This was to ensure that the subject could not guess the content of a sentence from the previous sentence which, by chance, they may have understood. The subjects had to listen to each sentence and write it down in i.e., translate it into standard Danish. Each sentence was played once as a whole, and once again in chunks of four to eight words so that the subjects did not have to rely on memory to do the test. Their intelligibility scores were calculated based on the total number of words they correctly translated into Danish. As with Tang and van Heuven’s sentence completion test, this method has the advantage of producing results which are directly comparable with each other. A large number of subjects can be tested, and exactly the same content is tested for every dialect. One drawback, though, is that only people literate in a standard speech variety in this case, Danish can be tested. Additionally, we wonder whether randomisation of sentences which, in the correct order, tell a coherent story could potentially confuse the subjects.

8.3.5 Content question and answer tests

A method similar to SIL’s traditional RTT method was designed by Delsing and Åkesson 2005. A news item about a runaway kangaroo being chased through the streets of Copenhagen was translated from standard Danish into Norwegian and Swedish and read aloud in all three languages by professional newsreaders from the respective countries. The mean number of words in the news item was 257. Each group of subjects, comprising around 40 secondary school students, listened to a recording in one of two neighbouring languages and answered five open-ended questions on the text. Their percentage of correct answers formed the resulting intelligibility score. The same test was used by Gooskens 2007 to measure mutual intelligibility of Dutch, Frisian and Afrikaans. The advantage that this method has over the traditional RTT is that the same content is tested for each dialect. A drawback associated with this is that each participant could only be tested on one other dialect, otherwise they would listen to the same content more than once. As with the traditional RTT, results have the potential to vary wildly depending on the choice and design of the five open questions. Despite this, Gooskens 2006, 2007 found that the results of this particular test bore a high correlation with phonetic distance calculations, suggesting that they were a reliable indicator of linguistic differences.

8.3.6 Towards a new methodology: The sentence retelling L2 test