Scoring the tests Methodology

the parts of the sentence which the subject retold correctly. We wrote down any parts of the retelling which diverged from the original text. By using a different coloured pen for each subject, we could record the retellings of the three subjects who listened to the same sets of sentences on the same response sheets. After testing each group of sentences we asked the subject a set of questions given in appendix C to find out their own perception of how much they understood and their judgement of how similar the dialect was to their own dialect.

8.4.6 Scoring the tests

In this section we explain how we assigned each subject an overall intelligibility score. The scoring process comprised five stages: 1 identifying the core elements of each sentence on which to base the final scores; 2 assigning each subject a score based upon these core elements; 3 calculating an overall score for each test; 4 adjusting the scores based on the hometown test results in order to ensure that the results for each RTT were comparable to each other; 5 adjusting the final percentages by excluding anomalous subject scores so as to raise the overall confidence limit. 8.4.6.1 Identifying core elements For every sentence of each of the four dialects tested Central, Southern, Pandong and Yang’an we identified core elements on which to base our scoring. To do this we referred to two sets of materials: 1 the literal word-for-word translations of each of the original Sui sentences; and 2 the responses of the “hometown subjects” those who listened to and retold their own dialects. We did not base the scoring solely upon the literal translations of the original sentences because we did not know whether any specific content would be “assumed” by local Sui speakers and would therefore tend not to be retold. In other words, we did not know which elements of the sentences were deemed by mother-tongue speakers to be “important” enough to retell. Furthermore, some parts of the sentence may have been auditorily clearer than others in the original recording. We did not wish to penalise subjects for not retelling parts of the recording which the hometown subjects themselves did not retell because they did not hear them clearly. The process of identifying core elements is best illustrated by working through an example step by step. We will use the second sentence in Group B, Sentence 15. The original sentence designed for the RTT is given below. Although the Chinese and Sui could be described as a single “sentence”, the English translation is better rendered using several short sentences. Although neither the original Chinese nor the Sui version of this sentence indicates a particular tense, we render all English translations in this section in the past tense for the sake of consistency. Chinese: 冬天很冷,每个人都穿着厚厚的衣服,在家里烤火,哪里都不去。 English: It was very cold in the winter. Everyone was wearing very thick clothes and warming themselves by the fire at home. They did not go anywhere. Table 8.4 shows a transcription of the same sentence as recorded in Pandong dialect. The Pandong version is not identical to the original version given above. The original states “very thick clothes” whereas the Pandong version reads “lots of clothes”. It was common for our original sentences to be translated slightly differently across the four target dialects. This is why we developed a separate scoring sheet for each dialect. Each scoring sheet was based on a different set of “core elements”. Table 8.4. Sentence 15 Group B, Pandong dialect PD. Word-for-word gloss and free translation PD haŋ² ȵi:t⁷ lən⁴ h̃ja:n⁵ tɕap⁸ ʔai³ tan³ ⁿduk⁷ lən⁴ kuŋ² Gloss time season cold very cold every CLF- person wear clothes very many PD to³ ȵa:u⁶ ɣa:n² pʰja:u¹ ɥi¹ tɕum² nau¹ pu³ me² pa: i¹. Gloss all atin home warm oneself fire place which all NEG go. Free translation It was very cold in the winter. Everyone wore lots of clothes and warmed themselves by the fire at home. No one went anywhere. Table 8.5 gives the retellings of the four hometown subjects i.e., the four subjects at datapoint PD who listened to this sentence. In general, any element that occurred in the original text and was also retold by at least two hometown subjects was counted as a core element. 5 Table 8.5. Pandong village speaker retellings of Sentence 15 Group B, Pandong dialect Subject ref Retelling as retold in Chinese dialect, with English translation PD03 里面很冷,在家里烤火,哪里都不去。 It was very cold inside. People warmed themselves by the fire at home and did not go anywhere. PD05 这 里很冷,每个人都穿很多,在家里烤火,哪里都不去。 It was very cold here, everyone wore a lot and warmed themselves by the fire at home, they did not go anywhere. PD06 冬天很冷,每个人穿着厚厚的衣服,在家里烤火,哪里都不去。 It was very cold in the winter. Everyone wore very thick clothing and warmed themselves by the fire at home. They did not go anywhere. PD07 那个地方很冷,很多都穿很多衣服,在家里烤火,哪里都不去。 It was really cold there, very many people wore lots of clothes and warmed themselves by the fire at home. They did not go anywhere. In total we identified nine core elements for this sentence: 1 very cold; 2 everyone lots ofmany people = 0.5; 3 wear; 4 very manyvery thick; 5 clothes; 6 at home; 7 warmed by the fire; 8 anywhere; 9 didn’t go. We now explain how we identified these core elements. In the Pandong version of this sentence, “winter” was translated as “cold time” or “cold season”. Only one hometown subject, PD06, retold this, so we did not include it as a core element. However, all four subjects retold “very cold” exactly as in the PD original, hence we identified “very cold” as a core element. “Everyone” occurs in the PD version and was retold by two of the hometown subjects, hence “everyone” was also counted as a core element. PD07 retold this as “very many”, which in our final scoring counted as half a point because “people” is assumed and “very many people” is close in meaning to “everyone” see section 8.4.6.2 on score assignment. “Wear” was retold by three subjects so it counts as a core element. The subsequent “a lot of” or “very many” occurs in the original Pandong version as a 5 Usually there were only three hometown retellings for each sentence because we only tested a total of nine subjects at each location see Section 4.3. Due to time limitations we did not conduct a separate “hometown testing” phase in which we specifically tested the four dialects on a larger pool of hometown subjects. modifier of “clothes” and was retold by PD05 and PD07. In practice, its meaning is identical to “very thick” PD06 retelling since Sui people tend to wear many layers of clothing which can be described as “thick” due to the number of layers rather than very thick individual articles of clothing. Thus we identified “lots of ”, “very many” or “very thick” as another core element. 6 “Clothes” was in the original PD sentence and was retold by two hometown subjects so we counted it as a core element. Finally, “at home”, “warmed by the fire”, “anywhere” and “didn’t go” were retold by all hometown subjects, thus they were all identified as core elements. If the recording contained a modern Chinese loanword it was not counted as a core element even if all of the hometown subjects retold it. This is because speakers of other dialects may recognise and correctly retell it owing to its being a Chinese loanword, not necessarily implying comprehension of the target Sui dialect. However, an old Chinese loanword could be counted as a core element. This is because old Chinese loanwords are now an integral part of the Sui language. Their pronunciation sometimes differs radically from one dialect to another due to divergent sound changes. Modern and old Chinese loanwords can be readily distinguished by comparing their pronunciation with the modern local Chinese dialect and with Middle Chinese. Table 8.6 gives the Southern Sui translation of Sentence 22 Group B along with an English word- for-word gloss. It is immediately obvious to a Chinese speaker that j i⁶ wai⁴ from 以 为 ‘to mistakenlywrongly think’ and ȶʰen⁴ pu¹ from 全部 ‘all’ are modern Chinese loanwords. These two words were therefore not included as core elements. Owing to differences in the way each sentence was translated into the four target dialects, we identified a separate set of core elements for each of our four RTTs. The core element charts for each RTT are provided in appendix E for reference purposes. Table 8.6. Sentence 22 group B, Southern subdialect SY. Word-for-word gloss and free translation SY ʔȵam⁵ kun⁵ kʰaːŋ⁵ laːu⁴ tsjeːŋ⁴ to¹ taŋ¹ ʔai¹ Gloss evening before wind. blows big open door come CLF- person SY kuŋ² ji⁶ wai⁴ ʔnaŋ¹ maːŋ¹ ȶʰen⁴ pu¹ tsam² taŋ¹. Gloss many wrongly think have ghost all hide come. Free translation The night before last, a strong wind blew open the door. Lots of people wrongly thought that it was a ghost and they all hid. 8.4.6.2 Score assignment The RTTs were scored according to the core elements identified in 8.4.6.1. Each core element retold by the subject earned one point. If the basic meaning of a core element was retold but the subject worded it in a different way from the Chinese translation of the recorded text, it still earned them one point. A partial retelling of a core element, or a retelling which was similar in meaning to the original but not identical, would earn half a point. Table 8.7 illustrates how the three RTT subjects in JR Southern who listened to Sentence 15 of the PD Pandong RTT were scored. JR07 earned half a point for “more” because it is similar to the core element “very many” but not identical in meaning. JR09 earned no points at all. This subject did not understand this sentence and appears to have made something up to avoid embarrassment. 6 If a subject retold “very many” as part of the phrase “very many people” but did not retell “very many” or “very thick” as a modifier of “clothing”, we would not give them the point for the core element “very many clothes”, but only the half point for the core element “everyone”. Table 8.7. Sentence 15 Group B, Pandong dialect PD. JR Southern retellings. Core elements gaining one point are underlined with a solid line, partial core elements gaining half a point are underlined with a dotted line PD original 冬天很冷,每个人都穿着很多衣服,在家里烤火,哪里都不去。 It was very cold in the winter. Everyone wore lots of clothes and warmed themselves by the fire at home. No one went anywhere. Max score: 9 Core elements very cold, everyone, wear, very manyvery thick, clothes, at home, warmed by the fire, anywhere, didn’t go JR07 很冷,在家 哪里 都不去,出去穿 多点 衣服。 It is very cold, stay at home and don’t go anywhere, and when you go out wear more clothes. 6.5 JR08 在家里 烤火,哪里 都不去。 warm themselves by the fire at home and don’t go anywhere 4 JR09 得一个老婆 还 想娶另一个。 He has already got one wife but he wants to marry another. Average score: 3.5 We should point out that we deliberately chose this example as an illustration of scoring. In reality, such a wide range of scores for the same sentence was uncommon. This sentence is the only such example among these three JR subjects listening to the PD Group B sentences. Their scores on almost all the other sentences were within one or two points of each other, giving us confidence that the overall comprehension score for JR listening to PD Group B was reliable. In some instances, a subject may retell a correct core element but we did not award them a point because they retold it in the incorrect place in the sentence, indicating that they did not understand it in the correct way according to the context of the core element in the original sentence. For example, Sentence 11 Group A says, “She didn’t like salty, sour or spicy food, she would only eat sweet things, so in the end all of her teeth fell out.” One subject retold this sentence in this way, “She didn’t like salty, sweet or sour food, so in the end all of her teeth fell out.” This subject did not earn a point for “sweet” because he did not understand it in the correct context of “she would only eat sweet food”. 8.4.6.3 Calculating the overall scores After assigning scores to each individual RTT subject, we entered all the scores for each RTT into a predesigned score chart. Table 8.8 shows our score chart for Zhonghe ZH subjects taking the Central GC RTT. As illustrated in the table, the raw scores were converted into percentages and then averaged across all nine participants to give an overall score. Note that each group of sentences had a different maximum score depending on the total number of core elements in that group. Converting the scores for each group into percentages ensured that each group of sentences was equally weighted in the overall score. Table 8.8. Scores for Zhonghe ZH subjects taking the Central Sui GC RTT Sentences tested: Subject ref. SD Group A Percentage Max score: 111 100 ZH01 96.5 86.94 ZH02 107 96.40 ZH03 104.5 94.14 Average score: 102.67 92.49 Sentences tested: Subject ref. SD Group B Max score: 112 100 ZH04 106 94.64 ZH05 104.5 93.30 ZH07 91 81.25 Average score: 100.5 89.73 Sentences tested: Subject ref. SD Group C Max score: 112 100 ZH06 100 89.29 ZH08 106 94.64 ZH09 97 86.61 Average score: 101 90.18 Overall score for ZH subjects listening to SD RTT: 92.49 + 89.73 + 90.18 ÷ 3 = 90.8 8.4.6.4 Adjusting the group scores Theoretically, all subjects taking the test in their own dialect should gain full marks because they should have complete comprehension of their own dialect. This was rarely the case, however, because the recordings were not always crystal clear and some sentences contained words which the text provider “swallowed” during recording. In order to maximise the comparability of the RTTs to each other, we adjusted the scores so that each group of sentences in each RTT was equally weighted. For example, the Shuiyao hometown subjects who listened to Groups A and C of the SY RTT achieved an average of 92 on both groups of sentences. Those listening to the Group B sentences only achieved an average of 85.3. The fact that the latter subjects achieved a higher score than the former subjects on both of the other RTTs that they took Central and Yang’an indicates that their lower score of 85.3 on the SY sentences was not due to a lesser ability in “test-taking” or in “translation”. Rather, it was probably due to the fact that the Group B SY recordings were not as easy for native speakers to understand and retell as the Group A and C recordings. Therefore, for the SY RTT, we gave Group B a greater weighting than Groups A and C in order to ensure a fair, comparable overall score for the SY RTT. Similarly, our hometown test results indicated that the four RTTs were not directly comparable. For example, the overall hometown test score for the SY RTT was 89.8 whereas the hometown test score for the PD RTT was 93.5. In order to make the results of each RTT comparable to each other, we adjusted the scores for each test based on the hypothesis that the hometown subjects had 100 comprehension of their own tests. We did this by multiplying the listeners’ scores by the group weighting coefficient illustrated in table 8.10. Tables 8.9 and 8.10 illustrate the mechanism by which we adjusted the RTT scores. For each test, each group of sentences was weighted using a “group weighting coefficient” calculated according to the hometown subject results. Table 8.9 shows how we calculated the weighting coefficient for each group, in this case taking the SD RTT as an example. Table 8.10 illustrates how these coefficients were used to adjust the SD RTT results in all datapoints, taking Zhonghe ZH results as an example. Table 8.9. Scores for Sandong SD subjects taking the Central Sui GC RTT hometown test, showing the calculation of group weighting coefficients used to adjust all test scores Sentences tested: Subject ref. SD Group A Percentage Group weighting coefficients Max score: 111 100 SD01 95.5 86.04 SD02 105.5 95.05 SD04 98.5 88.74 Average score: 99.83 89.94 100 ÷89.94= 1.11185 Sentences tested: Subject ref. SD Group B Max score: 112 100 SD03 104 92.86 SD05 108.5 96.88 SD06 110 98.21 Average score: 107.5 95.98 100 ÷95.98= 1.04186 Sentences tested: Subject ref. SD Group C Max score: 112 100 SD07 105 93.75 SD08 99 88.39 SD09 103.5 92.41 Average score: 102.5 91.52 100 ÷91.52= 1.09268 Overall score for SD subjects listening to SD RTT: 89.94+95.98+91.52 ÷ 3= 92.48 Table 8.10. Calculation of adjusted scores each group equally weighted for Zhonghe ZH subjects taking the Central Sui GC RTT Sentences tested: Subject ref. SD Group A Percentage Group weighting coefficient Adjusted score Max score: 111 100 ZH01 96.5 86.94 x 1.11185 = 96.66 ZH02 107 96.40 x 1.11185 = 100 ZH03 104.5 94.14 x 1.11185 = 100 Average score: 102.67 92.49 98.89 Sentences tested: Subject ref. SD Group B Max score: 112 100 ZH04 106 94.64 x 1.04186 = 98.60 ZH05 104.5 93.30 x 1.04186 = 97.21 ZH07 91 81.25 x 1.04186 = 84.65 Average score: 100.5 89.73 93.49 Sentences tested: Subject ref. SD Group C Max score: 112 100 ZH06 100 89.29 x 1.09268 = 97.56 ZH08 106 94.64 x 1.09268 = 100 ZH09 97 86.61 x 1.09268 = 94.63 Average score: 101 90.18 97.40 Overall score for ZH subjects listening to SD RTT: = 90.8 96.59 8.4.6.5 Raising the confidence limit An extremely wide range of test scores as indicated by the standard deviation among subjects at the same location listening to the same RTT would suggest that the overall average score is unreliable in some way Blair 1990. In all cases where the standard deviation was over 10 we examined the individual test scores to see if there were any subjects who were skewing the overall score. If such a subject could be identified and if we had external evidence to show that their result may be unreliable for example from information garnered from the pre-RTT questionnaire or from the interviewer’s own observations, we excluded their result from the overall score. For example, the original average score for PD Pandong dialect subjects listening to the Central Sui GC RTT was 45.4. The standard deviation was high, at 14.0. On closer inspection we found that one subject had a particularly high score of 71.2 whereas none of the other subjects achieved higher than 54.9. Our pre-RTT questionnaire indicated that the high-scoring subject was a graduate of the teacher’s college in Duyun the prefectural capital where he would have had contact with many Sui speakers from Sandu county including from the Central Sui area. Furthermore, he had worked in Duyun for several years after graduating. After excluding his score from the results, the overall average was lower, at 42.2, and the standard deviation was much lower, at 10.9. Due to the low number of subjects nine at each location, we endeavoured not to exclude more than one subject from any one set of results. It would be difficult to show that divergent results of two or more subjects in a population of just nine are statistically significant. All RTT scores, including original scores, post-adjustment scores and information on any subjects whose results were excluded from the final scores, are provided in appendix F.

8.5 Results