Sui dialects Lexical similarity counts

Table 6.4. Vocabulary shared by Pandong dialect and Kam, indicated by double lines English Sui Sandong Sui Yang’an Sui Pandong Kam ‘claw’ sim³ ȶim³ ɕim³ ɕim³ ‘rope’ laːk⁷ le¹ laːm⁶ laːm⁶ ‘round’ qu⁰ lu⁵ a qu⁵ lu⁵ tun² ton² ‘sweet’ ljen⁶ ljen⁶ qʰan¹ b kʰwan¹ ‘to come’ taŋ¹ taŋ¹ h̃a¹ ma¹ ‘to look at’ qau⁵ qau⁵ h̃o⁵ nu⁵ a Southern Sui also uses qon². b This word is used by other Sui dialects in the sense of ‘tasty’ or ‘delicious’.

6.3.2 Sui dialects

In order to gain a more detailed picture of lexical similarity across the Sui dialects we carried out a lexical comparison of a total of 594 lexical items. Our methodology for selecting and comparing these words is described in detail in section 6.2 of this chapter. Cluster analysis performed on the resulting lexical similarity percentages broadly supports a four-way dialect division: 1 Sandong Central, Western and Eastern; 2 Sandong Southern; 3 Yang’an and 4 Pandong. Eastern Sui spoken in Rongjiang county is also shown to form a mini-cluster of its own within the broader Central-Western-Eastern group. Further, we show that SD Sandong, Central dialect is by far the most lexically representative lect; on average it shares the most lexical items with all other lects. Thus both our historical phonological comparison and our lexical similarity analysis support the already established choice of SD as the basis for a Sui standard. 6.3.2.1 Lexical similarity percentages The percentages of cognate words shared by all the Sui dialects are shown in table 6.5. Figures are rounded to the nearest tenth of a percentage point. Central and Western varieties, along with DJ Eastern Sui spoken in Sandu county share above 90 of their vocabulary. The Eastern Sui lects DJ, SJ and RL, all situated to the northeast of the Sui region, have over 92 similar vocabulary. The Southern Sui lects JQ, JR, SW and SY all share above 90 of their vocabulary. The Yang’an lects TN and BL show 95.8 similarity to each other, but neither shares more than 87 vocabulary with any other location. Finally, JL and PD have consistently low percentages with all other Sui lects. These two Pandong dialects are most similar to each other at 81.2 lexical similarity. The most lexically similar lect to Pandong is AT Western Sui, at 81.2 similarity with JL and 77.6 similarity with PD. This could be due to the historical relationship between Pandong dialect and Western Sui which we proposed in chapter 5. The most lexically similar lect to Yang’an is TZ also Western Sui at 86.7 similarity with BL and 86.2 similarity with TN. This may be due to close contact between TZ and Yang’an dialects. Finally, the two furthest apart i.e., lexically most different lects are SY Southern and PD Pandong, sharing only 72.6 of vocabulary, not surprising given that they live at opposite ends of the Sui region. This is borne out by our intelligibility test results which show that, on average, people in PD understand only 30 of the SY lect the maximum comprehension of SY exhibited by any single person in PD was 43, see chapter 8. Table 6.5. Lexical similarity percentages percent historical cognates occupying the same meaning slot. Percentages over 90 are shaded in grey Sandong Yang’an Pandong Central Eastern Southern Central Western SD ZH TZ TP AT DJ SJ RL JQ SW JR SY BL TN JL PD 96.4 94.8 96.5 94.5 93.2 94.2 93.5 95.8 94.6 92 95.7 94.4 92.7 91.9 91.4 93.5 91.3 91.0 90.4 88.6 93.8 91.9 90.0 89.1 89.5 87.1 92.0 96.2 91.1 89.3 88.2 88.3 86.7 89.7 89.7 89.4 90.2 87.9 87.6 86.9 85.9 89.1 89.5 88.9 95.7 88.0 86.0 86.0 86.3 83.8 87.2 88.0 87.2 95.1 94.4 85.5 83.9 84.1 83.7 82.5 83.7 83.7 83.8 91.2 90.5 92.1 86.2 85.9 86.7 86.2 84.8 84.3 82.3 80.9 81.5 80.7 80.1 76.8 85.3 85.5 86.2 85.7 84.7 84.1 81.9 80.6 81.2 80.8 80.3 76.8 95.8 78.8 79.7 79.3 80.0 81.2 80.7 80.0 79.1 75.8 74.4 74.0 73.8 75.3 75.8 75.5 77.2 77.6 77.3 77.6 76.3 76.0 75.8 73.7 72.8 73.6 72.6 74.1 75.3 81.2 6.3.2.2 Lexical similarity visualizations and clustering In order to visualise our results, the lexical similarity percentages in table 6.5 were converted from WordSurv to a difference data format specified by the Gabmap software Nerbonne et al., 2011 and then uploaded and run through Gabmap’s clustering and mapping algorithms. Gabmap produces dendrograms according to the user’s specifications that show possible dialect clusters based on similarity data. It also produces colour coded maps of these clusters using dialect coordinates imported via Google Earth Google Inc. 2012. Various algorithms are available for viewing clusters based on the lexical similarity count. For our data, all the available algorithms produced the same four broad clusters. These are shown on the dialect map 6.1. Circles sharing the same colour belong to the same cluster. The clusters broadly match what we would expect: 1 Pandong light blue; 2 Yang’an dark blue; 3 Sandong Central, Western and Eastern, pink; 4 Sandong Southern, green. Map 6.1. Dialect cluster map based on a target of four clusters, showing the outer borders of counties covered in the survey: Rongjiang, Libo, Dushan, Sandu and Duyun city These clusters can be visualised in a dendrogram, as shown in figure 6.1. Along the bottom of the tree is a scale of lexical distance, 0.0 indicating zero lexical distance i.e., all lexical items are identical. A distance of 0.1 indicates that 10 of vocabulary is different, 0.2 indicates that 20 is different and so on. The data points most closely related show splits in the tree the furthest to the left. Many of these separations signify very slight differences, probably less than needed to denote a different dialect. The colours match the colours of the four clusters shown in map 6.1. Figure 6.1. Lexical similarity dendrogram. The software provides a way to test cluster validity by producing a plot using multi-dimensional scaling MDS plot, shown here in figure 6.2. Each dialect is positioned on a two-dimensional plane according to its lexical similarity to the other dialects. The distance between any two points is determined by the lexical similarity percentage for those two lects. The further apart two points are, the more different their lexicon is. Dialect borders Pandong, Sandong, Yang’an and Southern were added by the authors. Figure 6.2. MDS plot. Pandong and Yang’an dialects are very clearly distinct from the other Sui lects. Pandong is the most divergent lexically, being positioned much further away from the Sandong lects than Yang’an is. Within Sandong, the Southern lects form a clear group, although SY is positioned some distance away from the other three Southern lects. This indicates that within Southern Sui, SY is the most lexically divergent dialect. The non-Southern lects are all clustered closely together, with the two easternmost lects SJ and RL, both Eastern Sui spoken in Rongjiang county positioned off to the right-hand side. 6.3.2.3 Sandong SD, the most lexically representative dialect In order to find out which Sui dialect shares the most number of lexical items with all the other dialects, and thus which could be described as the most “lexically representative” dialect, we calculated the mean lexical similarity percentage across all dialects for each individual lect. Our results show that SD is the most representative lect. SD is also the most representative lect for the whole “Sandong” dialect cluster, and for the cluster of non-Southern Sandong lects shown in map 6.1, figure 6.1 and figure 6.2 covering Central, Western and Eastern Sui. These findings back up the well-established choice by Chinese linguists and government planners of the SD Sandong lect as the “standard” variety of Sui Sandu County Education Burea 2007. Overall mean lexical similarity percentages are shown in table 6.6, with the highest mean to the left and the lowest mean to the right. We also give the standard deviation for each mean calculated. The standard deviation is an indicator of the overall range of scores or “variance” at each location. Standard deviations are generally quite high over 5, indicating that there is a high amount of lexical variation over the whole Sui region. The large variance means that most of the differences in mean percentages are statistically insignificant, as shown by multiple pairwise comparisons. Table 6.6. Overall mean lexical similarity with all other lects StDev is the standard deviation Sandong Central, Eastern, Western Sandong Southern Yang’an Pandong SD ZH TZ DJ TP SJ AT RL JQ SW JR SY BL TN JL PD Mean: 89.4 89.1 88.6 88.5 88 87.7 87.3 86.8 87.1 86.4 85.5 83 82.8 82.7 77.9 75.8 StDev: 6.19 6.33 5.58 5.59 5.05 5.72 5.25 5.59 6.38 6.56 6.37 5.9 5.4 5.12 2.75 2.27 Nevertheless, SD has the highest average lexical similarity and ZH the second-highest. Both are Central Sui lects geographically lying between Western Sui and Southern Sui areas, thus can both be described as the most “central” dialects both lexically and geographically. PD Pandong dialect clearly shares the least amount of lexicon with all the other lects, that is, it is the most lexically divergent as also indicated by the MDS plot in figure 6.2. SD also has the highest average lexical similarity within the Sandong dialect cluster, as shown in table 6.7. This time, the standard deviations are much lower, reflecting the relative lexical homogeneity of lects within the Sandong cluster. SY Southern is the least lexically similar, sharing an average of 85.9 vocabulary with all the other Sandong lects. Of the Southern Sui lects, JQ shows the highest lexical similarity across the board. This is unsurprising because JQ is the only Southern Sui lect spoken in Sandu county and is geographically the closest to Central Sui speakers. Table 6.7. Overall mean lexical similarity with lects within Sandong dialect cluster StDev is the standard deviation Sandong Central, Eastern, Western Sandong Southern SD ZH DJ TZ SJ TP RL AT JQ SW JR SY Mean: 92.3 91.3 91.1 90.8 90.5 90.1 89.6 89.3 90.4 89.7 88.6 85.9 StDev: 3.36 4.32 3.45 4.07 3.33 3.49 3.18 4.47 2.78 3.00 3.67 3.54 The dialect clustering described in section 6.3.2.2 grouped Central, Eastern and Western Sui into a single cluster. As table 6.8 shows, SD is also clearly the most lexically representative lect within this cluster. RL Eastern is the most lexically divergent, again unsurprising since it is located on the very eastern fringe of the Sui region. In this case, multiple pairwise T-tests show that the differences in mean percentages are far more statistically significant yielding an overall probability of 14.92 due to the smaller amount of variance among the original similarity percentages. Table 6.8. Overall mean lexical similarity with lects within Sandong Central-Western-Eastern dialect cluster Central Western Eastern SD ZH TZ TP AT DJ SJ RL Mean: 94.3 93.9 93.3 92.2 91.9 93.1 92.1 90.8 Standard Deviation: 1.51 2.56 2.53 1.87 3.14 1.56 2.54 2.91 Gabmap is able to produce a “reference map” showing which dialects are most closely related to a specified reference point. Given that SD emerged as the most representative lect for the whole of Sui, we chose to plot a reference map based on SD. This is shown in map 6.2. SD is denoted by the star in the middle of the darkest plot. The other Central, Western and Eastern Sui varieties of ZH, TZ, TP and DJ are all dark blue signifying close lexical similarity to SD. AT and the two Rongjiang locations of SJ and RL are slightly lighter shades of blue, showing slightly lower lexical similarity to SD. The Southern locations of JQ, SW, and JR are still lighter shades of blue, while SY and Yang’an locations BL and TN are very light bluish green, showing that their lexicon is significantly different from SD. Finally, JL and PD are respectively pale yellow and completely white, showing the greatest lexical difference with the SD Sui variety. The map confirms that SD is a good reference lect to represent all of the Western, Central and Eastern Sui varieties. Map 6.2. SD reference plot based on lexical similarity. SD is marked by the star

6.4 Sui dialectal variants