Table 6.4. Vocabulary shared by Pandong dialect and Kam, indicated by double lines English
Sui Sandong Sui
Yang’an Sui Pandong
Kam ‘claw’
sim³ ȶim³
ɕim³ ɕim³
‘rope’ laːk⁷
le¹ laːm⁶
laːm⁶ ‘round’
qu⁰ lu⁵
a
qu⁵ lu⁵ tun²
ton² ‘sweet’
ljen⁶ ljen⁶
qʰan¹
b
kʰwan¹ ‘to come’
taŋ¹ taŋ¹
h̃a¹ ma¹
‘to look at’ qau⁵
qau⁵ h̃o⁵
nu⁵
a
Southern Sui also uses qon².
b
This word is used by other Sui dialects in the sense of ‘tasty’ or ‘delicious’.
6.3.2 Sui dialects
In order to gain a more detailed picture of lexical similarity across the Sui dialects we carried out a lexical comparison of a total of 594 lexical items. Our methodology for selecting and comparing these
words is described in detail in section 6.2 of this chapter. Cluster analysis performed on the resulting lexical similarity percentages broadly supports a four-way dialect division: 1 Sandong Central, Western
and Eastern; 2 Sandong Southern; 3 Yang’an and 4 Pandong. Eastern Sui spoken in Rongjiang county is also shown to form a mini-cluster of its own within the broader Central-Western-Eastern group.
Further, we show that SD Sandong, Central dialect is by far the most lexically representative lect; on average it shares the most lexical items with all other lects. Thus both our historical phonological
comparison and our lexical similarity analysis support the already established choice of SD as the basis for a Sui standard.
6.3.2.1 Lexical similarity percentages
The percentages of cognate words shared by all the Sui dialects are shown in table 6.5. Figures are rounded to the nearest tenth of a percentage point. Central and Western varieties, along with DJ Eastern
Sui spoken in Sandu county share above 90 of their vocabulary. The Eastern Sui lects DJ, SJ and RL, all situated to the northeast of the Sui region, have over 92 similar vocabulary.
The Southern Sui lects JQ, JR, SW and SY all share above 90 of their vocabulary. The Yang’an lects TN and BL show 95.8 similarity to each other, but neither shares more than 87 vocabulary with
any other location. Finally, JL and PD have consistently low percentages with all other Sui lects. These two Pandong dialects are most similar to each other at 81.2 lexical similarity.
The most lexically similar lect to Pandong is AT Western Sui, at 81.2 similarity with JL and 77.6 similarity with PD. This could be due to the historical relationship between Pandong dialect and
Western Sui which we proposed in chapter 5. The most lexically similar lect to Yang’an is TZ also Western Sui at 86.7 similarity with BL and 86.2 similarity with TN. This may be due to close contact
between TZ and Yang’an dialects.
Finally, the two furthest apart i.e., lexically most different lects are SY Southern and PD Pandong, sharing only 72.6 of vocabulary, not surprising given that they live at opposite ends of the
Sui region. This is borne out by our intelligibility test results which show that, on average, people in PD understand only 30 of the SY lect the maximum comprehension of SY exhibited by any single person
in PD was 43, see chapter 8.
Table 6.5. Lexical similarity percentages percent historical cognates occupying the same meaning slot. Percentages over 90 are shaded in grey
Sandong Yang’an
Pandong Central
Eastern Southern
Central Western
SD ZH
TZ TP
AT DJ
SJ RL
JQ SW
JR SY
BL TN
JL PD
96.4 94.8 96.5
94.5 93.2 94.2 93.5 95.8 94.6 92
95.7 94.4 92.7 91.9 91.4 93.5 91.3 91.0 90.4 88.6 93.8
91.9 90.0 89.1 89.5 87.1 92.0 96.2 91.1 89.3 88.2 88.3 86.7 89.7 89.7 89.4
90.2 87.9 87.6 86.9 85.9 89.1 89.5 88.9 95.7 88.0 86.0 86.0 86.3 83.8 87.2 88.0 87.2 95.1 94.4
85.5 83.9 84.1 83.7 82.5 83.7 83.7 83.8 91.2 90.5 92.1 86.2 85.9 86.7 86.2 84.8 84.3 82.3 80.9 81.5 80.7 80.1 76.8
85.3 85.5 86.2 85.7 84.7 84.1 81.9 80.6 81.2 80.8 80.3 76.8 95.8 78.8 79.7 79.3 80.0 81.2 80.7 80.0 79.1 75.8 74.4 74.0 73.8 75.3 75.8
75.5 77.2 77.6 77.3 77.6 76.3 76.0 75.8 73.7 72.8 73.6 72.6 74.1 75.3 81.2
6.3.2.2 Lexical similarity visualizations and clustering
In order to visualise our results, the lexical similarity percentages in table 6.5 were converted from WordSurv to a difference data format specified by the Gabmap software Nerbonne et al., 2011 and then
uploaded and run through Gabmap’s clustering and mapping algorithms. Gabmap produces dendrograms according to the user’s specifications that show possible dialect clusters based on similarity data. It also
produces colour coded maps of these clusters using dialect coordinates imported via Google Earth Google Inc. 2012.
Various algorithms are available for viewing clusters based on the lexical similarity count. For our data, all the available algorithms produced the same four broad clusters. These are shown on the dialect
map 6.1. Circles sharing the same colour belong to the same cluster. The clusters broadly match what we would expect: 1 Pandong light blue; 2 Yang’an dark blue; 3 Sandong Central, Western and
Eastern, pink; 4 Sandong Southern, green.
Map 6.1. Dialect cluster map based on a target of four clusters, showing the outer borders of counties covered in the survey: Rongjiang, Libo, Dushan, Sandu and Duyun city
These clusters can be visualised in a dendrogram, as shown in figure 6.1. Along the bottom of the tree is a scale of lexical distance, 0.0 indicating zero lexical distance i.e., all lexical items are identical.
A distance of 0.1 indicates that 10 of vocabulary is different, 0.2 indicates that 20 is different and so on. The data points most closely related show splits in the tree the furthest to the left. Many of these
separations signify very slight differences, probably less than needed to denote a different dialect. The colours match the colours of the four clusters shown in map 6.1.
Figure 6.1. Lexical similarity dendrogram.
The software provides a way to test cluster validity by producing a plot using multi-dimensional scaling MDS plot, shown here in figure 6.2. Each dialect is positioned on a two-dimensional plane
according to its lexical similarity to the other dialects. The distance between any two points is determined by the lexical similarity percentage for those two lects. The further apart two points are, the
more different their lexicon is. Dialect borders Pandong, Sandong, Yang’an and Southern were added by the authors.
Figure 6.2. MDS plot. Pandong and Yang’an dialects are very clearly distinct from the other Sui lects. Pandong is the most
divergent lexically, being positioned much further away from the Sandong lects than Yang’an is. Within Sandong, the Southern lects form a clear group, although SY is positioned some distance away from the
other three Southern lects. This indicates that within Southern Sui, SY is the most lexically divergent dialect. The non-Southern lects are all clustered closely together, with the two easternmost lects SJ and
RL, both Eastern Sui spoken in Rongjiang county positioned off to the right-hand side. 6.3.2.3
Sandong SD, the most lexically representative dialect In order to find out which Sui dialect shares the most number of lexical items with all the other dialects,
and thus which could be described as the most “lexically representative” dialect, we calculated the mean lexical similarity percentage across all dialects for each individual lect. Our results show that SD is the
most representative lect. SD is also the most representative lect for the whole “Sandong” dialect cluster, and for the cluster of non-Southern Sandong lects shown in map 6.1, figure 6.1 and figure 6.2 covering
Central, Western and Eastern Sui. These findings back up the well-established choice by Chinese linguists and government planners of the SD Sandong lect as the “standard” variety of Sui Sandu
County Education Burea 2007. Overall mean lexical similarity percentages are shown in table 6.6, with the highest mean to the left
and the lowest mean to the right. We also give the standard deviation for each mean calculated. The standard deviation is an indicator of the overall range of scores or “variance” at each location.
Standard deviations are generally quite high over 5, indicating that there is a high amount of lexical
variation over the whole Sui region. The large variance means that most of the differences in mean percentages are statistically insignificant, as shown by multiple pairwise comparisons.
Table 6.6. Overall mean lexical similarity with all other lects StDev is the standard deviation
Sandong Central, Eastern, Western Sandong Southern
Yang’an Pandong
SD ZH
TZ DJ
TP SJ
AT RL
JQ SW JR SY BL
TN JL
PD Mean: 89.4 89.1 88.6 88.5 88 87.7 87.3 86.8 87.1 86.4 85.5 83 82.8 82.7 77.9 75.8
StDev: 6.19 6.33 5.58 5.59 5.05 5.72 5.25 5.59 6.38 6.56 6.37 5.9 5.4 5.12 2.75 2.27 Nevertheless, SD has the highest average lexical similarity and ZH the second-highest. Both are
Central Sui lects geographically lying between Western Sui and Southern Sui areas, thus can both be described as the most “central” dialects both lexically and geographically. PD Pandong dialect clearly
shares the least amount of lexicon with all the other lects, that is, it is the most lexically divergent as also indicated by the MDS plot in figure 6.2.
SD also has the highest average lexical similarity within the Sandong dialect cluster, as shown in table 6.7. This time, the standard deviations are much lower, reflecting the relative lexical homogeneity
of lects within the Sandong cluster. SY Southern is the least lexically similar, sharing an average of 85.9 vocabulary with all the other Sandong lects. Of the Southern Sui lects, JQ shows the highest
lexical similarity across the board. This is unsurprising because JQ is the only Southern Sui lect spoken in Sandu county and is geographically the closest to Central Sui speakers.
Table 6.7. Overall mean lexical similarity with lects within Sandong dialect cluster StDev is the standard deviation
Sandong Central, Eastern, Western Sandong Southern
SD ZH
DJ TZ
SJ TP
RL AT
JQ SW
JR SY
Mean: 92.3 91.3 91.1 90.8 90.5 90.1 89.6 89.3 90.4 89.7 88.6 85.9 StDev: 3.36 4.32 3.45 4.07 3.33 3.49 3.18 4.47 2.78 3.00 3.67 3.54
The dialect clustering described in section 6.3.2.2 grouped Central, Eastern and Western Sui into a single cluster. As table 6.8 shows, SD is also clearly the most lexically representative lect within this
cluster. RL Eastern is the most lexically divergent, again unsurprising since it is located on the very eastern fringe of the Sui region. In this case, multiple pairwise T-tests show that the differences in mean
percentages are far more statistically significant yielding an overall probability of 14.92 due to the smaller amount of variance among the original similarity percentages.
Table 6.8. Overall mean lexical similarity with lects within Sandong Central-Western-Eastern dialect cluster
Central Western
Eastern SD
ZH TZ
TP AT
DJ SJ
RL Mean:
94.3 93.9
93.3 92.2
91.9 93.1
92.1 90.8
Standard Deviation: 1.51 2.56
2.53 1.87
3.14 1.56
2.54 2.91
Gabmap is able to produce a “reference map” showing which dialects are most closely related to a specified reference point. Given that SD emerged as the most representative lect for the whole of Sui, we
chose to plot a reference map based on SD. This is shown in map 6.2. SD is denoted by the star in the middle of the darkest plot. The other Central, Western and Eastern Sui varieties of ZH, TZ, TP and DJ are
all dark blue signifying close lexical similarity to SD. AT and the two Rongjiang locations of SJ and RL are slightly lighter shades of blue, showing slightly lower lexical similarity to SD. The Southern locations
of JQ, SW, and JR are still lighter shades of blue, while SY and Yang’an locations BL and TN are very light bluish green, showing that their lexicon is significantly different from SD. Finally, JL and PD are
respectively pale yellow and completely white, showing the greatest lexical difference with the SD Sui variety. The map confirms that SD is a good reference lect to represent all of the Western, Central and
Eastern Sui varieties. Map 6.2. SD reference plot based on lexical similarity. SD is marked by the star
6.4 Sui dialectal variants