Selection of data for comparison

115 6 Lexical Similarity Melissa Partida, Andy Castro

6.1 Background

The usefulness of lexical similarity counts for subgrouping language varieties has been the subject of a great deal of debate. The general consensus among comparative linguists is that regular diachronic sound changes as revealed by rigorous application of the comparative method are a far more reliable way of uncovering the genetic relatedness of languages and dialects Thurgood 2003, Campbell 2004. Nevertheless, comparing lexical items can give us a useful perspective on perceived levels of linguistic similarity between different dialects and languages. Lexical similarity counts contribute to the overall synchronic picture of inter-dialect relationships, even though they do not necessarily reflect historical relatedness. Comparison of “basic vocabulary” e.g., terms for body parts, close kinship and the natural world has also been used as evidence for proposing more distant genetic relationships between languages Benedict 1975, Luo 2000, Ostapirat 2005. This hinges on the problematic assumption that basic vocabulary is more resistant to borrowing, thus similarities in basic vocabulary are more likely to be due to a shared inheritance than to diffusion. Additionally, dialectologists have discovered that lexical similarity counts and levels of inter- dialectal intelligibility are related Blair 1990, Grimes 1995. In particular, Blair’s “intelligibility threshold” of 60 that is, two language varieties must have a minimum lexical similarity of 60 in order for them to have any chance of being mutually intelligible is often quoted. Thus lexical similarity counts can provide supporting evidence for “communication clusters” indicated by intelligibility testing. In this chapter we first compare Sui and Kam lexical data and show that all dialects of Sui including Yang’an, see discussion in chapters 4 and 5 group together and are clearly distinct from Kam. Nevertheless, Yang’an dialect shares more cognates with Kam than other Sui dialects do, and many of these appear to constitute shared lexical innovations. This backs up our hypothesis that Yang’an genetically belongs to the Kam branch of Kam-Sui. Secondly, we examine the internal lexical similarity of all Sui lects covered by our survey. When it comes to shared lexicon, four distinct groupings emerge: 1 Sandong Central, Western and Eastern; 2 Sandong Southern; 3 Yang’an; and 4 Pandong. The most lexically divergent dialect cluster is Pandong. Central Sui spoken in Sandong township SD emerges as the most lexically representative of all Sui dialects, sharing over 90 of vocabulary with all other Sandong Sui dialects apart from Jiarong JR and Shuiyao SY, both Southern varieties spoken in Libo county. Finally, we provide a summary of distinctive lexical items representative of each Sui dialect cluster and discuss some instances of semantic change which are evident in various parts of the Sui region. We conclude that although all Sui dialects are unique, there is a linguistic and cultural unity that makes them all distinctly “Sui”.

6.2 Methodology

6.2.1 Selection of data for comparison

Our first comparison involved four Sui dialects and three Kam dialects. The primary objective was to uncover the internal lexical similarity of Sui compared with Kam. In particular, we wanted to see whether Yang’an dialect’s lexicon is more similar to Sui or to Kam, since shared diachronic innovations show that Yang’an belongs to the Kam branch chapters 3, 4 and 5. For this comparison we used our own field data alongside Shi and Strange’s 2004 Kam dialect data, comparing all of the words which occurred in both sets of data—a total of 393 lexical items. Our second comparison was between all sixteen Sui dialects surveyed. We utilised our full wordlist in this comparison because our aim was to gain a larger picture of the overall lexical similarity between the Sui dialects. However, we had to omit some words from our calculations, particularly in cases where there appeared to be two or more semantically similar or even identical words under the same gloss and we had not elicited both words in all locations a description of our wordlist elicitation procedures can be found in chapter 1, section 1. 4.2. For example, there are two words in Sui for ‘wind’: kʰaːŋ⁵ and lum¹. Some locations appear only to use one word or the other, whereas other locations seem to distinguish between the two, the former being a general word for ‘wind’ or ‘to blow a wind’ and the latter a word for a ‘twisting wind’ or ‘tornado’. We did not discover the difference until halfway through the survey and thus did not consistently probe for both words in all locations. We therefore omitted both words completely from our lexical comparison. Occasionally we excluded words from individual pairwise comparisons because we suspected that we may have elicited the wrong word in a certain location. For example, there is one commonly used word in Sui for ‘to lose or misplace’: tʰa¹. However, in Tangzhou TZ, Western Sui we only elicited the word tok⁷ for this gloss. tok⁷ is used in all Sui dialects to mean ‘to drop onto the ground’ and therefore by extension ‘to lose’ in a different sense from tʰa¹, which means ‘unable to find something because you forgot where you put it’. Tangzhou was a relatively early data point and at that point we were unaware of the fine semantic distinction between these two words so we failed to probe for the word tʰa¹. Therefore we excluded this particular word from our lexical comparisons between Tangzhou and other lects. In total our lexical comparison among Sui dialects included 594 lexical items. The fewest number of items included in a single pairwise comparison was 565, in the comparison between Jiarong JR, Southern Sui and Banliang BL, Yang’an. In general, there were slightly fewer words in our JR comparisons because JR was our very first data point and we had not completely finalised our wordlist at that point.

6.2.2 Method of determining lexical similarity