Results and analysis Lexical similarity comparison

Table 1. Sites from which wordlists were obtained Language State District Village Bhumij Bihar Singhbhum Champi Bhumij Bihar Singhbhum Ladhiramsai Bhumij Bihar Singhbhum Munduy Bhumij Orissa Balasore Baigodia Bhumij Orissa Mayurbhanj Dighinuasahi Bhumij Orissa Mayurbhanj Dumadie Bhumij Orissa Mayurbhanj Madhupur Bhumij Orissa Mayurbhanj Mohuldiha Bhumij Orissa Mayurbhanj Podadiha Mundari Bihar Ranchi Chalagi Mundari Orissa Mayurbhanj Dhungarisai Mundari Orissa Sundargarh Jharmunda Mundari Dictionary Bihar Ranchi —— Bhumij Mundari Orissa Mayurbhanj Udala Ho Orissa Mayurbhanj Dillisore Santali Orissa Mayurbhanj Nayarangamot ia Santali Dictionary Bihar Santal Parganas ? —— Oriya Orissa Cuttack ——

2.1.3 Results and analysis

The lexical similarity percentages for the speech varieties investigated are calculated and presented in the following two tables The capital letter at the beginning of the sites refers to the speech variety; i.e., B – Bhumij, M – Mundari, BM – Bhumij Mundari, H – Ho, and S – Santali. The first chart, in table 2, is ordered by percentages, with the highest percentage being placed nearer to the top. Table 2. Lexical similarity organised by percentages within each speech variety B – Podadiha, Mayurbhanj 96 B – Madhupur, Mayurbhanj 94 91 B – Dumadie, Mayurbhanj 92 86 86 B – Champi, Singhbhum 86 83 83 82 B – Mohuldiha, Mayurbhanj Bhumij speech varieties 85 80 82 85 79 B – Munduy, Singhbhum 85 83 81 81 78 75 BM – Udala, Mayurbhanj 84 82 81 81 80 76 89 B – Baigodia, Balasore 84 78 79 79 76 77 82 81 B – Ladhiramsai, Singhbhum 77 73 74 75 77 73 83 81 77 B – Dighinuasahi, Mayurbhanj 83 81 80 79 76 74 94 87 80 79 M – Dhungarisai, Mayurbhanj 78 72 76 74 72 73 78 74 83 73 75 M – Dictionary, Mundari speech variety 78 72 76 73 74 77 69 71 74 68 67 79 M – Jharmunda, Sundargarh 77 70 72 72 71 71 74 72 82 74 72 84 74 M – Chalagi, Ranchi 75 71 71 71 69 72 71 70 80 66 69 75 74 77 H – Dillisore, Mayurbhanj Ho 71 69 72 68 72 66 67 71 65 72 66 65 68 65 61 S – Nayarangamotia, Mayurbhanj 73 69 70 66 66 63 67 70 73 64 67 76 66 68 69 79 S – Dictionary Santali 20 20 17 20 20 19 18 21 13 18 21 13 12 10 15 18 17 Oriya Cuttack The lexical similarity chart in table 3 is organised by geographic location, roughly from north to south, except in the case of Jharmunda since it is further west and the non-Bhumij and Mundari sites. Table 3. Lexical similarity organised by geography within each speech variety B – Ladhiramsai, Singhbhum 79 B – Champi, Singhbhum 76 82 B – Mohuldiha, Mayurbhanj 78 86 83 B – Madhupur, Mayurbhanj 79 86 83 91 B – Dumadie, Mayurbhanj Bhumij speech varieties 84 92 86 96 94 B – Podadiha, Mayurbhanj 77 85 79 80 82 85 B – Munduy, Singhbhum 77 75 77 73 74 77 73 B – Dighinuasahi, Mayurbhanj 82 81 78 83 81 85 75 83 BM – Udala, Mayurbhanj 81 81 80 82 81 84 76 81 89 B – Baigodia, Balasore 82 72 71 70 72 77 71 74 74 72 M – Chalagi, Ranchi 83 74 72 72 76 78 73 73 78 74 84 M – Dictionary, Mundari speech varieties 80 79 76 81 80 83 74 79 94 87 72 75 M – Dhungarisai, Mayurbhanj 74 73 74 72 76 78 77 68 69 71 74 79 67 M – Jharmunda, Sundargarh 80 71 69 71 71 75 72 66 71 70 77 75 69 74 H – Dillisore, Mayurbhanj Ho 73 66 66 69 70 73 63 64 67 70 68 76 67 66 69 S – Dictionary Santali 65 68 72 69 72 71 66 72 67 71 65 65 66 68 61 79 S – Nayarangamotia, Mayurbhanj 13 20 20 20 17 20 19 18 18 21 10 13 21 12 15 17 18 Oriya Cuttack Bhumij and Mundari vocabulary Typically for two speech varieties that have less than 60 lexical similarity, it can be concluded that the speech varieties are quite distinct and, with other supporting evidence, be classed as separate languages Blair 1990:24. The Bhumij and Mundari wordlist sites, however, show higher percentages than this 60 threshold. Two-language comparisons from among Bhumij wordlist sites, including Bhumij Mundari from Udala, yield an average of 82 similarity 45 comparisons. Comparisons of Bhumij sites with Mundari sites including the dictionary list which scored approximately equal to the speech sites in most cases, and excluding the Udala wordlist yield an average of 76 36 comparisons. This indicates that some difference exists, but not sufficient enough to warrant a separate language classification, especially since Mundari wordlist sites in comparison with one other only average 77 similar 10 comparisons, including the Bhumij Mundari wordlist. Because of this, it is not possible to say from the data that Bhumij and Mundari are separate languages. One issue that should be investigated is the relative homogeneity of Mundari, especially in comparison with Bhumij. One might expect to find the Mundari sites to show higher scores when making Mundari-internal comparisons than the Bhumij sites in Bhumij-internal comparisons. This would be based on various descriptions of Bhumij, particularly of Risley 1891, reprinted in 1981, who says the Bhumij tend to adopt the speech forms of whatever place they are living. Many Bhumij communities in West Bengal have cut ties with the tribal groups and have assimilated to Bengali language and culture. If this view of the Bhumij as a non-conservative group is accurate, much greater variation would be expected between their wordlist sites than found in the more conservative Mundari locations. This question cannot be adequately addressed due to the lack of sufficient Mundari data one of the wordlists is from a dictionary, and is therefore not strictly comparable. Nonetheless, it is interesting to note that the Bhumij wordlist points appear to show more homogeneity than the Mundari wordlist points. This might be evidence for a set of distinctly “Bhumij” vocabulary within Mundari not separate since the Bhumij-Mundari comparisons are high. As it stands, however, it cannot be concluded whether the lower in-group scores for Mundari are a reflection of the wider geographic spread of the sites bringing down the in-group average, or if indeed Mundari is a macro language subsuming Bhumij as early research suggests. Additional Mundari wordlists could help clarify this issue. There is a large degree of variation 73–96 when comparing Bhumij wordlists with each other, which does not seem explainable by geographical location. Both the southern Mayurbhanj district sites Udala Baigodia have scores in the mid-70s to mid-80s when comparing with the sites clustered around the Bihar-Orissa border. The only exception to this may be Dighinuasahi which shows less similarity mid-70s with the border sites to the north. The Bhumij wordlist percentages appear to indicate no clear dialect groupings. One question raised in the Varenkamp report 1989 was whether the people of Udala considered their speech Bhumij or Mundari. As it turns out, this issue of ambiguous language identity became one of the motivating questions for this project. The lexical similarity percentages in the previous charts seem to indicate that the Udala wordlist is slightly closer to the vocabulary of Bhumij sites than to that of Mundari sites—if such a distinction can be made. Comparison with neighbouring languages While not central to the team’s research goals, two Santali wordlists and a Ho wordlist from Varenkamp’s reports were used in the lexical similarity comparison. If these lists can be taken as representing their respective languages, the lexical similarity percentages confirm the belief that Santali is more distinct 68 similar on average from Bhumij and Mundari. Ho seems to share slightly more resemblance 72 average with the Bhumij and Mundari wordlists. Oriya, the Indo-Aryan state language of Orissa, not surprisingly shares very little in common 10–21 with the Munda family wordlists.

2.2 Intelligibility testing