various dialects are. In order to preserve a written record of the Semai dialects, copies of the compiled wordlists have been turned over to the Jabatan Hal Ehwal Orang Asli as well as the Economic Planning
Unit in the Prime Minister’s Department. Ultimately, for the usage of Semai to be preserved, some form of standardization will need to take
place so that important decisions, such as orthography, can be effectively made. One of the key questions regards determining the optimal dialect or dialects that allow adequate communication with all speakers
of Semai. Identification, documentation, and systematic comparison of the Semai dialects are critical first steps for standardizing Semai.
1.3 The contribution of Semai to historical linguistics
The Semai language, true to its Mon-Khmer heritage, has a rich set of vowels—nearly thirty, when counting all the nasal and length features. Furthermore, as Diffloth 1976a has noted, Semai has
preserved a number of disyllabic and polysyllabic words, features that have largely been lost in other Mon-Khmer languages in Southeast Asia. Thus the Semai people, as well as other speakers of Aslian
languages in Malaysia, have much to offer humanity as we endeavor to reconstruct the history of the Mon-Khmer languages.
It is hoped that the documentation and the reconstruction of the Semai ancestor language in this report will help to further such efforts.
2 Methodology
2.1 Collection of wordlists
A wordlist of 436 words was constructed, including words from the basic 200 Swadesh wordlist, words that are typical of Southeast Asian languages, and words that are culturally and linguistically specific to
the speakers of Central Aslian languages. The items in the wordlist were arranged by semantic categories and listed in Malay and English.
This wordlist was then used to elicit words from twenty-seven dialects of Semai. Dialects were selected based on a combination of information gleaned from existing literature on Semai and from
asking the Semai themselves which areas spoke dialects different from their own. The following table shows the locations of the dialects selected for this research. A map showing the geographic locations of
these villages is shown in Appendix A.
Table 1. Wordlist locations Kampung
District State
Batu 17 Batang Padang
Perak Bidor
Batang Padang Perak
Chinggung Batang Padang
Perak Cluny
Batang Padang Perak
Rasau Batang Padang
Perak Sungai Bil
Batang Padang Perak
Sungkai Batang Padang
Perak Tapah
Batang Padang Perak
Gopeng Kinta
Perak Kampar
Kinta Perak
Bota Perak Tengah
Perak Tangkai Cermin
Perak Tengah Perak
Cenan Cerah Cameron Highlands
Pahang Relong
Cameron Highlands Pahang
Renglas Cameron Highlands
Pahang Sungai Ruil
Cameron Highlands Pahang
Terisu Cameron Highlands
Pahang Bertang
Lipis Pahang
Betau Lipis
Pahang Cherong
Lipis Pahang
Kuala Kenip Lipis
Pahang Serau
Lipis Pahang
Lanai Lipis
Pahang Pagar
Lipis Pahang
Simoi Lipis
Pahang Pos Buntu
Raub Pahang
The wordlists were generally elicited using direct questioning in Bahasa Malaysia Malay. Once the complete list was elicited, the data were rearranged according to the similar phonetic segments
encountered. The list was then rechecked. By grouping the elicited words together according to similar sounds for instance, all the words containing front vowels were put together, it was easier to hear the
often-subtle differences between similar sounds. In some cases a recording was also made of the same Semai speaker pronouncing the words that had
just been elicited. These recordings were quite helpful in clearing up remaining inconsistencies later discovered in the elicited words, and thus often avoided the need to return to the same village for further
checking. The elicited wordlists were then used to determine the degree of linguistic similarity between
dialects. The comparison of wordlists was used to determine the number of phonetically similar lexical items, to discover word families, to identify phonological changes in order to establish the linguistic
relationship between the speech communities, and finally, to propose a reconstruction of several hundred lexical items for proto-Semai.
2.2 Language assistant questionnaires