IALP2010 Gunawan dan Erick

2010 International Conference
on Asian Language
Processing

IALP 2010

Table of Contents
Message from General Chairs..............................................................................................................xi
Message from Program Chairs..........................................................................................................xiii
Conference Committees........................................................................................................................xiv
Program Committee...................................................................................................................................xv
Organizers and Sponsors....................................................................................................................xvii
Invited Talks ..................................................................................................................................................xix

Lexicon and Morphology
A Survey on Rendering Traditional Mongolian Script ...................................................................................3
Biligsaikhan Batjargal, Fuminori Kimura, and Akira Maeda

A Combination of Statistical and Rule-Based Approach for Mongolian
Lexical Analysis ....................................................................................................................................................7
Lili Zhao, Jia Men, Congpin Zhang, Qun Liu, Wenbin Jiang, Jinxing Wu,

and Qing Chang

A Letter Tagging Approach to Uyghur Tokenization ...................................................................................11
Batuer Aisha

Development of Analysis Rules for Bangla Root and Primary Suffix
for Universal Networking Language ................................................................................................................15
Md. Nawab Yousuf Ali, Shahid Al Noor, Md. Zakir Hossain, and Jugal Krishna Das

A Suffix-Based Noun and Verb Classifier for an Inflectional Language ...................................................19
Navanath Saharia, Utpal Sharma, and Jugal Kalita

Behavior of Word ‘kaa’ in Urdu Language ....................................................................................................23
Muhammad Kamran Malik, Aasim Ali, and Shahid Siddiq

Methods to Divide Uygur Morphemes and Treatments for Exceptions .....................................................27
Pu Li and Hao Zhao

Rules for Morphological Analysis of Bangla Verbs for Universal
Networking Language ........................................................................................................................................31

Md. Nawab Yousuf Ali, Mohammad Zakir Hossain Sarker,
Ghulam Farooque Ahmed, and Jugal Krishna Das

v

Discussion on Collation of Tibetan Syllable ..................................................................................................35
Heming Huang and Feipeng Da

A Dictionary Mechanism for Chinese Word Segmentation Based on
the Finite Automata ............................................................................................................................................39
Wu Yang, Liyun Ren, and Rong Tang

Development of Templates for Dictionary Entries of Bangla Roots
and Primary Suffixes for Universal Networking Language .........................................................................43
Md. Zakir Hossain, Shaikh Muhammad Allayear, Md. Nawab Yousuf Ali,
and Jugal Krishna Das

A Study on "Worry" Separable Words & Its Separable Slots ......................................................................47
Chunling Li and Xiaoxiao Wang

Syntax and Parsing
Improving Dependency Parsing Using Punctuation ......................................................................................53
Zhenghua Li, Wanxiang Che, and Ting Liu

A Tree Probability Generation Using VB-EM for Thai PGLR Parser .......................................................57
Kanokorn Trakultaweekoon, Taneth Ruangrajitpakorn, Prachya Boonkwan,
and Thepchai Supnithi

Research on Verb Subcategorization-Based Syntactic Parsing Postprocess
for Chinese Language .........................................................................................................................................61
Jinyong Wang and Xiwu Han

Identification of Maximal-Length Noun Phrases Based on Maximal-Length
Preposition Phrases in Chinese .........................................................................................................................65
Guiping Zhang, Wenjing Lang, Qiaoli Zhou, and Dongfeng Cai

Urdu Noun Phrase Chunking - Hybrid Approach ..........................................................................................69
Shahid Siddiq, Sarmad Hussain, Aasim Ali, Kamran Malik, and Wajid Ali

The Function of Fixed Word Combination in Chinese Chunk Parsing ......................................................73

Liqun Wang and Shoichi Yokoyama

Problems and Review of Statistical Parsing Language Model ....................................................................77
Faguo Zhou, Fan Zhang, and Bingru Yang

A General Comparison on Sentences Analysis and Its Teaching
Significance between Traditional and Structural Grammars ........................................................................81
Jiaying Yu

Semantics
Two Cores in Chinese Negation System: A Corpus-Based View ...............................................................87
Hio Tong Chan and Chunyu Kit

Finding Semantic Similarity in Vietnamese ...................................................................................................91
Dat Tien Nguyen and Son Bao Pham

Automatic Metaphor Recognition Based on Semantic Relation Patterns ..................................................95
Xuri Tang, Weiguang Qu, Xiaohe Chen, and Shiwen Yu
vi

Event Entailment Extraction Based on EM Iteration ..................................................................................101
Zhen Li, Hanjing Li, Mo Yu, Tiejun Zhao, and Sheng Li

On the Semantic Orientation and Computer Identification of the Adverb
“Jiù” ....................................................................................................................................................................105
Lin He and Jiaqin Wu

Semantic Genes and the Formalized Representation of Lexical Meaning ...............................................110
Dan Hu

Acquisition of Hypernymy-Hyponymy Relation between Nouns
for WordNet Building ......................................................................................................................................114
Gunawan and Erick Pranata

Algorithm for Conversion of Bangla Sentence to Universal Networking
Language ............................................................................................................................................................118
Md. Nawab Yousuf Ali, M. Ameer Ali, Abu Mohammad Nurannabi,
and Jugal Krishna Das

Construction of the Paradigmatic Semantic Network Based on Cognition .............................................122

Xiaofang Ouyang

The Research of Sentence Testing Based on HNC Analysis System
of Sentence Category ........................................................................................................................................126
Zhiying Liu

Semantic Patterns of Chinese Post-Modified V+N Phrases .......................................................................130
Likun Qiu and Wenxian Zhang

Information Extraction
A Grammar-Based Unsupervised Method of Mining Volitive Words .....................................................137
Jianfeng Zhang, Yu Hong, Yuehui Yang, Jianmin Yao, and Qiaoming Zhu

Using Feature Selection to Speed Up Online SVM Based Spam Filtering .............................................142
Yuewu Shen, Guanglu Sun, Haoliang Qi, and Xiaoning He

A Semi-supervised Method for Classification of Semantic Relation
between Nominals .............................................................................................................................................146
Yuan Chen, Yue Lu, Man Lan, Jian Su, and Zhengyu Niu

XPath-Wrapper Induction for Data Extraction .............................................................................................150
Nam-Khanh Tran, Kim-Cuong Pham, and Quang-Thuy Ha

A Block Segmentation Based Approach for Web Information Extraction ..............................................154
Chanwei Wang, Chengjie Sun, Lei Lin, and Xiaolong Wang

Linguistic Features for Named Entity Recognition Using CRFs ..............................................................158
R. Vijay Sundar Ram, A. Akilandeswari, and Sobha Lalitha Devi

Research on Domain-Adaptive Transfer Learning Method and Its
Applications .......................................................................................................................................................162
Geli Fei and Dequan Zheng

vii

Information Theory Based Feature Valuing for Logistic Regression
for Spam Filtering .............................................................................................................................................166
Haoliang Qi, Xiaoning He, Yong Han, Muyun Yang, and Sheng Li

Automatic Named Entity Set Expansion Using Semantic Rules

and Wrappers for Unary Relations .................................................................................................................170
Mai-Vu Tran, Tien-Tung Nguyen, Thanh-Son Nguyen, and Hoang-Quynh Le

Anaphora Resolution of Malay Text: Issues and Proposed Solution Model ...........................................174
Noorhuzaimi Karimah Mohd Noor, Shahrul Azman Noah, Mohd Juzaidin Ab. Aziz,
and Mohd Pouzi Hamzah

Combining Multi-features with Conditional Random Fields for Person
Recognition ........................................................................................................................................................178
Suxiang Zhang

Comparison between Typical Discriminative Learning Model
and Generative Model in Chinese Short Messages Service Spam Filtering ............................................182
Xiaoxia Zheng, Chao Liu, Chengzhe Huang, Yu Zou, and Hongwei Yu

Chinese Spam Filter Based on Relaxed Online Support Vector Machine ...............................................185
Yong Han, Xiaoning He, Muyun Yang, Haoliang Qi, and Chao Song

Comment Target Extraction Based on Conditional Random Field &
Domain Ontology ..............................................................................................................................................189

Shengchun Ding and Ting Jiang

Text Understanding and Summarization
Topic-Driven Multi-document Summarization ............................................................................................195
Hongling Wang and Guodong Zhou

Multiple Factors-Based Opinion Retrieval and Coarse-to-Fine Sentiment
Classification .....................................................................................................................................................199
Shu Zhang, Wenjie Jia, Yingju Xia, Yao Meng, and Hao Yu

Chinese Sentence-Level Sentiment Classification Based on Sentiment
Morphemes ........................................................................................................................................................203
Xin Wang and Guohong Fu

Extracting Phrases in Vietnamese Document for Summary Generation ..................................................207
Huong Thanh Le, Rathany Chan Sam, and Phuc Trong Nguyen

User Interest Analysis with Hidden Topic in News Recommendation
System ................................................................................................................................................................211
Mai-Vu Tran, Xuan-Tu Tran, and Huy-Long Uong

Dependency Tree-Based Anaphoricity Determination for Coreference
Resolution ..........................................................................................................................................................215
Fang Kong, Jianmei Zhou, Guodong Zhou, and Qiaoming Zhu

Text Clustering Based on Domain Ontology and Latent Semantic Analysis ..........................................219
Yaxiong Li, Jianqiang Zhang, and Dan Hu

viii

Social Network Mining Based on Wikipedia ...............................................................................................223
Fangfang Yang, Zhiming Xu, Sheng Li, and Zhikai Xu

Retrospective Labels in Chinese Argumentative Discourses .....................................................................227
Donghong Liu

Improve Search by Optimization or Personalization, A Case Study in Sogou
Log ......................................................................................................................................................................231
Jingbin Gao, Muyun Yang, Sheng Li, Tiejun Zhao, and Haoliang Qi

Machine Translation
Conditional Random Fields for Machine Translation System Combination ...........................................237
Tian Xia, Shandian Zhe, and Qun Liu

A Method of Automatic Translation of Words of Multiple Affixes
in Scientific Literature ......................................................................................................................................241
Lei Wang, Baobao Chang, and Janet Harkness

Hierarchical Pitman-Yor Language Model for Machine Translation .......................................................245
Tsuyoshi Okita and Andy Way

Training MT Model Using Structural SVM .................................................................................................249
Tiansang Du and Baobao Chang

English-Hindi Automatic Word Alignment with Scarce Resources .........................................................253
Eknath Venkataramani and Deepa Gupta

Sentence Similarity-Based Source Context Modelling in PBSMT ...........................................................257
Rejwanul Haque, Sudip Kumar Naskar, Andy Way, Marta R. Costa-jussà,
and Rafael E. Banchs

Verb Transfer in a Tamil to Hindi Machine Translation System ..............................................................261
Sobha Lalitha Devi, Pravin Pralayankar, S. Menaka, T. Bakiyavathi,
R. Vijay Sundar Ram, and V. Kavitha

Lexical Gap in English - Vietnamese Machine Translation: What to Do? ..............................................265
Le Manh Hai and Phan Thi Tuoi

Nominal Transfer from Tamil to Hindi .........................................................................................................270
Sobha Lalitha Devi, V. Kavitha, Pravin Pralayankar, S. Menaka, T. Bakiyavathi,
and R. Vijay Sundar Ram

Language Resources
Building Thai FrameNet through a Combination Approach ......................................................................277
Dhanon Leenoi, Sawittree Jumpathong, and Thepchai Supnithi

Evaluating the Quality of Web-Mined Bilingual Sentences Using Multiple
Linguistic Features ............................................................................................................................................281
Xiaohua Liu and Ming Zhou

ix

A Semi-Supervised Approach on Using Syntactic Prior Knowledge
for Construction Thai Treebank ......................................................................................................................285
Taneth Ruangrajitpakorn, Prachya Boonkwan, Thepchai Supnithi,
and Phiradet Bangcharoensap

A Proposed Model for Constructing a Yami WordNet ...............................................................................289
Meng-Chien Yang, D. Victoria Rau, and Ann Hui-Huan Chang

Annotation Guidelines for Hindi-English Word Alignment ......................................................................293
Rahul Kumar Yadav and Deepa Gupta

Building Synsets for Indonesian WordNet with Monolingual Lexical
Resources ...........................................................................................................................................................297
Gunawan and Andy Saputra

A Study of Unique Words in Hawks’ Translation of Hong Lou Meng
in Comparison with Yang’s Translation ........................................................................................................301
Yunfang Liang, Lixin Wang, and Dan Yang

Kazakh Noun Phrase Extraction Based on N-gram and Rules ..................................................................305
Gulila Altenbek and Ruina Sun

Spoken Language Processing
Feature Smoothing and Frame Reduction for Speaker Recognition .........................................................311
Santi Nuratch, Panuthat Boonpramuk, and Chai Wutiwiwatchai

Improved Cantonese Tone Recognition with Approximated F0 Contour:
Implications for Cochlear Implants ................................................................................................................315
Meng Yuan, Haihong Feng, and Tan Lee

Combining Sub-bands SNR on Cochlear Model for Voice Activity
Detection ............................................................................................................................................................319
Qibo Liu, Yi Liu, and Yanjie Li

Precedence of Emotional Features in Emotional Prosody Processing:
Behavioral and ERP Evidence ........................................................................................................................323
Xuhai Chen and Yufang Yang

Acoustic Space of Vowels with Different Tones: Case of Thai Language ..............................................327
Vaishna Narang, Deepshikha Misra, Ritu Yadav, and Sulaganya Punyayodhin

A Study of F1 Correlation with F0 in a Tone Language: Case of Thai ....................................................330
Sulaganya Punyayodhin, Deepshikha Misra, Ritu Yadav, and Vaishna Narang

A Contrastive Study of F3 Correlation with F2 and F1 in Thai and Hindi ..............................................334
Ritu Yadav, Deepshikha Misra, Sulaganya Punyayodhin, and Vaishna Narang

Durational Contrast and Centralization of Vowels in Hindi and Thai ......................................................339
Deepshikha Misra, Ritu Yadav, Sulaganya Punyayodhin, and Vaishna Narang

A Study in Comparing Acoustic Space: Korean and Hindi Vowels .........................................................343
Hyunkyung Lee and Vaishna Narang

Author Index ................................................................................................................................................347
x

Acquisition of Hypernymy-Hyponymy Relation
between Nouns for WordNet Building
Gunawan*,**) and Erick Pranata**)
*) Dept. of Electrical Engineering
Faculty of Industrial Technology
Institut Teknologi Sepuluh Nopember
Surabaya, East Java, Indonesia
**) Dept. of Computer Science
Sekolah Tinggi Teknik Surabaya
Surabaya, East Java, Indonesia
[email protected], [email protected]

Abstract—Automatic extraction of hypernym-hyponym pairs has
been done in many researches. But none is described as an
automatic method to incorporate the result to WordNet or on
WordNet building. This paper proposes a method to
automatically acquire hypernym-hyponym pairs for WordNet
building by utilizing a monolingual dictionary and Lesk Word
Sense Disambiguation or Lesk WSD to deliver tagged pairs. This
method is implemented on an Indonesian monolingual dictionary
and produces 70% accuracy.
Keywords-WordNet; Hypernymy; Hyponymy; WSD; Dictionary

I.

INTRODUCTION

WordNet [1] is a lexical reference system which was first
built in 1985 by Princeton University, known as Princeton
WordNet, and until now it’s already in its 3.0 version.
WordNet building is a resource and time consuming process in
which this process was done seriously by Princeton University
and produced a reliable lexical reference system. Many
attempts were done to automatically or semi-automatically
build WordNet for other languages than English such as
researches done by Barbu et al. [2], Lee et al. [3], Elkateb et al.
[4], Putra et al. [5], and many more.
WordNet can be built in various methods, whether by
expanding existing initial WordNet data, or merging existing
data. Acquisition of hypernymy-hyponymy relation is an
instance of the latter method. Many attempts were done to
acquire hypernymy-hyponymy relation, e.g. Hearst [6],
Sombatsrisomboon et al. [7], and Costa et al. [8], but none of
the research exposes a method to automatically incorporate the
result acquired to an already sense-distinguished data or
WordNet. This paper proposes a method to automatically
acquire hypernymy-hyponymy relation from Kamus Besar
Bahasa Indonesia or KBBI (Indonesian monolingual
dictionary) along with the definition for each lemma
constructing the relation to distinguish the sense.
The input which is used as the acquisition source is
described in section 2. The strategy used to utilize input data to
acquire hypernymy-hyponymy relation along with the

definitions is described in section 3. The result of the approach
is then described in section 4, followed by the incorporation in
section 5 and further researches in section 6.
II.

KAMUS BESAR BAHASA INDONESIA

The input data used in this paper is KBBI [9], the most
standard dictionary for Bahasa Indonesia. KBBI was first
published in 1988 and kept being improved until the 4th edition
in 2008. This dictionary becomes the chosen input data because
of the credibility of the producer, which is the Language Center
of Indonesia. The editions were improving for every edition,
whether in the amount of lemmas, or in the structure.
KBBI contains records of lemma, part-of-speech,
definition, examples in sentences, and proverbs. But in the
acquisition process, only lemma, part-of-speech, and definition
which are taken, as shown in Fig. 1. This decision is done,
considering that only those three elements are related to
WordNet structure and WSD algorithm. These elements then
can be used whether for the acquisition source or the Lesk
WSD source where the latter topic is described in detail in
section 3.
Papan n
1

kayu (besi, batu, dsb) yg lebar dan tipis (broad and thin wood
(iron, rock, etc))
2
tempat tinggal; rumah (place to stay; house)
Papan atas n
kelas utama; kelas tinggi; (important class; high class)
Figure 1. Example of Different Sense in KBBI

Lemma papan (board) and lemma papan atas (high class)
are lemmas which are given in the Fig. 1 and have their own
definitions, where the first lemma contains two definitions
which means lemma papan has two senses. The KBBI structure
where every record contains lemmas, and every lemma
contains definitions, implies that the iteration on the acquisition
will be done for every lemma and every definition. Therefore,
the objective of the acquisition is to find hypernym, which
consists of lemma, part-of-speech, and definition, for every

lemma in KBBI which is delivered in records, where the
records are denoted as hyponyms which also consists of
lemma, part-of-speech, and definition.
KBBI which is used in the hypernymy-hyponymy
acquisition on Bahasa Indonesia is KBBI in its 3rd edition,
where this version is available in HTML-like format, where
lemmas, and sense numbers are indicated in bold, and part-ofspeech is indicated in italic. These tags can be used to extract
KBBI into records of fields consisting of lemma, part-ofspeech, and definition, where the example of KBBI raw data is
shown in Fig. 2. The 4th edition of KBBI cannot be used
because it is still not available in machine readable format.
1ceng·kung a cekung (tt mata, pipi,
dsb);-- mengkung sangat cekung
2ceng·kung n bunyi keras besar (spt bunyi
anjing menyalak);-- cengking berbagai bunyi (spt bunyi
anjing)
3ceng·kung, ber·ceng·kung v
duduk dng lutut terangkat ke atas (spt berjongkok); bertinggung;
mencangkung
Figure 2. Tagged KBBI Raw Data

Tags in KBBI raw data holds the biggest part in the
extraction process. Look at the tag. This tag separates
lemmas in canonical (e.g. sakit (sick)), derived (e.g. penyakit
(illness)), or compound form (e.g. rumah sakit (hospital)).
Inconsistency that appears in this format, where there are
several records without this tag, would decrease the
performance of the extraction, which is an error in the output.
The percentage of the error cannot be calculated precisely for it
needs thorough analysis for every record extracted. The result
of the extraction is structured KBBI records.
III.

ACQUISITION STRATEGY

Acquiring hypernym-hyponym pairs from KBBI is done by
several processes. The first process is to disambiguate every
word in the definition. The process is intended to find the
appropriate part-of-speech and definition for every word in
definition. Thus, the result from the first process is used in the
second process where the definition is refined so that only the
part of the definition which contains the information of
hypernym of a lemma is taken. The last process is to acquire
hypernym from the simplified definition. The overall process
of these processes is described in Fig. 3.

Record
of KBBI

Lesk WSD

Definition
Simplification

Tagged
Definition

Simplified
Tagged
Definition

Hypernym
Acquisition

Hypernym

Figure 3. Overall Hypernym-Hyponym pairs Acquisition Process

A. Lesk Algorithm
A simple WSD algorithm used in this paper is Lesk
algorithm [10]. This algorithm is applied considering the
availability of lexical resources in Bahasa Indonesia, i.e. KBBI.
KBBI is the appropriate lexical resource for this WSD, as the
source itself contains lemma, part-of-speech, and definition
which match the requirements of the algorithm.
The input for Lesk WSD is a sentence, and the output is a
tagged definition, where the tags are part-of-speech and
definition which is attached to every word in input definition.
Thus, every word in the definition will have the proper part-ofspeech and definition for the given definition. These elements
can be illustrated in Fig. 4, where the sentence is taken from
the first definition of papan (board).
kayu (besi, batu, dsb) yg lebar dan tipis

Lesk WSD

kayu - n -

bagian batang cabang dahan pokok keras biasa dipakai
untuk bahan bangunan dsb (part of branch; hard material
usually used as building material, etc.)
besi - n logam keras kuat serta banyak sekali gunanya bahan
pembuat senjata mesin ferum (strong and hard metal with
many functions; material used to make weapon; ferum)
batu - n akik untuk mata cincin dsb (gemstone for rings)
dsb - null - null
yg
- null - null
lebar - a lapang tidak sempit (spacious; not narrow)
dan - p penghubung satuan kata frasa klausa kalimat setara
termasuk tipe sama serta memiliki fungsi tidak berbeda
(conjuction of phrase, clause, sentence with same type
and function)
tipis - a kurang padat awan udara dsb (less dense)
Figure 4. Example of WSD’s input and output

Lemma bisa in Bahasa Indonesia generally has two
meanings or senses. The first sense is be able to, and the
second sense is poison. From Fig. 4, one can conclude that bisa

in the phrase bisa ular (snake poison) takes the latter sense of
bisa as the proper sense. The senses are provided in part-ofspeech and definition which is more than enough to express
sense of a lemma.
B. Definition Simplification
The tagged definition is then processed further. The process
is done by analyzing the definition format delivered by KBBI.
KBBI delivers every concept in the definition by dividing them
by semicolon (;). This definition should be simplified by taking
the first concept which doesn’t express synonymy, considering
that KBBI may contain synonyms in its definitions.
Synonymy in definition can be identified by counting the
word count in the concept taken, where a concept with less
than three words will be identified as a synonym for the
lemma. This decision is based on the analysis on every record
in KBBI where concepts which contain one or two words
always resemble synonym. Fig. 5 shows the example of
concepts in definitions where blue-marked concepts express
synonym.
anjing - n - binatang menyusui yg biasa dipelihara untuk menjaga rumah,
berburu, dsb; Canis familiaris; (mammal which is cared to guard houses
or hunt, etc.)
udang - n - binatang tidak bertulang, hidup dl air, berkulit keras, berkaki
sepuluh, berekor pendek, dan bersepit dua, pd kaki depannya; Crustacea;
(invertebrate that live in the water, hard-skinned, has 10 legs, short tail)
kamerad - n - saudara separtai; teman seperjuangan; (party colleague;
companion in arms)
Figure 5. Example of Synonym Concepts

Lemmas anjing (dog), udang (shrimp), and kamerad
(comrade) have implicit synonym on the definitions. But only
anjing and udang which are taken into consideration because
the first concept of the lemmas doesn’t express synonymy.
Therefore, lemma kamerad is disposed of from the acquisition
process, while the first concept from the chosen lemmas is
taken by splitting the definition on semicolon.
The results of this process are lemmas, with simplified
definition which doesn’t express synonymy. Through this
process, the next process will be easier, because the data
processed are a lot smaller.

conclude that the hypernym for abu is sisa (remains). This
process is then done through every KBBI record to acquire
every hypernym existing in KBBI definitions for the given
lemma.
abu n sisa yg tinggal setelah suatu barang mengalami pembakaran
lengkap (remain that is left after a thing is fully burned)
apa tertinggal dimakan diambil lebihan saldo (things
which are left, eaten, taken; excessed balance)
yg
null null
tinggal
v
sbg keterangan kata majemuk berarti didiami (as
remark of compund which means inhabited)
setelah
adv sesudah (after)
suatu
num satu hanya satu untuk menyatakan benda kurang
tentu (one; only one; to express uncertain object)
barang
n
semua perkakas rumah perhiasan dsb (every housing
tool; jewelry)
mengalami v
merasai menjalani menanggung suatu peristiwa dsb
(feeling such as bearing an event)
pembakaran n
tempat membakar bata genting kapur dsb (place to
burn brick; roof tile; chalk)
lengkap
a
tidak ada kurangnya genap (not less; complete)
sisa

n

Figure 6. Example of Acquisition Input

The results of this process are hypernym-hyponym pairs
where each lemma, whether it is hypernym or hyponym, is
attached with proper part-of-speech and definition. This
information then can be used in the incorporation process with
WordNet or glossed synset, where in this process part-ofspeech and definition hold the biggest part on determining the
proper sense for hypernym and hyponym.
IV.

RESULT

The method proposed in this paper successfully acquired
24,256 pairs from 54,395 possible pairs in 91029 records in
KBBI, where the other possible pairs consist of compound
forms, synonyms, and invalid pairs. Hypernym-hyponym pairs
in compound forms can be acquired through a further process
in defining the sense using WSD, while the other pairs cannot
be considered as results, as they don’t express hypernymy.

C. The Acquisition
Acquiring hypernym from the result of the previous process
can be done by extracting the first noun phrase. The extraction
utilizes the part-of-speech information from the tagged lemma
formed from the Lesk WSD. The result from this extraction
can be a canonical or derived word, or a compound form of
word. But in this paper, only the first output used as the final
result, as the latter result needs refinement on the definition.

Counting the accuracy is done by an Indonesian native
speaker in two steps. The first step didn’t take the sense of
hypernym and hyponym into account. Therefore, a pair scores
if native speaker of Bahasa Indonesia accepts the pair as a valid
hypernym-hyponym pairs without regard to each sense. This
step produces 92% accuracy. The second step is done by taking
account the sense of hypernym or hyponym. Therefore, a pair
scores if native speaker of Bahasa Indonesia accepts the pair as
a valid hypernym-hyponym pairs if it is a valid pairs with valid
sense for each lemma constructing the pairs. From this step,
70% accuracy is produced. The result is satisfying, but it fully
depends on the input data and the WSD algorithm used.

Lemma abu (dust), as an example which is shown in Fig. 6,
after being processed using Lesk WSD will produce tagged
definition which can be used in acquisition process whereas
from the example in Fig. 6, the first noun phrase is taken. Noun
phrase is identified by taking sequenced nouns before lemma
with part-of-speech p or null, where it will be then identified as
hypernym for lemma abu. From the example in Fig. 6, one can

Several limitations that affect the implementation of the
method proposed are the availability of lexical resource in fully
machine readable format. KBBI is not delivered in a consistent
format, which results in some errors on the KBBI extraction.
The WSD algorithm used is also the simplest one, while the
effort to implement more powerful algorithm is restricted by
the limitation of the lexical resources existed.

V.

INCORPORATION

The results of the acquisition are tagged hypernymhyponym pairs. The information provided in the pairs can be
incorporated to WordNet which has already existed or a
glossed synset. The technique used is word match similarity,
where the main concept of this incorporation is finding the best
synset for every hypernym and hyponym, so that the
hypernymy-hyponymy relation can be incorporated to the
synsets.
The incorporation, which is shown in Fig. 7, is done for
every hypernym-hyponym pairs where the process is done first
on hypernym and then hyponym for next process. The word
match technique is done for every synset until the synset which
matches with the hypernym or hyponym is found. The results
from this process are pairs of synset which bear the
hypernymy-hyponymy relation. Thus, these results can be
denoted as the prototype of a simple WordNet.

Figure 7. Incorporation Process

In Bahasa Indonesia itself, the incorporation is done by
implementing the results acquired, to collection of Bahasa
Indonesia synsets, which is gloss-less, from the current
research [11] done on Bahasa Indonesia. The gloss-less synset
should be processed further so that every lemma that constructs
the synset is glossed. Gaining the gloss can be done by using
WSD, where the result of this process would be tagged synset,
where every lemma which constructs the synset would have
definition with only noun part-of-speech. The glossed Bahasa
Indonesia synset then would be ready for incorporation.
VI.

FURTHER RESEARCH

Hypernym-hyponym pairs acquired fully depend on KBBI,
while in WordNet, there are some categories in the root area
that need special attention. Since the method proposed is still
not able to determine a proper synset to complete the upper
categories in the root area, there should be research on
automatically or semi-automatically construct the upper
category synsets.
Being limited in lexical resources, there is also a challenge
in implementation of a better WSD. The WSD method being
used in this paper is Lesk WSD which only needs a dictionary
for the implementation, which matches the availability of
Bahasa Indonesia lexical resources.

Considering the availability of lexical resources, researches
on building Bahasa Indonesia lexical resources are encouraged.
The result can be used to improve the acquisition of hypernymhyponym pairs, Word Sense Disambiguation, Language
Translation, or other Natural Language Processing Tasks. This
research should be also implemented on other Asian languages
which still lack lexical resources, so that they can be used for
tasks like this.
REFERENCES
[1]

Christiane Fellbaum, “WordNet: An Electronic Lexical Database,”
Cambridge, MA: MIT Press, 1998.
[2] Eduard Barbu, Verginica Barbu Mititelu, “Automatic building of
Wordnets,” Proc. RANLP conference Borovets, Bulgaria, 2005.
[3] Changki Lee, Gunbae Lee, Seo Jung Yun, “Automatic WordNet
mapping using word sense disambiguation,” Proc. of the 2000 Joint
SIGDAT conference on empirical methods in natural language
processing and very large corpora: held in conjunction with the 38th
Annual Meeting of the Association for Computational Linguistics, vol.
13, pp. 142-147, Hongkong, 2000.
[4] Sabri Elkateb, William Black, Horacio Rodríguez, Musa Alkhalifa, Piek
Vossen, Adam Pease, Christiane Fellbaum, “Building a WordNet for
Arabic,” Proc. of the Fifth International conference on Language
Resources and Evaluation, 2005.
[5] Desmond Darma Putra, Abdul Arfan, Ruli Manurung, “Building an
Indonesian WordNet,” 2007.
[6] Marti A. Hearst, “Automatic acquisition of hyponyms from large text
corpora,” Proc. of the 14th conference on Computational Linguistics, vol.
2, pp. 539-545, Nantes, France, 1992.
[7] Ratanachai Sombatsrisomboon, Yutaka Matsuo, Mitsuru Ishizuka,
“Acquisition of hypernyms and hyponyms from the WWW,” 2003.
[8] Rui P. Costa, Nuno Seco, “Hyponymy extraction and web search
behavior analysis based on query reformulation,” Proc. of the 11th IberoAmerican conference on AI: Advances in Artificial Intelligence, pp.
332-341, Lisbon, Portugal, 2008.
[9] Tim Penyusun Kamus Pusat Bahasa Departemen Pendidikan Nasional,
Kamus Bahasa Indonesia, 2008.
[10] Michael Lesk, “Automatic sense disambiguation using machine readable
dictionaries: how to tell a pine cone from an ice cream cone,” Proc. of
the 5th annual international conference on Systems documentation, pp.
24-26, Toronto, Ontario, Canada, 1986.
[11] Andy Saputra, “Building synsets for Indonesian WordNet with
monolingual lexical resources,” Proc. IALP conference, Harbin, China,
2010

IALP2010 Gunawan dan Erick

Dokumen yang terkait

Keanekaragaman Makrofauna Tanah Daerah Pertanian Apel Semi Organik dan Pertanian Apel Non Organik Kecamatan Bumiaji Kota Batu sebagai Bahan Ajar Biologi SMA

FREKUENSI KEMUNCULAN TOKOH KARAKTER ANTAGONIS DAN PROTAGONIS PADA SINETRON (Analisis Isi Pada Sinetron Munajah Cinta di RCTI dan Sinetron Cinta Fitri di SCTV)

Analisis Sistem Pengendalian Mutu dan Perencanaan Penugasan Audit pada Kantor Akuntan Publik. (Suatu Studi Kasus pada Kantor Akuntan Publik Jamaludin, Aria, Sukimto dan Rekan)

DOMESTIFIKASI PEREMPUAN DALAM IKLAN Studi Semiotika pada Iklan "Mama Suka", "Mama Lemon", dan "BuKrim"

KONSTRUKSI MEDIA TENTANG KETERLIBATAN POLITISI PARTAI DEMOKRAT ANAS URBANINGRUM PADA KASUS KORUPSI PROYEK PEMBANGUNAN KOMPLEK OLAHRAGA DI BUKIT HAMBALANG (Analisis Wacana Koran Harian Pagi Surya edisi 9-12, 16, 18 dan 23 Februari 2013 )

PENERAPAN MEDIA LITERASI DI KALANGAN JURNALIS KAMPUS (Studi pada Jurnalis Unit Aktivitas Pers Kampus Mahasiswa (UKPM) Kavling 10, Koran Bestari, dan Unit Kegitan Pers Mahasiswa (UKPM) Civitas)

Pencerahan dan Pemberdayaan (Enlightening & Empowering)

KEABSAHAN STATUS PERNIKAHAN SUAMI ATAU ISTRI YANG MURTAD (Studi Komparatif Ulama Klasik dan Kontemporer)

GANGGUAN PICA(Studi Tentang Etiologi dan Kondisi Psikologis)

Efek Hipokolesterolemik dan Hipoglikemik Patigarut Butirat

Dukungan

Links

IALP2010 Gunawan dan Erick

Dokumen yang terkait

Keanekaragaman Makrofauna Tanah Daerah Pertanian Apel Semi Organik dan Pertanian Apel Non Organik Kecamatan Bumiaji Kota Batu sebagai Bahan Ajar Biologi SMA

FREKUENSI KEMUNCULAN TOKOH KARAKTER ANTAGONIS DAN PROTAGONIS PADA SINETRON (Analisis Isi Pada Sinetron Munajah Cinta di RCTI dan Sinetron Cinta Fitri di SCTV)

Analisis Sistem Pengendalian Mutu dan Perencanaan Penugasan Audit pada Kantor Akuntan Publik. (Suatu Studi Kasus pada Kantor Akuntan Publik Jamaludin, Aria, Sukimto dan Rekan)

DOMESTIFIKASI PEREMPUAN DALAM IKLAN Studi Semiotika pada Iklan "Mama Suka", "Mama Lemon", dan "BuKrim"

KONSTRUKSI MEDIA TENTANG KETERLIBATAN POLITISI PARTAI DEMOKRAT ANAS URBANINGRUM PADA KASUS KORUPSI PROYEK PEMBANGUNAN KOMPLEK OLAHRAGA DI BUKIT HAMBALANG (Analisis Wacana Koran Harian Pagi Surya edisi 9-12, 16, 18 dan 23 Februari 2013 )

PENERAPAN MEDIA LITERASI DI KALANGAN JURNALIS KAMPUS (Studi pada Jurnalis Unit Aktivitas Pers Kampus Mahasiswa (UKPM) Kavling 10, Koran Bestari, dan Unit Kegitan Pers Mahasiswa (UKPM) Civitas)

Pencerahan dan Pemberdayaan (Enlightening & Empowering)

KEABSAHAN STATUS PERNIKAHAN SUAMI ATAU ISTRI YANG MURTAD (Studi Komparatif Ulama Klasik dan Kontemporer)

GANGGUAN PICA(Studi Tentang Etiologi dan Kondisi Psikologis)

Efek Hipokolesterolemik dan Hipoglikemik Patigarut Butirat

Dokumen yang Anda mencari sudah siap untuk unduhkan