Publication Repository

2015

Self-Learning System With
Natural Question-Guided Search
For Narration In Indonesian Language
Edwin Wibowo Sampurna#1, Esther Irawati Setiawan*2
#

Global Business, Chinese Culture University
55, Hwa-Kang Road, Taipei, Taiwan R.O.C 11114
1
edwin.sampurna@gmail.com
*
Sekolah Tinggi Teknik Surabaya
Ngagel Jaya Tengah 73-77, Surabaya, Indonesia
2
esther.irawati@gmail.com
Abstract The need of self-learning education
is quite high recently. Peoples find how to do
self-learning with the easiest way. That is one
reason for us to research more about Natural

Question-Guided Search and try to make a
system that can help us for self-learning
education. This System is made into selflearning education using the Natural
Question-Guided Search algorithm to make a
question that can form a question along with
answers that can be used to improve the selflearning understanding. Natural QuestionGuided Search is an algorithm used to make a
question from a narrative sentence.
This Natural Question-Guided Search utilizes
a Dependency Parser to process a sentence.
Dependency Parser is a syntactical analysis
process of a sentence which assume that a
word is dependent on other words (head). This
relation shows grammatical function between
words which cannot be shown by
constituency-based parser that can only
identify phrase. By knowing grammatical
function between those words, it will ease the
creation of a question.
A test is done to determine the accuracy of the
application of this Natural Question-Guided

Search algorithm with applying it to the school
reading text book to create a question.

improved by doing an exercise question of the
text. It will be able to increase the absorptive
capacity of comprehension about the text. But,
many narrative text that does not include an
exercise questions that are useful to test the
reader's understanding of the text.
With the technology that we know is
increasing everyday, it is needed a system that
can make such a question directly from an
existing narrative texts, so that the reader will get
a question and can answer any questions after
reading the narrative text in order to improve
understanding and comprehension of the content
of the text. With that way, the reader can study or
learn a text easily by themselves.
Natural Question-Guided Search algorithm is
require to generate a question. Natural QuestionGuided Search is an algorithm developed by

Alexander Kotov and ChengXiangZai by using
Dependency Parsing to process a sentence and
generate a tree as the output. This output is
Dependency Tree which further will be processed
by Natural Question-Guided Search algorithm to
generate a question. Those questions can be use
for self-learning education that can help reader to
study or learn a text by themselves.

2. FUNDAMENTAL
2.1. Indonesian Language Pattern

Keywords natural language processing,
natural question-guided search, dependency
parser, constituency parser, bahasa Indonesia

1. INTRODUCTION
Nowadays, the knowledge obtained from the
learning process is really important. In addition
to reading, comprehension of the text can be


The same with English, Indonesian Language
Pattern also consist of phrase, clause and
sentence.
Clause is consist of two or more words that
make up a construction containing elements of
the predicative, and has the potential to be a
sentence.

489

AIT / CEF 2015

Phrase is consist of two words or a more words
that forms a constituent and by which it functions
as a single unit in the syntax of a sentence.
Sentence is a group of words that are put
together to mean something which expresses a
complete thought. It does this by following the
grammatical rules of syntax. There are four type

of Sentence: Simple Sentence, Compound
Sentence, Complex Sentence and ComplexCompound Sentence. But on this research, it is
focus only on Simple Sentence.

2.2. Natural Language Processing
Natural Language Processing (NLP) that is a
part of computer science, Artificial Intelligence
(AI), and Linguistics which deals with the
interaction between human and computer so that
the computer has the ability to be able to
understand natural human language.
There are four NLP that used for this research :
Part-of-Speech Tagging (POS Tagging), Named
Entity Recognition, Constituency Parsing dan
Dependency Parsing.

2.2.1. Part-Of-Speech Tagging
Part-of-Speech Tagging (POS Tagging) is the
process of determining the words according to
the grammar for each of the words in the

sentences of natural language. POS Tagging is
also can provide information of the word from
syntactic or morphology of a sentence.
This research use Indonesian Language POS
Tagging (iPOSTagger) that made by Alfan
Farizki Wicaksono and Ayu Purwarianti.
iPOSTagger is very important in this research,
especially in classified and tagging the words on
a sentence.

2.2.2. Named Entity Recognition
Named Entity Recognition (NER) is one of the
components of information extraction to detect
and classify the named-entity in a text. NER is
generally used to detect people's names, place
names and organization of a document.
This research use LingPipe NER for English.
in this reasearch, so that LingPipe NER can be
use. While for detect the other names will be use
some addition rules for categorizing.


2.2.3. Constituency Parsing
Constituency Parsing or Phrase Structure
Grammar is a parser used in NLP to parse a

sentence. The function of this parser is as
decomposers sentence to make a Constituency
Tree or Phrase Structure Grammar Tree (Tree
Grammar Pattern). Constituency Parser using the
grammar rules to generate Constituency Tree of a
sentence so that it becomes a model of grammar
patterns.
But Constituency Parsing is not developed in
this research. This research use Constituency
Parsing by
Stanford Parser to generate a
Constituency Tree. Stanford Parser is an English
parser. Since Indonesian Language has the
similarities with English in terms of sentence
structure, the Stanford Parser can be used.


2.2.4. Dependency Parsing
Dependency Parsing is a parser which
generates a grammar that describes the
dependence between the components which one
is the head and the other is dependent. Head also
called modifier as a determinant for the partner.
Dependency Parsing will be done using the
method of mapping from Constituent Structure to
Dependencies Structure, because the input to this
parser is a constituent-based sentence that is
output from last process.

2.3. Natural Question-Guided Search
Natural Question-Guided Search is an
algorithm that can transform a narrative text into
a question. The first idea of development this
algorithm is by Alexander Kotov and
Chengxiang Zhai actually is due to the usual
information retrieval due to the need of an

answer for a question. Questions that written
with good grammar patterns will yield a better
and faster result. In general, it is conceivable that
Natural Question-Guided Search algorithm is
very useful to for search an information by
generate a good question.
But in this research, Natural Question-Guided
Search algorithm not use to generate a question
that use to search an information to get a better
and faster result, but this algorithm is only use to
generate a question with the answer, so it can be
use for self-learning study.
There are two early stage should be done
before running this algorithm, Pre-processing
stage, and the second is Dependency Parsing.
Furthermore, Dependency Tree (output from
Dependency Parsing) will be processed by
Question-Guided Natural Search algorithm.
The complete Question Forming Process
Diagram can be seen on Fig. 1.


490

AIT / CEF 2015

2015

Input:
Di
Indonesia
bahasa daerah.

terdapat

sekitar

665

Output:
Di/IN Indonesia/NNP terdapat/VBT

sekitar/CDI 665/CDP bahasa/NN daerah./.

Fig. 3 Example Input and Output POS Tagging
Furthermore, the output of POS Tagging will
be processed using Constituency Parsing to
produce Constituency Tree.
Input:
Di/IN Indonesia/NNP terdapat/VBT
sekitar/CDI 665/CDP bahasa/NN daerah./.
Output:
(ROOT
(S
(NP (NNP Di) (NNP Indonesia))
(VP (VBP terdapat)
(NP
(QP (CD sekitar) (CD 665))
(NN bahasa) (NN daerah)))
(. .)))

Fig. 1 Question Form Phase Processing Diagram

3. QUESTION FORM PROCESSING
The process of Question Form Processing is
processed through two main stages, Analysis
Solution Dependency Parsing and Natural
Question-Guided Search.
The first stage of Question Form Processing is
Analysis Solution Dependency Parsing. On this
stage the system will be mapping a Constituent
Tree Structure into Dependency Tree Structure.
This stage can be described in a process diagram
as illustrated in Fig. 2.

Fig. 4 Example Input and Output Constituency
Parsing
The next process is Dependency Parsing
module that mapping the Constituent Structure.
There are two phase to process and mapping it
into Dependency Tree:
1.

Dependency Extraction

This phase requires a Head Rule for each
phrase structure. This module uses Head Rule of
Collin Head Rule which can be implemented
using the following algorithm:
FOR EACH cfg IN cfg_list
IF NOT pos(cfg.lhs) THEN
head
get_head_pos(cfg.lhs)
FOR EACH node IN cfg.rhs
IF node head THEN
IF NOT pos(node) THEN
dependent
get_head_pos
(node)
ELSE
dependent
node
add_relation(head,dependent)

Fig. 2 Analysis Solution Dependency Parsing
Process Diagram
Can be seen in Fig. 2, the input process is an
Indonesian Language which will be processed by
the Pre-processing Module. The results of that
module is a Constituency Tree which will be
processed by the Dependency Parsing Module.
The result of this system is the Dependency Tree.
The first
process Analysis Solution
Dependency Parsing is Pre-processing that
consisting of POS Tagging and Parsing
Constituency.

2.

Dependency Typing

This phase requires a Dependencies Label.
This module uses the label of Stanford
dependencies that have been adapted to
Indonesian Language.
The result of this process is in the form of a
Dependency tree is Dependency Tree with a label
that shows the relationship between words.

491

AIT / CEF 2015

[Define Question Type]
get_question_mark
question_mark
(answer)
[Question Forming]
FOR EACH word IN sentence
IF word answer THEN
question
question_mark + word +

The first process is the process to finding
answers by tracing each of the left and right child
of the root in the Dependency Tree that obtained
from the previous stage. The process will be done
after the system got the deepest node of both side.
The results of the Answer Searching Process can
be seen in Table 1.

Fig. 5 Dependency Tree
The second stage of Question Form Processing
is Natural Question-Guided Search that will
generate a question and the answer.
On this stage the system will do some process
such as Answer Searching, Define Question Type,
and the last is Question Forming. This stage can
be described in a process diagram as illustrated in
Fig. 6.

TABLE 1
ANSWER SEARCH RESULT
L/R

Deepest Node

Head

Answer

Left

Di

Indonesia

Di
Indonesia

Right

sekitar

665

Sekitar 665

The second process is to define the Question
Type. The system will categorize each answer
that obtained from the first process to get the
question type. This categorization process is done
by using Named Entity Recognition that using
some addition rules to recognize the word
category of each answer and get the question type.
The results of the Define Question Type can be
seen in Table 2.
TABLE 2
DEFINE QUESTION TYPE RESULT
Answer

Category

Question Type

Left

Di

Places

Dimana

Numeric

Berapa

Indonesia

Fig. 6 Natural Question-Guided Search
Process Diagram

Right

Can be seen in Fig. 5, the input process
Question-Guided Natural Search is a Dependency
Tree that is output of the previous stage, Analysis
Solution Dependency Parsing.
Natural Question-Guided Search process can
be implemented using the following algorithm:
[Answer Search]
FOR EACH child IN root_left_child /
root_right_child
answer
get_deepest_child()

L/R

sekitar
665

The next process is the process of sentence
completion questions. This process is done by
forming the sentence asked by comparing all the
words in a sentence that is not the answer
obtained. Results of a sentence completion
question can be seen in Table 3.

492

AIT / CEF 2015

2015

Tree Bank to obtain a good Indonesian
CFG rule.
2. The limited resource and paper about
Natural Question-Guided Search also one
of the obstacle to do this research.
Research for this Natural Question-Guided
Search algorithm still need to develop
more in the future.
3. Head Rule that used for mapping from
Phrase
Structure
into
Dependency
Structure still need more testing and
validation by an Indonesian Language
expert. Because Head Rule that used on
this research is Collin Head Rule for
English.
4. Labeling rules need to be more specific to
avoid ambiguitous, not only for POS
Tagging labelling process, but also need
special handling for passive verbs.

TABLE 3
QUESTION FORMING RESULT
Question

Answer

Dimana terdapat sekitar 665 bahasa

Di Indonesia

daerah?
Berapa di Indonesia terdapat bahasa

sekitar 665

daerah?

From those process will obtained two question
resulting from the left side and the right side of
the answers. Each question always have an
answer to that question.

4. CONCLUSIONS
This research obtained some conclusions that
will be explained as follows:
1. Constituency Tree can be mapped to be
Dependency Tree using Constituency
Parser, Dependency Label and Head Rule.
2. Question Type error due to the lack of
rules at the process of categorizing the
answer that obtained on Define Question
Type process.
3. Dependency Parsing using mapping
method relies heavily on the Constituency
Tree that generated from Constituency
Parser with the input from POS Tagger.
4. Dependency Parser error using mapping
method from phrase structure to
dependency structure occurs in two phase:
Constituency Parser and CFG. Error on the
Constituency Parser phase is because the
parser using a parser for English by giving
a sentence that have been tagged with
Indonesian Language structure as an input.
Error on the CFG phase is because the
CFG is a Constituency Tree generated
from the Constituency Parser.

ACKNOWLEDGMENT
This research can be used for further research.
Here are some suggestions regarding the
development of this research.
1. One of the obstacle on this research is the
limited of Constituency Parser for
Indonesian Language, so that errors
happen dominoly from the Pre-process
phase until the result of this research. The
possible solution is to develop more for
Constituency Parser for Indonesian
Language start from Indonesian Language

REFERENCES
[1] Afif, Irfan, Studi Perbandingan Kinerja
Algoritma CYK dan Algoritma Earley pada
Pengurai
Kalimat
Menggunakan
Probabilistic Context Free Grammar Bahasa
Indonesia Sederhana, Tugas Akhir Sarjana
Teknik Informatika Institut Teknologi
Bandung. 2011.
[2] Alexander Kotov, ChengXiang Zhai.
Towards Natural Question-Guided Search,
2010.
[3] Alfan Farizki Wicaksono dan Ayu
Purwarianti. HMM Based Part-of-Speech
Tagger for Bahasa Indonesia, Tugas Akhir
Sarjana
Teknik
Informatika
Institut
Teknologi Bandung, 2010.
[4] Bies, Ann et. Al., Bracketing Guidelines for
Treebank II Style Penn Treebank Project.
Linguistic Data Consortium, 1995.
[5] Covington, Michael A, Important Additional
Notes about Dependency Parsing. The
University of Georgia, 2004.
[6] Debusmann, Ralph, An Introduction to
Dependency Grammar. Universitat des
Saarlandes, 2000.
[7]
Concordance untuk Alkitab Perjanjian Baru
Bahasa Indonesia, Sekolah Tinggi Teknik
Surabaya, 2008.
[8] Jessica Felani Wijoyo, Natural Language
Processing: Lecture 1. Introduction to
Natural Language Processing, Sekolah
Tinggi Teknik Surabaya, 2011.

493

AIT / CEF 2015

[9] Kasman, S.Pd., M.Hum., Slide Sintaksis
Frasa, Klausa dan Kalimat Bahasa Indonesia,
2010.
[10] Keraf, Gorys, Tatabahasa Indonesia untuk
SMK dan Umum. Penerbit Nusa Indah,
1991.
[11] Robinson, J., Dependency structure and
tranformational rules. Language, 2/46. 1970.
[12] Sukamto, Rosa Ariani, Penguraian Bahasa
Indonesia dengan Menggunakan Pengurai

Collins. Tesis Program Magister Informatika
Institut Teknologi Bandung. 200
[13] _______,
Collins
Head
Rule,
http://www.cs.columbia.edu/~mcollins/pape
rs/heads, 1995.
[14] _______,
Stanford
Parser,
http://nlp.stanford.edu/software/lexparser.shtml, 2010.

494

AIT / CEF 2015