Indonesian Language Pattern Natural Question-Guided Search

Self-Learning System With Natural Question-Guided Search For Narration In Indonesian Language Edwin Wibowo Sampurna 1 , Esther Irawati Setiawan 2 Global Business, Chinese Culture University 55, Hwa-Kang Road, Taipei, Taiwan R.O.C 11114 1 edwin.sampurnagmail.com Sekolah Tinggi Teknik Surabaya Ngagel Jaya Tengah 73-77, Surabaya, Indonesia 2 esther.irawatigmail.com Abstract The need of self-learning education is quite high recently. Peoples find how to do self-learning with the easiest way. That is one reason for us to research more about Natural Question-Guided Search and try to make a system that can help us for self-learning education. This System is made into self- learning education using the Natural Question-Guided Search algorithm to make a question that can form a question along with answers that can be used to improve the self- learning understanding. Natural Question- Guided Search is an algorithm used to make a question from a narrative sentence. This Natural Question-Guided Search utilizes a Dependency Parser to process a sentence. Dependency Parser is a syntactical analysis process of a sentence which assume that a word is dependent on other words head. This relation shows grammatical function between words which cannot be shown by constituency-based parser that can only identify phrase. By knowing grammatical function between those words, it will ease the creation of a question. A test is done to determine the accuracy of the application of this Natural Question-Guided Search algorithm with applying it to the school reading text book to create a question. Keywords natural language processing, natural question-guided search, dependency parser, constituency parser, bahasa Indonesia

1. I

NTRODUCTION Nowadays, the knowledge obtained from the learning process is really important. In addition to reading, comprehension of the text can be improved by doing an exercise question of the text. It will be able to increase the absorptive capacity of comprehension about the text. But, many narrative text that does not include an exercise questions that are useful to test the readers understanding of the text. With the technology that we know is increasing everyday, it is needed a system that can make such a question directly from an existing narrative texts, so that the reader will get a question and can answer any questions after reading the narrative text in order to improve understanding and comprehension of the content of the text. With that way, the reader can study or learn a text easily by themselves. Natural Question-Guided Search algorithm is require to generate a question. Natural Question- Guided Search is an algorithm developed by Alexander Kotov and ChengXiangZai by using Dependency Parsing to process a sentence and generate a tree as the output. This output is Dependency Tree which further will be processed by Natural Question-Guided Search algorithm to generate a question. Those questions can be use for self-learning education that can help reader to study or learn a text by themselves.

2. F

UNDAMENTAL

2.1. Indonesian Language Pattern

The same with English, Indonesian Language Pattern also consist of phrase, clause and sentence. Clause is consist of two or more words that make up a construction containing elements of the predicative, and has the potential to be a sentence. 489 AIT CEF 2015 Phrase is consist of two words or a more words that forms a constituent and by which it functions as a single unit in the syntax of a sentence. Sentence is a group of words that are put together to mean something which expresses a complete thought. It does this by following the grammatical rules of syntax. There are four type of Sentence: Simple Sentence, Compound Sentence, Complex Sentence and Complex- Compound Sentence. But on this research, it is focus only on Simple Sentence.

2.2. Natural Language Processing

Natural Language Processing NLP that is a part of computer science, Artificial Intelligence AI, and Linguistics which deals with the interaction between human and computer so that the computer has the ability to be able to understand natural human language. There are four NLP that used for this research : Part-of-Speech Tagging POS Tagging, Named Entity Recognition, Constituency Parsing dan Dependency Parsing.

2.2.1. Part-Of-Speech Tagging

Part-of-Speech Tagging POS Tagging is the process of determining the words according to the grammar for each of the words in the sentences of natural language. POS Tagging is also can provide information of the word from syntactic or morphology of a sentence. This research use Indonesian Language POS Tagging iPOSTagger that made by Alfan Farizki Wicaksono and Ayu Purwarianti. iPOSTagger is very important in this research, especially in classified and tagging the words on a sentence.

2.2.2. Named Entity Recognition

Named Entity Recognition NER is one of the components of information extraction to detect and classify the named-entity in a text. NER is generally used to detect peoples names, place names and organization of a document. This research use LingPipe NER for English. in this reasearch, so that LingPipe NER can be use. While for detect the other names will be use some addition rules for categorizing.

2.2.3. Constituency Parsing

Constituency Parsing or Phrase Structure Grammar is a parser used in NLP to parse a sentence. The function of this parser is as decomposers sentence to make a Constituency Tree or Phrase Structure Grammar Tree Tree Grammar Pattern. Constituency Parser using the grammar rules to generate Constituency Tree of a sentence so that it becomes a model of grammar patterns. But Constituency Parsing is not developed in this research. This research use Constituency Parsing by Stanford Parser to generate a Constituency Tree. Stanford Parser is an English parser. Since Indonesian Language has the similarities with English in terms of sentence structure, the Stanford Parser can be used.

2.2.4. Dependency Parsing

Dependency Parsing is a parser which generates a grammar that describes the dependence between the components which one is the head and the other is dependent. Head also called modifier as a determinant for the partner. Dependency Parsing will be done using the method of mapping from Constituent Structure to Dependencies Structure, because the input to this parser is a constituent-based sentence that is output from last process.

2.3. Natural Question-Guided Search

Natural Question-Guided Search is an algorithm that can transform a narrative text into a question. The first idea of development this algorithm is by Alexander Kotov and Chengxiang Zhai actually is due to the usual information retrieval due to the need of an answer for a question. Questions that written with good grammar patterns will yield a better and faster result. In general, it is conceivable that Natural Question-Guided Search algorithm is very useful to for search an information by generate a good question. But in this research, Natural Question-Guided Search algorithm not use to generate a question that use to search an information to get a better and faster result, but this algorithm is only use to generate a question with the answer, so it can be use for self-learning study. There are two early stage should be done before running this algorithm, Pre-processing stage, and the second is Dependency Parsing. Furthermore, Dependency Tree output from Dependency Parsing will be processed by Question-Guided Natural Search algorithm. The complete Question Forming Process Diagram can be seen on Fig. 1. 490 AIT CEF 2015 Fig. 1 Question Form Phase Processing Diagram

3. Q

UESTION F ORM P ROCESSING The process of Question Form Processing is processed through two main stages, Analysis Solution Dependency Parsing and Natural Question-Guided Search.