ISSN: 1693-6930
TELKOMNIKA Vol. 12, No. 2, June 2014: 447 – 454
448 Research on visualization of search result [16]-[17] has also been conducted. A
prototype visualization system [16] was created to enhance author searching. The system was based on author co-citation analysis and algorithms such as Kohonen’s feature maps and
Pathfinder networks. Google has Google Scholar http:scholar.google.com, a search engine to retrieve
research documents. It obtains bibliography entries in online documents to measure some metrics and provides authors with a publication profile. Microsoft has also created a search
engine called Microsoft Academic Search http:academic.research.microsoft.com with similar features. In addition, it can display Co-Author Graph, Co-Author Path, Citation Graph, and
Genealogy Graph to visualize the relationship between authors. The applications developed by Google and Microsoft can be used freely online, but they
cannot be modified to extract the bibliography entries of research documents with a specific format. We also have no control on the scope of documents. This paper is to propose a method
to perform extracts bibliographic data entries of research documents from a given collection of documents.
This study aims to create a module that can extract references from the bibliography entries of research documents. A method is created to recognize the bibliography entries from
the research documents. Once identified, the bibliography entries are stored into a database. The database is used to build an information retrieval system for searching research documents
along with their references and to visualize the relationship between the authors.
2. The Methods
This study began with collecting the research documents as PDF files. Each file was converted into plaintext file and stored in a research document database. The text was
extracted and identified to get the bibliography entries. The bibliography entries were stored into the database. The database was used to build an information retrieval system of research
documents. A visualization module was created to display the relationships between the authors of the documents from bibliographic entries in the database. The steps of the proposed method
are shown in Figure 1.
Figure 1. The Proposed Method 2.1. System Evaluation
System evaluation in this study adopts the metrics in information retrieval, namely, recall and precision [18]. This study carries out two kinds of evaluation. The first evaluation is
the measurement of the success of extraction and attributes identification of bibliography entries in each document. Assume B
i
is a set of bibliography entries in the i-th research document and E
i
is a set of bibliography entries that are successfully extracted and identified from the i-th research document by the system, then recall can be calculated by 1 and precision can be
calculated by 2. 1
Start Collecting research
documents and designing the database
Extraction and attribute identification
of bibliography entries Information retrieval
system of research document development
Creates visualization of the author relationship
System Evaluation Finish
TELKOMNIKA ISSN: 1693-6930
Searching and Visualization of References in Research Documents Firnas Nadirman 449
2 The equations 1 and 2 are used to calculate the percentage of recall with 3 and
percentage of precision with 4 from the whole bibliography entries throughout the documents. 3
4 The second evaluation is the measurement of the success of document relationship in
the collection with a bibliography entry. Assume C
i
is a set of research documents that refer the i-th bibliography entry and F
i
is a set of research documents that are determined to refer to the i- th bibliography entry by the system, then recall can be calculated by 5 and precision can be
calculated by 6. 5
6 The equation 5 and 6 are then used to calculate the percentage of recall with 7 and
percentage of precision with 8 from the total number of entries that are connected to a document.
7 8
2.2 Collection Our collection consists of 242 PDF files of Bachelor theses from Computer Science
Department, Bogor Agricultural University IPB, Indonesia, and almost all of them are written in Indonesian language. Therefore, the templates in our bibliographic entries extraction are based
on IPB’s writing guidelines. The evaluation is performed using this collection.
3. Results 3.1. Data Characteristics