ANALISIS SISTEM REKOMENDASI PARIWISATA MENGGUNAKAN NAIVE BAYES CLASSIFIER DENGAN SELEKSI FITUR TERM FREQUENCY.

ii

iii

ANALISIS SISTEM REKOMENDASI PARIWISATA MENGGUNAKAN
NAIVE BAYES CLASSIFIER DENGAN SELEKSI FITUR TERM
FREQUENCY

ALFI MUHAMMAD ANWAR
Program Studi Informatika Fakultas FMIPA Universitas Sebelas Maret

ABSTRAK
Perkembangan bisnis pariwisata setiap tahun semakin berkembang pesat, sentimen
setiap pengunjung obyek wisata adalah salah satu hal yang penting dalam
perkembangan industri pariwisata. Dengan menggunakan Application Programming
Interface (API) Sosial media twitter akses untuk mendapatkan sentimen pengunjung

suatu lokasi wisata dapat dilakukan secara bebas. Pada penelitian ini penggunaan naive
bayes classifier dan term frequency digunakan dikarenakan penggunaan metode yang

sederhana dan memerlukan data yang tidak terlalu besar. Implementasi hasil

klasifikasi pada map menggunakan haversine formula yang akan menampilkan lokasi
di sekitar map. Data yang digunakan adalah hasil crawling pada Oktober 2016 untuk
training dan November 2016 untuk data uji. 3 hasil terbaik penghitungan akurasi pada

uji terhadap data training adalah 96,8% tanpa seleksi fitur, 96,8% untuk seleksi fitur
penghilangan ciri dengan probabilitas kemunculan kata terkecil dan 93,6% untuk
penggabungan seleksi fitur penghilangan ciri dengan probabilitas kemunculan kata
terkecil dan probabilitas kemunculan kata terbesar pada beberapa kelas dan
penghilangan nama lokasi wisata yang mengakibatkan penurunan jumlah kata dari
1264 kata sebelum menggunakan seleksi fitur dan setelah menggunakan seleksi fitur
menjadi 577 kata, seleksi fitur tersebut digunakan pada uji terhadap data testing
didapatkan hasil yang terbesar yaitu 84,7%. Penurunan akurasi antara data memorized
dan data testing kemungkinan dikarenakan karena beberapa data testing yang tidak
terdapat pada data yang diujikan.

Kata kunci: Text Mining, Naive Bayes Classifier, Term Frequency, Haversine
Formula, twitter, sentiment

iv


ANALYSIS OF THE RECOMMENDATION SYSTEM OF TOURISM USING
NAIVE BAYES CLASSIFIER WITH TERM FREQUENCY SELECTION
FEATURE
ALFI MUHAMMAD ANWAR
Department of Informatic Mathematic and Science Faculty Sebelas Maret
University

ABSTRACT
The development of tourism business is growing rapidly every year, every tourist
sentiment is one of the things that are important in the development of the tourism
industry. Twitter social media is one way of assessing the tourist sentiment that can be
freely accessible by their Application Programming Interface (API). The naive bayes

classifier and term frequency is used due to the simple methods whom it’s methods
only require small usage data. Implementation of the results in is using haversine
formula on the maps that show a few location around map. The data is crawled from
twitter on October 2016 for the training data and November 2016 for the testing data,
location data are obtained from the reccomendation of google in October for Java,
Indonesia. The best 3 results of the measurement accuracy of memorized test is 96,8%
without using selection feature, 96,8% for selection feature with removing highest

probability words on sentence on more than one classes and 93,6% mixing removing
lowest probability words on sentence and highest probability words on sentence on
more than one classes with removing location name lead to a decrease the number of
words from 1264 word befor using feature selection into 577 after using feature
selection and it given highest result for the data test amounted 84,7%. Impairment of
accuracy between memorized and test result possibly due to there are few testing data
that does not include in the test data used.

Keywords : Text Mining, Naive Bayes Classifier, Term Frequency, Haversine

Formula, twitter, sentiment

v

MOTTO
“Your duty as a child is lifting your family pride”

vi

PERSEMBAHAN

Skripsi ini saya persembahkan untuk :
Ibu, dan Ayah saya yang telah membimbing saya dan mendukung dalam setiap
langkah saya.

vii

KATA PENGANTAR

Segala puji syukur kehadirat Allah SWT, atas limpahan rahmat dah hidayahNya, sehingga penulis dapat menyelesaikan skripsi yang berjudul “Analisis Sistem
Rekomendasi Pariwisata Menggunakan Naive Bayes Classifier dengan Seleksi Fitur

Term Frequency”. Dalam menyelesaikan skripsi ini, Penulis mengucapkan
terimakasih kepada semua pihak yang telah meluangkan waktu untuk memberikan
bimbingan dan dukungan oleh berbagai pihak baik secara langsung maupun tidak
langsung. Secara khusus, ucapan terimakasih penulis sampaikan kepada :
1. Drs. Bambang Harjito M,App.Sc., Ph.D selaku Kepala Program Studi
Informatika yang telah memberikan dukungan selama proses penyusunan
Tugas Akhir.
2. Bapak Ristu Saptono S.Si., M.T. selaku dosen pembimbing I atas ilmu yang
diberikan, bimbingan, kebaikan serta kesabaran kepada penulis selama

pelaksanaan Tugas Akhir.
3. Ibu Rini Anggrainingsih, S.T., M.T. selaku dosen pembimbing II atas ilmu
yang diberikan, bimbingan, kebaikan serta kesabaran kepada penulis selama
pelaksanaan Tugas Akhir.
4. Anita Budi Raharjo yang selama ini begitu sabar dalam memberikan dukungan
selama penyusunan Tugas Akhir.
5. Teman-terdekat saya Yonathan Adi Kurnia, Dian Adi Nugroho, Anthony Juan,
Rhesa Havilah, Adi Prasetya P atas segala bantuan dan support.
6. Martha Pritzanda, Sandi Suko, Edo Rizki, Debora Sakti, Elfrida Nathalita,
Tittah Hayyu, Muhammad Arifudin, Lala Mareta, Andreas Bobola atas
dukungan selama penulisan.
7. Teman-teman dari TKTC Fariz, Pardek, Affan, Galang, Imam, Irfan, Azza,
Indra, Topik, Iam, Tri dan Unggul atas dukungan selama penulisan.
Penulis berharap agar skripsi ini dapat bermanfaat bagi berbagai pihak.
Surakarta, Januari 2017

Penulis

viii


DAFTAR ISI

HALAMAN JUDUL ................................................................................................... i
HALAMAN PERSETUJUAN ................................... Error! Bookmark not defined.
HALAMAN PENGESAHAN ..................................................................................... ii
ABSTRAK .................................................................................................................. iv
ABSTRACT................................................................................................................. v
MOTTO ...................................................................................................................... vi
PERSEMBAHAN ...................................................................................................... vii
KATA PENGANTAR .............................................................................................. viii
DAFTAR ISI............................................................................................................... ix
DAFTAR TABEL....................................................................................................... xi
DAFTAR GAMBAR ................................................................................................. xii
DAFTAR LAMPIRAN............................................................................................. xiii
PENDAHULUAN ....................................................................................................... 1
1.1 Latar Belakang ...................................................................................................... 1
1.2 Rumusan Masalah .................................................................................................. 3
1.3 Batasan Masalah .................................................................................................... 3
1.4 Tujuan Penelitian ................................................................................................... 3
1.5 Manfaat Penelitian ................................................................................................. 3

1.6 Sistematika Penulisan ............................................................................................ 4
TINJAUAN PUSTAKA .............................................................................................. 5
2.1 Dasar Teori ............................................................................................................ 5
2.1.1. Text Mining ........................................................................................................ 5
2.1.2. Term Frequency ................................................................................................. 6
2.1.3. Naive Bayes Classifier ....................................................................................... 7
2.1.4. Haversine Formula ............................................................................................ 8
2.2 Penelitian Terkait ................................................................................................... 8
METODOLOGI PENELITIAN ................................................................................ 14
3.1 Pengumpulan Data .............................................................................................. 14
3.2 Preprocessing ..................................................................................................... 14
3.3 Seleksi Fitur dengan Term Frequency ................................................................ 15

ix

3.4 Penerapan Naive Bayes Classifier ...................................................................... 15
3.5 Penerapan Haversine Formula ........................................................................... 15
3.6 Pengujian ............................................................................................................ 16
HASIL DAN PEMBAHASAN ................................................................................. 17
4.1 Deskripsi Data .................................................................................................... 17

4.2 Preprocessing ..................................................................................................... 17
4.3 Feature Selection dengan Term Frequency Relative .......................................... 19
4.4 Naive Bayes Classifier ........................................................................................ 23
4.5 Haversine Formula ............................................................................................. 24
4.6 Hasil Pengujian ................................................................................................... 26
4.7 Pembahasan ........................................................................................................ 33
PENUTUP ................................................................................................................. 36
5.1 Kesimpulan .......................................................................................................... 36
5.2 Saran ................................................................................................................... 36
DAFTAR PUSTAKA ................................................................................................ 38
LAMPIRAN............................................................................................................... 39

x

DAFTAR TABEL
Tabel 3.1. Contoh Perhitungan Accuracy, Precision dan Recall ............................... 16
Tabel 4.1. Waktu Pengambilan Data ......................................................................... 17
Tabel 4.2. Hasil Case Folding ................................................................................... 18
Tabel 4.3. Stopword Removal .................................................................................... 19
Tabel 4.4. Contoh Hasil Term Frequency Relative ................................................... 19

Tabel 4.4. Contoh Hasil Term Frequency Relative (Lanjutan).................................. 20
Tabel 4.5. Contoh Hasil Seleksi Fitur 1 ..................................................................... 20
Tabel 4.5. Contoh Hasil Seleksi Fitur 1 (Lanjutan) ................................................... 21
Tabel 4.6. Contoh Hasil Seleksi Fitur 2 ..................................................................... 21
Tabel 4.6. Contoh Hasil Seleksi Fitur 2 (Lanjutan) ................................................... 22
Tabel 4.7. Contoh Hasil Seleksi Fitur 3 ..................................................................... 22
Tabel 4.8. Contoh Hasil Seleksi Fitur 4 ..................................................................... 23
Tabel 4.9. Contoh Hasil Perhitungan Naive Bayes Classifier ................................... 23
Tabel 4.10. Confusion Matrix Uji-1 Memorized ........................................................ 27
Tabel 4.11. Confusion Matrix Uji-2 Memorized ........................................................ 27
Tabel 4.12. Confusion Matrix Uji-3 Memorized ........................................................ 28
Tabel 4.13. Confusion Matrix Uji-4 Memorized ........................................................ 28
Tabel 4.13. Confusion Matrix Uji-4 Memorized (Lanjutan) ...................................... 29
Tabel 4.14. Confusion Matrix Uji-5 Memorized ........................................................ 29
Tabel 4.15. Confusion Matrix Uji-1 Data Testing ..................................................... 30
Tabel 4.16. Confusion Matrix Uji-2 Data Testing ..................................................... 30
Tabel 4.17. Confusion Matrix Uji-3 Data Testing ..................................................... 31
Tabel 4.18. Confusion Matrix Uji-4 Data Testing ..................................................... 32
Tabel 4.19. Confusion Matrix Uji-5 Data Testing ..................................................... 32
Tabel 4.20. Perbandingan Hasil Perhitungan Precision, Accuracy dan Recall ......... 33


xi

DAFTAR GAMBAR
Gambar 3.1. Metodologi Penelitian. .......................................................................... 14
Gambar 4.1. Penerapan Haversine Formula Jarak Max 2 KM ................................. 24
Gambar 4.2. Penerapan Haversine Formula Jarak Max 10 KM. .............................. 25

xii

DAFTAR LAMPIRAN
Tabel Lampiran 1. Data Training 1-25 ..................................................................... 39
Tabel Lampiran 1. Data Training 26-50 ................................................................... 40
Tabel Lampiran 1. Data Training 51-79 ................................................................... 41
Tabel Lampiran 1. Data Training 80-106 ................................................................. 42
Tabel Lampiran 1. Data Training 107-131 ............................................................... 43
Tabel Lampiran 1. Data Training 132-157 ............................................................... 44
Tabel Lampiran 1. Data Training 158-186 ............................................................... 45
Tabel Lampiran 1. Data Training 187-213 ............................................................... 46
Tabel Lampiran 1. Data Training 214-238 ............................................................... 47

Tabel Lampiran 1. Data Training 239-265 ............................................................... 48
Tabel Lampiran 1. Data Training 266-282 ............................................................... 49
Tabel Lampiran 2. Data Testing 1-7 .......................................................................... 49
Tabel Lampiran 2. Data Testing 8-31 ........................................................................ 50
Tabel Lampiran 2. Data Testing 32-55 ...................................................................... 51
Tabel Lampiran 2. Data Testing 56-81 ...................................................................... 52
Tabel Lampiran 2. Data Testing 82-107 .................................................................... 53
Tabel Lampiran 2. Data Testing 108-133 .................................................................. 54
Tabel Lampiran 2. Data Testing 134-162 .................................................................. 55
Tabel Lampiran 2. Data Testing 163-188 .................................................................. 56
Tabel Lampiran 2. Data Testing 189-213 .................................................................. 57
Tabel Lampiran 2. Data Testing 214-237 .................................................................. 58
Tabel Lampiran 2. Data Testing 238-263 .................................................................. 59
Tabel Lampiran 2. Data Testing 264-289 .................................................................. 60
Tabel Lampiran 2. Data Testing 290-300 .................................................................. 61
Tabel Lampiran 3. Daftar Lokasi wisata yang digunakan 1-26 ................................. 61
Tabel Lampiran 3. Daftar Lokasi wisata yang digunakan 27-70 ............................... 62
Tabel Lampiran 3. Daftar Lokasi wisata yang digunakan 71-114 ............................. 63
Tabel Lampiran 3. Daftar Lokasi wisata yang digunakan 115-130 ........................... 64
Tabel Lampiran 4. Hasil Klasifikasi Data Training 1-21 .......................................... 64

xiii

Tabel Lampiran 4. Hasil Klasifikasi Data Training 22-64 ........................................ 65
Tabel Lampiran 4. Hasil Klasifikasi Data Training 65-107 ...................................... 66
Tabel Lampiran 4. Hasil Klasifikasi Data Training 108-150 .................................... 67
Tabel Lampiran 4. Hasil Klasifikasi Data Training 151-193 .................................... 68
Tabel Lampiran 4. Hasil Klasifikasi Data Training 194-236 .................................... 69
Tabel Lampiran 4. Hasil Klasifikasi Data Training 237-279 .................................... 70
Tabel Lampiran 4. Hasil Klasifikasi Data Training 280-282 .................................... 71
Tabel Lampiran 5. Hasil Klasifikasi Data Testing 1-32 ............................................ 71
Tabel Lampiran 5. Hasil Klasifikasi Data Testing 33-71 .......................................... 72
Tabel Lampiran 5. Hasil Klasifikasi Data Testing 72-110 ........................................ 73
Tabel Lampiran 5. Hasil Klasifikasi Data Testing 111-149 ...................................... 74
Tabel Lampiran 5. Hasil Klasifikasi Data Testing 150-188 ...................................... 75
Tabel Lampiran 5. Hasil Klasifikasi Data Testing 189-227 ...................................... 76
Tabel Lampiran 5. Hasil Klasifikasi Data Testing 228-266 ...................................... 77
Tabel Lampiran 5. Hasil Klasifikasi Data Testing 266-300 ...................................... 78
Tabel Lampiran 6. Daftar Ciri Positif ........................................................................ 79
Tabel Lampiran 6. Daftar Ciri Negatif ...................................................................... 80
Tabel Lampiran 6. Daftar Ciri Netral ........................................................................ 81
Tabel Lampiran 6. Daftar Ciri Netral ........................................................................ 82
Tabel Lampiran 6. Daftar Ciri Netral ........................................................................ 83

xiv