APPLICATION OF LENGTH OF STUDY AND DEGREE OFEXCELLENCE PREDICTION FOR INFORMATICS DEPARTMENT Application of Length of Study and Degree of Excellence Prediction For Informatics Department Students of UMS Using Naive Bayes Method.
APPLICATION OF LENGTH OF STUDY AND DEGREE OF
EXCELLENCE PREDICTION FOR INFORMATICS DEPARTMENT
STUDENTS OF UMS USING NAÏVE BAYES METHOD
Papers
Department of Informatics
Faculty of Communications and Informatics
By:
Muh Amin Nurrohmat
Yusuf Sulistyo Nugroho, S.T, M.Eng
DEPARTMENT OF INFORMATICS
FACULTY OF COMMUNICATIONS AND INFORMATICS
UNIVERSITAS MUHAMMADIYAH SURAKARTA
JUNE 2015
APPLICATION OF LENGTH OF STUDY AND DEGREE OF
EXCELLENCE PREDICTION FOR INFORMATICS DEPARTMENT
STUDENTS OF UMS USING NAÏVE BAYES METHOD
Muh Amin Nurrohmat, Yusuf Sulistyo Nugroho
Department of Informatics, Faculty of Communications and Informatics
Universitas Muhammadiyah Surakarta
Email : amin.nurrohmat@gmail.com
ABSTRACT
Informatics department of Muhammadiyah University of Surakarta has
large data. The data are active students and graduate students. Every year, the data
becomes larger. On the other hand, the department cannot manage the data well,
thus it means that if the data is larger, then the information is smaller. The solution
to solve the problem is that the data must be converted into information. This
research discusses how to maximize the data into information using data mining
technique. This research uses Naïve Bayes method. It is used to analyze the data,
especially in the process of pattern recognition, predicting length of study, and
predicting the degree of excellence. After processing the data, the application will
display the report, the summary report, and suggestion. Based on the results, the
application helps Informatics department to find a solution and take a decision to
determine the policy. It is in order to decide where Informatics department will
promote its department. Moreover, if Informatics department can recruit good
student, it can improve the quality of Informatics department.
Keywords: data mining, degree of excellence, length of study, Naïve Bayes
method
INTRODUCTION
The
competitiveness of university and to
development
of
support strategic decision-making.
information technology is very rapid.
Mauriza (2013), based on
Now and in the future, information
data from 342 college students,
becomes a crucial element in life.
results show that the samples using
Information
generates
Naïve Bayes method shows that
large data. Data includes education,
students will not graduate on time is
economy, industry, and others. These
256 students or 74.85% of the total
can
for
sample. This shows that many
information. However, the need is
students who take study more than 8
not offset by presenting sufficient
semesters. Therefore, the department
information. The big data cannot be
needs to utilize the student data and
used optimally. Thus, in presenting
the graduation data. The data is used
the information needed require data
to determine student graduation rates
reprocessing.
by using data mining techniques.
technology
increase
the
need
The use of data mining
This research still has weaknesses.
techniques are expected to provide
The
information where the previous data
implementation of data mining is still
is only hidden in the data warehouse,
using the existing application. The
it makes the data as valuable
application was WEKA.
information (Huda, 2010).
The
implementation
weakness
is
that
the
Based on the problems, this
of
research develops application to help
information technology in university
the Informatics department. This
generates large data. The data such
application predicts the length of
as student data and graduation data
study and degree of excellence of
that always grows every year. The
Informatics
data include personal data, the
Communication
number of graduation and academic
UMS
score. Based on the data, the hidden
techniques. Selected techniques are
information
by
Naïve Bayes method. The writer
exploring the data to improve the
merges the technique with student
can
be
found
by
students,
and
using
Faculty of
Informatics
data
mining
data and graduation data. The writer
semester, and the role as an lab
hopes that the use of data mining
assistant. The results show that the
techniques with Naive Bayes method
highest influence variable on the
can find the information about the
length of study is the average credits
length of study and degree of
per semester. Moreover, the results
excellence.
the
indicate that the variables that need
Informatics
to be used as consideration for
department in finding solutions and
faculty to obtain the effective rate of
policies
the length of study is the average of
The
application
use
helps
to
achievement,
of
improve
thus
student
students
can
credits.
complete their studies on time.
Setyawan
(2014),
in
his
research titled "Klasifikasi Prestasi
LITERATURE REVIEW
Nugroho
and
Akademik Mahasiswa FKI UMS
Setyawan
(2014) state in their research titled
"Klasifikasi
Masa
Studi
Menggunakan
Metode
Decision
Tree" aims to classify students'
academic
achievement.
Student
Mahasiswa Fakultas Komunikasi
achievement is obtained by using
Dan
Universitas
data mining and provides a strategic
Surakarta
plan for the Informatics department
Menggunakan Algoritma C4.5"
to find out new students, and to
that there is a relationship between
improve
the classification of the length of
achievement.
study to the data FKI UMS’s student
accreditation score for informatics
who have graduated by using the
department. The writer chooses 342
Decision Tree algorithm C4.5. The
students
writer took 341 data of the 2358
classification result shows that male
data. The data is the student data
students
who have already graduated. The
predicted to have a satisfactory GPA
writer uses some attributes such as
from
school major, gender, school, the
classification results predict that the
average
students who are not from science
Informatika
Muhammadiyah
number
of
credits
per
student’s
It
as
are
a
increases
sample.
students
Surakarta
academic
who
residency.
the
The
are
The
are the students who have a less
Surakarta by using Naive Bayes
satisfactory GPA. The classification
methods.
results predict that students who have
compared the method with one
a satisfactory GPA are the students
another. J48 is better than others
who come from the Surakarta outside
because of its ability to define and
the residency. Female students from
classify each attribute to each class
Surakarta residency are predicted to
simply. The result shows that there
have
The
are 342 students as a data. The
classification results predict that
writer uses Naive Bayes method.
students who have a less satisfactory
Apparently, the results show that the
GPA are the student majoring in
students who will graduate on time
science and social studies. The
only
classification results predict that
Students who will not graduate on
male students from social studies
time 256 students or 74.85%. Based
originating from Surakarta residency,
on the results, the faculty needs
and women students who come from
solutions
outside residency of Surakarta is a
achievement. Thus, students can
student with a GPA of less than
graduate on time and they have
satisfactory.
the
satisfactory outcome. Thereby, it
students majoring in science is
can help the faculty to increase the
predicted to have a satisfactory GPA.
accreditation score.
a
satisfactory
Almost
Mauriza
GPA.
all
students
to
or
improve
is
also
25.15%.
student
According to Huda (2010),
research entitled "Implementasi Data
data mining is a mining or the
Mining
invention of new information by
Untuk
in
research
his
Kelulusan
(2013)
of
86
The
Memprediksi
Fakultas
looking for patterns or specific rules.
Komunikasi dan Informatika UMS
The information is obtained from the
Menggunakan Metode Naive Bayes"
big data and is expected to treat the
has the purpose to predict the length
condition. By utilizing student data
of
and graduation data, it is expected to
study
Mahasiswa
in
Communication
Muhammadiyah
the
Faculty
and
Informatics,
University
of
of
generate
information
on
student
graduation rates through data mining
techniques.
The
categories
graduation data are school
graduation rate is measured from the
major,
length of study and GPA. The
school,
algorithm is a apriori algorithm. The
length
information is presented as the
degree of excellence.
support and the confidence of each
gender,
region,
assistant
of
study,
lab,
and
2) Student Data
category graduation rates. The result
Student data that are used
is an application to obtain useful
as testing data is the
information about the graduation
student data which is still
rates of students with data mining
active as the students. The
techniques.
writer
the
data
randomly as a sample. The
RESEARCH METHOD
attributes of student data
1. Data Mining Analysis
are school major, gender,
This research will seek the
greatest
takes
probability
of
each
attribute.
region, school, assistant
lab.
b. Data Needs
a. Collecting The Data
Determination needs is
This research requires
a necessary in assisting the
from
Informatics
development of data mining.
students either already passed
Based on the graduation data
or not passed. This research
attributes and student data
uses
attributes, then, it is divided
data
all
student
data
and
graduation data.
into a variable, such as:
1) Graduation Data
1) School
Graduation data are used
Major:
Science,
Social and others.
as training data is the data
2) Gender: Male and Female.
of
3) Region:
students
who
have
passed. Sample data is
student’s
data student of 2007 to
students
2010. The attributes of
This
explains
home,
or
address.
The
students are divided based
c. Cleaning Data
This research needs to
on Indonesia time zone.
explains
perform data cleansing. Thus,
where students took the
the data is relevant to the
course
high
needs. In addition, to avoid
school). It is divided based
noise and inconsistency data
on Indonesia time zone.
on graduation data attributes
4) School:
This
(Senior
5) Lab
Assistant:
This
and students data attributes in
answer the question “Was
testing the system.
the
Table 1 Attribute List
student
the
lab
assistant?”
Attribute
6) Length of Study: This
answer the question "Do
the students graduate on
School
Content
IPA, IPS, and LAIN
Major
Gender
PRIA and WANITA
Region
WIB, WITA, and WIT
School
WIB, WITA, and WIT
criteria that students do
Lab
YA and TIDAK
not graduate on time are
Assistant
when students graduate >
Length of TEPAT
4 years.
Study
time?". The criteria for
students who graduate on
time are if a student
graduated ≤ 4 years. The
7) Degree
of
excellence:
Predicates graduation are
Cumlaude if GPA> 3.51,
Very Satisfactory when
2.76
EXCELLENCE PREDICTION FOR INFORMATICS DEPARTMENT
STUDENTS OF UMS USING NAÏVE BAYES METHOD
Papers
Department of Informatics
Faculty of Communications and Informatics
By:
Muh Amin Nurrohmat
Yusuf Sulistyo Nugroho, S.T, M.Eng
DEPARTMENT OF INFORMATICS
FACULTY OF COMMUNICATIONS AND INFORMATICS
UNIVERSITAS MUHAMMADIYAH SURAKARTA
JUNE 2015
APPLICATION OF LENGTH OF STUDY AND DEGREE OF
EXCELLENCE PREDICTION FOR INFORMATICS DEPARTMENT
STUDENTS OF UMS USING NAÏVE BAYES METHOD
Muh Amin Nurrohmat, Yusuf Sulistyo Nugroho
Department of Informatics, Faculty of Communications and Informatics
Universitas Muhammadiyah Surakarta
Email : amin.nurrohmat@gmail.com
ABSTRACT
Informatics department of Muhammadiyah University of Surakarta has
large data. The data are active students and graduate students. Every year, the data
becomes larger. On the other hand, the department cannot manage the data well,
thus it means that if the data is larger, then the information is smaller. The solution
to solve the problem is that the data must be converted into information. This
research discusses how to maximize the data into information using data mining
technique. This research uses Naïve Bayes method. It is used to analyze the data,
especially in the process of pattern recognition, predicting length of study, and
predicting the degree of excellence. After processing the data, the application will
display the report, the summary report, and suggestion. Based on the results, the
application helps Informatics department to find a solution and take a decision to
determine the policy. It is in order to decide where Informatics department will
promote its department. Moreover, if Informatics department can recruit good
student, it can improve the quality of Informatics department.
Keywords: data mining, degree of excellence, length of study, Naïve Bayes
method
INTRODUCTION
The
competitiveness of university and to
development
of
support strategic decision-making.
information technology is very rapid.
Mauriza (2013), based on
Now and in the future, information
data from 342 college students,
becomes a crucial element in life.
results show that the samples using
Information
generates
Naïve Bayes method shows that
large data. Data includes education,
students will not graduate on time is
economy, industry, and others. These
256 students or 74.85% of the total
can
for
sample. This shows that many
information. However, the need is
students who take study more than 8
not offset by presenting sufficient
semesters. Therefore, the department
information. The big data cannot be
needs to utilize the student data and
used optimally. Thus, in presenting
the graduation data. The data is used
the information needed require data
to determine student graduation rates
reprocessing.
by using data mining techniques.
technology
increase
the
need
The use of data mining
This research still has weaknesses.
techniques are expected to provide
The
information where the previous data
implementation of data mining is still
is only hidden in the data warehouse,
using the existing application. The
it makes the data as valuable
application was WEKA.
information (Huda, 2010).
The
implementation
weakness
is
that
the
Based on the problems, this
of
research develops application to help
information technology in university
the Informatics department. This
generates large data. The data such
application predicts the length of
as student data and graduation data
study and degree of excellence of
that always grows every year. The
Informatics
data include personal data, the
Communication
number of graduation and academic
UMS
score. Based on the data, the hidden
techniques. Selected techniques are
information
by
Naïve Bayes method. The writer
exploring the data to improve the
merges the technique with student
can
be
found
by
students,
and
using
Faculty of
Informatics
data
mining
data and graduation data. The writer
semester, and the role as an lab
hopes that the use of data mining
assistant. The results show that the
techniques with Naive Bayes method
highest influence variable on the
can find the information about the
length of study is the average credits
length of study and degree of
per semester. Moreover, the results
excellence.
the
indicate that the variables that need
Informatics
to be used as consideration for
department in finding solutions and
faculty to obtain the effective rate of
policies
the length of study is the average of
The
application
use
helps
to
achievement,
of
improve
thus
student
students
can
credits.
complete their studies on time.
Setyawan
(2014),
in
his
research titled "Klasifikasi Prestasi
LITERATURE REVIEW
Nugroho
and
Akademik Mahasiswa FKI UMS
Setyawan
(2014) state in their research titled
"Klasifikasi
Masa
Studi
Menggunakan
Metode
Decision
Tree" aims to classify students'
academic
achievement.
Student
Mahasiswa Fakultas Komunikasi
achievement is obtained by using
Dan
Universitas
data mining and provides a strategic
Surakarta
plan for the Informatics department
Menggunakan Algoritma C4.5"
to find out new students, and to
that there is a relationship between
improve
the classification of the length of
achievement.
study to the data FKI UMS’s student
accreditation score for informatics
who have graduated by using the
department. The writer chooses 342
Decision Tree algorithm C4.5. The
students
writer took 341 data of the 2358
classification result shows that male
data. The data is the student data
students
who have already graduated. The
predicted to have a satisfactory GPA
writer uses some attributes such as
from
school major, gender, school, the
classification results predict that the
average
students who are not from science
Informatika
Muhammadiyah
number
of
credits
per
student’s
It
as
are
a
increases
sample.
students
Surakarta
academic
who
residency.
the
The
are
The
are the students who have a less
Surakarta by using Naive Bayes
satisfactory GPA. The classification
methods.
results predict that students who have
compared the method with one
a satisfactory GPA are the students
another. J48 is better than others
who come from the Surakarta outside
because of its ability to define and
the residency. Female students from
classify each attribute to each class
Surakarta residency are predicted to
simply. The result shows that there
have
The
are 342 students as a data. The
classification results predict that
writer uses Naive Bayes method.
students who have a less satisfactory
Apparently, the results show that the
GPA are the student majoring in
students who will graduate on time
science and social studies. The
only
classification results predict that
Students who will not graduate on
male students from social studies
time 256 students or 74.85%. Based
originating from Surakarta residency,
on the results, the faculty needs
and women students who come from
solutions
outside residency of Surakarta is a
achievement. Thus, students can
student with a GPA of less than
graduate on time and they have
satisfactory.
the
satisfactory outcome. Thereby, it
students majoring in science is
can help the faculty to increase the
predicted to have a satisfactory GPA.
accreditation score.
a
satisfactory
Almost
Mauriza
GPA.
all
students
to
or
improve
is
also
25.15%.
student
According to Huda (2010),
research entitled "Implementasi Data
data mining is a mining or the
Mining
invention of new information by
Untuk
in
research
his
Kelulusan
(2013)
of
86
The
Memprediksi
Fakultas
looking for patterns or specific rules.
Komunikasi dan Informatika UMS
The information is obtained from the
Menggunakan Metode Naive Bayes"
big data and is expected to treat the
has the purpose to predict the length
condition. By utilizing student data
of
and graduation data, it is expected to
study
Mahasiswa
in
Communication
Muhammadiyah
the
Faculty
and
Informatics,
University
of
of
generate
information
on
student
graduation rates through data mining
techniques.
The
categories
graduation data are school
graduation rate is measured from the
major,
length of study and GPA. The
school,
algorithm is a apriori algorithm. The
length
information is presented as the
degree of excellence.
support and the confidence of each
gender,
region,
assistant
of
study,
lab,
and
2) Student Data
category graduation rates. The result
Student data that are used
is an application to obtain useful
as testing data is the
information about the graduation
student data which is still
rates of students with data mining
active as the students. The
techniques.
writer
the
data
randomly as a sample. The
RESEARCH METHOD
attributes of student data
1. Data Mining Analysis
are school major, gender,
This research will seek the
greatest
takes
probability
of
each
attribute.
region, school, assistant
lab.
b. Data Needs
a. Collecting The Data
Determination needs is
This research requires
a necessary in assisting the
from
Informatics
development of data mining.
students either already passed
Based on the graduation data
or not passed. This research
attributes and student data
uses
attributes, then, it is divided
data
all
student
data
and
graduation data.
into a variable, such as:
1) Graduation Data
1) School
Graduation data are used
Major:
Science,
Social and others.
as training data is the data
2) Gender: Male and Female.
of
3) Region:
students
who
have
passed. Sample data is
student’s
data student of 2007 to
students
2010. The attributes of
This
explains
home,
or
address.
The
students are divided based
c. Cleaning Data
This research needs to
on Indonesia time zone.
explains
perform data cleansing. Thus,
where students took the
the data is relevant to the
course
high
needs. In addition, to avoid
school). It is divided based
noise and inconsistency data
on Indonesia time zone.
on graduation data attributes
4) School:
This
(Senior
5) Lab
Assistant:
This
and students data attributes in
answer the question “Was
testing the system.
the
Table 1 Attribute List
student
the
lab
assistant?”
Attribute
6) Length of Study: This
answer the question "Do
the students graduate on
School
Content
IPA, IPS, and LAIN
Major
Gender
PRIA and WANITA
Region
WIB, WITA, and WIT
School
WIB, WITA, and WIT
criteria that students do
Lab
YA and TIDAK
not graduate on time are
Assistant
when students graduate >
Length of TEPAT
4 years.
Study
time?". The criteria for
students who graduate on
time are if a student
graduated ≤ 4 years. The
7) Degree
of
excellence:
Predicates graduation are
Cumlaude if GPA> 3.51,
Very Satisfactory when
2.76