Improvement and Simulation of Data Mining Algorithm Based on Association Rules

TELKOMNIKA, Vol.14, No.3A, September 2016, pp. 202~207
ISSN: 1693-6930, accredited A by DIKTI, Decree No: 58/DIKTI/Kep/2013
DOI: 10.12928/TELKOMNIKA.v14i3A.4388



202

Improvement and Simulation of Data Mining Algorithm
Based on Association Rules
Li Tian*, Dai Xiao-Xia, Chen Guan-Lin, Weng Wen-Yong
Department of Computer and Computer Science, Zhejiang University City College, Hangzhou 310015, P.
R. China
*Corresponding author, e-mail: [email protected]

Abstract
As a highly concerned research topic in the field of data mining, association rules have achieved
good results at present. However, in practical application, the expanding of database scale coupled with
the increasing of the amount of data leads to the decrease of efficiency in algorithm mining, which limits its
own application. The paper modifies the data mining algorithm based on association rules and explores
the simulation of this algorithm, intending to provide some reference for the improvement and simulation of

association rules data mining algorithm.
Keywords: association rules; data mining; algorithm; simulation
Copyright © 2016 Universitas Ahmad Dahlan. All rights reserved.

1. Introduction
The characteristic of data mining is particularly remarkable, mainly manifesting in its
huge scale, imperfect complete degree and massive random data, etc. Generally, people don’t
have the awareness of extracting data in advance. Besides, the large scale network of database
data and the dynamic and heterogeneous characteristics of those data will increase the difficulty
of extraction. Under such circumstance, an effective algorithm should be integrated in order to
solve this problem and improve the computing capability. In the past years, with the continuous
development of information technology, the amount of data held by people is constantly
expanding, and how to find valuable information from the large amounts of data is already a
very concerned topic, which promotes the emergence of data mining technology and its rapid
development. Since data mining is a kind of automation and intelligent technology transfer data
into valuable information, the analysis and exploration of association rule data mining algorithm
in this paper is of much significance for the improvement and simulation of data mining
algorithm based on association rules.
2. Improvement of Association Rule Data Mining Algorithm
Two main aspects are discussed in this section about the improvement of association

rules data mining algorithm. Strategy need to be modified to improve the efficiency of
association rules data mining algorithm, and thus enhancing its practical value. Specific content
is as follows:
2.1. Researches on Data Mining Algorithm
The most important role of data mining algorithm is to obtain valuable information from
large amounts of data including structured data, semi-structured and non-structured data
sources, such as audio [1], video, data and data algorithm. The first thing to do is to create a
good model and standardize the algorithm of priority search. The common data mining
algorithms include decision tree method, bionic global optimized genetic algorithm and neural
network, statistical analysis and reports of exclusive counterexample method and so on. In
order to enhance the efficiency of data mining and to improve the application of information
data, a new system of cloud computing should be conducted to excavate more effective
information hidden in the large-scale data [2]. The association rules find the relationship and

Received March 15, 2016; Revised July 21, 2016; Accepted August 2, 2016

203




ISSN: 1693-6930

I  i , i , i



1 2
m
dependence between things. Suppose there is a collection
, and the
corresponding data task is the database D collection shown below, each transaction T is a

collection, T  I , and each transaction has an identifier TD. The signal A association rule of

these two channels is the type of containing types, and A  T , then A  I

B  I and
A  B   . With the support rules set in the transaction, the percentage of transaction is
A  B in D.


2.2. Objects and Props of Data Mining
At present, many related statistician or econometrician, particularly the research staff of
the National Bureau of Statistics, have already focused their attention on model creating and
inquiry based on data model. They point out that data analysis and modeling are an
indispensable task. In their viewpoint, the model is like a summary and a more complete
description about the relationship between the variables and variables [3]. The two variables are
the budget that allows the use of the model results, and can help to know the process of data
forming. Because of this, the linear model of simultaneous equations can be widely used in real
analysis of these two variables. However, the actual process is usually based on a specific and
simplified theoretical model (especially economic theory). Besides being unrealistic, it is also
more difficult to replace them with unclosed mathematical model. Therefore, the use of data
mining algorithm will have better feasibility.
3. LIPI Rule of Data Mining Algorithm
LIPI data mining algorithm has certain rules. The LIPI algorithm uses the data set of
scan frequency to obtain corresponding data, and forms the mining. Its mechanism is as:
If there is a group with different characteristics, and each characteristic forms a group of
projects. An element in the subsets which is not of non-empty set with the project sets can be
shown in Table 1.
Table 1. Sequence database
Sit


Sequence

S1

ACAABC

S2

ACABCE

S3

ACABC

S4

ABCAB

The relationship between sample variance and sample variance ratio is established:


S 2 

S2

2

(1)

prove: according to definition

S 2 

1 n i   2
)
(
n  1 i 1  2

1
(n  1)  2

1
 2 S2



n

 (

i

  )2

i 1



TELKOMNIKA Vol. 14, No. 3A, September 2016 : 202 – 207

(2)


TELKOMNIKA

ISSN: 1693-6930



204

This algorithm is based on the precondition that the raw data have been processed
many times and the information contained in raw data is effectively used.
The research of data mining algorithm is of great significance, and the effect of data
processing is improving. The user data information needs to be extracted from a large amount
of data to promote the development of it. Wang [4] Cloud computing is a relatively new
computing model and its application in data mining need further research to constantly improve
its application efficiency and improve data information application process.
4. Results and Analysis
Simulation can provide technical support for the improvement and perfection of
association rules data mining algorithm. The author explores the association rules data mining
algorithm simulation in detail from data preparation, association rule mining and mining results.

Main contents are as follows:
4.1. Data Preparation
His data mining algorithm simulation is based on the application model of algorithm in
marketing. Shopping malls recorded a large number of customers purchase through the
operation of these years, which constitute a customer purchase information database. At
present, the data has not been reasonably and effectively used, being a fortune waiting for
discovery. To better run this store, a study on the customer’s purchases and purchase habits
must be done. Association rule mining algorithm application is quite widespread, and how to
closely relate the association rules algorithm to practical problems is the main research target of
association rule mining problem. This paper is trying to melt Boolean matrix based association
rule mining algorithm into the mall marketing decision support system so as to obtain useful
information to enhance business interests and enhance the competitiveness from the items data
of the customers. Because transaction information are not the same every day [5], 500 copies of
the transaction in different time are randomly selected, the membership number whom and their
purchase information are input into the database D. Table 2 shows part of the transaction
records in the database, a total of 500 copies.
Table 2. Transaction database D
Customer membership
002013
002035

002037
002065
003002
003010
003022
003052
003105
003152
……

Purchase goods records
Milk bread shampoo instant noodles fruit drinks……
Razor battery gum……
Shampoo towel toothbrush fruit drinks……
Milk bread fruit drinks……
Fruit drinks battery razor……
Milk bread shampoo towel toothbrush……
Fruit drink gum……
Fruit drink bread milk……
Towel toothbrush shampoo instant noodles……

Bread milk shampoo towel toothbrush……
…………………………………………

Given an ordinary two-dimensional table is given, it is required to separate the non—
discrete attributes to obtain the required tables for association rules analysis. In Table 2, the
customer membership number was labeled Ti (i=1,2,3...... , n), and milk, bread, shampoo,
towels, toothbrush, fruits, drinks, razor, battery, chewing gum and instant noodles are recorded
in sequence order Ij (j=1,2,3...... , n). The association rules mining algorithm based on Boolean
matrix is applied here to separate the transaction database, and the discrete results are shown
in Table 3.

Improvement and Simulation of Data Mining Algorithm Based on Association Rules (Li Tian)

205



ISSN: 1693-6930
Table 3. The discrete relational table
TID
T1

Item sets
I1 I2 I3 I6 I7 I11……

T2

I8 I9 I10……

T3
T4
T5
T6
T7
T8
T9
T10
……

I3 I4 I5 I6 I7 ……
I1 I2 I6 I7……
I6 I7 I8 I9……
I1 I2 I3 I4 I5……
I6 I7 I10……
I1 I2 I6 I7……
I3 I4 I5 I11……
I1 I2 I3 I4 I5……
…………………………

4.2. Association Rule Mining
We set: minS=40%; minC=90%. Minimum support: Min supsh=min_sup sup x m=0.4 *
10=4 (1) transfer the transaction database D into a Boolean matrix Am*n: as shown in equation
(3):
I1 I 2 I 3 I 4 I 5 I 6 I 7 I 8 I 9 I10 I 11 In
T1 

T2 
T3 

T4 

T5 
T6 

T7 
T8 

T9 
T 10 

Tm 

1

1

1

0

0

1

1

0

0

0

1

0

0

0

0

0

0

0

1

1

1

0
0

0

0

1

1

1

1

1

0

0

0

1

1

0

0

0

1

1

0

0

0

0

0

0

0

0

0

1

1

1

1

0

0

1

1

1

1

1

0

0

0

0

0

0

0
1

0
1

0
0

0
0

0
0

1
1

1
1

0
0

0
0

1
0

0
0

0

0

1

1

1

0

0

0

0

1

0

1

1

1

1

1

0

0

0

0

0

0

... ...

...

...

... ... ... ... ...

... ...

... 

... 
... 
... 

... 
... 

... 
... 

... 
... 
... 

(3)

The frequent itemsets of equation (4) are calculated according to corresponding
program written into the computer.
I 1 I 2 I 3 I 4 I 5 I 6 I 7 I 8 I 9 I10 I11
T1  1

T 2 0
T3 0

T 4 1

T5 0
T 6 1

T7 0
T8 1

T9 0
T 10  1

1
0

1 0
0 0

0
0

1 1
0 0

0
1

0
1

0
1

0

1 1

1

1 1

0

0

0

1
0

0 0
0 0

0
0

1 1
1 1

0
1

0
1

0
0

1

1 1

1

0 0

0

0

0

0
1

0 0
0 0

0
0

1 1
1 1

0
0

0
0

1
0

0

1 1

1

0 0

0

0

1

1

1 1

1

0 0

0

0

0

1

0
0 
0

0
0

0
0

0
0 

(4)

All the frequent itemsets obtained by operation L= {{I1, I2}, {I3, I4}, {I3, I5}, {I4, I5}, {I6,
I7}, {I3, I4, I5}} (without regard to L1). Potential association rules can be listed according to six
frequent itemsets got from type Figure 3. Meanwhile, confidence C can also be calculated.
Finally association rules are listed in the following table 4:

TELKOMNIKA Vol. 14, No. 3A, September 2016 : 202 – 207

TELKOMNIKA



ISSN: 1693-6930

206

Table 4. Association rules
No.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16

Potential association rules
{I1}=>{I2}
{I2}=>{I1}
{I3}=>{I4}
{I4}=>{I3}
{I3}=>{I5}
{I5}=>{I3}
{I4}=>{I5}
{I1}=>{I2}
{I6}=>{I7}
{I7}=>{I6}
{I3}=>{I4,I5}
{I4}=>{I3,I5}
{I5}=>{I3,I4}
{I3,I4}=>{I5}
{I3,I4}=>{I4}
{I4,I5}=>{I3}

X∪Y
5
5
4
4
4
4
4
4
6
6
4
4
4
4
4
4

X
5
5
5
4
5
4
4
4
6
6
5
4
4
4
4
4

C
100
100
80
100
100
100
100
100
100
100
80
100
100
100
100
100

minC
90
90
90
90
90
90
90
90
90
90
90
90
90
90
90
90

Whether association rules
Yes
Yes
No
Yes
No
Yes
Yes
Yes
Yes
Yes
No
Yes
Yes
Yes
Yes
Yes

Finally, the association rule is obtained as shown in equation (5)

I1  I 2
I 2  I1
I 4  I 3
I 5  I 3
I 5  I 4
I 6  I 7
I 7  I 6
I 3, I 4  I 5
I 3, I 5  I 4
I 4, I 5  I 3

(5)

4.3. Mining Results
By using the above association rules analysis based on Boolean matrix algorithm, some
simple association rules can be obtained. For example, things such as milk, bread, shampoo,
towels and toothbrushes are strongly correlated and the together-sales of these strongly
correlated items can increase sales and provide data support for goods placing [6]. However, it
is just a manor aspect in mall marketing.
5. Conclusion
As is shown in this paper, the improvement of association rule data mining algorithm is
very significant, so is the strengthening of the simulation of the association rule data mining
algorithm. However, this is a systematic work, and it cannot be accomplished overnight. It needs
to be improved in many aspects, such as data preparation and systematic collection. The author
of this paper believes that as a new discipline, data mining needs further study to effectively
improve the efficiency and ability of it. Data cube environment and correspondent data
warehouse technology should be given in order to make data mining meet the demands of
needs. It is believed that as long as the above problems are solved, the application of
association rules data mining algorithm will be effectively strengthened and it can lay a very
solid foundation for further improvement and simulation of association rules data mining
algorithm [7, 8].
References
[1]
[2]

Lifeng Liu. A mining algorithm of financial data cleaning based on association rules. Microelectronics
and Computer. 2012; 05: 174-177.
Jie Ma. Association rule data mining algorithm research on cloud computing environment. Journal of
Industrial and Commercial University of Chongqing (Natural Science Edition). 2012; 11: 36-39.

Improvement and Simulation of Data Mining Algorithm Based on Association Rules (Li Tian)

207
[3]
[4]
[5]
[6]
[7]
[8]



ISSN: 1693-6930

Jun Li, Wenliang Song, Kuan Zhu. A novel association rule data mining algorithm. Liaoning
Engineering Technology University Journal (Natural Science Edition). 2014; 06: 846-849.
Gang Wang. A Novel Data Mining Algorithm for Mathematics Teaching Evaluation. TELKOMNIKA
Indonesian Journal of Electrical Engineering. 2014; 12(4): 2658-2666.
M Abdar, S.R.N. Kalhori, T. Sutikno, I.M.I Subroto, G. Arji. Comparing Performance of Data Mining
Algorithms in Prediction Heart Diseases. International Journal of Electrical and Computer
Engineering (IJECE). 2015; 5(6): 1569-1576.
Ali SH, El Desouky AI, Saleh AI. A New Profile Learning Model for Recommendation System based
on Machine Learning Technique. Indonesian Journal of Electrical Engineering and Informatics. 2016;
4(1): 81-92.
Juanqin Wang, Shuqin Li. Matrix based association rule mining algorithm research and improvement.
Computer Measurement and Control. 2011; 09: 2275-2277.
Xue-Feng Jiang. Application of Parallel Annealing Particle Clustering Algorithm in Data Mining.
TELKOMNIKA Indonesian Journal of Electrical Engineering. 2014; 12(3): 2118-2126.

TELKOMNIKA Vol. 14, No. 3A, September 2016 : 202 – 207