Classification Recognition Algorithm Based on Strong Association Rule Optimization of Neural Network

TELKOMNIKA, Vol.14, No.2A, June 2016, pp. 241~247
ISSN: 1693-6930, accredited A by DIKTI, Decree No: 58/DIKTI/Kep/2013
DOI: 10.12928/TELKOMNIKA.v14i2A.4364



241

Classification Recognition Algorithm Based on Strong
Association Rule Optimization of Neural Network
1

1

Zhang Xuewu* , Joern Huenteler

2

College of Computer Science and Engineering, Changshu Institute of Technology, Changshu, Jiangsu
215500, China
2

Counterpane Internet Security, Inc, 00148 Rome, Italy
*Corresponding author, e-mail: zhangxuewu4116@163.com

Abstract
Feature selection of text is one of the basic matters for intelligent classification of text. Textual
feature generating algorithm adopts weighted textual vector space model generally at present. This model
uses BP network evaluation function to calculate weight value of single feature and textual feature
redundancy generated in this algorithm is high generally. For this problem, a textual feature generating
algorithm based on clustering weighting is adopted. This new method conducts initial weighted treatment
for feature candidate set first of all and then conducts further weighted treatment of features through
semantic and information entropy and it removes redundancy features with features clustering at last.
Experiment shows that the average classification accuracy rate of this algorithm is about 5% higher than
that of traditional BP network algorithm.
Keywords: Text classification; Feature generating; Weight calculation; Feature clustering; Information
entropy

1. Introduction
With the continuous development of informatization, the electronic text as an important
carrier of information has already become an inevitable trend of Internet development. How to
classify and recognize enormous and unordered information has become an important research

orientation. The text classification refers to automatic classification of text written in natural
language based on the predefined subject style. Currently, the text classification generally
improves the classification algorithm to improve the classification efficiency and ignores the
significance of the feature words selection. There are various textual feature generating
methods commonly used, of which typical methods are document frequency (DF), information
gain (IG), mutual information (MI),
statistical magnitude (CHI), expected cross entropy,
weight evidence of text, odds ratio and the like [1, 2]. The basic idea of those methods is to
calculate certain statistical measurements for each feature and then set a threshold value and
filter out the feature with measurement less than the threshold value , and the rest is the text
feature. In terms of said analysis, a new classification recognition algorithm based on strong
association rule optimization of neural network is presented in the Thesis to effectively solve the
problem of redundancy of text feature generating.
2. Textual Feature Generating Algorithm Based on BP Network
2.1. Initial Weights Processing
Carry out weighting processing of progressive increase for the initial feature set based
on the basic idea of information abstract generating algorithm of BP network to obtain the
weights of each feature word and then remove feature words with high relevance through the
clustering mining algorithm and finally generate text feature through the information gain
processing.

Definition 1: for Text , define a feature vector
, ,
,…, ,
,
as the vector space model of it [3-6].

Received January 19, 2016; Revised April 27, 2016; Accepted May 11, 2016

242



ISSN: 1693-6930

refer to
features of and
Where, , , … ,
refer to the
,…,
,

weights of the feature corresponding to , and generally it is defined as the function of
frequency
of appearing in , that is
.
The initial weight processing refers to the initialization of the weights of all feature words
in the text. The Thesis adopts the feature weight algorithm -BP network algorithm commonly
used in vector space model.
BP network algorithm takes the frequency of appearance of the feature word in the
Document and the document number including this feature words as the weight of the feature
word, namely

N
 0.01)
DF (ti )
wti (d ) 
N
 0.01)
(tf i (d )) 2 log 2 (

DF (ti )

i 1
tfi (d ) log(

(1)

the frequency of
refers to the weight of feature word ,
Where,
in Document ,
the total document number and
appearance of feature word
the
document number including .
The weight of feature word generated with the BP network algorithm shows when
certain feature word more frequently appear in one document and appears in other documents
least, is shows that this feature word is of stronger capacity to distinguish the document and its
weight is also larger.
It is only half right that BP network algorithm only simply regards feature words with
small text frequency more important and feature words with large text frequency unimportant.
2.2. Semantic Weighting Processing

There are numerous semantic relations among words in natural language [7]; for
instance, the same word appears in the document headline or text is of different semantic
features, and therefore, it is very significant for how to analyze and process the word meaning
and carry out weighting processing for the initial weights to reflect the semantic information.
The Thesis mainly adopts the position analysis to carry out weighting processing for
feature words.
The position of feature word is a key index showing the significance of it and the feature
word appearing in the headline is different from that appearing in the text. Therefore, the initial
is adjusted as follows:
weight

N
 0.01)
DF (ti )
'
wti (d ) 
N
 0.01)
(tfi (d )' ) 2 log 2 (


DF (ti )
i 1
tfi (d )' log(

tf i ( d ) '    ( a )  tf i ( d , a )

(2)

(3)

a

Where,
refers to the sum of weights of feature word in different positions of the
Document ;
refers to different position of feature word appearing in the document;
refers to the weight coefficient in corresponding position and important position is given larger
refers to the frequency of
coefficient and less important one given smaller one;
,

appearance of feature word in Position .
2.3. Information Entropy Weighting Processing
Entropy (Entropy) refers to the uncertainty measurement on the attribute corresponding
to the event [8]. The larger one attributive entropy is and the higher its chaotic level is, the larger

TELKOMNIKA Vol. 14, No. 2A, June 2016 : 241 – 247

TELKOMNIKA

ISSN: 1693-6930



243

contained uncertain information is, and entropy is the combination of mathematical method and
language philology.
Definition 2: assume as a random variable obtained from an infinite number of values
and the probability of each value obtained is


Pi  P S  si  , i  1, 2,..., n ;

Then, the entropy of

is

n

Entropy ( S )    Pi log Pi

(4)

i 1

Definition 3: when the appearance probabilities of all variables in
, the maximum entropy,


are equal, that is,


n

Entropy ( S )    Pi log Pi  log n

(5)

i 1

The entropy of reaches the maximum value, denoted as maximum entropy
.
Definition 4: for the feature word , in case of
, it shows that
is more chaotic than with higher uncertainty; in indicating an event, is better than .
Definition 5: the uncertainty of feature word is
n

Ui 

E ( ti )


Emax

  Pk (ti ) log Pk (ti )
k 1

(6)

log n

In term of said definitions, the feature weight undergoes further weighting processing as
follows through the information entropy:

N
 0.01)(1Ui )
DF (ti )
''
wti (d ) 
N
 0.01)(1Ui )2
(tfi (d )' )2 log2 (

(
)
DF
t
i 1
i
tfi (d )' log(

(7)

2.4. Feature Clustering Processing
Definition 6: the input of clustering analysis can be represented by a set of ordered pair
refers to a set of samples and the standard for measurement of sample
, , where
similarity [9].
,
Clustering system output refers to the data separation result, namely
, ,…,
where
, , … , is the subset of and complies with the following conditions:
(1) C1  C2    Ck  X ;
(1) C1  C2    Ck  X .
(2) C i  C j   , i  j 。
(2) C i  C j   and i  j .
in are called clusters and each cluster can be described
The members of , , … ,
with some features.
is one sample of clustering.
Definition 7: in feature clustering, each feature word
Weight of feature word in each Document
in training set is one-dimensional vector, namely
, where
,
is quantity of feature word
feature vector
,…,
,
and is quantity of document.
Definition 8: similarity measurement between feature and is:
Classification Recognition Algorithm Based on Strong Association Rule … (Zhang Xuewu)

244



ISSN: 1693-6930
n

sim(ti , t j ) 

 w (d
ti

k

 ( w (d

2

k 1

n

k 1

ti

) wt j (d k )
(8)

n

k

))

 (w
k 1

tj

(d k ))

2

When these two feature words become more similar, value of
,
∈ , will
become bigger
According to above definitions, feature clustering processing is adopted as follows:
(1) One feature word in candidate set of feature words shall be selected randomly as
the center of the first cluster.
(2) Next feature word shall be selected and similarity between the feature word and the
existing cluster center shall be calculated.
(3) If similarity between the feature word and centers of all clusters is less than the
specified threshold value, a new cluster with this feature as center shall be established;
otherwise, the feature word shall be added into the cluster having maximum similarity.
(4) Step 2 and step 3 shall be repeated until all feature words are processed.
(5) Center of cluster shall be reserved and other feature words in the cluster shall be
removed.
Redundancy of feature word will decrease sharply after feature clustering processing.
2.5. Algorithm Design Process
Candidate set for feature word of Document shall be and generating algorithm of
textual features is as follows:
For each ∈ , where
, ,…, .
of feature word ;
Step 1: Equation (1) shall be used to generate initial weight
Step 2: Text shall be scanned and position information of feature word
shall be
recorded and initial weight
shall undergo weighted processing according to equation (3)
after semantic weighting processing will be gained;
and then weight
of feature word
shall undergo further weighted processing
Step 3: Weight
will be gained;
according to equation (6) and then weight based on entropy
Step 4: Feature clustering processing shall be used to remove redundancy of feature
word and feature word set with minimum redundancy will be generated;
Step 5: Feature words in feature word set shall be sorted according to the value of
"
. Several large feature words shall be selected to generate final text feature ".
3. Analysis of Experiment and Results
3.1. Experiment Method
The Thesis uses text classification method to compare and evaluate textual features
extracted based on BP network weights evaluation algorithm and textual features generated
based on textual features generating algorithm of BP network respectively. It adopts KNN text
classification algorithm. Basic thought of the algorithm is: distance between each training text
and text to be classified shall be calculated and k training texts which are closest to the text to
be classified shall be taken; text to be classified belongs to the type which prevails in training
text of k texts. Distance between texts can be measured through cosine between text vectors.
3.2. Evaluation Criteria
There are many standards to evaluate classification effect and common-used
evaluation criteria are accuracy rate (Precious), recall rate (Recall) and aggregative indicator of
both parties F-measure [10].
(1) Accuracy rate is the proportion of text matching manual classification results in all
texts:

TELKOMNIKA Vol. 14, No. 2A, June 2016 : 241 – 247

TELKOMNIKA

p



ISSN: 1693-6930

acorrect
acorrect  berror

245

(9)

indicates quantity of text classified into certain type correctly;
Where,
indicates quantity of text classified into certain type falsely.
(2) Recall rate is the proportion of text matching system classification in texts of manual
classification results:

r

acorrect
acorrect  cerror

(10)

indicates quantity of text which is excluded of certain type falsely.
Where
(3) F-measure is an evaluation indicator after overall consideration of accuracy rate and
recall rate [11, 12] and its mathematical formula is as follows:

F 

(  2  1) * p * r
2* pr

(11)

Where, is correct rate, is recall rate and  is the parameter to adjust proportion of
correct rate and recall rate in evaluation function.  is 1 generally and it is function.

F1 

2 pr
pr

(12)

3.3. Experiment Results
IEEE conference papers in electronic library system of the school are used in test of the
Thesis and these papers are divided into 7 types: database management, information retrieval,
knowledge discovery, information security, data communication, distributed computation and
multimedia computation. There are 1400 papers in total and 1000 papers are training
documents and 400 papers are test documents. Coefficient value of
in the experiment is
shown in Table 1.
Table 1. Coefficient value of
a
Title and abstract part
Opening paragraph and conclusion paragraph
First sentence of each paragraph
Other parts

0.8
0.6
0.4
0.2

We use BP network algorithm and BP network algorithm to extract features of test
document set respectively in the experiment. We use KNN classification algorithm to classify
them and evaluate classification results. Figure 1 and Figure 2 are comparison of accuracy rate
and recall rate and Figure 3 is comparison of comprehensive evaluation F1.

Classification Recognition Algorithm Based on Strong Association Rule … (Zhang Xuewu)

246



ISSN: 1693-6930

Figure 1. Comparison of accuracy rate

Figure 2. Comparison of recall rate

Figure 3. Comparison of F1 value
Note: 1: database management; 2: information retrieval; 3: knowledge discovery; 4:
information security; 5: data communication; 6: distributed computation; 7: multimedia
computation
Comparative analysis of experiment results shows that, accuracy rate, recall rate and
F1 value based on BP network algorithm are higher than that of BP network algorithm. Figure 3
shows that, F1 value with BP network algorithm as feature selection is about 5% higher than F1
value with BP network algorithm as feature selection.

TELKOMNIKA Vol. 14, No. 2A, June 2016 : 241 – 247

TELKOMNIKA

ISSN: 1693-6930



247

4. Conclusion
In terms of automatic extraction of Chinese document features, the Thesis puts forward
a textual features generating algorithm based on clustering weighting. Such algorithm integrates
word frequency, document frequency, semantic relation and entropy. It includes weighting
processing of feature word and makes up deficiencies of BP network algorithm. It conducts
clustering processing on textual features through weight of weighting processing, which filters
redundancy features effectively and promote generating quality of textual features. Algorithm
only considers position information during semantic weighting and semantic weighting is not
comprehensive. More comprehensive analysis and processing of semantic parts will promote
extracting effect of feature word and accuracy rate of textual classification further.
Acknowledgements
The Foundation of Jiangsu Educational Committee of China under Grant No.
13KJB420001.
References
[1]

Wei R, Shen X, Yang Y. Urban Real-Time Traffic Monitoring Method Based on the Simplified
Network Model. International Journal of Multimedia & Ubiquitous Engineering. 2016; 11(3): 199-208.
[2] Chakraborty K, Roy I, De P, et al. Controlling the Filling and Capping Operation of a Bottling Plant
using PLC and SCADA. Indonesian Journal of Electrical Engineering and Informatics (IJEEI). 2015;
3(1): 39-44.
[3] Lv Z, Halawani A, Fen S. Touch-less Interactive Augmented Reality Game on Vision Based
Wearable Device. Personal and Ubiquitous Computing. 2015; 19(3): 551-567.
[4] Hu J, Gao Z, Pan W. Multiangle Social Network Recommendation Algorithms and Similarity Network
Evaluation. Journal of Applied Mathematics. 2013; 2013(2013): 3600-3611.
[5] Degui Zeng, Yishuang Geng, Content distribution mechanism in mobile P2P network. Journal of
Networks. 2014; 9(5): 1229-1236.
[6] Gu W, Lv Z, Hao M. Change detection method for remote sensing images based on an improved
Markov random field. Multimedia Tools and Applications. 2015; 1-16.
[7] T Sutikno, M Facta, GRA Markadeh. Progress in Artificial Intelligence Techniques: from Brain to
Emotion. TELKOMNIKA Telecommunication Computing Electronics and Control. 2011; 9(2): 201202.
[8] Yan X, Wu Q, Zhang C, Li W, Chen W, Luo W. An improved genetic algorithm and its application.
Indonesian Journal of Electrical Engineering and Computer Science. 2012, 10(5): 1081-1086.
[9] Chi Andrei, Denker M, et al. Practical domain-specific debuggers using the Moldable Debugger
framework. Computer Languages Systems & Structures. 2015; 44(PA): 89-113.
[10] RK Tripathy, A Acharya, SK Choudhary, SK Sahoo. Artificial Neural Network based Body Posture
Classification from EMG signal analysis. Indonesian Journal of Electrical Engineering and Informatics
(IJEEI). 2013; 1(2): 59-63.
[11] Su T, Wang W, Lv Z. Rapid Delaunay triangulation for randomly distributed point cloud data using
adaptive Hilbert curve. Computers & Graphics. 2015; 54: 65-74.
[12] Gu W, Lv Z, Hao M. Change detection method for remote sensing images based on an improved
Markov random field. Multimedia Tools and Applications. 2015; 1-16.

Classification Recognition Algorithm Based on Strong Association Rule … (Zhang Xuewu)