Temperament Accuracy Judgment and Intelligent Processing Based on Voiceprint Recognition

TELKOMNIKA, Vol.14, No.2A, June 2016, pp. 357~362
ISSN: 1693-6930, accredited A by DIKTI, Decree No: 58/DIKTI/Kep/2013
DOI: 10.12928/TELKOMNIKA.v14i2A.4378



357

Method of Temperament Accuracy Judgment and
Intelligent Processing Based on Voiceprint Recognition
1,3

2

Xiaohua Yang1, Xiaofei Zhu2, Xiaowei Zhu*3

School of Arts, Harbin University of Science and Technology, Harbin, 150080, China
School of Computer Science and Technology, Harbin University of Science and Technology, Harbin,
150080, China
*Corresponding author, e-mail: smile.zhu.2012@foxmail.com


Abstract
Voiceprint recognition is a kind of temperament recognition. It achieves judgment of temperament
through analysis of voiceprint. The traditional judgment method usually recognizes the voiceprint directly,
ignoring the influence of environmental noise, sound variation and other factors in the process of
recognition. Study on the method of temperament accuracy judgment and intelligent processing based on
voiceprint recognition is proposed because the traditional method is of low accuracy. The process
includes, extraction of voiceprint main features, pretreatment of voiceprint for feature parameters,
classification of the voiceprint according to the feature parameters, getting the temperament feature value
through the Mel frequency cepstrum coefficient method, building the vector quantization model, introducing
hypersphere extremum method to extract the sound signal features, applying LBG algorithm to classify the
sound signal features, and achieving the aim of judging the temperament accurately. The experimental
results show that the improved algorithm can effectively overcome the environmental noise, effectively
reduce the impact of noise and sound variation, and improve the accuracy of temperament judgment.
Keywords: Voiceprint recognition, temperament accuracy, LabVIEW graphical programming
Copyright © 2016 Universitas Ahmad Dahlan. All rights reserved.

1. Introduction
Voiceprint recognition technology has been used in many fields such as military,
communication, etc [1]. Voiceprint recognition technology is the technology that can
automatically and accurately identify the temperament through the analysis of sound signal and

the extraction of parameters of temperament features [2]. It is very easy to judge a sound signal
for humans, but not for computers [3]. At present, we have not found the algorithm that can fully
characterize the temperament model, letting alone the accuracy of judgment [4]. Therefore, it is
a current research hotspot to find a good method of temperament signal classification so that
the computer can process sound information like the human brains, improving the accuracy of
judgment and reducing the error of the judgment results to a minimum at the same time [5].
The 6th reference [6] proposed a method based on variance analysis to judge the
sound accuracy. It is mainly based on the analysis of variance to classify the prosodic features,
realizing the accuracy of judgment. Although this method takes short time, it may have bog
error. The 7th reference [7] proposed a method based on MFCC voiceprint recognition. By
dividing the sound into two parts, namely, confirmation of temperament, and identification of
temperament, this method can realize the accurate temperament judgment. It has higher
accuracy, but it is time-consuming. The 8th reference [8] proposed a method that judges the
voiceprint through the adaptive system. This method studies the voiceprint using mode
matching and variance analysis of probability statistics with low efficiency. The method put
forward by the 9th reference [9] and the 10th reference [10] is mainly applying the F ratio
formula to measure the validity of feature parameters. Based on the law of time variation of the
fundamental frequency, this method analyzes the principal component of fundamental
frequency parameters by statistical techniques. Although this method can identify the judgment
of the temperament accuracy, it has big error [11, 12].

As for the problems mentioned above, we put forward the method of temperament
accuracy judgment and intelligent processing based on voiceprint recognition. The system of
voiceprint recognition is set up to judge temperament accuracy by optimizing the Mel frequency

Received January 22, 2016; Revised April 27, 2016; Accepted May 13, 2016

358



ISSN: 1693-6930

cepstrum coefficient, which is a parameter used to describe temperament recognition features,
and the recognition model, VQ algorithm. The improved MFCC algorithm, the VQ vector
quantization model and the Gauss mixed model are also used to analyze the experimental
results. The experimental results show that the system can effectively overcome the influence of
environmental noise and the variation of temperament. It is of high practicability.
2. Study on the Method of Temperament Accuracy Judgment and Processing Based on
Voiceprint Recognition
Based on voiceprint recognition, in order to achieve the accuracy of judgment, the

method to judge the sound accuracy in this paper mainly uses LabVIEW data stream
programming technique to design voiceprint recognition modules in the voiceprint recognition
system. The specific process includes the extraction of voiceprint main features, pretreatment of
voiceprint for feature parameters, classification of the voiceprint according to the feature
parameters, getting the temperament feature value through the Mel frequency cepstrum
coefficient method, building the vector quantization model, introducing hypersphere extremum
method to extract the sound signal features, applying LBG algorithm to classify the sound signal
features, and achieving the aim of judging the temperament accurately.
2.1. Designing Voiceprint Recognition Modules in the Voiceprint Recognition System
Figure 1 shows the procedure of designing voiceprint recognition modules in the
voiceprint recognition system using LabVIEW data stream programming technique and realizing
the recognition of voice. At present, voice signal acquisition and processing system is
developed based on the VC++, MATLAB and LabVIEW software. Among them, LabVIEW,
equipped with modularized programming, is more flexible than traditional programming because
it uses the icon code to create application programs, replacing the programming language.
MATLAB is a cross-platform environment for scientific computing. With strong ability of
numerical analysis, matrix computation, signal processing, image and sound processing and
system recognition functions, it combines computer software design platform and instruments of
data acquisition, analysis, processing, results expression, etc.
The programming of LabVIEW includes the design of the front panel and the program

block diagram. The front panel equals to the traditional instrument’s front panel, mainly used for
instrument control and signal display; program block diagram design designs voiceprint
recognition programs by certain methods. The system front panel as shown in Figure 2 and
voiceprint recognition program block diagram is shown in Figure 3.

Figure 1. Realization of Voiceprint Recognition
Modules

Figure 2. UI Event Handler

TELKOMNIKA Vol. 14, No. 2A, June 2016 : 357 – 362

ISSN: 1693-6930

TELKOMNIKA



359


Figure 3. Voiceprint recognition program block diagram
Voiceprint recognition module of the system mainly includes two parts, feature
extraction and pattern recognition. Based on premise of the fact that the front panel and
voiceprint recognition program block diagram are determined, voiceprint recognition module is
designed around feature extraction and pattern recognition, as shown below:

Figure 4 voiceprint recognition module
2.2. Extraction and Classification of Voiceprint Features and Characteristics Parameters
Signals that may make sound unstable are framed, and then each frame’s signal is
transmitted as a stationary signal. Voiceprint features are achieved through the pretreatment of
is original voice,
the length of
the input voiceprint using Fourier transform. Assuming that
voiceprint, and formula of the voiceprint features is shown below:


, ,…,

(1)


represents activation function.
In this formula: represents the number of signals;
Voiceprint parameters are obtained by band-pass filter processing of the obtained voiceprint
voiceprint
features in the filter bank. Assuming that j is the voiceprint that needs filtering, and
variables, and voiceprint parameters calculation formula can be expressed as follows:
(2)
Voiceprints are classified according to the obtained parameters. The formula below can
achieve the classification of voiceprint:

In this formula:

, ,…,

represents the weight;

(3)

represents the threshold.


Method of Temperament Accuracy Judgment and Intelligent Processing … (Xiaowei Zhu)

360



ISSN: 1693-6930

2.3. Extraction of Temperament Signal Features
After the classification of features, further processing is carried out to get the
characteristic value. The temperament signal features are achieved by the establishment of
sound vector quantization model through initial codebook selection method. Specific steps are
as follows:
as a set of features, and randomly select a feature from X
, ,…,
Suppose
. Sound feature value can be got after the training of the feature
as the first feature value
selected. The expression is shown as follows:
(4)

Establishing sound vector quantization model. On the basis of the fact that the
characteristic value is determined, sound vector quantization model is set through initial
codebook selection method. The expression is as follows:

(5)

Based on the sound vector quantization model, sound signal features are extracted
through hypersphere extremum method. Assuming that the number of signals in the feature set
is
,  the pre-weighting coefficients, and e the network prediction error, the formula is:
(6)
In this formula:

represents prediction rate of feature

2.4. Judgment and Intelligent Processing of Temperament Accuracy
Based on the improved LBG algorithm, sound signal features are classified. Suppose
, with each template to quantify the feature
sound features vector sequence is
, ,…,

vectors sequence, realizing the intelligent classification of sound signal features. Formula:

Di 

1
M

 min[Y    X
M

0

n 1

1 l  L

n

, Yl i ]


(7)

In the formula above
represents the
codebook vector in the
codebook,
, ,…, ,
, , … , ( n represents the number of sounds). M represents the number of
codebook vectors in every codebook.
,
represents the distance between
and .
On the basis of classification sound signal features, accurate judgment and intelligent
processing of sound can be achieved. The formula is:


 h2  4 [  (


log f
)dx]1
h

(8)

3. Results and Analysis
A lot of experiments to judge the temperament accuracy are carried out using improved
MFCC algorithm, VQ vector quantization model and Gauss mixed model in the laboratory
environment. All sound data used in the experiments is from self built database and
independent of the text. And all the data is collected in different noise environment. The
experimental data using the PC sound card through the audio sampling level 12KHz, with the
sampling accuracy of 16bit. Figure 5 is Matlab interface diagram based on the improved MFCC
and vector quantization model, of which the twelfth can not accurately judge. Figure 6 is the
recognition rate of 30 times temperament extraction of 15 experimental individuals during the 30

TELKOMNIKA Vol. 14, No. 2A, June 2016 : 357 – 362



ISSN: 1693-6930

TELKOMNIKA

361

days. Black solid line shows the recognition rate of improved method, and the blue dotted line
represents the traditional method recognition rate. Table 1 shows the recognition rate. Table 2
shows the total recognition rate of 30 times records of 15 experimental individuals, the Nth day’s
record being the sample.

Figure 5. Voiceprint Recognition Statistical Interface Based on MFCC and VQ

Figure 6. Recognition Rate of Traditional Method and Improved Method (Everyday)
Table 1. Recognition Rate of Traditional Method and Improved Method (Everybody)
Sample
No.
1
3
5
7
9
11
13
15

Recognition rate (%)
Traditional
Improved
method
method
96.7
100
96.7
96.7
93.3
100
100
96.7
83.3
96.7
96.7
96.7
93.3
93.3
90
96.7

Recognition rate (%)
Sample No.
2
4
6
8
10
12
14

Traditional method

Improved method

86.7
83.3
90
90
93.3
90
90

100
93.3
100
100
100
93.3
96.7

Table 2. Recognition Rate of Sound Samples from Different Days
Training date
1
3
5
15
25

Recognition rate (%)
97.8
97.6
98.0
97.3
97.3

Training date
2
4
10
20
30

Recognition rate (%)
97.3
97.1
96.7
97.8
96.7

Method of Temperament Accuracy Judgment and Intelligent Processing … (Xiaowei Zhu)

362



ISSN: 1693-6930

Table 1 and 2 show that in the actual voiceprint recognition, different parameters and
different models will affect the recognition results, with many factors restricting each other. The
experimental results show that this method overcomes the drawbacks of the traditional
stochastic method and splitting method, and the improved method has a higher rate of
accuracy.
4. Conclusion
Based on voiceprint recognition, in order to achieve the accuracy of judgment, the
method to judge the sound accuracy in this paper mainly uses LabVIEW data stream
programming technique to design voiceprint recognition modules in the voiceprint recognition
system. The specific process includes the extraction of voiceprint main features, pretreatment of
voiceprint for feature parameters, classification of the voiceprint according to the feature
parameters, getting the temperament feature value through the Mel frequency cepstrum
coefficient method, building the vector quantization model, introducing hypersphere extremum
method to extract the sound signal features, applying LBG algorithm to classify the sound signal
features, and achieving the aim of judging the temperament accurately. 15 experimental
individuals’ 30 times of record being the samples, experiments are carried out respectively by
the traditional method and the improved algorithm. The experimental results show that the
improved algorithm in the temperament recognition accuracy is superior to the traditional
method.
Conflict of interest
The authors confirm that this article content has no conflicts of interest.
References
[1]

Andi Muhammad Ilyas, M Natsir Rahman. Economic dispatch thermal generator using modified
improved particle swarm optimization. TELKOMNIKA Telecommunication Computing Electronics and
Control. 2012; 10(3): 459-470.
[2] Zhang Minmin, Ma Jun, Gong Chenxiao, Chen Liangliang, Zheng Qianqian. Design of voiceprint
recognition system software based on MATLAB. The technology vision. 2013; 22 (22): 7-7.
[3] Li Yuding. Discussion on the Mel algorithm of series MFCC in speech signal feature extraction.
Journal of advanced continuing education. 2012; 04 (04): 78-80.
[4] Ma Hairui, Han Yundong, Yuan Qunzhe, Hu Hongcan. Analysis of parallel mechanism of multi core
technology and graphical programming. Computer programming skills and maintenance. 2012; (18):
12-13.
[5] Piltan F, Mansoorzadeh M, Zare S, et al. Artificial tune of fuel ratio: Design a novel siso fuzzy
backstepping adaptive variable structure control. International Journal of Electrical and Computer
Engineering. 2013; 3(2): 171.
[6] Wu Di, Cao Jie, Wang Jinhua. Speaker recognition based on adaptive Gauss mixture model and
static and dynamic auditory feature fusion. Optical precision engineering. 2013; 21(6): 1598-1604.
[7] Jiang Linqiong, He Jianbiao. An interference elimination algorithm of voiceprint, Mel frequency
cepstrum coefficient. Computer simulation. 2013; 30 (4): 382-385.
[8] Ali ZS. Mobile Phone and Pakistani Youth: A Gender perspective. Journal of Telematics and
Informatics. 2013; 1(2): 59-68.
[9] Yang Shengyue, He Zhengming, Zhou Yanyu, Long Hui. An Audio-Based Fault Diagnosis Method
using MFCC and LPCC features for Rolling Bearings. Micro computer information. 2014; 25(31): 123124.
[10] T Sutikno, AZ Jidin, A Jidin, NRN Idris. Simplified VHDL coding of modified non-restoring square root
calculator. International Journal of Reconfigurable and Embedded Systems. 2012; 1(1): 37-42.
[11] Shen Qia, Weng Zhiwen, Zhu Peiqi. Study on dynamic rhythm basis. Journal of Central Conservatory
of Music. 2005; 1(1): 8-13.
[12] Yu Jianchao, Zhang Ruilin. Speaker recognition based on MFCC and LPCC. Computer engineering
and design. 2009; 30(5): 1189-1191.

TELKOMNIKA Vol. 14, No. 2A, June 2016 : 357 – 362