An Introduction to Probabilistic Neural Networks

An Introduction to
Probabilistic Neural
Networks
Vincent Cheung
Kevin Cannons
Signal & Data Compression Laboratory
Electrical & Computer Engineering
University of Manitoba
Winnipeg, Manitoba, Canada
Advisor: Dr. W. Kinsner

June 10, 2002

Outline
● Introduction
● Classifier Example
● Theory and Architecture
● Training
● Program Implementations
● Conclusion

Page 1 of 13

Intro

Example

Theory

Training

Programs

What is a PNN?
● A probabilistic neural network (PNN) is
predominantly a classifier
► Map any input pattern to a number of
classifications
► Can be forced into a more general function
approximator

● A PNN is an implementation of a
statistical algorithm called kernel
discriminant analysis in which the
operations are organized into a
multilayered feedforward network with
four layers:
► Input layer
► Pattern layer
► Summation layer
► Output layer
Page 2 of 13

Intro

Example

Theory

Training

Programs

Advantages and
Disadvantages
● Advantages

► Fast training process

■ Orders of magnitude faster than
backpropagation

► An inherently parallel structure
► Guaranteed to converge to an optimal
classifier as the size of the representative
training set increases
■ No local minima issues

► Training samples can be added or removed
without extensive retraining

● Disadvantages

► Not as general as backpropagation
► Large memory requirements
► Slow execution of the network
► Requires a representative training set

■ Even more so than other types of NN’s
Page 3 of 13

Intro

Example

Theory

Training

Programs

Classification
Theory
● If the probability density function (pdf) of
each of the populations is known, then an
unknown, X, belongs to class “i” if:
fi(X) > fj(X), all j ≠ i

fk is the pdf for class k

● Other parameters may be included
► Prior probability (h)

■ Probability of an unknown sample being drawn
from a particular population

► Misclassification cost (c)

■ Cost of incorrectly classifying an unknown

► Classification decision becomes:

hicifi(X) > hjcjfj(X), all j ≠ i
(Bayes optimal decision rule)

Page 4 of 13

Intro

Example

Theory

Training

Programs

PDF Estimation
● Estimate the pdf by using the samples of
the populations (the training set)
● PDF for a single sample (in a population):

1  x − xk 
W

σ  σ 

x = unknown (input)
xk = “kth” sample
W = weighting function
σ = smoothing parameter

● PDF for a single population:
1 n  x − xk 
W

∑
nσ k =1  σ 

(average of the pdf’s
for the “n” samples in
the population)

(Parzen’s pdf estimator)

● The estimated pdf approaches the true
pdf as the training set size increases, as
long as the true pdf is smooth
Page 5 of 13

Intro

Example

Theory

Training

Programs

Weighting Function
● Provides a “sphere-of-influence”

► Large values for small distances between the
unknown and the training samples
► Rapidly decreases to zero as the distance
increases

● Commonly use Gaussian function
► Behaves well and easily computed
► Not related to any assumption about a normal
distribution

● The estimated pdf becomes:
g ( x) =

n

1
e
∑
nσ k =1

−

( x − xk ) 2

σ2

Page 6 of 13

Intro

Example

Theory

Training

Programs

Multivariate Inputs
● Input to the network is a vector

● PDF for a single sample (in a population):
−
1
e
p/2 p
(2π ) σ

X −Xk
2σ

2

X = unknown (input)
Xk = “kth” sample
σ = smoothing parameter
p = length of vector

2

● PDF for a single population:
gi ( X ) =

ni

1
(2π ) p / 2 σ p ni

∑e

−

X − X ik

k =1

● Classification criteria:
gi(X) > gj(X), all j ≠ i
2
X − X ik
ni
−
1
σ2
∴ gi ( X ) = ∑ e
ni k =1
Page 7 of 13

2σ

2

2

(average of the pdf’s
for the “ni” samples in
the “ith”population)

(eliminate common factors
and absorb the “2” into σ)

Intro

Example

Theory

Training

Programs

Training Set
● The training set must be thoroughly
representative of the actual population for
effective classification
► More demanding than most NN’s
► Sparse set sufficient
► Erroneous samples and outliers tolerable

● Adding and removing training samples
simply involves adding or removing
“neurons” in the pattern layer
► Minimal retraining required, if at all

● As the training set increases in size, the
PNN asymptotically converges to the
Bayes optimal classifier
► The estimated pdf approaches the true pdf,
assuming the true pdf is smooth
Page 8 of 13

Intro

Example

Theory

Training

Programs

Training
● The training process of a PNN is
essentially the act of determining the
value of the smoothing parameter, sigma
► Training is fast

Correct Classifications

■ Orders of magnitude faster than
backpropagation
Optimum

Matched Filter

Nearest Neighbour
Sigma (σ)

● Determining Sigma

► Educated guess based on knowledge of the
data
► Estimate a value using a heuristic technique
Page 9 of 13

Intro

Example

Theory

Training

Programs

Estimating Sigma
Using Jackknifing
● Systematic testing of values for sigma
over some range
► Bounding the optimal value to some interval
► Shrinking the interval

● Jackknifing is used to grade the
performance of each “test” sigma
► Exclude a single sample from the training set
► Test if the PNN correctly classifies the
excluded sample
► Iterate the exclusion and testing process for
each sample in the training set
■ The number of correct classifications over the
entire process is a measure of the performance
for that value of sigma

► Not unbiased measure of performance

■ Training and testing sets not independent
■ Gives a ball park estimate of quality of sigma
Page 10 of 13

Intro

Example

Theory

Training

Programs

Implementations
● Current Work
► Basic PNN coded in Java
■ Simple examples
Boy/Girl classifier (same as perceptron)
Classification of points in R2 or R3 into the
quadrants

► Multithreaded PNN
■ For parallel processing (on supercomputers)
■ One thread per class

● Future Work
► Artificially create a time series of a chaotic
system and use a PNN to classify its features
in order to reconstruct the strange attractor
■ Further test the classification abilities of PNN
■ Test the PNN’s tolerance to noisy inputs

Page 11 of 13

Conclusion
● PNN’s should be used if
► A near optimal classifier with a short training
time is desired
► Slow execution speed and large memory
requirements can be tolerated

● No extensive testing on our
implementation of PNN’s have been done
► Once chaotic time series have been obtained,
we will have more challenging data to work
with

Page 12 of 13

References
[Mast93] T. Masters, Practial Neural Network Recipes in C++, Toronto, ON: Academic
Press, Inc., 1993.

[Specht88] D.F. Specht, “Probabilistic Neural Networks for Classification, Mapping, or
Associative Memory”, IEEE International Conference on Neural Networks, vol. I, pp.
525-532, July 1998.

[Specht92] D.F. Specht, “Enhancements to Probabilistic Neural Networks”, International
Joint Conference on Neural Networks, vol. I, pp. 761-768, June 1992.

[Wass93] P. D. Wasserman, Advanced Methods in Neural Computing, New York, NY:
Van Nostrand Reinhold, 1993.

[Zak98] Anthony Zaknich, Artificial Neural Networks: An Introductory Course. [Online].
http://www.maths.uwa.edu.au/~rkealley/ann_all/ann_all.html (as of June 6, 2002).

Page 13 of 13

Simple Classifier
Example
● Idea behind classification using a PNN
● Three classes or populations
► X, O, and

● The “?” is an unknown sample and must
be classified into one of the populations
● Nearest neighbour algorithm would
classify the “?” as a because a
sample is the closest sample to the “?”
► In other words, with nearest neighbour, the
unknown belongs to the same population as
the closest sample

Page 14 of 13

Simple Classifier
Example
● A more effective classifier would also
take the other samples into consideration
in making its decision
● However, not all samples should
contribute to the classification of a
particular unknown the same amount
► Samples close to the unknown should have a
large contribution (increase the probability of
classifying the unknown as that population)
► Samples far from the unknown should have a
small contribution (decrease the probability of
classifying the unknown as that population)
► A “sphere-of-influence”

Page 15 of 13

Simple Classifier
Example
● What the more effective classifier would
then do is, for each population, calculate
the average of all the contributions made
by the samples in that population
● The unknown sample is then classified as
being a member of the population which
has the largest average

Page 16 of 13

Architecture

Input
Layer

Pattern
Layer

Summation
Layer

Output
Layer

Architecture
1
gi ( X ) =
ni

ni

∑e
k =1

−

X − X ik

σ

2

−

2

X11

y11 =

e

X − X 11

2

σ2

1
X12

−

y12 =
X21

e

X − X 12

2

σ2

y21

X

2
X22

y11 + y12
g1(X) =
2

g2(X)

y22
g3(X)

X31
X32
X33
Input
Layer

y31
y32

3

Output =
Class of
Max(g1, g2, g3)

y33

Pattern
Layer
(Training Set)

Summation
Layer

Output
Layer

An Introduction to Probabilistic Neural Networks

Dokumen yang terkait

An analysis of learners' needs and learning needs for ESP course at hotel and tourism program of business training center (BTC) Jember in the 2001/2002 academic year

EVALUASI PENGELOLAAN LIMBAH PADAT MELALUI ANALISIS SWOT (Studi Pengelolaan Limbah Padat Di Kabupaten Jember) An Evaluation on Management of Solid Waste, Based on the Results of SWOT analysis ( A Study on the Management of Solid Waste at Jember Regency)

An analysis of illocutionary act in prince of persia: the sand of time movie

An analysis of moral values through the rewards and punishments on the script of The chronicles of Narnia : The Lion, the witch, and the wardrobe

An Analysis of illocutionary acts in Sherlock Holmes movie

An Aalysis grmmatical and lexical cohesion on the journalistic of VoANEWS.COM

An Identity Crisis In Hanrahan's Lost Girls And Love Hotels

An analysis on the content validity of english summative test items at the even semester of the second grade of Junior High School

The Effectiveness of Computer-Assisted Language Learning in Teaching Past Tense to the Tenth Grade Students of SMAN 5 Tangerang Selatan

Factors Related to Somatosensory Amplification of Patients with Epigas- tric Pain

Dukungan

Links

An Introduction to Probabilistic Neural Networks

Dokumen yang terkait

An analysis of learners' needs and learning needs for ESP course at hotel and tourism program of business training center (BTC) Jember in the 2001/2002 academic year

EVALUASI PENGELOLAAN LIMBAH PADAT MELALUI ANALISIS SWOT (Studi Pengelolaan Limbah Padat Di Kabupaten Jember) An Evaluation on Management of Solid Waste, Based on the Results of SWOT analysis ( A Study on the Management of Solid Waste at Jember Regency)

An analysis of illocutionary act in prince of persia: the sand of time movie

An analysis of moral values through the rewards and punishments on the script of The chronicles of Narnia : The Lion, the witch, and the wardrobe

An Analysis of illocutionary acts in Sherlock Holmes movie

An Aalysis grmmatical and lexical cohesion on the journalistic of VoANEWS.COM

An Identity Crisis In Hanrahan's Lost Girls And Love Hotels

An analysis on the content validity of english summative test items at the even semester of the second grade of Junior High School

The Effectiveness of Computer-Assisted Language Learning in Teaching Past Tense to the Tenth Grade Students of SMAN 5 Tangerang Selatan

Factors Related to Somatosensory Amplification of Patients with Epigas- tric Pain

Dokumen yang Anda mencari sudah siap untuk unduhkan