Selected Notions of Probability Theory
B.1 Selected Notions of Probability Theory
A notion of σ-algebra generated by a family of sets is included in Appendix I . In order to introduce notions of probability theory used in our considerations, firstly we present a definition of a special σ-algebra generated by a family of open sets. 5
Definition B.1 Let X be a topological space. A σ-algebra generated by a family of open sets of the space X is called the Borel σ-algebra. Any element of a Borel σ-algebra is called a Borel set. A family of all Borel sets on X is denoted with B(X).
After defining a Borel set, we can introduce the following notions: probability distribution, random vector (random variable), and distribution of random vector (distribution of random variable).
In Appendix I n (R denotes a set of real numbers), and
4 Appendix I contains basic notions of probability theory that are used for probabilistic reasoning in intelligent systems.
5 Open set and topological space are defined in Appendix G.1 .
© Springer International Publishing Switzerland 2016 251 M. Flasi´nski, Introduction to Artificial Intelligence, DOI 10.1007/978-3-319-40022-8
252 Appendix B: Formal Models for Artificial Intelligence Methods … consequently in a σ-algebra F being a family of Borel sets B(R n ) . Let us introduce
a definition of probability distribution. Definition B.2
A probability measure P such that the triple (R n , B(R n ), P) is a probability space is called an n-dimensional probability distribution.
Definition B.3 An n-dimensional probability distribution P is called a discrete dis- tribution
iff there exists a Borel set S ⊂ R n such that:
P(S) = 1 and s ∈ S ⇒ P({s}) > 0 .
If S = {s i : i = 1, . . . , m}, where m ∈ N (i.e. m is a natural number) or m = ∞ and P(
{s n i }) = p i , then for each Borel set A ⊂ R the following formula holds: P(A) =
P( {s i }) =
i :s i ∈A and in addition:
i :s i ∈A
•p i > m
0, for each i = 1, . . . , m, •
i =1 p i = 1. Definition B.4 An n-dimensional probability distribution P is called a continuous
distribution iff there exists an integrable function f : R n
−→ R such that for each Borel set A ⊂ R the following formula holds:
P(A) =
f (x) dx ,
where
f (x) dx denotes a multiple integral of a function f over A. A function f is
A called a probability density function, and in addition:
• f (x) ≥ 0, for each x ∈ R n , •
f (x) dx = 1.
Definition B.5 n is called a random vector iff:
X −1 ( B) for each Borel set B ∈ B(R n ) .
A one-dimensional random vector is called a random variable.
Appendix B: Formal Models for Artificial Intelligence Methods … 253 Definition B.6
a random
vector. A distribution P X defined by a formula: P X ( B) = P(X −1 ( B)),
for B ∈ B(R n ) , is called a distribution of a vector X.
A distribution of a random variable is defined in an analogous way (we have R, instead of R n ). Now, we introduce definitions of basic parameters of random variable distribu- tions: expected value, variance, and standard deviation.
Definition B.7 able. An expected value of X is computed as:
m = E(X) =
i =1
if X is a random variable of a discrete distribution: P(X = x i ) =p i , i = 1, . . . , m, m ∈ N or m = ∞, or it is computed as:
m = E(X) =
xf (x) dx ,
if X is a random variable of a continuous distribution with a probability density function f .
Definition B.8 able with a finite expected value m = E(X). A variance of X is computed as:
σ 2 =D 2 ( X)
= E((X − m) 2 ) =
− m) 2 p i ,
i =1
if X is a random variable of a discrete distribution: P(X = x i ) =p i , i = 1, . . . , m, m ∈ N or m = ∞, or it is computed as:
=D ( X)
= E((X − m) 2 ) =
− m) 2 f (x) dx,
if X is a random variable of a continuous distribution with a probability density function f . A standard deviation of a random variable X is computed as:
σ= D 2 ( X).
At the end of this section we introduce a notion of the normal distribution, called also the Gaussian distribution.
254 Appendix B: Formal Models for Artificial Intelligence Methods … Definition B.9
A distribution P is called the normal (Gaussian) distribution iff there exist numbers: m, σ ∈ R, σ > 0 such that a function f : R −→ R, given by a formula:
is a probability density function of a distribution P. The normal distribution with parameters: m (an expected value) and σ (a standard deviation) is denoted with N (m, σ).