The Polytope of Phase-Type Distributions

2.7 The Polytope of Phase-Type Distributions

Let PH(A) be the set of all PH distributions with representation (~ α, A), where ~ α ranges over all sub-stochastic vectors of dimension n, i.e.,

PH(A) = n {PH(~α, A) | ~α ∈ R

≥0 ,~ α~e ≤ 1}.

The point mass at zero ( δ) always resides in any PH(A), having representation (~0, A). PH(A) can be regarded as a subset of a vector space of signed measures 4 on R ≥0 having rational n LST s [O’C90]. Using ~e

i to denote the i-th unit vector of R , PH(A) is the convex hull of vectors δ and PH(~e i , A), for 1 ≤ i ≤ n. Hence each of the n + 1 distributions is a vertex of PH(A). We refer to this convex hull as the polytope 5 of the PH -generator A.

4 Signed measure is a measure that can have negative values. 5 See Appendix A.3 for a brief description of some required concepts from the field of convex analysis.

2.7. The Polytope of Phase-Type Distributions

Example 2.33. Let (~γ, G) be a PH representation, where ~γ = [ 1 2 , 1 4 , 1 4 ] and

We illustrate the polytope of PH -generator G and several concepts related to it in Figure 2.8. Sub-figure (a) depicts the polytope in a 3-dimensional coordinate system.

~e 3

R 0 . 1 PH(~γ, G)

R 0 . 01 PH(~γ, G)

Figure 2.8: The Polytope PH(G) and Its Unit Components

Each point in the coordinate system represents an initial vector—which can be a non- valid initial probability vector—of PH -generator G. The origin corresponds to the point mass at zero ( δ) and the unit vectors on the axes x, y and z correspond to PH distributions

PH(~e 1 , G), PH(~e 2 , G) and PH(~e 3 , G), respectively. The convex hull of these four vectors forms a polyhedron ~ 0~e 1 ~e 2 ~e 3 . Every point inside the polyhedron represents a valid initial

probability vector (namely a sub-stochastic vector) of PH -generator G, and it is a convex combination of vectors ~ 0, ~e 1 , ~e 2 and ~e 3 . The plane ~e 1 ~e 2 ~e 3 is called the face of the polytope. (We can generalize this for the n-dimensional space: the hyperplane ~e 1 ~e 2 · · ·~e n is also called the face of the corresponding polytope). Every point on this plane represents a stochastic vector, hence corresponds to a PH representation that starts immediately in the absorbing state with probability 0. The PH distribution PH(~γ, G) is on this plane, represented by point ~γ in the sub-figure.

Sub-figures (b), (c) and (d) depict the PH representations of PH(~e 1 , G), PH(~e 2 , G) and PH(~e 3 , G), respectively.

2.7.1 Residual-Life Operator

Intuitively, the residual-life operator R t is an operator whose application on a PH distri- bution amounts to running the underlying Markov process of the PH distribution t time units. At time t, a new PH distribution is produced, which reflects the current standing of the underlying Markov process. The operator R t , then, describes the trajectory that is traversed by the PH distribution in its polytope as t changes.

Definition 2.34 ([O’C90]). Let 6 Z be the vector space of all Borel signed measures. For

6 The σ-algebra generated by the open sets of R is called the Borel σ-algebra, denoted by B. A measure defined on the Borel σ-algebra is called a Borel measure. A set is a Borel subset if A ⊆ Ω and

A ∈ B [Rud87]. A signed measure on B is called a Borel signed measure.

28 Chapter 2. Preliminaries

all µ ∈ Z, the residual-life operator R t ,t ∈R ≥0 , is defined by

R t µ(0) = µ([0, t]) and R t µ(E) = µ(E + t),

where

E is a Borel subset of [0, + ∞) and E + t is the translation of E by t units to the right.

The residual-life operator translates the mass associated with the given measure by t units to the left of real line. The mass that because of the translation now resides in the interval ( −∞, 0) is accumulated on point t = 0. Given a PH distribution PH(~ α, A), the application of the residual-life operator R t on the distribution is

t PH(~ α, A) = PH(~ αe , A).

For each sub-stochastic vector A ~ α and t ′ t ∈R ≥0 , vector α ~ := ~ αe is sub-stochastic. Hence all possible initial probability distributions of a PH -generator A and their tra-

jectories towards absorption (namely to the point mass at zero δ) reside inside the polytope A. The dimension of the smallest affine subspace containing the whole tra- jectory starting at point ~ α is equal to the algebraic degree of PH(~ α, A) [O’C90].

Example 2.35. We refer back to Figure 2.8 in Example 2.33. As mentioned before, point ~γ in the figure represents PH distribution PH(~γ, G). The application of the residual-life operator R t forms the trajectory that is traversed by the distribution as t advances. In

the figure, we depict two points that correspond to R 0.01 PH(~γ, G) and R 0.1 PH(~γ, G), and whose coordinates are approximately [0.405, 0.258, 0.302] and [0.061, 0.138, 0.525], respectively. R 0.01 PH(~γ, G) and R 0.1 PH(~γ, G) also reflect the transient probabilities of the underlying CTMC at time t = 0.01 and t = 0.1, respectively. Previously, we mentioned that the dimension of the smallest affine subspace containing the whole trajectory starting at point ~γ is equal to the algebraic degree of PH(~γ, G). We can conclude that the algebraic degree of the PH distribution in our example is no less than 3 because the whole trajectory of PH(~γ, G) is contained in a 3-dimensional affine subspace. We know that this is the smallest affine subspace, because points ~ 0, ~γ,

R 0.01 PH(~γ, G) and R 0.1 PH(~γ, G) do not reside in the same plane.

If point ~γ were to be on the plane ~0~e 1 ~e 2 , for instance, and its trajectory for all t ∈R ≥0 were to be confined in this plane, then we would be able to conclude that the algebraic degree of PH(~γ, G) was 2.

2.7.2 Simplicity and Majorization

The notion of PH -simplicity was first formalized in [O’C89] and it is closely related to the notion of simplicity in convex analysis.

Definition 2.36. A PH -generator A (of dimension n) is PH -simple if and only if for any two n-dimensional sub-stochastic vectors ~ α 1 and α ~ 2 , where ~ α 1 6= ~α 2 , PH(~ α 1 , A) 6= PH(~ α 2 , A).

A PH -generator A is PH -simple if all PH distributions in PH(A) are pairwise dis- tinct. This is equivalent to saying that each distribution in PH(A) is a unique convex combination of the n + 1 distributions δ and PH(~e i , A), for 1 ≤ i ≤ n.

The term “simple” comes from the fact that if PH -generator A is PH -simple, then the n + 1 points δ and PH(~e i , A), for 1 ≤ i ≤ n, form an n-dimensional simplex (see

2.7. The Polytope of Phase-Type Distributions

Appendix A.3). It follows that a PH -generator is not PH -simple if the n + 1 points are not linearly independent of each other, or equivalently, if its polytope is a subset of an (n − 1)-dimensional affine subspace [Roc70].

Example 2.37. Returning to Example 2.33, we can determine whether PH -generator G is PH -simple by showing that δ and PH(~e i , G), for 1 ≤ i ≤ 3, are linearly independent of each other. One of the ways to do this is in the LST domain. Using the representations

depicted in Figure 2.8(b)–(d), the LST s of δ, PH(~e 1 , G), PH(~e 2 , G), and PH(~e 3 , G) are calculated as follows

3s 2 + 134s + 672

6s + 32

. (s + 2)(s + 16)(s + 21) (s + 2)(s + 16)

, and

(s + 2) Since the LST s are linearly independent, PH -generator G is PH -simple. Example 2.38 (Not PH -simple). PH -generator

is not PH -simple because PH([1, 0], G) = PH([0, 1], G). Both representations represents an exponential distribution with rate 2.

A simpler and more efficient method to determine the PH -simplicity of a PH -generator is described in the following theorem.

Theorem 2.39 ([O’C89]). A PH -generator A of dimension n is PH -simple if and only if vectors A i

~e, for 0 n ≤ i ≤ n − 1, span (or equivalently form a basis of) the vector space R . Therefore, a PH -generator A of dimension n is PH -simple if and only if matrix

(2.24) has rank n. Recall that ~ A in Equation (2.24) is a column vector representing the

2 n [~ −1 A A~ AA A ~ ···A A], ~

transition rates from all transient states to the absorbing state. Since a PH distribution has many representations, it is of interest to obtain and use

a representation that has a simple form or some desirable properties. In the following, we define a relation between two PH -generators based on the sets of PH distributions they can generate.

Definition 2.40. Let A and B be two PH -generators. Then B PH -majorizes A if and only if PH(A) ⊆ PH(B).

The following theorem gives the necessary conditions for PH -majorization when PH -generator B is PH -simple.

Theorem 2.41 ([O’C89]). Let A and B be two PH -generators and let B be PH -simple. Then

1. B PH -majorizes A if and only if there exists a nonnegative matrix P with unit row-sums (i.e., P ~e = ~e) such that AP = PB, and

2. If B PH -majorizes A, then there exists exactly one matrix P with unit row-sums such that AP = PB. Furthermore, P is of full rank if and only if A is also PH -simple.

30 Chapter 2. Preliminaries

2.7.3 Geometrical View of APH Representations

be the eigenvalues of PH -generator A. Since PH -generator A is a triangular matrix, its eigen- values are simply given by its diagonal components. Furthermore, by Theorem 2.28, the poles of PH(~ α, A) are among these eigenvalues. Without loss of generality, assume

Let (~ α, A) be an APH representation of size n. Moreover, let −λ 1 , −λ 2 , · · · , −λ n

that λ n >λ n −1 > ···>λ 1 > 0.

Diagonal Representations An important observation [HZ06a] is that an eigenvector ~ b [i] [i] A of A that corresponds to an eigenvalue [i] −λ

i , which means that ~ b = −λ i ~b , can be associated with an exponential distribution with rate λ i as follows. If ~ b [i] ~e 6= 0, we

[i]

normalize ~ [i] b to have a unit sum (i.e., ~ b ~e = 1), namely, for all 1 ≤ i ≤ n, let

[i] A Now, since ~ t β is an eigenvector of A with eigenvalue i ,~ β e ~e = e −λ i −λ t . There- fore, eigenvector ~ β [i] represents the exponential distribution with rate λ i in the poly-

[i]

tope PH(A), even though it is not a sub-stochastic vector. In general, we have n eigenvectors ~ β [i] , for 1 ≤ i ≤ n, each corresponding to the exponential distribution with rate λ i . These n points lie in the same hyperplane as the

face of polytope PH(A). The convex hull of ~0 and these n points form a new polytope PH(D), where D is a PH -generator whose diagonal components are −λ i , for 1 ≤ i ≤ n, and all other components are equal to zero. A PH -generator of this form is called a diagonal PH -generator. A PH representation with a diagonal PH -generator is a mixture of exponential distributions, and it is called a hyperexponential representation.

Example 2.42. We continue using Example 2.33. Figure 2.8(a) depicts the polytope of PH -generator G. The eigenvectors of G are ~γ [1] = [0, 0, 1] for eigenvalue

−2, ~γ [2] = [0, −7 , 1] for eigenvalue

15 75 , 1] for eigenvalue −21. Normalized, they become

2 2 39 39 39 Figure 2.9 depicts these three eigenvectors in the coordinate system of PH(G).

The convex hull of ~ [1] , [2] , and [3] 0, ~γ ~γ ~γ forms a polytope PH(D), where

If some vector ~γ ′ =a ~γ [1] 1 +a 2 ~γ [2] +a 3 ~γ [3]

P , where 3 a

i ≥ 0, and 0 <

i =1 a i ≤ 1, then PH(~γ ′ , G) has a diagonal PH representation ([a 1 ,a 2 ,a 3 ], D). Equivalently, the intersection

of the two polytopes represents the set of the APH distributions that can be represented by using both PH -generators. In our example, the intersection forms the plane ~ 0~e 2 ~e 3 . Hence, our special point ~γ does not have a representation in polytope PH(D).

2.7. The Polytope of Phase-Type Distributions

Figure 2.9: The Polytopes PH(G) and PH(D)

Bidiagonal Representations In the following, we describe a method to expand the face of polytope PH(D) to include linear combinations of eigenvectors ~ β [i] , for 1 ≤ i ≤ n, that still represent probability distributions [DL82]. Recall that eigenvector ~ β [i] repre- sents an exponential distribution with rate λ i . Let

V be an n-dimensional real vector

[i]

space generated by ~ [n] β , for 1 ≤ i ≤ n. We work with {~β ,~ β , · · · , ~β } as the basis of

V and we denote the subset of V that are probability distributions by C n . The first step is recognizing that all convex combinations of two arbitrary vectors

β ~ [i] and ~ β [j] , for 1 ≤ i, j ≤ n—namely those vectors that lie in the line segment between the two vectors—are also probability distributions. Moreover, certain parts of the extension of this line are still probability distributions as described in the following theorem.

∈ {1, 2, · · · , n} and i > j, vector κ~β [i] + (1 − κ)~β is an element of λ C

Theorem 2.43 ([DL82]). For i, j

[j]

n if and only if i λ i −λ j ≤ κ ≤ 1. Consider Figure 2.10, which is based on the discussion in Example 2.42, and will

be used as a running example for the rest of the section.

The line segment between vectors [2] ~γ and ~γ , for instance, can be extended and the extension still represents probability distributions. The extreme point of the exten-

2 1 sion, namely when [2,3] κ= 16−2 = 7 , corresponds to the vector ~γ . In a similar fashion, we can obtain vectors ~γ [1,2] and ~γ [1,3] . In general, let vector λ κ~ β [j] + (1 [i] i − κ)~β in Theorem 2.43 when κ=

be de-

λ i −λ j

noted by ~ [j,i] β . In terms of probability distributions, vector ~ β corresponds to the convolution of two exponential distributions with rates λ j and λ i , respectively (we

[j,i]

know this by looking at the distribution function). We generalize this notation: Let ψ

⊆ {1, 2, · · · , n} and ψ 6= ∅. Let vector ~β [ψ] 7 denote the vector that represents the convolution of |ψ| exponential distributions with rates λ i , for each i ∈ ψ, respectively,

in the vector space

V. Let Ψ denote the collection of all such ψ’s. The following theorem helps us locating these vectors.

Theorem 2.44 ([DL82]). For i, j ∈ {1, 2, · · · , n} and i > j

β [j,··· ,n] ~ ∈ conv({~β ,~ β }).

[j,··· ,i−1,i+1,··· ,n]

[j+1,··· ,n]

7 We abuse the vector index notation also to be able to indicate a set notation.

32 Chapter 2. Preliminaries

Figure 2.10: The Polytopes PH(G), PH(D) and PH(Bi)

Set conv(A) is the set of all convex combinations of all members of set A (see Appendix A.3.2). In the case of Theorem 2.44, this set forms the line between the two vectors. Based on the theorem, we know for certain that vector [1,3] ~γ , in our running example, lies on the line between vectors ~γ [2,3] and ~γ [1,2,3] .

The following theorem summarizes the result of the expansion and describes the

structure of the subset C n .

Theorem 2.45 ( [DL82]). For each ψ ∈ Ψ, vector ~β [ψ] is on the boundary of C n , except for when ψ= {1}, which resides inside it. Furthermore, vectors

β ~ [n] [n−1,n]

,~ [1,··· ,n] β , · · · , ~β ,~ β

[2,··· ,n]

are the extreme points of C n .

In conclusion, we have [1,··· ,n] n vectors ~ β ,~ β , · · · , ~β ,~ β , each correspond- ing to an extreme point of C n , and representing a convolution of several exponential

[n]

[n−1,n]

[2,··· ,n]

distributions. These n vectors lie in the same hyperplane as the face of the polytope PH(D). Now, the convex hull of ~0 and these n vectors form a new polytope PH(Bi), where Bi is a bidiagonal PH -generator.

Example 2.46. The convex hull of ~ 0, ~γ [3] , ~γ [2,3] , and ~γ [1,2,3] forms a polytope PH(Bi), where

The polytope is depicted in Figure 2.10. If some vector ~γ ′

1 ~γ [1,2,3] +a ~γ [2,3] 2 +a 3 ~γ [3] , where a i ≥ 0, and 0 < i =1 a i ≤ 1, then PH(~γ ′ , G) has a bidiagonal PH representation ([a

P =a 3

1 ,a 2 ,a 3 ], Bi). In the example, polytope PH(G) is a subset of PH(Bi). Thus, our special point ~γ must have a representation in polytope

109 31 PH(Bi). Indeed, vector ~γ is equal to vector [ 1 168 , 168 , 6 ] in PH(Bi).

2.8. Matrix-Exponential Distributions

Actually, the structure of C n is not completely described by Theorem 2.45 and The- orem 2.43. In Figure 2.10, we also depict the region Ω 3 , which is part of C 3 , but lies outside the polytope ~ 0~γ [1,2,3] ~γ [2,3] ~γ [3] (we call this polytope Θ 3 ). The construction of the curve delimiting the region Ω 3 is described in [DL82]. Each vector in this region is formed by a linear combination of a vector in Θ and a vector on the plane ~ 0~γ [1,2,3] 3 ~γ [3] , such that the resulting vector still represents a probability distribution, even though it must now have negative entries. Although the vector cannot have an APH repre- sentation with PH -generator Bi, it can be represented by a PH -generator of larger sizes [O’C91].

In a similar fashion, we can also describe the region Ω n and Θ n in C n . Every PH distribution in Θ n is of algebraic degree and order of no more than n. On the other hand, every PH distribution in Ω n is of algebraic degree n, but of order more than n.