getdocfac8. 357KB Jun 04 2011 12:05:14 AM

(1)

El e c t ro n ic

Jo ur

n a l o

f P

r o

b a b i_{l i t y}

Vol. 15 (2010), Paper no. 60, pages 1863–1892. Journal URL

http://www.math.washington.edu/~ejpecp/

Multivariate records based on dominance

Hsien-Kuei Hwang and Tsung-Hsi Tsai

Institute of Statistical Science

Academia Sinica

Taipei 115

Taiwan

Abstract

We consider three types of multivariate records in this paper and derive the mean and the vari-ance of their numbers for independent and uniform random samples from two prototype regions: hypercubes[0, 1]d _and_{d-dimensional simplex. Central limit theorems with convergence rates} are established when the variance tends to infinity. Effective numerical procedures are also pro-vided for computing the variance constants to high degree of precision .

Key words:Multivariate records, Pareto optimality, central limit theorems, Berry-Esseen bound, partial orders, dominance.

AMS 2000 Subject Classification:Primary 60C05; 60F05; 60G70.

(2)

1 Introduction

While the one-dimensional records (or record-breakings, left-to-right maxima, outstanding ele-ments, etc.) of a given sample have been the subject of research and development for more than six decades, considerably less is known for multidimensional records. One simple reason being that there is no total ordering for multivariate data, implying no unique way of defining records in higher dimensions. We study in this paper the stochastic properties of three types of records based on the dominance relation under two representative prototype models. In particular, central limit theorems with convergence rates are proved for the number of multivariate records when the variance tends to infinity, the major difficulty being the asymptotics of the variance.

Dominance and maxima. A pointp_∈Rdis said todominateanother pointq_∈Rdifp₋qhas only

positive coordinates, where the dimensionality d ≥ 1. Writeq ≺p or p ≻q. The nondominated points in the set_{p₁, . . . ,p_n_} are called maxima. Maxima represent one of the most natural and widely used partial orders for multidimensional samples when d _≥ 2, and have been thoroughly investigated in the literature under many different guises and names (such as admissibility, Pareto optimality, elites, efficiency, skylines, . . . ); see[1, 5]and the references therein.

Pareto records. A point p_k is defined to be a Pareto record or a nondominated record of the

se-quencep1, . . . ,pnif

p_k⊀p_i for all 1_≤i<k.

Such a record is referred to as aweak recordin[17], but we found this term less informative. In addition to being one of the natural extensions of the classical one-dimensional records, the Pareto records of a sequence of points are also closely connected to maxima, the simplest connection being the following bijection. If we consider the indices of the points as an additional coordinate, then the Pareto records are exactly the maxima in the extended space (the original one and the index-set) by reversing the order of the indices. Conversely, if we sort a set of points according to a fixed coordinate and use the ranks as the indices, then the maxima are nothing but the Pareto records in the induced space (with one dimension less); see [17]. See also the recent paper [5] for the algorithmic aspects of such connections.

More precisely, assume thatp₁, . . . ,p_n are independently and uniformly distributed (abbreviated as iud) in a specified regionS andq1, . . . ,qn are iud in the regionS×[0, 1]. Then the distribution of

the number of Pareto records of the sequencep₁, . . . ,p_n is equal to the distribution of the number of maxima of the set_{q₁, . . . ,q_n_}. This connection will be used later in our analysis.

On the other hand, we also have, for any given regions, the following relation between the expected numberE[Xn]of Pareto records and the expected numberE[Mn]of maxima of the same sample of

points, sayp₁, . . . ,p_n,

E[Xn] =

1≤k≤n

E[M_k]

k ;

(3)

Dominating records. Although the Pareto records are closely connected to maxima, their proba-bilistic properties have been less well studied in the literature. In contrast, the following definition of records has received more attention.

A pointp_kis defined to be adominating recordof the sequencep1, . . . ,pn if

p_i_≺p_kfor all 1≤i<k.

This is referred to as thestrong recordin[17]and themultiple maximain[21].

Let the number of dominating records falling inA_⊂S be denoted by Z_A. Goldie and Resnick[18]

showed that

E[Z_A] =

1₋µ(D_x)−1dµ(x),

whereD_x=_{y:y_≺x_}. They also calculated all the moments ofZ_Aand derived several other results such as the probability of the event_{Z_A=0_}and the covariance Cov Z_A,Z_B.

In the special case when the p_i’s are iid with a common multivariate normal (non-degenerate) distribution, Gnedin[16]proved that

λn:=P{pnis a dominating record} ≍n−α(logn)(α−β)/2.

for some explicitly computable α > 1 and β ∈ {2, 3, . . . ,d_}. See also [20] for finer asymptotic estimates.

Chain records. Yet another type of record of multi-dimensional samples introduced in[17]is the

chain record

p1≺pi1≺pi2≺ · · · ≺pik,

where 1<i1<i2<· · ·<ikand there are nopj ≻pia withia< j<ia+1or ia< j≤n. See Figure 1

for an illustration of the three different types of records.

p₁

p₂ p₃

p₄

p5 _p

p₈

p₁

p₂ p₃

p₄

p5 _p

p₇

p₈

p₁

p₂ p₃

p₄

p5 _p

p₈

Figure 1:In this simple example, the dominating records arep1andp7(left), the chain records arep1,

(4)

Some known results and comparisons. If we drop the restriction of order, then the largest subset of indices such that

p_i₁≺p_i₂≺ · · · ≺p_i_k (1)

is equal to the number of maximal layers (maxima being regarded as the first layer, the maxima of the remaining points being the second, and so on). Assuming that_{p₁, . . . ,p_n_}are iud in the hyper-cube [0, 1]d, Gnedin [17]proved that the number of chain records Yn is asymptotically Gaussian

with mean and variance asymptotic to

E[Yn]∼d−1logn, V[Yn]∼d−2logn;

see Theorem 4 for an improvement. The author also derived exact and asymptotic formulas for the probability of a chain recordP(Yn>Yn−1)and discussed some point-process scaling limits.

The behavior of the record sequence (1) inR2are studied in Goldie and Resnick[19], Deuschel and Zeitouni [10]. The position of the points converges in probability to a (or a set of) deterministic curve(s). Deuschel and Zeitouni [10] also proved a weak law of large number for the longest increasing subsequence, extending a result by Vershik and Kerov [24] to a non-uniform setting; see also the breakthrough paper[3]. A completely different type of multivariate records based on convex hulls was discussed in[23].

Chain records can in some sense be regarded as uni-directional Pareto records, and thus lacks the multi-directional feature of Pareto records. The asymptotic analysis of the moments is in general simpler than that for the Pareto records. On the other hand, it is also this aspect that the chain records reflect better the properties exhibited by the one-dimensional records. Interestingly, the chain records correspond to the “left-arm" (starting from the root by always choosing the subtree corresponding to the first quadrant) of quadtrees; see[6, 11, 13]and the references therein.

DOMINANCE

MAXIMA RECORDS

Pareto records

chain records

dominating records

. . . maximal

layers depth

Figure 2: A diagram illustrating the diverse notions defined on dominance; in particular, the Pareto records can be regarded as a good bridge between maxima and multivariate records.

A summary of results. We consider in the paper the distributional aspect of the above three

types of records in two typical cases when thep_i’s are iud in the hypercube[0, 1]d and in the d -dimensional simplex, respectively. Briefly, hypercubes correspond to situations when the coordinates are independent, while thed-dimensional simplex to that when the coordinates are to some extent negatively correlated. The hypercube case has already been studied in [17]; we will discuss this briefly by a very different approach. In addition to the asymptotic normality for the number of

(5)

Pareto records in the d-dimensional simplex, our main results are summarized in the following table, where we list the asymptotics of the mean (first entry) and the variance (second entry) in each case.

❳ ❳

❳ Records

Models

Hypercube[0, 1]d d-dimensional simplex Dominating records H_n(d),H_n(d)₋H(_n2d) (15), (16)

Chain records 1

dlogn,

d2logn[17]

d H_d logn, H_d(2) d H3_dlogn

Pareto records 1

d! logn

₁

d!+κd+1

lognd [17] m_dn(d−1)/d, v_dn(d−1)/d

Maxima =Pareto records in[0, 1]d−1 _[₁₇_] _m_˜

dn(d−1)/d, ˜vdn(d−1)/d

HereH(_ba)=P_ib₌₁i−a,κd is a constant (see[1]),md := _dd₋₁Γ

₁

, vd is defined in (3), ˜md := Γ(1_d),

v_d is given in (4), and both (15) and (16) are bounded innand ind; see Figure 3.

From this table, we see clearly that the three types of records behave very differently, although they coincide whend=1. Roughly, the number of dominating records is bounded (indeed less than two on average) in both models, while the chain records have a typical logarithmic quantity; and it is the Pareto records that reflect better the variations of the underlying models.

3 4 5 6 7

2 0.1 0.4 0.7 1.0 1.3 1.6

E# of dominating records (hypercube)

E# of dominating records (d-dim simplex)

V# of dominating records (hypercube)

V# of dominating records (d-dim simplex)

Figure 3:The mean and the variance of the number of dominating records in low dimensional random samples. In each model, the expected number approaches1very fast as d increases with the correspond-ing variance tendcorrespond-ing to zero.

Organization of the paper. We derive asymptotic approximations to the mean and the variance

for the number of Pareto records in the next section. Since the expression for the leading coefficient of the asymptotic variance is very messy, we then address in Section 3 the numerical aspect of this constant. The tools we used turn out to be also useful for several other constants of similar nature, which we briefly discuss. We then discuss the chain records and the dominating records.

(6)

2 Asymptotics of the number of Pareto records

Assumed≥2 throughout this paper. Let

S_d :=_{x:x_i_≥0 and 0_{≤ k}x_{k ≤}1_}

denote the d-dimensional simplex, where _kx_k := x₁+_{· · ·}+x_d. Assume that p₁, . . . ,p_n are iud in

S_d. LetX_ndenote the number of Pareto records of_{p₁, . . . ,p_n_}. We derive in this section asymptotic approximations to the mean and the variance and a Berry-Esseen bound forXn. The same method

of proof also applies to the number of maxima, denoted byM_n, which we will briefly discuss. Letq₁, . . . ,q_nbe iud inS_d_×[0, 1]. As discussed in Introduction, the distribution ofX_n is equivalent to the distribution of the number of maxima of_{q₁, . . . ,q_n_}.

For notational convenience, denote bya_n_≃b_nif a_n=b_n+On−1/d.

Theorem 1. The mean and the variance of the number of Pareto maxima X_n in random samples from

the d-dimensional simplex satisfy

E[X_n]_≃n1−1/d X

0≤j≤d−2

−1

(₋1)jΓ

_j₊₁

d−1₋jn

−j/d

+ (₋1)d−1 logn+γ,

(2)

V[Xn] = vd+o(1)

n1−1/d,

where

vd:=

d d₋1Γ

₁

+2d2(d−1) X 1≤ℓ<d

ℓ

−2

ℓ−1

Z 1 0

Z 1

Z ∞

yd−ℓ−1wℓ−1e−u(x+y)d−v(x+w)dev xd₋1dwdydxdudv

+2d2 Z 1

Z 1

Z _∞

wd−1e−uxd−v(x+w)dev xd₋1dwdxdudv

−2d2 Z 1

Z 1

Z ∞

yd−1e−u(x+y)

d −v xd

dydxdudv.

(3)

(7)

We start with the expected value ofX_n. LetG_i=1_{_q

iis a maximum}. E[Xn] =nE[G1]

=d!n Z 1

1−z(1− kxk)dn−1 dxdz

≃d!n Z 1

e−nz(1−kxk)ddxdz

=d n Z 1

Z 1 0

e−nz(1−y)dyd−1dydz (y_{7→ k}x_k)

=d n X

0≤j<d

−1

(₋1)j

Z 1 0

e−nz ydyjdydz

= X 0≤j<d

n(d−1−j)/d

−1

(₋1)j

Z 1 0

Z nz

e−xx(1+j−d)/dz−(j+1)/ddxdz

≃ X

0≤j≤d−2

−1

(₋1)jΓ

_j₊₁

d−1₋j n

(d−1−j)/d

+ (₋1)d−1 logn+γ. This proves (2).

For the variance, we start from the second moment, which is given by

EX_n2=E[Xn] +n(n−1)E

G1G2

Let Abe the region in Rd_×[0, 1]such that q₁ andq₂ are incomparable (neither dominating the other). Writeq₁= (x,u),q₂= (y,v),_kx_k_∗:= (_kx_{k ∧}1)and

x_∨y:= x₁_∨y₁,_{· · ·},x_d_∨y_d. Then by standard majorization techniques (see[1])

n(n−1)EG1G2

=n(n−1)d!2

1−u(1− kxk)d₋v(1−y)d+ (u∧v)(1−x∨y_∗)dn−2 dxdydudv

≃n2d!2

e−n[u(1−kxk)d+v(1−kyk)d]dxdydudv

+n2d!2

e−n[u(1−kxk)d+v(1−kyk)d]

en(u∧v)(1−kx∨yk∗) d

−1dxdydudv

≃EX_n2₋J_n_,0+ X 1≤ℓ<d

ℓ

(8)

where

J_n_,0=2n2d!2

Z 1 0

Z 1

x≺y x,y∈Sd

e−n[u(1−kxk)d+v(1−kyk)d]dxdydudv,

J_n_,ℓ=n2d!2

Z 1 0

xi>yi,1≤i≤ℓ

xi<yi,ℓ<i≤d x,y∈Sd

e−n[u(1−kxk)d+v(1−kyk)d]en(u∧v)(1−kx∨yk∗) d

−1dxdydudv,

J_n_,_d=2n2d!2

Z 1 0

Z 1

y≺x x,y∈Sd

e−n[u(1−kxk)d+v(1−kyk)d]en(u∧v)(1−kx∨yk∗) d

−1dxdydudz,

for 1_≤ℓ≤d₋1.

Consider firstJ_n_,_ℓ, 1_≤ℓ <d. We proceed by four sets of changes of variables to simplify the integral starting from ₍

x_i_7→ξi, yi 7→ξi(1−ηi), for 1≤i≤ℓ;

xi7→ξi(1−ηi), yi7→ξi, forℓ <i≤d,

which leads to

Jn,ℓ= (nd!)2

Z 1 0

[0,1]d

e−n

u1−Pξi+ P_′′

ξiηi d

+v1−Pξi+ P_′

ξiηi di

en(u∧v)(1−

P ξi)

−1 Yξi

dξdηdudv, wherePξi:=

Pd i=1ξi,

ξi:=

i=1ξi,

P_′

x_i:=Pℓ_i₌₁x_i andP′′x_i:=Pd_i₌_ℓ₊₁x_i. Next, by the change of variables

ξi7→

d −ξin

−1/d_, _η

i7→dηin−1/d,

we have

J_n,ℓ=d!2

Z 1 0

Sd(n)

[0,n1/d_/_d_]d

e−

uPξi+ P′′_η

i(1−dξin−1/d) d

+vPξi+ P′_η

i(1−dξin−1/d) di

e(u∧v)(

P ξi)

−1 Y 1₋dξin−1/d

dξdηdudv, whereS_d(n) =_{ξ:ξi≤n1/d/d and

_ξ_>0_}. We then perform the change of variables

(9)

and obtain

J_n_,_ℓ= (d!)2

Z 1 0

Sd(n)

[0,n1/d_/_d_]d

e−

uPξi+ P_′′

ηi d

+vPξi+ P_′

ηi di

e(u∧v)(

P ξi)

−1

dξdηdudv. Finally, we “linearize” the integrals by the change of variables

x _7→Xξi, y 7→

X_′′

ηi, w7→

X_′

ηi,

and get

J_n_,ℓ≃

d·d!

(d−ℓ−1)!(ℓ−1)!n

1−1/d

Z 1 0 Z 1 0 Z ∞ 0 Z ∞ 0 Z ∞ 0

yd−ℓ−1wℓ−1e−u(x+y)

−v(x+w)d

×e(u∧v)xd−1dwdydxdudv, since the change of variables produces the factors

n1−1/d

(d₋1)!,

yd−ℓ−1

(d₋ℓ−1)! and

wℓ−1

(ℓ−1)!. Now by symmetry (ofuandv), we have

1≤ℓ<d

ℓ

J_n_,_ℓ_≃ X

1≤ℓ<d

ℓ

₂_d

·d!

(d₋ℓ−1)!(ℓ−1)!n

1−1/d

Z 1 0 Z 1 v Z ∞ 0 Z ∞ 0 Z ∞ 0

yd−ℓ−1wℓ−1

×e−u(x+y)d−v(x+w)dev xd₋1dwdydxdudv. Proceeding in a similar manner forJn,d, we deduce that

J_n_,_d=2d!2

Z 1 0

Z 1

Sd(n)

[0,n1/d_/_d_]d

e−

u(Pξi) d

+v(Pξi+ P

ηi) di

e(u∧v)(

P ξi)

−1

dξdηdudv. By the change of variablesx 7→Pξi,w7→

ηi, we have

J_n_,_d _≃ 2d!

2 ((d₋1)!)2n

1−1/d

Z 1 0 Z 1 v Z ∞ 0 Z ∞ 0

wd−1e−uxd−v(x+w)de(u∧v)xd₋1dwdxdudv

=2d2n1−1/d Z 1 0 Z 1 v Z ∞ 0 Z ∞ 0

wd−1e−uxd−v(x+w)d

ev xd−1

dwdxdudv. Similarly, forJ_n_,0, we get

Jn,0=2d!2

Z 1 0

Z 1

Sd(n)

[0,n1/d_/_d_]d

e−

u(Pξi+ P

ηi) d

+v(Pξi) di

dξdηdudv. The change of variables x_7→Pξi, y 7→

ηi then yields

Jn,0≃2d2n1−1/d

Z 1 0 Z 1 v Z ∞ 0 Z ∞ 0

yd−1e−u(x+y)

d −v xd

dydxdudv. This completes the proof of the theorem.

(10)

Remark. By the same arguments, we derive the following asymptotic estimates for the number of maxima inSd.

E[M_n]_≃ X 0≤j<d

−1

(₋1)jΓ

_j₊₁

n(d−1−j)/d,

V[Mn] = ˜vd+o(1)

n1−1/d, where

vd = Γ

₁

+ X 1≤k<d

_{d d}_!

(d₋k₋1)!(k₋1)! × Z ∞ 0 Z ∞ 0 Z ∞ 0

yd−k−1wk−1e−(x+y)

−(x+w)d_exd

−1dwdydx

−2d2

Z ∞

yd−1e−xd−(x+y)ddxdy.

(4)

Theorem 2. The number of Pareto records in iud samples from d-dimensional simplex is asymptotically

normal with a rate given by

sup x P 

Xn−E[Xn]

V[X_n] <

x 

₋Φ(x)

n−(d−1)/(4d)(logn)2+n−1/d(logn)1/d, (5)

whereΦ(x)denotes the standard normal distribution. Proof.Define the region

D_n:=

(x,z):x_∈S_d,z_∈[0, 1]andz(1_{− k}x_k)d_≤ 2 logn

LetXndenote the number of maxima in Dn andXen the number of maxima of a Poisson process on

D_n with intensityd!n. Then

P 

Xn−E[Xn]

V[X_n] <

x 

₋Φ(x)

≤ P 

Xn−E[Xn]

V[X_n] <

x 

₋P



Xn−E[Xn]

V[X_n] <

x   + P 

Xn−E[Xn]

V[X_n] <

x 

₋P



Xen−E[Xn]

V[X_n] <

x   + P 

Xen−E[Xen]

V[Xe_n] < y



₋Φ(y)

Φ(y)₋Φ(x), (6)

forx _∈R, where

y=x È

V[Xn]

V[Xen]

+E[X_pn]−E[Xen] V[Xen]

(11)

We prove that the four terms on the right-hand side of (6) all satisfy theO-bound in (5). For the first term, we consider the probability

PXn6=Xn

≤nP q1∈/ Dnandq1is a maximum

=nd!

Sd×[0,1]−Dn

1₋z(1_{− k}x_k)dn−1dxdz

≤nd!

Sd×[0,1]−Dn

1−2 logn

n−1

dxdz

≤O(n−1).

For the second term on the right-hand side of (6), we use a Poisson process approximation sup

PX_n<t₋PXe_n<t_≤OD_n=On−1/d(logn)1/d.

To bound the third term, we use Stein’s method similar to the proof for the case of hypercube given in[1]and deduce that

sup



Xen−E[Xen]

V[Xen]

< y 

₋Φ(y)

(E[Xen])1/2Qn

(V[Xe_n])3/4

=On−(d−1)/(4d) logn2,

whereQn is the error term resulted from the dependence between the cells decomposed and

Qn=O

(logn)2. Finally, the last term in (6) is bounded above as follows.

Φ(y)₋Φ(x)=O    

pV[X_n]₋pV[Xe_n]

+E[X_n]₋E[Xe_n]

V[Xe_n]

   

=On−(d+1)/(2d). This proves (6) and thus (5).

The Berry-Esseen bound in (5) is expected to be non-optimal and one naturally anticipates an opti-mal order of the formO(n−1/2−1/(2d)), but we were unable to prove this.

Remark. By defining

Dn:=

x:x∈Sd and (1− kxk)d≤

2 logn n

instead and by applying the same arguments, we deduce the Berry-Esseen bound for the number of maxima in iud samples fromS_d

sup



Mn−E[Mn]

V[M_n] < x



₋Φ(x)

(12)

3 Numerical evaluations of the leading constants

The leading constants vd (see (3)) and ˜vd (see (4)) appearing in the asymptotic approximations to

the variance ofX_n and to that of M_n are not easily computed via existing softwares. We discuss in this section more effective means of computing their numerical values to high degree of precision. Our approach is to first apply Mellin transforms (see[12]) and derive series representations for the integrals by standard residue calculation and then convert the series in terms of the generalized hypergeometric functions

pFq(α1, . . . ,αp;β1, . . . ,βq;z):=

Γ(β1)· · ·Γ(βq)

Γ(α1)· · ·Γ(αp)

j≥0

Γ(j+α1)· · ·Γ(j+αp)

Γ(j+β1)· · ·Γ(j+βq)

· z

j!.

The resulting linear combinations of hypergeometric functions can then be computed easily to high degree of precision by any existing symbolic softwares even with a mediocre laptop.

The leading constant v_d of the asymptotic variance of thed-dimensional Pareto records. We

consider the following integrals

C_d= X

1≤m<d

₍_d

−1)!

(m−1)!(d−1−m)! ×

Z 1 0

Z 1

Z ∞

yd−1−mwm−1e−u(x+y)d−v(x+w)d

ev xd−1

dwdydxdudv

Z 1 0

Z 1

Z ∞

wd−1e−uxd−v(x+w)dev xd₋1dwdxdudv

−

Z 1 0

Z 1

Z ∞

yd−1e−u(x+y)d−v xddydxdudv

=:(d−1) X 1≤m<d

−2

m₋1

Id,m+Id,d−Id,0.

ThenC_d is related to v_d byv_d= d

d−1Γ( 1

d) +2d

2_C

d. We start from the simplest one, Id,0and use the

integral representation for the exponential function

e−t= 1

2πi Z

(c)

Γ(s)t−sds,

where c > 0, _ℜ(t) > 0 and the integration path R₍_c₎ is the vertical line from c₋i_∞ to c+i_∞. Substituting this representation intoI_d_,0, we obtain

Id,0=

1 2πi

(c)

Γ(s)

Z 1 0

Z 1

Z ∞

(13)

Making the change of variables y_7→x y yields

Id,0=

1 2πi

(c)

Γ(s)

Z 1 0

Z 1

Z ∞

u−sxd(1−s)(1+y)−dsyd−1e−v xddydxdudvds

= 1

2πi Z

(c)

Γ(s)

Z 1 0

Z 1

u−s

Z _∞

yd−1(1+ y)−dsdy

Z _∞

xd(1−s)e−v xddx

dudvds

= dΓ(d−1)

2πi Z

(c)

Γ(s)Γ(ds₋d)Γ(1+ 1

d−s)

Γ(ds)(ds−1) ds,

where 1<c<1+ 1

d. Moving the integration path to the right, one encounters the simple poles at

s=1+1

d +j for j=0, 1, . . . . Summing over all residues of these simple poles and proving that the

remainder integral tends to zero, we get

Id,0= Γ(d−1)

j≥0

(₋1)j_Γ(_j₊₁₊ 1

d)Γ(d j+1)

(j+1)!Γ(d j+d+1) ,

where the terms converge at the rate j−d−1+1d. This can be expressed easily in terms of the

general-ized hypergeometric functions.

An alternative integral representation can be derived forI_d_,0as follows.

I_d_,0= Γ(d−1) Γ(d)

j≥0

Γ(j+1+1

Γ(j+2) (−1)

Z 1 0

(1₋x)d−1xd jdx

= Γ( 1

d−1

Z 1 0

(1₋x)d−11−(1+x

d₎−1_d

xd dx,

which can also be derived directly from the original multiple integral representation and successive changes of variables (firstu, then x, thenv, and finally y). In particular, ford=2,

I_2,0=pπp2₋1+log 2+log(p2₋1). Now we turn toI_d,d.

I_d_,_d =

Z 1 0

Z 1

Z ∞

wd−1e−uxd−v(x+w)dev xd₋1dwdxdudv. By the same arguments used above, we have

Id,d=

Γ(d)

2dπi Z

(c)

Γ(s)Γ(ds−d)Γ(1+ 1

d−s)

Γ(ds) I ′

d,dds,

wherec>1 and

I_d′_,_d:=

Z 1 0

v−s Z 1

(u₋v)s−1−1d −us−1−

(14)

To evaluateI_d′_,_d, assume first that 1

d <ℜ(s)<1, so that

I_d′_,_d =

Z 1 0

v−s Z 1−v

us−1−1ddu−

Z 1 0

us−1−1d

Z u

v−sdvdu

= d

d−1

Γ(1₋s)Γ(s₋ 1_d) Γ(1−1_d) −

1 1−s

Now the right-hand side is well-defined for 1_d <ℜ(s)<2. Substituting this into Id,d, we obtain

I_d_,_d= Γ(d−1)

2πi Z

(c)

Γ(s)Γ(ds−d)Γ(1+1

d −s)

Γ(ds)

Γ(1₋s)Γ(s−1_d) Γ(1₋1

− 1

1₋s !

ds,

where 1 < c < 1+ 1_d. For computational purpose, we use the functional equation for Gamma function

Γ(1₋s)Γ(s) = π

sinπs,

so that

I_d_,_d= Γ(d−1)

2πi Z

(c)

πΓ(ds−d) Γ(ds)sin(π(s₋_d1))

Γ(1₋1_d)sin(πs)+

Γ(s−1) Γ(ds)Γ(s₋ 1_d)

ds.

In this case, we have simples poles ats= j+1/dfor both integrands ands= jfor the first integrand to the right of_ℜ(s) = 1 for j =2, 3, . . . . Thus summing over all the residues and proving that the remainder integral goes to zero, we obtain

Id,d= Γ(d−1)Γ(1_d)

j≥2

Γ(d j₋d)

jΓ(d j) −Γ(d−1)

j≥2

(₋1)j_Γ(_j₋₁₊1

d)Γ(d j−d+1)

Γ(j)Γ(d j+1) .

A similar argument as that used forId,0gives the alternative integral representation

Id,d=

Γ(1

d(d₋1) −1+

Z 1 0

1−t1d

d−1

td1−1(1+t)−

d +−

log(1₋t)₋t t2

dt !

. In particular, ford=2,

I2,2=

π2−p2−2 log 2+log(p2+1). Now we considerI_d_,_mfor 1_≤m<d.

I_d_,_m:=

Z 1 0

Z 1

Z ∞

yd−1−mwm−1e−u(x+y)d−v(x+w)dev xd₋1dwdydxdudv, which by the same arguments leads to

Id,m=

1 2πi

(c)

Γ(s)

Z 1 0

Z 1

u−s

Z _∞

yd−1−m(1+ y)−dsdy

Z ∞

wm−1

Z _∞

xd(1−s)e−v xd((1+w)d−1)₋e−v xd(1+w)ddx

dwdudvds

= dΓ(d−m)

2(d−1)πi

Z _Γ(_s_)Γ(_ds

−d+m)Γ(1+1_d ₋s)

(15)

where 1<c<1+1

d and

W_m(s):=

Z ∞

wm−1(1+w)d₋1s−1−

−(1+w)ds−d−1

= 1

d Z 1

t−s(t−1d−1)m−1

(1−t)s−1−1d −1

= 1

d X

0≤ℓ<m

−1

ℓ

(₋1)m−1−ℓ πΓ(s− 1

Γ(1− ℓ+_d1)Γ(s+ ℓ

d)sin(π(s+

ℓ

d))

− 1

1−_dℓ −s !

for 1_d <ℜ(s)<2−(m−1)/d. Note that each term has no pole ats=1−_dℓ. Thus

I_d_,_m= Γ(d−m)

d−1

0≤ℓ<m

−1

ℓ

(₋1)m−1−ℓI_d_,_m_,_ℓ,

where

I_d_,_m_,_ℓ:= 1

2πi Z

(c)

Γ(s)Γ(ds₋d+m)Γ(1+1

d−s)

Γ(ds)(ds₋1)

× πΓ(s−

Γ(1₋ℓ+1

d )Γ(s+

ℓ

d)sin(π(s+

ℓ

d))

− 1

1₋ ℓ

d−s

ds.

We then deduce that the integral equals the sum of the residues ats= j+ 1

d ands= j+1−

ℓ

I_d_,_m_,_ℓ=₋Γ( ℓ+1

d )

d X

j≥1

Γ(j+1+1_d)Γ(d j+m+1) (j+1)Γ(d j+d+1)Γ(j+1+ℓ+1

d )

+ 1

d X

j≥1

(₋1)jΓ(j+1+ 1_d)Γ(d j+m+1) (j+1)!Γ(d j+d+1)(j+ℓ+1

d )

+ Γ(ℓ+1

d )

j≥1

Γ(j+1₋ ℓ

d)Γ(d j+m−ℓ)

j!Γ(d j+d−ℓ)(d j+d−ℓ−1) =I_d[1_,_m]_,_ℓ+I_d[2_,_m]_,_ℓ+I_d[3_,_m]_,_ℓ.

It follows that

Cd−Id,d+Id,0 = (d−1) X

1≤m<d

−2

m₋1

Id,m

= X 1≤m<d

m ₍_d

−2)!

(m₋1)!

0≤ℓ<m

−1

ℓ

(₋1)m−1−ℓI_d[1_,_m]_,_ℓ+I_d[2_,_m]_,_ℓ+I[_d3_,_m]_,_ℓ

(16)

For further simplification of these sums, we begin withC_d[2]. Note first that

0≤ℓ<m

−1

ℓ

(₋1)m−1−ℓI[_d2_,_m]_,_ℓ

= 1

d X

j≥1

(₋1)jΓ(j+1+1

d)Γ(d j+m+1)

(j+1)!Γ(d j+d+1)

0≤ℓ<m

−1

ℓ

(₋1)m−1−ℓ 1

j+ℓ+1

= (₋1)m−1(m₋1)!X

j≥1

(₋1)jΓ(j+1+1_d)Γ(d j+1) (j+1)!Γ(d j+d+1) .

Thus

C_d[2]= X 1≤m<d

m ₍_d

−2)!

(m₋1)!

0≤ℓ<m

−1

ℓ

(₋1)m−1−ℓI[_d2_,_m]_,_ℓ

= (d₋2)! X

1≤m<d

(₋1)m−1X

j≥1

(₋1)jΓ(j+1+1

d)Γ(d j+1)

(j+1)!Γ(d j+d+1) = (1+ (₋1)d) I_d_,0₋Γ(1+

d(d−1)

Accordingly,C_d[2]=0 for odd values ofd.

For the other two sums containingI[_d1_,_m]_,_ℓ andI[_d3_,_m]_,_ℓ, we use the identity

ℓ<m<d

(N+m)!(₋1)m

m!(d−m)!(m−1−ℓ)! =

(₋1)dN!

(d−1−ℓ)!

_N₊₁₊_ℓ

−

_N₊_d

Then

C_d[1]= X 1≤m<d

m ₍_d

−2)!

(m₋1)!

0≤ℓ<m

−1

ℓ

(₋1)m−1−ℓI_d[1_,_m]_,_ℓ

= (d₋2)! X

0≤ℓ≤d−2

ℓ!(−1)

ℓΓ( ℓ+1

d )

d X

j≥1

Γ(j+1+1_d)

(j+1)Γ(d j+d+1)Γ(j+1+ℓ+1

d )

× X

ℓ<m<d

Γ(d j+m+1)(₋1)m

m!(d−m)!(m−1−ℓ)!

= (−1)

d(d−1)

0≤ℓ≤d−2

−1

ℓ

(₋1)ℓΓ(ℓ+1

d )

j≥1

Γ(j+1+1

(j+1)Γ(j+1+ℓ+1

d )

 

d j+ℓ+1

d j+d d

−1

 . Note that

d j+ℓ+1

d j+d d

−1=O(j−1) (0≤ℓ≤d−2),

(17)

Similarly,

C_d[3]= X 1≤m<d

m ₍_d

−2)!

(m−1)!

0≤ℓ<m

−1

ℓ

(₋1)m−1−ℓI_d[3_,_m]_,_ℓ

= (−1)

d−1

0≤ℓ≤d−2

−1

ℓ

(₋1)ℓ−1Γ(ℓ+1

d )

j≥1

Γ(j+1₋ℓ

j!(d j+d−ℓ−1)

 

d j d

d j+d−ℓ−1

−1

 .

Sincev_d= d

d−1Γ( 1

d) +2d

2_C

d, we obtain, by converting the series representations into

hypergeomet-ric functions, the following approximate numehypergeomet-rical values ofvd.

v₂_≈2.86126 35493 11178 82531 14379,

v₃_≈3.22524 36444 05576 89660 59392,

v₄_≈3.97797 27442 19455 29292 64760,

v5≈4.84527 39171 62611 42226 50057,

v6≈5.76349 95321 96568 64812 77416,

v₇_≈6.70865 12250 86590 36364 34742,

v₈_≈7.66955 04435 24665 04704 24808,

v₉_≈8.64032 79742 08287 24931 00067,

v10≈9.61764 75521 13755 73944 20940,

v11≈10.59949 78766 56951 63098 76869,

v₁₂_≈11.58460 78314 60409 77794 37163. In particular,v₂ has a closed-form expression

v₂= 2

3 p

π2π2−9₋12 log 2.

The leading constant ˜v_d of the asymptotic variance of thed-dimensional maxima. Let

Jd,0:=2d2

Z ∞

yd−1e−xd−(x+y)ddxdy, and

J_d_,_k:= d d!

(d−k−1)!(k−1)!

Z ∞

yd−k−1wk−1e−(x+y)d−(x+w)dexd₋1dwdydx. Then (see (4))

v_d= Γ

₁

+ X 1≤k<d

(18)

Consider first J_d_,0. By expanding (1+ xd)−1−1d, interchanging and evaluating the integrals, we

obtain

Jd,0=2Γ

₁

Z 1

(1₋x)d−1 (1+xd)1+1d

=2d!X

j≥0

Γ(j+1+1

d)Γ(d j+1)

Γ(j+1)Γ(d j+d+1)(−1)

j_,

the general terms converging at the rate O(j−d−1d). The convergence rate can be accelerated as

follows.

J_d_,0=2Γ

1+1

Z 1

xd1−1(1−x

d)d−1(1+x)−1−

ddx

=2Γ

1+1

d X

r≥0

2−r−1−1d

Z 1 0

(1₋x)rx1d−1(1−x

d)d−1dx

= Γ

1+1

2−1d

0≤ℓ<d

−1

ℓ

(₋1)ℓΓ

_ℓ₊₁

j≥0

Γ(j+1+1_d) Γ(j+1+ℓ+1

d )

2−j,

the convergence rate being now exponential. In terms of the generalized hypergeometric functions, we have

Jd,0= Γ

₁

2−1d

0≤ℓ<d

−1

ℓ

₍

−1)ℓ ℓ+12F1

1+ 1

d, 1; 1+

ℓ+1

d ;

1 2

The integralsJ_d_,_kcan be simplified as follows.

J_d_,_k₊₁=d2(d₋1)

−2

Z ∞

(exd₋1)

Z ∞

e−yd

Z ∞

(y−x)d−2−k(z−x)ke−zddzdydx

=2d2(d−1)

−2

Z ∞

e−yd Z y

e−zd

Z z

(exd₋1)(y₋x)d−2−k(z₋x)kdxdzdy

=2(d−1)Γ

₁

d₋2

Z 1

(1−x)k

Z 1 0

(1−xz)d−2−kzk+1

× 1

(1+zd₋_xd_zd₎1+1_d − 1

(1+zd₎1+1_d

dzdx

(19)

By the same proof used forJ_d_,0, we have

J_d′′_,_k₊₁=₋2(d−1)Γ

₁

d₋2

Z 1

(1−x)k

Z 1 0

(1₋xz)d−2−kzk+1(1+zd)−1−1ddzdx = (₋1)k+12−1dΓ

₁

d X

k<j<d

−1

₍

−1)j

j+1 2F1

1+ 1

d, 1; 1+ j+1

d ;

1 2

Similarly,

J_d′_,_k₊₁=2(d−1)Γ

₁

d₋2

Z 1

(1−x)k

Z 1 0

(1₋xz)d−2−kzk+1(1+zd−xdzd)−1−1ddzdx =2Γ

₁

(d−1)! X

0≤j≤d−2−k

(₋1)j

j!(d₋2₋k₋j)!

× X

0≤ℓ≤k

(₋1)ℓ ℓ!(k₋ℓ)!·

3F2(1+ 1_d,k

+j+2

d , 1; 1+

ℓ+j+1

d , 1+ k+j+2

d ;−1)

(ℓ+j+1)(k+j+2) .

Thus we obtain the following numerical values for the limiting constant ˜v_d ofV[M_n]/n(d−1)/d

v₂_≈0.68468 89279 50036 17418 09957, ˜

v₃_≈1.48217 31873 40583 68601 11369, ˜

v₄_≈2.35824 37612 02486 93742 28054, ˜

v5≈3.27773 90059 79491 26684 80858,

v6≈4.22231 09450 77067 79998 34338,

v7≈5.18220 76686 16078 48517 29967,

v₈_≈6.15196 29023 77474 45508 28039, ˜

v₉_≈7.12835 13658 43360 52793 29089, ˜

v₁₀≈8.10938 23221 15849 82527 77117, ˜

v11≈9.09377 74697 86680 89694 70616,

v12≈10.0806 86465 19733 08113 16376.

In particular, ˜v₂=pπ(2 log 2₋1); see[2].

Yet another constant in[8]. A similar but simpler integral to (4) appeared in[8], which is of the

form

K_d:=

Z ∞

(20)

(thisK_d is indeed theirK_d₋₁). By Mellin inversion formula fore−t, we obtain

K_d= 1

2πi Z

(c)

Γ(s)

Z _∞

(u+w)d−2(u+x)−dse−(w+x)d+xddxdudwds

= 1

2dπi Z

(c)

Γ(s)Γ(1+1

d −s)

Z ∞

(u+w)d−2(1+u)−ds(1+w)d₋1s−1−

d _d_u_d_w_d_s_.

Expanding the factor(u+w)d−2, we obtainK_d=P₀_≤_m_≤_d₋₂ d−_m2K_d_,_m, where

K_d_,_m:= 1

2dπi Z

(c)

Γ(s)Γ(1+1_d ₋s)

Z _∞

um(1+u)−dsdu

Z _∞

wd−2−m(1+w)d₋1s−1−

= 1

2dπi Z

(c)

Γ(s)Γ(1+1

d −s)B(m+1,ds−m−1)Um(s)ds. (7)

Here

Um(s):=

Z ∞

wd−2−m(1+w)d₋1s−1−

d _d_w

= 1

d Z 1

t−s(1₋t)s−1−1d

t−1d −1

d−2−m

= 1

0≤ℓ≤d−2−m

−2₋m

ℓ

(₋1)d−2−m−ℓB(1−s−_dℓ,s− 1_d).

Thus we obtain

K_d= 1

0≤m≤d−2

−2

0≤ℓ≤d−2−m

−2₋m

ℓ

(₋1)d−2−m−ℓm!Γ(ℓ+_d1)

×X

j≥0

Γ(j+1₋_dℓ)Γ(d j+d₋ℓ−m₋1)

j!Γ(d j+d−ℓ) −

Γ(j+1+1_d)Γ(d j+d₋m) Γ(j+1+ℓ+1

d )Γ(d j+d+1)

(21)

This readily gives, by converting the above series into hypergeometric functions, the numerical values of the first fewKd,

K2≈0.30714 28473 56944 02518 48954,

K₃_≈0.21288 24684 73220 99693 80676,

K₄_≈0.19494 67028 23033 18190 40460,

K₅_≈0.20723 21512 99671 45854 93769,

K6≈0.24331 17024 51836 72554 88428,

K7≈0.30744 56566 07893 22242 37300,

K₈_≈0.41127 01058 90385 83873 59349,

K₉_≈0.57571 68456 67243 64328 08087,

K₁₀_≈0.83615 82236 77116 00233 16115,

K11≈1.25179 63251 14070 86480 31485,

K12≈1.92201 04035 18847 36012 85304.

These are consistent with those given in Chiu and Quine (1997). In particular, K₂ = 1₄pπlog 2. Further simplification of this formula can be obtained as above, but the resulting integral expression is not much simpler than

Γ(1

Z 1 0

u−1d+v−

d −2

d−2

u−1−1dv−1−

u−1+v−1−1−1−

d _d_u_d_v_.

4 Asymptotics of the number of chain records

We consider in this section the number of chain records of random samples from d-dimensional simplex; the tools we use are different from [17] and apply also to chain records for hypercube random samples, which will be briefly discussed. For other types of results, see[17].

4.1 Chain records of random samples from

d

-dimensional simplex

Assume thatp₁, . . . ,p_nare iud in thed-dimensional simplexS_d. LetY_ndenote the number of chain records of this sample. ThenY_nsatisfies the recurrence

Y_n=d 1+Y_I_n (n_≥1), (8)

withY₀:=0, where

P(I_n=k) =πn,k=d

−1

Z 1

tkd(1₋td)n−1−k(1₋t)d−1dt, for 0_≤k<n. An alternative expression for the probability distributionπn,kis

πn,k=

−1

0≤j<d

−1

(₋1)j

Γ(n₋k)Γk+ j+_d1 Γn+ j+_d1

(22)

which is more useful from a computational point of view. Let

(z+1)_{· · ·}(z+d)₋d!=z Y

1≤ℓ<d

(z−λℓ),

where theλℓ’s are all complex (6∈R), except whend is even (in that case,−d−1 is the unique real

zero among_{λ1, . . . ,λd−1}). Interestingly, the same equation also arises in the analysis of random

increasingk-trees (see[9]), in some packing problem of intervals (see[4]), and in the analysis of sorting and searching problems (see[7]).

Theorem 3. The number of chain records Y_nfor random samples from d-dimensional simplex is

asymp-totically normally distributed in the following sense

sup

x∈R P



Yn−µslogn

σs

logn < x



₋Φ(x)

(logn)−1/2, (9)

whereµs:=1/(d Hd)andσs:=

H_d(2)/(d H3_d). The mean and the variance are asymptotic to

E[Yn] =

H_n

d H_d +c1+O(n

−ǫ₎_, ₍₁₀₎

V[Yn] =

H(_d2)

d H_d3Hn+c2+O(n

−ǫ₎_, ₍₁₁₎

respectively, for someǫ >0, where

c₁= 1

d H_d X

1≤ℓ<d

−λℓ

−ψ

ℓ

c2=

1 6+

π2

6d2H_d2 −

2H(_d3)

3H_d3 +

(H_d(2))2

2H_d4 +

d2H_d2 X

1≤ℓ<d

ψ′

−λℓ

−ψ′

_ℓ

+c1H

(2)

H2_d −

2d!

H_d X

j≥1

(d j+1)_{· · ·}(d j+d)(Hd j+d−Hd j)

((d j+1)_{· · ·}(d j+d)₋d!)2 .

Hereψ(x)denotes the derivative oflogΓ(x).

The error terms in (10) and (11) can be further refined, but we content ourselves with the current forms for simplicity.

Expected number of chain records. We begin with the proof of (10). Consider the meanµn :=

E[Y_n]. Thenµ0=0 and, by (8),

µn=1+

0≤k<n

πn,kµk (n≥1). (12)

Let ˜f(z):=e−zP_n_≥₀µnzn/n! denote the Poisson generating function ofµn. Then, by (12),

f(z) + f˜′(z) =1+d Z 1

(23)

Let ˜f(z) =P_n_≥₀µ˜nzn/n!. Taking the coefficients ofzn on both sides gives the recurrence

µn+µ˜n+1=

(d n+1)_{· · ·}(d n+d)µ˜n (n≥1).

Solving this recurrence using ˜µ1=1 yields

µn= (−1)n−1

1≤j<n

1− d!

(d j+1)_{· · ·}(d j+d)

(n≥1).

It follows that forn_≥1

µn=

1≤k≤n

µk=

1≤k≤n

(₋1)k−1 Y 1≤j<k

1₋ d!

(d j+1)_{· · ·}(d j+d)

. (13)

This is an identity with exponential cancelation terms; cf.[17]. In the special case whend=2, we have an identity

µn=

H_n+2 3 .

No such simple expression is available ford_≥3 since there are complex-conjugate zeros; see (14).

Exact solution of the general recurrence. In general, consider the recurrence

a_n=b_n+ X

0≤k<n

πn,kak (n≥1),

witha₀=0. Then the same approach used above leads to the recurrence ˜

an+1=−

1₋ d!

(d n+1)_{· · ·}(d n+d)

an+˜bn+˜bn+1,

which by iteration gives ˜

a_n+1=

0≤k≤n

(₋1)k˜b_n₋_k+˜b_n₋_k+1

0≤j<k

1₋ d!

(d(n− j) +1)_{· · ·}(d(n− j) +d)

by definingb₀=˜b₀=0. Then we obtain the closed-form solution

an=

1≤k≤n

ak.

A similar theory of “d-analogue” to that presented in [13] can be developed (by replacing 2d/jd

there byd!/((d j+1)_{· · ·}(d j+d))).

(1)

where the O-term can be made more explicit if needed. Note that to get this expression, we used the identity

(z+1)_{· · ·}(z+d)₋d!

z =

1≤j≤d

d!Γ(z+d− j+1) (d₋j+1)!Γ(z+1).

The probability generating function. LetP_n(y):=E[yYn_{]. Then}_P

0(y) =1 and forn≥1

P_n(y) = y X

0≤k<n

πn,kPk(y). The same procedure used above leads to

P_n(y) = X

0≤k≤n _n

(₋1)k Y

0≤j<k

1− d!y

(d j+1)_{· · ·}(d j+d)

=1+ (y₋1) X

1≤k≤n _n

(₋1)k−1 Y

1≤j<k

1₋ d!y

(d j+1)_{· · ·}(d j+d)

(n_≥0). Let now|y−1|be close to zero and

(z+1)_{· · ·}(z+d)₋d!y = Y

1≤ℓ≤d

z−λℓ(y)

Note that theλℓ’s are analytic functions of y. Letλd(y)denote the zero with λd(1) =0. Then we have

P_n(y) =1₋ y−1 2πi

Z 1−ǫ+i∞ 1−ǫ−i∞

Γ(n+1)Γ(₋s)

Γ(n+1−s) φ(s,y)ds, whereǫ >0 and

φ(s,y) =

Γs−λd(y)

d Γ(s+1)Γ1−λd(y)

1≤ℓ<d

Γs−λℓ(y) d

Γ1+ℓ d

Γs+ ℓ

Γ1−λℓ(y) d

Note that for y₆=1,φ(0,y) =1₋y. When y _∼1, the dominant zero isλd(y), and we then deduce that

P_n(y) =Q(y)nλd(y)/d₊_O(_|₁₋ _y_|_n−ǫ_),

where

Q(y):= d(y−1) λd(y)Γ(1+

λd(y)

d ) Y

1≤ℓ<d

Γλd(y)−λℓ(y) d

Γ1+_dℓ Γλd(y)+ℓ

Γ1₋λℓ(y) d

. By writing(z+1)_{· · ·}(z+d)₋d!y=0 as

(1+z)_{· · ·}

1+ z d

−1= y−1, and by Lagrange’s inversion formula, we obtain

λd(y) = y−1

Hd −

H2_d₋H_d(2)

2H3_d (y−1)

2₊_O

(2)

From this we then getQ(1) =1+O(_|y₋1_|)and λd(eη) =

η Hd

(2)

d 2H_d3η

−2HdH (3)

d −3(H

(2)

d )

6H5_d η

3₊_O( |η|4),

for small _|η|. This is a typical situation of the quasi-power framework (see [15, 22]), and we deduce (10), (11) and the Berry-Esseen bound (9). The expression forc2 is obtained by an ad-hoc

calculation based on computing the second moment (the expression obtained by the quasi-power framework being less explicit).

Whend=2, a direct calculation leads to the identity

V[Y_n] = 5 27Hn+

2π2

27 + H(2)_n

9 −

26 27−

2 9

X j≥1

   2j−1

j2 n+j

− 2j

(j+1₂)2 n+j+12

  ,

forn_≥1, which is also an asymptotic expansion. This is to be contrasted withE[Y_n] = (H_n+2)/3.

4.2 Chain records of random samples from hypercubes

In this case, we have, denoting still byY_n the number of chain records in iud random samples from [0, 1]d_,

Y_n=d 1+Y_I_n (n_≥1), withY0=0 and

P(I_n=k) = _n

−1 k

Z 1 0

tk(1₋t)n−1−k(−logt) d−1

(d₋1)! dt. LetP_n(y):=E[yYn_{]. Then the Poisson generating function ˜}P_(z,_y₎_:=_e−zP

n≥0Pn(y)zn/n! satisfies ˜

P_(z,_y_{) +} ∂ ∂z

P_(z,_y_{) =} _y Z 1

P_(tz,_y₎(−logt) d−1

(d₋1)! dt, with ˜P(0,y) =1. We then deduce that

Pn(y) =1+ X

1≤k≤n _n

(₋1)k Y

1≤j≤k

1₋ y jd

Consequently, by Rice’s integral representation[14], P_n(y) = 1

2πi

Z 1−ǫ+i∞ 1−ǫ−i∞

Γ(n+1)Γ(₋s) Γ(n+1₋s)Γ(s+1)d

1≤ℓ≤d

Γ(s+1−y1/de2ℓπi/d) Γ(1₋ y1/d_e2ℓπi/d₎ ds. If_|y₋1_|is close to zero, we deduce that

P_n(y) = n y1/d₋₁

Γ(y1/d₎1/d Y

1≤ℓ<d

Γ(y1/d(1₋e2ℓπi/d)) Γ(1₋y1/d_e2ℓπi/d₎

1+O(n−ǫ). A very similar analysis as above then leads to a Berry-Esseen bound forYn as follows.

(3)

Theorem 4. The number of chain records Y_n for iud random samples from the hypercube [0, 1]d satisfies

sup x∈R



Yn−µhlogn σh

logn < x



₋Φ(x) =O

(logn)−1/2,

whereµh=σh:=1/d. The mean and the variance are asymptotic to

E[Y_n] = 1

dlogn+γ+ 1 d

1≤ℓ<d

ψ(1−e2ℓπi/d) +O(n−ǫ),

V[Y_n] = 1

d2logn+

γ d −

π2 6d + 1

d2 X

1≤ℓ<d

ψ1−e2ℓπi/d+1−2e2ℓπi/dψ′1−e2ℓπi/d+O(n−ǫ),

for someǫ >0.

The asymptotic normality (without rate) was already established in[17]. In the special case whend=2, more explicit expressions are available

E[Y_n] = Hn+1

2 , V[Yn] =

Hn+Hn(2)−2

4 ,

forn_≥1.

5 Dominating records in the

d

-dimensional simplex

We consider the mean and the variance of the number of dominating records in this section. Let Zn denote the number of dominating records of n iud points p1, . . . ,pn in the d-dimensional simplexS_d.

Theorem 5. The mean and the variance of the number of dominating records for iud random samples from the d-dimensional simplex are given by

E[Z_n] = X

1≤k≤n

(d!)kΓ(k)d

Γ(d k+1), (15)

V[Z_n] =2 X

2≤k≤n

(d!)kΓ(k)d Γ(d k+1) H

(d)

k−1+

1≤k≤n

(d!)kΓ(k)d Γ(d k+1) −

1≤k≤n

(d!)kΓ(k)d Γ(d k+1)

, (16)

respectively. The corresponding expressions for iud random samples from hypercubes are given by H(_nd) and H(_nd)−H(2_n d), respectively.

(4)

Proof.

E[Z_n] = X

1≤k≤n

P p_k is a dominating record

= X

1≤k≤n (d!)k

Z Sd

1≤i≤d x_i

!k−1

= X

1≤k≤n (d!)k

k Y

1≤j<d

Γ(k)Γ(jk+1) Γ((j+1)k+1).

Thus, we obtain (15). For largenand boundedd, the partial sum converges to the series

E[Z_n]_→X k≥1

(d!)kΓ(k)d Γ(d k+1),

at an exponential rate. For larged, the right-hand side is asymptotic to

E[Z_n] =1+O

(d!)2

(2d)!

=1+O

4−dpd

, by Stirling’s formula.

Similarly, for the second moment, we have

E[Z_n2]₋E[Z_n] =2 X

2≤k≤n X

1≤j<k

Pp_j andp_k are both dominating records

=2 X

2≤k≤n X

1≤j<k (d!)k

Z Sd

y≺x

Y yi

j−1Y

xi k−j−1

dydx

=2 X

2≤k≤n (d!)k

Z Sd

   X

1≤j<k Z

y≺x

Y (y_i/xi)

j−1

dy   

x_ik−2 dx

=2 X

2≤k≤n

(d!)kH(_kd₋₁) Z

Y xi

k−1

=2 X

2≤k≤n

(d!)k_Γ(k)d Γ(d k+1) H

(d)

k−1,

and we obtain (16).

For largen, the right-hand side of (16) converges to

2X k≥2

(d!)kΓ(k)d Γ(d k+1) H

(d)

k−1+

X k≥1

(d!)kΓ(k)d Γ(d k+1) −

X k≥1

(d!)kΓ(k)d Γ(d k+1)

at an exponential rate, which, for large d, is asymptotic to 3pπd4−d. This explains the curves corresponding toZ_n in Figure 3.

(5)

Acknowledgements

We thank the referees and Alexander Gnedin for helpful comments.

References

[1] Z.-D. Bai, L. Devroye, H.-K. Hwang and H.-T. Tsai, Maxima in hypercubes,Random Structures Algorithms27(2005), 290–309. MR2162600

[2] Z.-D. Bai, H.-K. Hwang, W.-Q. Liang and T.-H. Tsai, Limit theorems for the number of maxima in random samples from planar regions, Electron. J. Probab. 6 (2001), no. 3, 41 pp. (elec-tronic). MR1816046

[3] J. Baik, P. Deift and K. Johansson, On the distribution of the length of the longest increasing subsequence of random permutations,J. Amer. Math. Soc.12(1999), 1119–1178. MR1682248 [4] Yu. Baryshnikov and A. Gnedin, Counting intervals in the packing process,Ann. Appl. Probab.

11(2001), 863–877. MR1865026

[5] W.-M. Chen, H.-K. Hwang and T.-H. Tsai, Maxima-finding algorithms for multidimensional samples: A two-phase approach, preprint, 2010.

[6] H.-H. Chern, M. Fuchs and H.-K. Hwang, Phase changes in random point quadtrees, ACM Trans. Algorithms3(2007), no. 2, Art. 12, 51 pp. MR2335295

[7] H.-H. Chern, H.-K. Hwang and T.-H. Tsai, An asymptotic theory for Cauchy-Euler differential equations with applications to the analysis of algorithms,J. Algorithms44 (2002), 177–225. MR1933199

[8] S. N. Chiu and M. P. Quine, Central limit theory for the number of seeds in a growth model in Rd with inhomogeneous Poisson arrivals,Ann. Appl. Probab.7(1997), 802–814. MR1459271 [9] A. Darrasse, H.-K. Hwang, O. Bodini and M. Soria, The connectivity-profile of random

increas-ingk-trees, preprint (2009); available at arXiv:0910.3639v1.

[10] J.-D. Deuschel and O. Zeitouni, Limiting curves for iid records,Ann. Probab.23(1995), 852– 878. MR1334175

[11] L. Devroye, Universal limit laws for depths in random trees,SIAM J. Comput.28(1999), 409– 432. MR1634354

[12] P. Flajolet, X. Gourdon and P. Dumas, Mellin transforms and asymptotics: harmonic sums, Theoret. Comput. Sci.144(1995), 3–58. MR1337752

[13] P. Flajolet, G. Labelle, L. Laforest and B. Salvy, Hypergeometrics and the cost structure of quadtrees,Random Structures Algorithms7(1995), 117–144. MR1369059

[14] P. Flajolet and R. Sedgewick, Mellin transforms and asymptotics: finite differences and Rice’s integrals,Theoret. Comput. Sci.144(1995), 101–124. MR1337755

(6)

[15] P. Flajolet and R. Sedgewick,Analytic Combinatorics, Cambridge University Press, Cambridge, 2009. MR2483235

[16] A. Gnedin, Records from a multivariate normal sample,Statist. Prob. Lett.39 (1998) 11–15. MR01649331

[17] A. Gnedin, The chain records,Electron. J. Probab.12(2007), 767–786. MR2318409

[18] C. M. Goldie and S. Resnick, Records in a partially ordered set,Ann. Probab17(1989), 678– 699. MR985384

[19] C. M. Goldie and S. Resnick, Many multivariate records,Stoch. Process. Appl.59(1995), 185– 216. MR1357651

[20] E. Hashorva and J. Hüsler, On asymptotics of multivariate integrals with applications to records,Stoch. Models18(2002), 41–69. MR1888285

[21] E. Hashorva and J. Hüsler, Multiple maxima in multivariate samples, Stoch. Prob. Letters 75 (2005), 11–17. MR2185605

[22] H.-K. Hwang, On convergence rates in the central limit theorems for combinatorial structures, European J. Combin.19(1998), 329–343. MR1621021

[23] M. Kałuszka, Estimates of some probabilities in multidimensional convex records,Appl. Math. (Warsaw)23(1995), 1–11. MR1330054

[24] A. M. Vershik and S. V. Kerov, Asymptotic behavior of the Plancherel measure of the sym-metric group and the limit form of Young tableaux,Soviet Math. Dokl.233(1977), 527–531. MR0480398

getdocfac8. 357KB Jun 04 2011 12:05:14 AM

Multivariate records based on dominance

Hsien-Kuei Hwang and Tsung-Hsi Tsai

Institute of Statistical Science

Academia Sinica

Taipei 115

Taiwan

1

Introduction

2

Asymptotics of the number of Pareto records

3

Numerical evaluations of the leading constants

4

Asymptotics of the number of chain records

4.1

Chain records of random samples from

d

-dimensional simplex

4.2

Chain records of random samples from hypercubes

5

Dominating records in the

d

-dimensional simplex

Acknowledgements

References

Parts

Dokumen yang terkait

AN ALIS IS YU RID IS PUT USAN BE B AS DAL AM P E RKAR A TIND AK P IDA NA P E NY E RTA AN M E L AK U K A N P R AK T IK K E DO K T E RA N YA NG M E N G A K IB ATK AN M ATINYA P AS IE N ( PUT USA N N O MOR: 9 0/PID.B /2011/ PN.MD O)

ANALISIS FAKTOR YANGMEMPENGARUHI FERTILITAS PASANGAN USIA SUBUR DI DESA SEMBORO KECAMATAN SEMBORO KABUPATEN JEMBER TAHUN 2011

EFEKTIVITAS PENDIDIKAN KESEHATAN TENTANG PERTOLONGAN PERTAMA PADA KECELAKAAN (P3K) TERHADAP SIKAP MASYARAKAT DALAM PENANGANAN KORBAN KECELAKAAN LALU LINTAS (Studi Di Wilayah RT 05 RW 04 Kelurahan Sukun Kota Malang)

FAKTOR Â– FAKTOR YANG MEMPENGARUHI PENYERAPAN TENAGA KERJA INDUSTRI PENGOLAHAN BESAR DAN MENENGAH PADA TINGKAT KABUPATEN / KOTA DI JAWA TIMUR TAHUN 2006 - 2011

A DISCOURSE ANALYSIS ON “SPA: REGAIN BALANCE OF YOUR INNER AND OUTER BEAUTY” IN THE JAKARTA POST ON 4 MARCH 2011

Pengaruh kualitas aktiva produktif dan non performing financing terhadap return on asset perbankan syariah (Studi Pada 3 Bank Umum Syariah Tahun 2011 – 2014)

Pengaruh pemahaman fiqh muamalat mahasiswa terhadap keputusan membeli produk fashion palsu (study pada mahasiswa angkatan 2011 & 2012 prodi muamalat fakultas syariah dan hukum UIN Syarif Hidayatullah Jakarta)

Pendidikan Agama Islam Untuk Kelas 3 SD Kelas 3 Suyanto Suyoto 2011

ANALISIS NOTA KESEPAHAMAN ANTARA BANK INDONESIA, POLRI, DAN KEJAKSAAN REPUBLIK INDONESIA TAHUN 2011 SEBAGAI MEKANISME PERCEPATAN PENANGANAN TINDAK PIDANA PERBANKAN KHUSUSNYA BANK INDONESIA SEBAGAI PIHAK PELAPOR

KOORDINASI OTORITAS JASA KEUANGAN (OJK) DENGAN LEMBAGA PENJAMIN SIMPANAN (LPS) DAN BANK INDONESIA (BI) DALAM UPAYA PENANGANAN BANK BERMASALAH BERDASARKAN UNDANG-UNDANG RI NOMOR 21 TAHUN 2011 TENTANG OTORITAS JASA KEUANGAN

Dokumen yang Anda mencari sudah siap untuk unduhkan