Statistics authors titles recent submissions

Parameter Regimes in Partial Functional
Panel Regression


arXiv:1709.05786v1 [stat.ME] 18 Sep 2017

Dominik Liebl
University of Bonn
Institute for Financial Economics and Statistics
dliebl@uni-bonn.de
and
Fabian Walders
University of Bonn
BGSE & Institute for Financial Economics and Statistics
fwalders@uni-bonn.de

Abstract
We propose a partial functional linear regression model for panel data with time
varying parameters. The parameter vector of the multivariate model component is
allowed to be completely time varying while the function-valued parameter of the
functional model component is assumed to change over K < ∞ unknown parameter

regimes. We derive consistency for the suggested estimators and for our classification procedure used to detect the K unknown parameter regimes. In addition, we
derive the convergence rates of our estimators under a double asymptotic where we
differentiate among different asymptotic scenarios depending on the relative order of
the panel dimensions n and T . The statistical model is motivated by our real data
application, where we consider the so-called “idiosyncratic volatility puzzle” using
high frequency data from the S&P500.

Keywords: functional data analysis, mixed data, partial functional linear regression model,
classification, idiosyncratic volatility puzzle

We want to thank Alois Kneip (University of Bonn) and Michael Vogt (University of Bonn) for fruitful
discussions and valuable comments which helped to improve this research work.


1

1

Introduction


The availability of mixed—functional and multivariate—data types and the need to analyze
such data types appropriately, has trigged the development of new statistical models and
procedures. In this work we consider the so-called partial functional linear model for
scalar responses, which combines the classical functional linear regression model (see, e.g.,
Hall and Horowitz, 2007) with the classical multivariate regression model. This model
was first proposed by Zhang et al. (2007) and Schipper et al. (2008)—two mixed effects
modeling approaches. The first theoretical work is by Shin (2009), who uses a functionalprincipal-components-based estimation procedure and derives convergence rates for the case
of independent cross-sectional data. Recently, the partial functional linear regression model
was extended in several directions. Shin and Lee (2012) consider the case of prediction, Lu
et al. (2014) and Tang and Cheng (2014) focus on quantile regression, Kong et al. (2016)
consider the case of a high-dimensional multivariate model component, Peng et al. (2016)
allow for varying coefficients in the multivariate model component, and Wang et al. (2016)
and Ding et al. (2017) are concerned with a functional single-index model component.
Motivated by our real data application, we deviate from this literature and contribute a
new partial functional linear panel regression model with time varying parameters allowing
for K < ∞ latent parameter regimes that can be estimated from the data. In the theoretical
part of this work we show consistency of our estimators and of our unsupervised classification procedure identifying the K parameter regimes. In addition, we derive the convergence
rates of our estimators under a double asymptotic where we differentiate among different
asymptotic scenarios depending on the relative order of the panel dimensions n and T .
The consideration of time varying parameters is quite novel in the literature on functional data analysis. To the best of our knowledge, the only other work concerned with this

issue is Horváth and Reeder (2012), who focus on testing the hypothesis of a time constant
parameter function in the case of a classical fully-functional regression model. Closely related to the partial functional linear model is the so-called Semi-Functional Partial Linear
(SFPL) proposed by Aneiros-Pérez and Vieu (2006), where the functional component consists of a nonparametric functional regression model instead of a functional linear regression
model. The SFPL model is further investigated by Aneiros-Pérez and Vieu (2008), Lian
(2011), Zhou and Chen (2012), and Aneiros-Pérez and Vieu (2013), among others. Readers
with a general interest in functional data analysis are referred to the textbooks of Ramsay
and Silverman (2005), Ferraty and Vieu (2006), Horváth and Kokoszka (2012), and Hsing
and Eubank (2015).
The usefulness of our model and the applicability of our estimation procedure is demonstrated by means of a simulation study and a real data application. For the latter we consider the so-called “idiosyncratic volatility puzzle” using high frequency stock-level data
from the S&P500. Our model allows us to consider this puzzle at a much less aggregated
time scale than considered so far in the literature. This leads to new insights into the
temporal heterogeneity in the pricing of the idiosyncratic risk component.
The remainder of this work is structured as follows. In Sections 2 and 3 we introduce
the model and present the estimation procedure. Section 4 contains our main assumptions
and asymptotic results. Section 5 discusses the practical choice of the tuning parameters
2

involved. The finite sample performance of the estimators is explored in Section 6. Section
7 offers an empirical study examining regime dependent pricing of idiosyncratic risk in the
US stock market. Section 8 contains a short conclusion. All proofs can be found in the

online supplement supporting this article.

2

Model

We introduce a partial linear regression model for panel data which allows us to model
the time varying effect of a square integrable random function Xit ∈ L2 ([0, 1]) on a scalar
response yit in the presence of a finite dimensional random variable zit ∈ RP . Indexing the
cross-section units i = 1, . . . , n and time points t = 1, . . . , T , our statistical model reads as
Z 1
αt (s)Xit (s)ds + βt⊤ zit + ǫit ,
(1)
yit = µt +
0

where µt is a time fixed effect1 , αt ∈ L2 ([0, 1]) is a time varying deterministic functional
parameter, βt ∈ RP is a time varying deterministic parameter vector, and ǫit is a scalar error
term with finite but potentially time heteroscedastic variances (see also our assumptions
in Section 4).

The unknown function-valued parameters αt are assumed to differ across unknown
time regimes Gk ⊂ {1, . . . , T }. That is, each regime Gk is associated with a regime specific
parameter function Ak ∈ L2 [0, 1], such that
αt (s) ≡ Ak (s) if t ∈ Gk .

(2)

The regimes G1 , . . . , GK are mutually exclusive subsets of {1, . . . , T } and do not have to
consist of subsequent time points t. We have in mind the situation where Gk is a collection
of time points t that belong to the kth risk regime. The kth risk regime is described by
the function-valued slope parameter Ak —the vector-valued slope parameters βt describe the
marginal effects of additional control variables zit . The joint and the marginal distributions
of Xit , zit and ǫit are allowed to vary over the different regimes Gk .
Model (1) nests several different model specifications. It might be the case that K = 1
and hence G1 = {1, . . . , T }. In this situation the effect of the random function on the
response is time invariant. The classical functional or the classical multivariate linear
regression model are obtained if βt = 0 or αt = 0 for all t = 1, . . . , T .

3


Estimation

Our objective is to estimate the model parameters Ak , βt , and the regimes G1 , . . . , GK
from realizations of the random variables {(yit , Xit , zit ) : 1 ≤ i ≤ n, 1 ≤ t ≤ T }. For
this, we propose a three-step estimation procedure. The first step is a pre-estimation step
where model (1) is fitted to the data separately for each t = 1, . . . , T . This pre-estimation
1

We use the term time fixed effect in the sense that µt is a latent time specific random variable, which
is potentially correlated with the regressors.

3

step reveals information about the regime memberships, which is used in the second step
where we apply our unsupervised classification procedure in order to estimate the regimes
G1 , . . . , GK . The third step is the final estimation step, in which we improve estimation of
the functional parameter Ak by employing the information about the regime membership
gathered in step two. The general procedure is inspired by the work of Vogt and Linton
(2017), but differs from it as we consider a functional data context which demands for a
different estimation procedure. In the following we explain the three estimation steps in

more detail:
Step 1 In this step, we pre-estimate the parameters αt and compute the final estimates
of βt separately for each t = 1, . . . , T . Estimation starts from removing the fixed effect
µt using a classical within-transformation. For this we denote the centered variables
Pn as
c
c
c
c
−1
yit = yit −Pȳt , Xti = Xit − X̄P
t , zit = zit − z̄it , and ǫ
it = ǫit − ǭit , where ȳt = n
i=1 yit ,
P
n
n
n
−1
−1

−1
X̄t = n
i=1 Xit , z̄t = n
i=1 zit , and ǭt = n
i=1 ǫit . Then, the within-transformed
version of model (1) is
Z 1
c
αt (s)Xitc (s)ds + βt⊤ zitc + ǫcit .
yit =
0

By adapting the methodology in Hall and Horowitz (2007), we estimateP
the slope parameter
t
αt using (t-wise) truncated series expansions of αt and Xit , i.e., αt (s) ≈ m
j=1 hαt , φ̂jt iφ̂jt (s) =
Pmt
Pmt
c

c
j=1 hXit , φ̂jt iφ̂jt (s), where φ̂jt denotes the eigenfunction corj=1 ajt φ̂jt (s) and Xit (s) ≈
responding to the jth largest eigenvalue λ̂jt of the empirical covariance operator Γ̂t , with
(Γ̂t x)(u) :=

Z

1

K̂Xt (u, v)dv
0

for all x ∈ L2 ([0, 1])

n

1X c
and K̂Xt (u, v) :=
Xit (u)Xitc (v).
n i=1

The eigenfunctions φ̂jt and eigenvalues λ̂jt are defined as the solutions of the eigen-equations
R1
K̂Xt (u, v)φ̂jt (u)dv = λ̂jt φ̂jt (u), where hφ̂jt , φ̂ℓt i = 1 for all j = ℓ and hφ̂jt , φ̂ℓt i = 0 if j 6= ℓ,
0
where j, ℓ ∈ {1, 2, . . . , n}. This leads to the following pre-estimators of the functional slope
parameter αt and the final estimators of the vector-valued slope parameter βt :
mt
X

n

1X c
hXit , φ̂jt i(yitc − β̂t⊤ zitc ),
n
i=1
j=1
i

i−1 h
h

K̂zy,t − Φ̂t K̂yX,t ,
β̂t = K̂z,t − Φ̂t (K̂zX,t )
α̂t =

âjt φ̂jt ,

with âjt = λ̂−1
jt

4

and

where
n

K̂z,t

1 X c c⊤
:=
zit zit , K̂zX,t (s) := [K̂z1 X,t (s), . . . , K̂zP X,t (s)]⊤ , K̂zy,t := [K̂z1 y,t , . . . , K̂zP y,t ]⊤ ,
n i=1
n

K̂yX,t (s) :=

1X c c
y X (s),
n i=1 it it

n

K̂zp X,t (s) :=


Φ̂t (g) := [Φ̂1,t (g), . . . , Φ̂P,t (g)] ,

1X c
z X c (s),
n i=1 p,it it

Φ̂p,t (g) :=

mt
X
hK̂zp X,t , φ̂jt ihφ̂jt , gi

λ̂jt

j=1

and Φ̂t (K̂zX,t ) := [Φ̂p,t (K̂zq X,t )]1≤p≤P,

n

K̂zp y,t :=

1X c c
z y ,
n i=1 p,it it

for any g ∈ L2 ([0, 1]),

1≤q≤P ;

see Shin (2009) for similar estimators in a cross-section context.
For our theoretical analysis, we let mt = mt,nT → ∞ as n, T → ∞. In practice, the
cut-off parameter mt can be chosen, for instance, by Cross Validation (CV) or by a suitable
information criterion as introduced in Section 5.
Besides computing the final estimates β̂t of βt , this first estimation step is intended to
pave the way for the classification procedure in Step 2 where we aim to detect systematic
differences in the estimates α̂t and α̂s across different time points t 6= s. For this it
is necessary to distinguish systematically large from small differences between estimators,
which can be achieved by comparing these differences to an appropriate threshold. However,
the estimators α̂t are not well suited for deriving a practically useful threshold parameter.
We, therefore, compute the following transformed estimates, for which it is straightforward
to derive a valid threshold parameter using distributional arguments (see Section 5):
(∆)
α̂t

:=

m
1/2
X
λ̂jt

σ̂ǫ,t

j=1

2
where σ̂ǫ,t
:= n−1

Pn

i=1

yitc − hα̂t , Xitc i + β̂t⊤ zitc

2

âjt φ̂jt ,

(3)

and m := min1≤t≤T mt .

Step 2 In this step, we use the scaled estimators α̂t∆ of (3) to classify time points t = 1 . . . , T
into regimes G1 , . . . , GK . Our classification algorithm tries to detect systematic differences
ˆ ts := ||α̂t(∆) − α̂s(∆) ||22 , where ||.||22 denotes the squared L2 norm
in the empirical distances ∆
defined as ||x||22 = hx, xi for all x ∈ L2 ([0, 1]).
ˆ ts . If ∆
ˆ ts
The algorithm detects regimes by iteratively searching for large differences ∆
exceeds the value of a threshold parameter τnT > 0, it classifies periods t and s in different
regimes. The procedure is initialized by setting S (0) := {1, . . . , T } and then iterates over
k:
while |S (k) | > 0 do
select any t ∈ S (k) , Ĝk+1 ← ∅, S (k+1) ← ∅
for s ∈ S (k) do
ˆ ts ≤ τnT then
if ∆
Ĝk+1 ← Ĝk+1 ∪ {s}
else S (k+1) ← S (k+1) ∪ {s}
5

end if
end for
end while
The algorithm stops as soon as all time points t are classified into regimes and the total
number K̂ of estimated regimes Ĝ1 , . . . , ĜK̂ can serve as a natural estimate for the true K.
Our theoretical results show that this procedure consistently estimates the true regimes Gk
and the true number K. However, in order to improve the classification in finite samples, we
propose to augment the classification algorithm using a Kmax parameter where K ≤ Kmax ,
where the practical choice of Kmax is described in Section 5. Then the algorithm stops
after at most Kmax − 1 iterations and assigns all remaining time points t to the final regime
ĜKmax , such that K̂ ≤ Kmax .
Step 3 In this step, we build upon the regime structure determined in Step 2 in order
to estimate Ak , k = 1, . . . , K̂. Let Xitcc denote
the regime specific centered functional
P
regressors defined as Xitcc := Xit − |Ĝk |−1 t∈Ĝk X̄t and define the k-specific empirical
covariance operator Γ̃k by
Z 1
(Γ̃k x)(u) :=
K̃Xk (u, v)dv for all x ∈ L2 ([0, 1])
0

where K̃Xk (u, v) :=

1

n|Ĝk |

n X
X

Xitcc (u)Xitcc (v).

i=1 t∈Ĝk

Our final estimator of Ak is obtained, as were the pre-estimators α̂t , according to
Ãk =

m̃k
X

ãj,k φ̃j,k ,

with ãj,k =

j=1

λ̃−1
jk

1
n|Ĝk |

n X
X

i=1 t∈Ĝk

hφ̃j,k , Xitcc i(yitc − β̂t⊤ zitc ),

where (λ̃jk , φ̃jk )j≥1 denotes the eigenvalue-eigenfunction pairs of the empirical covariance
operator Γ̃k . Again, for our theoretical analysis, we let m̃k = mk,nT → ∞ as n, T → ∞.
In practice the cut-off parameter m̃k can be chosen, for instance, by CV or by a suitable
information criterion as introduced in Section 5.

4

Asymptotic Theory

In the asymptotic analysis of our estimators we need to address two problems: First, there
is a classification error contaminating the estimation of Ak . Second, the estimation of tspecific parameters βt cannot be separated from the estimation of the regime specific Ak .
In the following we list our theoretical assumptions, allowing us to deal with these aspects
in a large sample framework.
A1

1. {(Xit , zit , ǫit ) : 1 ≤ i ≤ n, t ∈ Gk } are strictly stationary and independent
over the index i for any 1 ≤ k ≤ K. Further, the errors ǫit are centered and
independent over the index 1 ≤ t ≤ T as well.
6

2. For every 1 ≤ k ≤ K and 1 ≤ i ≤ n, the sequence {Xit : t ∈ Gk } is L4 -m
approximable in the sense of Definition 2.1 in Hörmann and Kokoszka (2010).
3. For every 1 ≤ k ≤ K and 1 ≤ i ≤ n, the sequence {zit : t ∈ Gk } is m-dependent.

4. Suppose that E [||Xit ||42 ] < ∞, E [zit4 ] < ∞, E [ǫ4it ] < ∞ for any 1 ≤ i ≤ n and
1 ≤ t ≤ T,

5. ǫit is independent of Xjs and zjs for any 1 ≤ i, j ≤ n and 1 ≤ t, s ≤ T .

A2 Suppose there exist constants 0 < Cλ , Cλ′ , Cθ , Ca , CzX , Cβ < ∞, such that the following
holds for every k = 1, . . . , K:
1. Cλ−1 j −µ ≤ λjk ≤ Cλ j −µ and λjk − λj+1,k ≥ Cλ′ j −(µ+1) , j ≥ 1 for the eigenvalues
λ1,k > λ2,k > . . . of the covariance operator Γk of Xit , t ∈ Gk , and a µ > 1,
2. E [hXit , φjk i4 ] ≤ Cθ λ2jk for the eigenfunction φjk of Γk corresponding to the jth
eigenvalue,
3. |hAk , φjk i| ≤ Ca j −ν uniformly over j ≥ 1

4. |hKzp X,k , φjk i| ≤ CzX j −(µ+ν) , for any 1 ≤ p ≤ P , where Kzp X,k := E[Xit zp,it ],

2
5. sup1≤t≤T βp,t
≤ Cβ , for any 1 ≤ p ≤ P , with βp,t the p-th coordinate in βt and

A3 Let n → ∞ and T → ∞ jointly, such that T ∝ nδ for some 0 < δ < 1 and |Gk | ∝ T .
A4 Suppose that ν > max{r1 , r2 }, where r1 := 3 − 21 µ and r2 :=

3+2µ
2(1−δ)

− 21 µ.
1

1

A5 Suppose that mt = mt,nT and m̃k = m̃k,nT with mt ∝ n µ+2ν and m̃k ∝ (n|Gk |) µ+2ν for
any 1 ≤ t ≤ T and 1 ≤ k ≤ K.
A6 Suppose that the random variables
sp,it := (zp,it − E[zp,it ]) −

Z

1
0

(Xit (u) − E[Xit ](u))


X
hKzp X,k , φjk i
j=1

λjk

!

φjk (u) du

are iid over the index i for any given t = 1, . . . , T as well as stationary, ergodic and
at most m-dependent over the index t within every regime Gk , k = 1, . . . , K. Let
E[sp,it |X1t , . . . , Xnt ] = 0 and let [E[sp,it sq,it |X1t , . . . , Xnt ]]1≤p,q≤P be a positive definite
matrix.
A7

1. There exists some C∆ > 0 such that for any 1 ≤ k ≤ K and any t ∈ Gk
(
≥ C∆ if s 6∈ Gk
(∆)
||αt − αs(∆) ||22 =: ∆ts
=0
if s ∈ Gk ,
(∆)

−1
:= σǫ,l

P∞

1/2

2
λjl hαr , φjl iφjl and σǫ,l
:= E[ǫ2ir ] for r ∈ Gl .


ˆ ts ≤ τnT → 1 as
2. The threshold parameter τnT → 0 satisfies P maxt,s∈Gk ∆
n, T → ∞ for all 1 ≤ k ≤ K.

where αr

j=1

7

Assumptions A1-A6 correspond to existing standard assumptions in the literature (see
Hall and Horowitz (2007) and Shin (2009)), adapted to our panel data version of the
partial functional linear regression model. Assumption A7 is a slightly modified version of
Assumption Cτ in Vogt and Linton (2017).
Our theoretical results establish the consistency of our classification procedure and the
convergence rates for our estimators. The convergence rates of the period-wise estimators
β̂t and α̂t of Step 1 of our estimation procedure are provided in Theorem 4.1. Lemma 4.1
establishes the uniform consistency of β̂t over all t = 1, . . . , T , which is a necessary property
for showing the consistency of our classification procedure in Theorem 4.2. Finally, Theorem 4.3 establishes the convergence rates of our estimator Ãk of Step 3 of our estimation
procedure.
Theorem 4.1 Given Assumptions A1–A6 it holds for all 1 ≤ t ≤ T that

||β̂t − βt ||2 = Op n−1
 1−2ν 
2
||α̂t − αt ||2 = Op n µ+2ν ,

where ||.|| denotes the Euclidean norm and ||.||2 the L2 norm.

Theorem 4.1 is related to the Theorems 3.1 and 3.2 in Shin (2009), though our proof
deviates from that in Shin (2009) at important instances and considers the case of panel
data. The above rates for α̂t correspond to the rates in the cross-section context of Hall
and Horowitz (2007). Theorem 4.1 is necessary to show consistency of Ãk ; however, it is
not sufficient to establish consistency of our classification algorithm. For this, we need the
following uniform consistency results:
Lemma 4.1 Under Assumptions A1–A6, it holds that

and

max1≤t≤T ||β̂t − βt ||2 = o(1), max1≤t≤T ||α̂t − αt ||22 = o(1),
(∆)
(∆)
max1≤t≤T ||α̂t − αt ||22 = o(1).

The following theorem establishes consistency of our classification procedure and is
based on our results in Lemma 4.1:
Theorem 4.2 Given Assumptions A1–A7 it holds that


P {Ĝ1 , . . . , ĜK̂ } =
6 {G1 , . . . , GK } = o(1).

The statement of Theorem 4.2 is twofold. First, it says that the number of groups K
is asymptotically correctly determined. Second, it says that the estimator Ĝk consistently
estimates the regime Gk . This notion of classification consistency is sufficient to achieve
the following rates of convergence for our estimators Ãk from Step 3 of our estimation
procedure:

8

Theorem 4.3 Given Assumptions A1–A7 it holds for all 1 ≤ k ≤ K̂ that
(
Op 
(n−1 )
if ν ≥ 1+µ+δ


2
1−2ν
||Ãk − Ak ||2 =
1+µ+δ
µ+2ν
if ν ≤ 2δ
Op (nT )
Theorem 4.3 quantifies the extent to which the estimation error ||β̂t − βt || contaminates
the estimation of Ak . In the first case (ν ≥ (1 + µ + δ)/2δ), n diverges relatively slowly in
comparison to T and, therefore, the contamination due to estimating βt is not negligible.
This results in the relatively slow convergence rate of n−1/2 , where the attribute “slow”
has to be seen in relation to our panel context with n → ∞ and T → ∞. In the second
case (ν ≤ (1 + µ + δ)/2δ), n diverges sufficiently fast such that the contamination due to
estimating βt becomes asymptotically negligible, which results in the faster convergence
rate of (nT )(1−2ν)/(µ+2ν) . The latter rate coincides with the minimax optimal convergence
result in Hall and Horowitz (2007).

Practical Choice of Tuning Parameters τnT , mt, m̃k
and Kmax

5

Inspired by the thresholding procedure in Vogt and Linton (2017), we suggest choosing the
ˆ ts = ||α̂t(∆) − α̂s(∆) ||2 under the
threshold parameter τnT based on an approximate law for ∆
2
hypothesis that t and s belong to the same regime Gk . As argued in the supplement of this
paper, the scaling of the estimators α̂t and α̂s as suggested in (3) leads, for large n, to
n (∆)

∆ts = ||α̂t − α̂s(∆) ||22 ∼ χ2m
2
2

approximately.

Hence we recommend setting the threshold τnT to be 2/n times the ατ -quantile of a χ2m
distribution, where ατ is very close to one, for instance, ατ = 0.99 or ατ = 0.999. By
scaling this quantile with 2/n, the threshold converges to zero as n tends to infinity. The
decay of the threshold τnT is sufficiently slow in order to satisfy our Assumption 7 (see
Section A.7 of the supplemental paper for more details).
For selecting the truncation parameters mt and m̃k the literature offers two general
strategies2 . The first strategy is to choose the truncation parameters in order to find an
optimal prediction. A cross validation procedure is shown, e.g., in Shin (2009). The second
one is to choose the cut-off level according to the covariance structure of the functional
regressor. This is for large sample sizes particularly convenient from a computational point
of view. We thus suggest choosing mt , and m̃k according to the eigenvalue ratio criterion
suggested in Ahn and Horenstein (2013). This choice obtains according to
mt = arg max λ̂t,l /λ̂t,l+1 ,
1≤l