SHRINKAGE ESTIMATION OF MEAN VARIANCE PO

International Journal of Theoretical and Applied Finance
Vol. 19, No. 1 (2016) 1650003 (25 pages)
c World Scientific Publishing Company

DOI: 10.1142/S0219024916500035

SHRINKAGE ESTIMATION
OF MEAN-VARIANCE PORTFOLIO

YAN LIU
Department of Finance
Ocean University of China
No. 238 Song Ling Road, Qing Dao
Shan Dong 266100, P. R. China
liuyan ouc@126.com
NGAI HANG CHAN∗
Department of Statistics
The Chinese University of Hong Kong
N.T. Shatin, Hong Kong, P. R. China
nhchan@sta.cuhk.edu.hk
CHI TIM NG

Department of Statistics
Chonnam National University
77 Yongbong-ro, Buk-gu
Gwangju 500-757, Republic of Korea
easterlyng@gmail.com
SAMUEL PO SHING WONG
Department of Statistics
The Chinese University of Hong Kong
N.T. Shatin, Hong Kong, P. R. China
samwong@sta.cuhk.edu.hk
Received 10 April 2015
Accepted 14 October 2015
Published 4 February 2016

This paper studies the optimal expected gain/loss of a portfolio at a given risk level when
the initial investment is zero and the number of stocks p grows with the sample size n.
A new estimator of the optimal expected gain/loss of such a portfolio is proposed after
examining the behavior of the sample mean vector and the sample covariance matrix
based on conditional expectations. It is found that the effect of the sample mean vector is
additive and the effect of the sample covariance matrix is multiplicative, both of which

over-predict the optimal expected gain/loss. By virtue of a shrinkage method, a new

∗ Corresponding

author.
1650003-1

Y. Liu et al.
estimate is proposed when the sample covariance matrix is not invertible. The superiority
of the proposed estimator is demonstrated by matrix inequalities and simulation studies.
Keywords: Investment analysis; matrix inequalities; mean-variance portfolio; shrinkage
covariance matrix.

1. Introduction
As a cornerstone of modern portfolio theory, the mean-variance (MV) optimization procedure (Markowitz 1952) has always been a vibrant research topic since
its establishment. Based on the assumptions that asset returns are normally distributed, all the investors are rational, risk-averse and aim to maximize economic
utility, the MV optimization procedure characterizes the asset allocation problem
as a trade-off between risk and expected gain/loss. It specifically studies two issues:
maximizing the expected gain/loss at a given risk or minimizing the risk at a given
expected gain/loss, both of which lead to the formulation of the efficient frontier,

from which the investors choose the optimal portfolios. Although mathematically
elegant, practitioners often find it difficult to locate the optimal portfolios on the
efficient frontier. Most of the time, the resulting portfolios selected according to the
theory were even inferior to the equal weighting portfolios, see Frankfurter et al.
(1971). The development of large covariance matrix provides new evidence for the
failure of the MV optimization procedure. When the number of stocks p grows
with the sample size n, which is often the case in the financial market, p/n plays
an important role in controlling the behavior of the MV procedure, see Bai et al.
(2009a). Because optimal expected gain/loss is an important criterion for comparing different portfolios at the same risk level, in this paper, we focus on the issue
of optimal expected gain/loss at a given risk level under the assumption that the
dimension to sample size ratio p/n goes to a nonzero constant y ∈ (0, ∞) as n → ∞.
1.1. Modern portfolio theory
1.1.1. Portfolio with zero initial investment
Consider a portfolio consisting of a set of long and short investments such that
the sum of investments is zero. This means that the acquisition of long position is
financed by short-selling. Examples are hedges, swaps, overlays, arbitrage portfolios
and long/short portfolios, see Korkie & Turtle (2002). Further assume that there is
no restriction on the short-selling activities.
1.1.2. Mean-variance optimization procedure
Suppose that there are totally p stocks with returns given by x = (x1 , x2 , . . . , xp )T ,

and x follows a p-dimensional multivariate normal distribution with mean µ and
covariance matrix Σ. Let the gain/loss of the portfolio be R = ω T x. Herein, ω =
(ω1 , ω2 , . . . , ωp )T contains the amounts of investments in the p stocks. In this paper,
1650003-2

Shrinkage Estimation of Mean-Variance Portfolio

short sales are allowed, which means that the components of ω can be negative.
Zero initial investment means that ω T 1 = 0, where 1 = (1, . . . , 1)T .
Since we only focus on the analysis of optimal expected gain/loss at a given risk
in this study, the problem can be described as follows:
P = max E(R) = ω T µ,

ωT Σω = σ02 ,
subject to
ωT 1 = 0,

(1.1)
(1.2)


where P = E(R) is the expected gain/loss of the portfolio and σ0 characterizes the
given risk. By the Lagrange multiplier method, the optimal expected gain/loss is
given by

(1T Σ−1 µ)2

,
(1.3)
P = σ0 µT Σ−1 µ −
1T Σ−1 1
and the corresponding ω is given by
σ0
ω∗ = 
µT Σ−1 µ −

(1T Σ−1 µ)2
1T Σ−1 1




1T Σ−1 µ
Σ−1 µ − T −1 Σ−1 1 .
1 Σ 1

(1.4)

Note that in the above expressions, both µ and Σ refer to the true values, not the
estimated values.
1.1.3. Plug-in method
Because the mean vector and the covariance matrix of the returns are unknown,
traditionally, the portfolio analysis proceeds in two steps: (a) the sample mean µ̂
and the sample covariance matrix S of returns are estimated from a time series of
historical returns; (b) then the MV problem is solved as if the sample estimates
were true values. This “certainty equivalence” viewpoint is also called the plug-in
method, and the plug-in optimal expected gain/loss becomes

(1T S −1 µ̂)2
P̂ ∗ = σ0 µ̂T S −1 µ̂ −
1T S −1 1
(1.5)


= σ0 µ̂T h(S −1 )µ̂,
where the function h is defined as

h(S −1 ) = S −1 −

S −1 11T S −1
.
1T S −1 1

(1.6)

Without loss of generality, the value σ0 is assumed to be one because it does not
affect the analysis. Suppose that the historical data are generated from sequence of
1650003-3

Y. Liu et al.

independent N (µ, Σ) random vectors. The sample mean and the sample covariance
matrix are defined as

n
n


µ̂ = n−1
xt and S = n−1
(xt − µ)(xt − µ)T .
(1.7)
t=1

t=1

∗2

If we can get a better estimate of P , then by taking square root of it, we can get
a better estimate of P ∗ . Thus in this study, we analyze P̂ ∗2 .
1.2. Literature review

Stein (1956) proved that for a p-dimensional multivariate normal distribution, with
p ≥ 3, the sample mean vector is not admissible under a quadratic loss function.

He proposed a so-called James–Stein estimator (James & Stein 1961). The essence
of this estimator is that it shrinks the maximum likelihood estimator towards a
common value, which leads to a uniformly lower risk than the sample mean vector. Ledoit & Wolf (2003, 2004a,b) proposed a shrinkage covariance matrix under
the weaker assumption that p/n was only bounded. It inherits the advantage of
the unbiasedness of the sample covariance matrix. Also, the combination of the
highly structured shrinkage target makes it stable and invertible. Bai et al. (2009a,b)
offered a new idea on this issue using random matrix theory. By the result of Bai
et al. (2007), they proved that when p/n → y ∈ (0, 1), the plug-in optimal expected
gain/loss over-predicts due to the over-dispersion of the eigenvalues of the sample
covariance matrix. They also calculated the over-prediction ratio based on the limit
spectral distribution. This method, however, is inapplicable when p/n > 1 because
the sample covariance matrix becomes singular.
The joint effect of the sample mean and the sample covariance matrix together
has rarely been considered. Although some studies have considered the Bayes and
empirical Bayes estimators of mean and covariance matrix together, their interactions usually complicate the issue, see Brown (1976), Frost & Savarino (1986),
Jorion (1986). This paper first examines the joint effect of the two quantities based
on conditional expectations. It is found that the effect of the sample mean vector
is additive and the effect of the sample covariance matrix is multiplicative. A new
estimator for evaluating the optimal expected gain/loss is then proposed. To make
the sample covariance matrix stable and invertible when p/n > 1, shrinkage covariance matrices are combined with the proposed estimator. The superiority of the new

estimator is demonstrated not only by matrix inequalities, but also by simulation
studies.
This paper is organized as follows. Section 2 explains how the asymptotically
unbiased estimator of the optimal expected gain/loss can be constructed using the
sample mean µ and the sample covariance matrix S under the assumption that
p/n → y ∈ (0, 1). Shrinkage method can further be used to handle the cases p/n →
y ≥ 1 and a new estimator is introduced in Sec. 3. In Sec. 4, simulation studies are
conducted to compare the new estimator with the previous methods, and Sec. 5
concludes.
1650003-4

Shrinkage Estimation of Mean-Variance Portfolio

2. Sample Mean and Sample Covariance Matrix
To construct asymptotically unbiased estimator of the optimal expected gain/loss, it
is necessary to study the impact of S and µ̂ on E(P̂ ∗2 ) individually. The conditional
expectation is used throughout this section. Note that E(P̂ ∗2 ) can be expressed as:
E(P̂ ∗2 ) = E[E(P̂ ∗2 | S)] = E[E(P̂ ∗2 | µ̂)].

(2.1)


Introduce the following notations:
P ∗2 = f (µ, Σ) = µT h(Σ−1 )µ,
T

P̂ ∗2 = f (µ̂, S) = µ̂ h(S −1 )µ̂,
P̂1∗2
P̂2∗2

T

−1

T

−1

= f (µ̂, Σ) = µ̂ h(Σ
= f (µ, S) = µ h(S

(2.2)
(2.3)

)µ̂,

(2.4)

)µ.

(2.5)

Herein, P ∗2 is calculated using the true mean vector and the true covariance matrix;
P̂ ∗2 is calculated by plugging in the sample mean vector and the sample covariance
matrix; P̂1∗2 is calculated by only plugging in the sample mean vector, assuming
that the covariance matrix is known; and P̂2∗2 is calculated assuming that the mean
is known while the covariance matrix unknown. Because S −1 is involved, except for
the study of P̂1∗2 , this section deals with the case p/n → y ∈ (0, 1) only. Shrinkage
estimation method will be discussed in Sec. 3 to overcome the difficulties related to
the singularity.
Inspired by the research of Bai et al. (2009a,b), we design the simulation studies
to find the pattern of the errors incurred by using µ̂ and S. Data sets of different
sample sizes n and different dimensions p are generated from multivariate normal
distributions. Without loss of generality, for each data set, the corresponding true
covariance matrix is a p-dimensional symmetric matrix with 1 as diagonal values and
0.4 as off-diagonal values; and the true mean vector µ is generated from N (0, 0.01).
For each graph reported in this section (Figs. 1–4), the variable i in the x-axis
ranges from 1 to 300. Each i corresponds to data sets with dimension pi and sample
size ni . Since p/n is a key value in studying the error patterns, in each graph of
Figs. 1–4, the relationship between pi and ni is
pi
= c,
ni

(2.6)

where c is a fixed constant. That is, in each graph, the ratio between the dimension and the sample size is fixed, with the dimension and the sample size growing
together. To be precise, in the four graphs of Fig. 1, ni s are set as 28+2i, 29+i, 29+i
and 29 + i, respectively, and correspondingly, pi s are 0.5(28 + 2i), 2(29 + i), 3(29 + i)
and 5(29 + i), which guarantees that pi /ni is a fixed value. In Figs. 2–4, ni s are all
set as 25 + 5i, and pi s are 0.2(25 + 5i), 0.4(25 + 5i), 0.6(25 + 5i), and 0.8(25 + 5i).
The theoretical optimal expected gain/loss P ∗ is regarded as the benchmark value.
1650003-5

Y. Liu et al.

Fig. 1. Difference between E(P̂1∗2 ) and P ∗2 . The mean effect is measured by assuming that the true
covariance matrix is known. In each graph, the dimension p and the sample size n grow together
with the same ratio i. The sample mean overestimates P ∗2 and the difference D fluctuates around
the value of p/n.

2.1. Mean effect
To study the impact of the sample mean vector on the optimal expected gain/loss,
assume that the true covariance matrix Σ is known. The impact of the sample mean
can be measured by the differences P̂ ∗2 − P̂2∗2 or P̂1∗2 − P ∗2 . The results of such
differences are summarized in the following theorem.
Theorem 2.1. Suppose that x = (x1 , x2 , . . . , xp )T , and x follows a p-dimensional
multivariate normal distribution with mean µ and covariance matrix Σ. µ̂ and S
are the sample mean and the sample covariance matrix, respectively. For a given
M, we have
E[P̂ ∗2 − P̂2∗2 | S = M ] =
1650003-6

1
tr[h(M −1 )Σ]
n

(2.7)

Shrinkage Estimation of Mean-Variance Portfolio

Fig. 2. Ratio between E(P̂2∗2 ) and P ∗2 . The covariance effect is measured by assuming that the
true mean is known. In each graph, the dimension p and the sample size n grow together with the
same ratio i. The overestimating ratio incurred by the sample covariance matrix is stable around
a value related to p/n.

and
E[P̂1∗2 − P ∗2 ] =

1
p−1
tr[h(Σ−1 )Σ] =
.
n
n

(2.8)

Equivalently, if Σ is known, P̂12 − (p − 1)/n is an unbiased estimator of P ∗2 .
Proof. For multivariate normal distribution, Var(µ̂) = E[µ̂µ̂T ] − µµT . Moreover,
µ̂ and S are independent. Therefore, we have
E[µ̂T h(S −1 )µ̂ | S] = E[tr(µ̂T h(S −1 )µ̂) | S]
= tr{E[µ̂µ̂T ]h(S −1 )}

= tr{[Var(µ̂) + µµT ]h(S −1 )}




Σ
1
+ µµT h(S −1 ) = tr[h(S −1 )Σ] + µT h(S −1 )µ.
= tr
n
n
(2.9)
1650003-7

Y. Liu et al.

Fig. 3. Plug-in optimal expected gain/loss P̂ ∗ versus benchmark value P ∗ . In each graph, the
dimension p and the sample size n grow together with the same ratio i. As p/n grows larger, P̂ ∗
deviates further from P ∗ .

According to (2.5), S is known to be the real covariance matrix Σ so that D =
(p − 1)/n. For detail, please refer to Anderson (2003). This gives (2.7). The proof
of (2.8) is similar.
Simulation results are presented below. Define
D = E(P̂1∗2 ) − P ∗2 .

(2.10)

In the four graphs of Fig. 1, the y-axis measures the difference D and the x-axis
denotes the variable i. For each i, we simulate 30 data sets with the same covariance
matrix and the same mean vector using a multivariate normal distribution. For each
data set, we have a value P̂1∗2 . Then E(P̂1∗2 ) is approximated by taking the sample
average of these 30 values of P̂1∗2 as the estimate of the mean, see Fig. 1. From
Fig. 1, it is observed that the sample mean overestimates P ∗2 and the difference
1650003-8

Shrinkage Estimation of Mean-Variance Portfolio

Fig. 4. Ratio between E(P̂ ∗2 ) − γ(p − 1)/n and γP ∗2 . In each graph, the dimension p and the
sample size n grow together with the same ratio i. After considering both effects of the sample
mean and the sample covariance matrix, an unbiased estimator is achieved.

D fluctuates around the value of p/n. It is possible to obtain an asymptotically
unbiased estimator of P 2 by correcting P̂1∗2 with the term p/n.
2.2. Covariance effect
In this section, µ is known and only the impact of the sample covariance matrix is
considered. Random matrix theory is employed to investigate the effect of S. The
details are referred to Marčenko & Pastur (1967), Bai & Silverstein (2010), Bai
(1999), Bai et al. (2007).
Suppose that {zjk , j = 1, . . . , n, k = 1, . . . , p} is a set of double arrays of i.i.d. real
random variables with mean 0 and variance σ 2 . The empirical spectral distribution
of the sample covariance matrix T is defined as:
p

FT (x) =

1
1(λi ≤ x),
p i=1
1650003-9

(2.11)

Y. Liu et al.

where λi is the ith smallest eigenvalue of T and

1 if λi ≤ x,
1(λi ≤ x) =
0 otherwise.

(2.12)

By Bai (1999), if p/n → y ∈ (0, ∞), then with probability one, FT (x) converges to
the Marčenko–Pastur (M–P) law Fy (x) almost surely, where the M–P law is defined
as follows:
Definition 2.1 (Marčenko–Pastur Law, Marčenko & Pastur (1967)). The
density function of the limit spectral distribution Fy (x) is given by:

1 


(b − x)(x − a), if a ≤ x ≤ b,
2
fy (x) = 2πxyσ
(2.13)


0,
otherwise.


It has a point mass 1 − 1/y at the origin if y > 1, where a = σ 2 (1 − y)2 , b =

σ 2 (1 + y)2 , p/n → y ∈ (0, ∞). If σ 2 = 1, then it is called the standard M–P law.
The bias of P̂2∗ in the estimation of P ∗ is explained in the following theorem.

Theorem 2.2. Consider the notation in Theorem 2.1. Let Σn be the covariance
matrix Σ of the first p(n) stocks. Suppose that λmin (Σn ), the smallest eigenvalue of
Σn is bounded below by a positive constant. Then, the ratio
k=

E(P̂2∗2 )
→ γ,
P ∗2

(2.14)

where
γ=



a

b

1
1
dFy (x) =
> 1.
x
1−y

(2.15)

Equivalently, if µ is known, Pˆ2∗2 /γ is an asymptotically unbiased estimator of P ∗2 .
Proof. Applying Lemma A.2 and Lemma 3.1, part (b) of Bai et al. (2009a), we see
that
µT S −1 µ a.s.

γ
µT Σ−1 µ

and

1T S −1 1 a.s.

γ.
1T Σ−1 1

(2.16)

Note that |µT S −1 µ/µT Σ−1 µ| is bounded above by 1/λ̂1 where λ̂1 is the smallest
eigenvalue of S. Let Sn be the sample covariance matrix S obtained from the first
n sample and the first p(n) stocks. It is obvious that that Sn is a principal submatrix of Sn+1 . Therefore, as n increases, λ̂1 is monotonic decreasing and converges
to a limit that is bounded below by aλmin (Σ). Then, dominating convergence theorem suggests that the almost sure convergence in (2.16) can be replaced by L1
1650003-10

Shrinkage Estimation of Mean-Variance Portfolio

convergence. The desired results follow from the fact that
µT h(S −1 )µ = µT S −1 µ − (1T S −1 1)−1 (µT S −1 µ)2

(2.17)

µT h(Σ−1 )µ = µT Σ−1 µ − (1T Σ−1 1)−1 (µT Σ−1 µ)2 .

(2.18)

and

Below, simulation studies are conducted to give an intuitive feeling of the results
in Theorem 2.2. In Fig. 2, the y-axis measures the ratio k and the x-axis denotes the
variable i. Again, for each i, 30 data sets are simulated using the same covariance
matrix and the mean vector. For each data set, we have a value of P̂2∗2 , then E(P̂2∗2 )
is approximated by taking the sample average of these 30 values of P̂2∗2 as the
estimate of the mean. It is seen that for a fixed µ, the overestimating ratio incurred
by S is stable around the value γ as p and n increase with i.
2.3. Joint effect
To consider the joint effect, we examine the sample mean and the sample covariance matrix simultaneously. The following theorem explains the bias of P̂ ∗ in the
estimation of P ∗ .
Theorem 2.3. Consider the notation in Theorem 2.1. Let Σn be the covariance
matrix Σ of the first p(n) stocks. Denote by λmin (Σn ) and λmax (Σn ) the smallest
and greatest eigenvalue of Σn respectively. Suppose that λmin (Σn ) is bounded below
by a positive constant and λmax (Σn ) is bounded above by another positive constant.
Assume that 1T Σ−1 1 = O(p). Then, the ratio
r=

E(P̂ ∗2 ) − γ(p − 1)/n
= 1.
γP ∗2

(2.19)

Equivalently, [P̂ ∗2 − γ(p − 1)/n]/γ is an asymptotically unbiased estimator of P ∗2 .
Proof. Let T = Σ−1/2 SΣ−1/2 . Denote the smallest and greatest eigenvalues of T
by λ̃1 and λ̃p , respectively. Using Theorem 2.1,
E(P̂ ∗2 ) = E(P̂2∗2 ) + E(P̂ ∗2 − P̂2∗2 )
= E(P̂2∗2 ) + n−1 Etr[h(S −1 )Σ]
=

E(P̂2∗2 )

= E1 + n

+n

−1

−1

E[tr(T

E2 + n

−1

−1

)] − E



T −1/2 −2 −1/2

T Σ
1
−1 1 Σ
n
T −1/2 −1 −1/2

1 Σ

T

Σ

1

E3 .

E1 /P 2 → γ according to Theorem 2.2.
The random matrix theory (Bai 1999) suggests that p−1 tr(T −1 ) converges
almost surely to γ. The quantity tr(T −1 ) is bounded above by p/λ̃1 . In order to
1650003-11

Y. Liu et al.

apply the dominating convergence theorem, a lower bound of λ̃1 is needed. To see
this, let Sn be the sample covariance matrix S obtained from the first n sample
and the first p(n) stocks. It is obvious that Sn is a principal sub-matrix of Sn+1 .
Therefore, as n increases, λ̂1 is monotonic decreasing and converges to a limit that
is bounded below by aλmin (Σ). Consequently, a lower bound of λ̃1 can be given by
aλmin (Σ)/λmax (Σ).
Next, we show that E3 is o(p). Note that
1T Σ−1/2 T −2 Σ−1/2 1
b
λ̃p
→ .

T −1/2 −1 −1/2
a
1 Σ
T Σ
1
λ̃1

(2.20)

It can be checked that λ̃1 ≥ aλmin (Σ)/λmax (Σ) and λ̃p ≤ bλmax (Σ)/λmin (Σ). Then,
dominating convergence theorem can be used.
The plug-in optimal expected gain/loss P̂ ∗ is plotted against the theoretical
value P ∗ . The solid line in Fig. 3 shows the behavior of P̂ ∗ and the dash line
denotes P ∗ .
It is observed that as p/n grows larger, P̂ ∗ deviates further from P ∗ . From
Theorem 2.3, consider the quantity
r=

E(P̂ ∗2 ) − γ(p − 1)/n
.
γP ∗2

(2.21)

Since the ratio between E(P̂ ∗2 ) − γ(p − 1)/n and γP ∗2 fluctuates around 1, the
value (E(P̂ ∗2 ) − γ(p − 1)/n)/γ is approximately the same as P ∗2 , see Fig. 4.
From these numerical studies, we see that both the sample mean and the sample
covariance matrix over predict the optimal expected gain/loss. If we fix the sample
covariance matrix as known, the effect of the sample mean on P ∗2 is additive; while if
the sample mean is fixed, the effect of the sample covariance matrix is multiplicative;
if we take the two variables simultaneously, the effect can be eliminated by two steps.
It can also be argued that the error patterns incurred by µ̂ and S together cannot
be ignored.

3. Shrinkage Estimator
In this section, we introduce a new estimator for the optimal expected gain/loss.
The joint effect of the sample mean vector and the sample covariance matrix are
taken into consideration together. The problem is proceeded in two steps. First,
the shrinkage method of Ledoit & Wolf (2003, 2004a,b) is employed to construct
a shrinkage covariance matrix, which leads to an estimate that is smaller than the
plug-in optimal expected gain/loss. The advantage of using the shrinkage covariance
matrix is proved by the knowledge of matrix inequalities. In the second step, the
impact of the sample mean vector is addressed.
1650003-12

Shrinkage Estimation of Mean-Variance Portfolio

3.1. Estimating the covariance matrix
The shrinkage covariance matrix is a linear combination of the sample covariance
matrix and a highly structured covariance matrix, which is estimated from the data.
The structured covariance matrix is also called the shrinkage target. The weight on
the shrinkage target, which is called shrinkage intensity, is chosen based on the
criterion of minimizing a risk function.
3.1.1. Shrinkage target
We use the largest eigenvalue of the sample covariance matrix to specify the shrinkage target, which is defined as:
F = λ1 I,

(3.1)

where λ1 denotes the largest eigenvalue of S. Then, the shrinkage covariance
matrix is
S † = αF + (1 − α)S,

(3.2)

where α is the shrinkage intensity. The shrinkage target chosen according to (3.1)
guarantees that the greatest eigenvalue of S † does not depend on the shrinkage
intensity α. This means that the amount of information contained in the first principal component obtained by S † is not affected by the shrinkage.
3.1.2. Computation of the shrinkage intensity
Define F = (fij ), S = (sij ) and Σ = (σij ). The shrinkage intensity is estimated by
minimizing the risk function, which is defined and decomposed as follows:
R(α) = E(L(α))
=

p
p 


E[αfij + (1 − α)sij − σij ]2

p
p 


[α2 E(fij − sij )2 + (1 − 2α)Var(sij ) + 2αCov(fij , sij )].

i=1 j=1

=

i=1 j=1

(3.3)

Here, we have used

E(fij − sij )(sij − σij ) = E(fij − σij )(sij − σij ) − E(sij − σij )2
= Cov(fij , sij ) + E(fij − σij ) · E(sij − σij ) + Var(sij )
= Cov(fij , sij ) + Var(sij ).

(3.4)

Differentiating R(α) with respect to α, we obtain the estimate of the shrinkage
intensity α as:
p p
p p
j=1 Var(sij ) −
i=1
j=1 Cov(fij , sij )
i=1



.
(3.5)
α =
p
p
2
i=1
j=1 E(fij − sij )
1650003-13

Y. Liu et al.

3.2. New estimator
From Sec. 2, it is known that E(P̂ ∗2 ) is overestimated due to using the sample
mean µ̂ and the sample covariance matrix S. To construct the new estimator, we
introduce Theorem 3.8 beneath, using conditional expectations. Similar to Sec. 2,
introduce the following notations:
P̂ †2 = f (µ̂, S † ) = µ̂T h((S † )−1 )µ̂

(3.6)

P̂2†2 = f (µ, S † ) = µT h((S ∗ )−1 ))µ.

(3.7)

and

The effect of µ on P̂ †2 is described in the following theorem.
Theorem 3.1. For a given M,
E[P̂ †2 − P̂2†2 | S = M ] =

1
E[tr h([αF + (1 − α)M ]−1 )Σ].
n

(3.8)

Proof. Using the independence of µ̂ and S, we have
E(P̂ †2 | S = M ) = E(µ̂T h(S †−1 )µ̂ | S = M )

= E[E(µ̂T h(S †−1 )µ̂ | S = M )]

=

1
E[tr h(S †−1 )Σ | S = M ] + E[µT h(S †−1 )µ | S = M ].
n

(3.9)

The following theorem further suggests that if Σ is known, asymptotically
conservative estimator of P ∗ can be constructed for both y ∈ (0, 1) and y ∈ (1, ∞)
cases. In certain situations, underestimated estimators are less dangerous than overestimated estimators. Note that P is always positive. If P ∗ is negative and the corresponding investment amount is ω ∗ , negating ω ∗ always gives positive expected
gain/loss. Therefore, such a negative P ∗ can never be optimal. If P ∗2 is overestimated, the computed minimal capital requirement may not be sufficient to protect
a financial institution against the risk.
Theorem 3.2. For any given p-dimensional vector u,
E(uT h(Σ−1 )u) ≥ E(uT h(Σ†−1 )u),

(3.10)

where Σ† = αF + (1 − α)Σ.
Proof. See Appendices A and B.
Inspired by Theorem 3.8, a new estimator is constructed below. Define K as
K=

1
tr[h(S †−1 )Σ)].
n
1650003-14

(3.11)

Shrinkage Estimation of Mean-Variance Portfolio

The new estimator of P is given by


 µ̂T h(S †−1 )µ̂ − K, if µ̂T h(S †−1 )µ̂ ≥ K,

P̂new
= 

 µ̂T h(S †−1 )µ̂,
otherwise.

(3.12)

When µ̂T h(S †−1 )µ̂ is smaller than K, we can only use the shrinkage covariance
matrix. Although the effect of the sample mean is not taken into account in this

is
case, the estimator is at least better than the plug-in estimate. Moreover, P̂new
well-defined even when the sample covariance matrix is singular. In practice, Σ is

can be biased. To mitigate the bias, α is chosen according
unknown. Therefore, P̂new
to the method proposed in the next section.
3.3. Algorithm
3.3.1. Shrinkage intensity α∗
According to the shrinkage target, F = λ1 I. Recall that the theoretical expression
of α∗ is
p p
p p
j=1 Cov(fij , sij )
i=1
j=1 Var(sij ) −
i=1

p p
α =
.
(3.13)
E(f

sij )2
ij
i=1
j=1
To estimate α∗ , the values Var(sij ), Cov(fij , sij ) and E(fij − sij )2 need to be estimated.
The bootstrap method (see, for example, Efron & Tibshirani (1993)) is chosen
to give numerical estimates of these three values. In this paper, p is large relative to
n, in which case the sample covariance matrix S is singular. Since the parametric
resampling needs the sample covariance matrix to be invertible, parametric resampling is inappropriate. Consequently, the nonparametric resampling method is used
to generate different data sets based on the observations.
Suppose that the number of resampling is N . Each time, resampling is taken
within each asset with replacement. For k ∈ {1, . . . , N }, the kth data set is generated
as follows:
 (k)
(k)
(k) 
x11 x12 . . . x1p

 (k)
(k)
(k) 
x
 21 x22 . . . x2p ,


... ...


(k)

xn1

(k)

xn2

(k)

. . . xnp

and the kth sample covariance matrix S (k) is
 (k)
(k) 
(k)
s11 s12 . . . s1p

 (k)
(k)
(k) 
s
 21 s22 . . . s2p .


... ...


(k)

sp1

(k)

sp2

(k)

. . . spp

1650003-15

Y. Liu et al.

The kth shrinkage target is
(k)

F (k) = λ1 I,

(3.14)

(k)

where λ1 is the largest eigenvalue of S (k) . Then, we get the estimates of the three
values as:
N

ˆ ij ) =
Var(s

1  (k)
(sij − s̄ij )2 ,
N −1

(3.15)

k=1
N

ˆ ij , sij ) =
Cov(f

1  (k)
(k)
(sij − s̄ij )(fij − f¯ij ),
N −1

(3.16)

k=1

Ê(fij − sij )2 =
where s̄ij =

(k)
k=1 sij /N

N

α̂∗ =

p

i=1

N
1  (k)
(k)
(fij − sij )2 ,
N

(3.17)

k=1


(k)

and f¯ij = N
k=1 fij /N . The estimate of α is

p

ˆ ij ) − p p Cov(f
ˆ ij , sij )
Var(s
i=1
j=1
.
p p
2
i=1
j=1 Ê(fij − sij )

j=1

(3.18)

To estimate K, for simplicity, we replace Σ by the sample covariance matrix S and
define K̂ as
K̂ =

1
tr[h(S †−1 )S)].
n

(3.19)

3.4. Bias of the new estimator
In this part, MSE of the new estimator is calculated using simulation studies and
compared with plug-in estimate. For simplicity, the true mean vector is generated
from N (0, 1), and the true covariance matrix is a p-dimensional identity matrix.
Four cases are considered, that is, p = 50, n = 200; p = 100, n = 200; p = 200, n = 200
and p = 400, n = 200. For each case, 100 data sets are generated. The results are
shown in Table 1.
4. Simulation Study
In this section, simulation studies are conducted to illustrate the new estimator for
the optimal expected gain/loss. It is compared with four different estimators: the
benchmark value, the plug-in estimate, the bootstrap corrected estimate Bai et al.
(2009a,b) and the shrinkage estimate Ledoit & Wolf (2003, 2004a,b). To construct
the data set, we use historical stock returns of the American stock market to specify
the true mean and the true covariance matrix, from which the empirical returns are
simulated using the multivariate normal distribution.
1650003-16

Shrinkage Estimation of Mean-Variance Portfolio
Table 1. MSE of the new estimator. The MSE of the new estimator is compared with the plug-in estimate considering four
combinations of different dimension and different sample size.
The new estimator not only works when p > n, but also outperforms the plug-in estimate.

p = 50, n = 200
p = 100, n = 200
p = 200, n = 200
p = 400, n = 200

Plug-in estimate

New estimator

1.8061
16.8158



0.0954
0.1595
1.8771
12.0717

4.1. Comparison
(1) Benchmark Value (rreal ). The benchmark value is the theoretical optimal
expected gain/loss computed using the true mean vector µ and the true covariance matrix Σ of the data set.
(2) Plug-in Estimate (rplug ). The plug-in estimate is computed using the sample
mean vector µ̂ and the sample covariance matrix S.
(3) Shrinkage Estimate (rshrink ). The shrinkage estimate is computed by plugging in
the sample mean µ̂ and the shrinkage covariance matrix S ∗ . Following Ledoit &
Wolf (2004a), the shrinkage target is chosen as the constant correlation model,
because it is easy to implement.
(4) Bootstrap Corrected Estimate (rbs ). In Bai et al. (2009b), an efficient estimator
using the theory of random matrices was developed to solve the over prediction
problem. A parametric bootstrap technique was employed in their study. The
procedure is as follows:
(a) A resample χ∗ = (X1∗ , . . . , Xn∗ ) is drawn from the p-dimensional multivariate normal distribution with mean vector µ̂ and covariance matrix S.
(b) The sample mean vector and the sample covariance matrix of the resample
data set is denoted by µ̂bs and Sbs , respectively. Then, by applying the
optimization procedure again, we obtain the bootstrapped plug-in estimate

.
of the optimal expected gain/loss rplug
(c) The bootstrapped corrected gain/loss estimate is given by:
1

).
rbs = rplug + √ (rplug − rplug
γ

(4.1)

4.2. Constructing the data set
In the simulation study, we first choose three groups of stocks from the American
stock market and download the historical stock prices of these three groups during the period from January 2, 2001 to December 31, 2010. The number of stocks
of the three groups are 30, 60 and 80, respectively. Then the sample mean vector
and the sample covariance matrix of these three groups are calculated and regarded
1650003-17

Y. Liu et al.

as the true parameters from which the empirical returns are generated using the
multivariate normal distribution.
Since the market changes significantly across time, we fix the number of observations n as 50 days and 100 days. Therefore, we have six combinations of p and n,
namely p = 30, n = 50; p = 30, n = 100; p = 60, n = 50; p = 60, n = 100; p = 80, n = 50
and p = 80, n = 100, respectively.

4.3. Simulation results
For each case, we simulate 100 data sets and for each data set i, we calculate the
i
i
i
i
i
, rplug
, rshrink
and rbs
. We use d1 , d2 , d3 and d4 to measure the
, rnew
values of rreal
Table 2. Average distance of the four estimates. The average distance between
the four estimates and the benchmark value is computed. Apparently, the new
estimator, correspondingly d1 , is the closest to the benchmark value.

p = 30, n
p = 30, n
p = 60, n
p = 60, n
p = 80, n
p = 80, n

= 50
= 100
= 50
= 100
= 50
= 100

d1

d2

d3

d4

0.2294
0.1564
0.2696
0.1652
0.2817
0.1402

1.1195
0.5513

1.0914

1.8457

0.6946
0.4582
0.9739
0.6682
1.1584
0.7956

0.4903
0.2711

0.4040

0.5714

Fig. 5. Model comparison p = 30, n = 50. The new estimator (rnew ) of the optimal expected
gain/loss is compared with the benchmark value (rreal ), the plug-in estimate (rplug ), the shrinkage
estimate (rshrink ) and the bootstrap corrected estimate (rbs ). Obviously, the new estimator is the
closest to the benchmark value among all the estimators.

1650003-18

Shrinkage Estimation of Mean-Variance Portfolio

Fig. 6. Model comparison p = 30, n = 100. The new estimator (rnew ) of the optimal expected
gain/loss is compared with the benchmark value (rreal ), the plug-in estimate (rplug ), the shrinkage
estimate (rshrink ) and the bootstrap corrected estimate (rbs ). Obviously, the new estimator is the
closest to the benchmark value among all the estimators.

Fig. 7. Model comparison p = 60, n = 50. The new estimator (rnew ) of the optimal expected
gain/loss is compared with the benchmark value (rreal ), the plug-in estimate (rplug ), the shrinkage
estimate (rshrink ) and the bootstrap corrected estimate (rbs ). Obviously, the new estimator is the
closest to the benchmark value among all the estimators.

1650003-19

Y. Liu et al.

average distance between the four estimates and the benchmark value. Define
100

d1 =

1  i
i
|r
− rreal
|,
100 i=1 new

100

d2 =

100

d3 =

1  i
i
|r
− rreal
|,
100 i=1 shrink

1  i
i
|r
− rreal
|,
100 i=1 plug
100

d4 =

1  i
i
|r − rreal
|.
100 i=1 bs

From Table 2, observe that rnew still has the minimum average distance among
all the estimators. When the sample covariance matrix is not singular, rbs is the
second best estimator, and rshrink becomes worse as p/n grows larger. Figures 5–7,
A.1, B.1 and B.2 also demonstrate the above conclusion. Moreover, rnew is much
more stable than the other estimators.
5. Conclusion
In this study, we proposed a new estimator for evaluating the optimal expected
gain/loss of a large dimensional portfolio.
In the MV portfolio optimization procedure, it is well known that the plug-in
optimal expected gain/loss is not a good estimator since using the sample mean
and the sample covariance matrix of the historical data incurs substantial errors.
Instead of constructing new estimators of the mean and the covariance matrix, this
paper incorporates the interaction effect of these two quantities and explores how
the sample mean and the sample covariance matrix behave based on the idea of
conditional expectation.
It is found that the effect of the sample mean is additive and the effect of the
sample covariance matrix is multiplicative. Both of them over-predict the optimal
expected gain/loss. In the financial market, the number of stocks can be very large
while the sample size is usually moderate. Therefore, p/n can be substantial and
in such a case the sample covariance matrix tends to be singular. This paper used
the shrinkage methods to construct a stable covariance matrix which was invertible
for both p < n and p ≥ n. Matrix inequalities were employed to prove that the
shrinkage covariance matrix led to an estimate of the optimal expected gain/loss
which was smaller than the plug-in estimate and closer to the benchmark value.
Simulation studies show that the new estimator has better performance than the
previous methods.
Acknowledgments
The authors would like to thank the Editor, an Associate Editor and an anonymous
referee for insightful comments and constructive suggestions, which lead to a substantially improved version of this paper. This research was supported in part by
grants from the FRF for the Central Universities of China No. 201413043 and the
NSFC No. 71201147 (Yan Liu), HKSAR-RGC-GRF: Nos. 400313 and 14300514 and
1650003-20

Shrinkage Estimation of Mean-Variance Portfolio

HKSAR-RGC-CRF No. CityU8/CRG/12G (N.H. Chan), and the National Research
Foundation of Korea (NRF) of the government of Korea (MSIP) No. 2011-0030810
(C.T. Ng).
Appendix A. Preliminaries
Definition A.1. The Loewner partial ordering on the set of positive semidefinite
matrices G is defined as follows: for A, B ∈ G, A ≤ B iff B − A ∈ G.
Theorem A.1. If A and B are positive definite Hermitian matrices, then A ≥
B ⇔ B −1 ≥ A−1 .
Theorem A.2. A and B are n × n Hermitian matrices. A > 0 and B > 0, then
A ≥ B ⇔ λmax (BA−1 ) ≤ 1.
Theorem A.3. If A ≥ B ≥ 0, then M (B) ⊂ M (A), where M (A) denotes the
column space of A.
Theorem A.4. If A and B are positive semidefinite Hermitian matrices, then
A ≥ B ⇔ M (B) ⊂ M (A), and A(A − B)A ≥ 0.
Theorem A.5. A and B are Hermitian matrices, and A ≥ 0, B ≥ 0. Then A ≥ B
if and only if M (B) ⊂ M (A), λmax (BA− ) ≤ 1, and λmax (BA− ) ≤ 1 is independent
of the choice of A− .

Fig. A.1. Model comparison p = 60, n = 100. The new estimator (rnew ) of the optimal expected
gain/loss is compared with the benchmark value (rreal ), the plug-in estimate (rplug ), the shrinkage
estimate (rshrink ) and the bootstrap corrected estimate (rbs ). Obviously, the new estimator is the
closest to the benchmark value among all the estimators.

1650003-21

Y. Liu et al.

Theorem A.6. If B = 0, then B ′ A− B is independent of the choice of A− if and
only if M (B) ⊂ M (A).
Appendix B. Proof of Theorem 3.2
Let Ψ be a positive definite matrix, λ1 be the greatest eigenvalue of Ψ, Ψ† =
αλ1 I + (1 − α)Ψ], A = Ψ−1 , and B = Ψ†−1 . To prove Theorem 3.2, we have to
prove the following theorems first.
Theorem B.1. A ≥ B.
Proof. By Theorem A.1, Ψ−1 ≥ Ψ†−1 ⇔ Ψ† ≥ Ψ. Factorize Ψ as QΛQT , where
QQT = I, Λ = diag(λ1 , λ2 , . . . , λp ).
Ψ∗ − Ψ = αλ1 I + (1 − α)Ψ − Ψ
= α(λ1 I − Ψ)
= αQ(λ1 I − Λ)QT ≥ 0.

(B.1)

Theorem B.2. M (h(B)) ⊂ M (h(A)).

Fig. B.1. Model comparison p = 80, n = 50. The new estimator (rnew ) of the optimal expected
gain/loss is compared with the benchmark value (rreal ), the plug-in estimate (rplug ), the shrinkage
estimate (rshrink ) and the bootstrap corrected estimate (rbs ). Obviously, the new estimator is the
closest to the benchmark value among all the estimators.

1650003-22

Shrinkage Estimation of Mean-Variance Portfolio

Fig. B.2. Model comparison p = 80, n = 100. The new estimator (rnew ) of the optimal expected
gain/loss is compared with the benchmark value (rreal ), the plug-in estimate (rplug ), the shrinkage
estimate (rshrink ) and the bootstrap corrected estimate (rbs ). Obviously, the new estimator is the
closest to the benchmark value among all the estimators.

Proof. If we can find a matrix X which makes h(B) = h(A)X hold, then we can
obtain the result. Let


B11T
−1
I− T
X=A
B,
(B.2)
1 B1
then,




B11T
A11T A
h(A)X = A − T
A−1 I − T
B
1 A1
1 B1



A11T
B11T
I− T
B
= I− T
1 A1
1 B1


B11T
= I− T
B = h(B),
(B.3)
1 B1
the proof is completed.
Theorem B.3. h(A) ≥ h(B).
Proof. According to Theorem A.4 and Theorem B.2, it is equivalent to proving
that when M (h(B)) ⊂ M (h(A)) holds,
h(A) ≥ h(B) ⇔ λ1 [h(B)h(A)− ] ≤ 1,

(B.4)

h(A)[h(A) − h(B)]h(A) ⇔ λ1 [h(B)h(A)− ] ≤ 1.

(B.5)

where h(A)− can be any general inverse of h(A). From Theorem A.4, the problem
converts to prove that when M (h(B)) ⊂ M (h(A)) holds,

1650003-23

Y. Liu et al.

Do full rank decomposition to h(A), so that h(A) = LL∗ . Then
h(A)[h(A) − h(B)]h(A) ⇔ L∗ (h(A) − h(B))L ≥ 0.

(B.6)

Multiply (L∗ L)−1 on both sides of L∗ (h(A) − h(B))L, and we get
L∗ (h(A) − h(B))L ⇔ I − T ∗ h(B)T ≥ 0
⇔ λ1 (T ∗ h(B)T ) ≤ 1
⇔ λ1 (h(B)T T ∗ ) ≤ 1
⇔ [λ1 [h(B)h(A)+ ] ≤ 1,

(B.7)

where T = L(L∗ L)−1 and T T ∗ = L(L∗ L)−2 L∗ = h(A)+ .
From Theorem A.6, when M (h(B)) ⊂ M (h(A)) holds,
λ1 [h(B)h(A)+ ] = λ1 [h(B)h(A)− ],

(B.8)

where h(A)− can be any general inverse of h(A). One of the general inverses of
h(A) is


11T A
I− T
A−1 .
1 A1

 


11T B
11T A
−1

λ1 [(h(B)h(A) )] = λ1 B I − T
I− T
A
1 B1
1 A1
 


11T B
−1
= λ1 B I − T
A
1 B1



11T B
= λ1 A−1 B I − T
1 B1


11T B
≤ λ1 (A−1 B)λmax I − T
1 B1


11T B
−1
= λ1 (BA )λmax I − T
.
1 B1
h(A)− =

(B.9)

(B.10)

T

B
is an idempotent matrix,
Since A ≥ B, by Theorem A.2, λ1 (BA−1 ) ≤ 1. I − 111T B1
therefore, the largest eigenvalue of it is 1. Thus we have

h(A) ≥ h(B).

(B.11)

Left and right multiply h(A) − h(B) by µT and µ, respectively. Positivedefiniteness of h(A) − h(B) suggested by Theorem B.3 yields Theorem 3.1.
1650003-24

Shrinkage Estimation of Mean-Variance Portfolio

References
T. W. Anderson (2003) An Introduction to Multivariate Statistical Analysis, third edition.
New York: Wiley.
Z. Bai (1999) Methodologies in spectral analysis of large dimensional random matrices, a
review, Statistica Sinica 16 (1), 611–677.
Z. Bai, H. Liu & W. K. Wong (2009a) Enhancement of the applicability of Markowitz’s
portfolio optimization by utilizing random matrix theory, Mathematical Finance
19 (4), 639–667.
Z. Bai, H. Liu & W. K. Wong (2009b) On the Markowitz mean-variance analysis of selffinancing portfolios, Risk and Decision Analysis 1, 35–42.
Z. Bai & J. W. Silverstein (2010) Spectral Analysis of Large Dimensional Random Matrices,
second edition. New York: Springer.
Z. Bai, B. Q. Miao & G. M. Pan (2007) On asymptotics of eigenvectors of large sample
covariance matrix, Annals of Probability 35 (4), 1532–1572.
S. J. Brown (1976) Optimal Portfolio Choice Under Uncertainty: A Bayesian Approach.
PhD Thesis, University of Chicago, 1976.
B. F. Efron & R. J. Tibshirani (1993) An Introduction to the Bootstrap. London: Chapman
and Hall.
G. M. Frankfurter, H. E. Phillips & J. P. Seagle (1971) The effects of uncertain means,
variances, and covariances, Journal of Financial and Quantitative Analysis 6 (5),
1251–1262.
P. A. Frost & J. E. Savarino (1986) An empirical bayes approach to efficient portfolio
selection, Journal of Financial and Quantitative Analysis 21 (3), 293–305.
W. James & C. M. Stein (1961) Estimation with quadratic loss, Proceedings of the Fourth
Berkeley Symposium on Mathematical Statistics and Probability 1, 361–380.
P. Jorion (1986) Bayes-stein estimation for portfolio analysis, Journal of Financial and
Quantitative Analysis 21 (3), 279–292.
B. Korkie & H. J. Turtle (2002) A mean-variance analysis of self-financing portfolios,
Management Science 48 (3), 427–443.
O. Ledoit & M. Wolf (2003) Improved estimation of the covariance matrix of stock returns
with an application to portfolio selection, Journal of Empirical Finance 10, 603–621.
O. Ledoit & M. Wolf (2004a) Honey, I shrunk the sample covariance matrix, Portfolio
Management 31 (4), 110–119.
O. Ledoit & M. Wolf (2004b) A well-conditioned estimator for large-dimensional covariance
matrices, Journal of Multivariate Analysis 88, 365–411.
V. A. Marčenko & L. A. Pastur (1967) Distribution of eigenvalues for some sets of random
matrices, Mathematics of the USSR-Sbornik 1 (4), 457–483.
H. Markowitz (1952) Portfolio selection, Journal of Finance 7 (1), 77–91.
C. M. Stein (1956) Inadmissibility of the usual estimator for the mean of a multivariate
normal distribution, Proceedings of the Third Berkeley Symposium on Mathematical
Statistics and Probability 1, 197–206.

1650003-25