07350015%2E2014%2E954708

Journal of Business & Economic Statistics

ISSN: 0735-0015 (Print) 1537-2707 (Online) Journal homepage: http://www.tandfonline.com/loi/ubes20

Sparse and Stable Portfolio Selection With
Parameter Uncertainty
Jiahan Li
To cite this article: Jiahan Li (2015) Sparse and Stable Portfolio Selection With
Parameter Uncertainty, Journal of Business & Economic Statistics, 33:3, 381-392, DOI:
10.1080/07350015.2014.954708
To link to this article: http://dx.doi.org/10.1080/07350015.2014.954708

Accepted author version posted online: 21
Aug 2014.

Submit your article to this journal

Article views: 248

View related articles


View Crossmark data

Citing articles: 1 View citing articles

Full Terms & Conditions of access and use can be found at
http://www.tandfonline.com/action/journalInformation?journalCode=ubes20
Download by: [Universitas Maritim Raja Ali Haji]

Date: 11 January 2016, At: 19:39

Sparse and Stable Portfolio Selection With
Parameter Uncertainty
Jiahan LI

Downloaded by [Universitas Maritim Raja Ali Haji] at 19:39 11 January 2016

Department of Applied and Computational Mathematics and Statistics, 156 Hurley Hall, University of Notre Dame,
Notre Dame, IN 46556 (jiahan.li@nd.edu)
A number of alternative mean-variance portfolio strategies have been recently proposed to improve the
empirical performance of the classic Markowitz mean-variance framework. Designed as remedies for

parameter uncertainty and estimation errors in portfolio selection problems, these alternative portfolio
strategies deliver substantially better out-of-sample performance. In this article, we first show how to
solve a general portfolio selection problem in a linear regression framework. Then we propose to reduce
the estimation risk of expected returns and the variance-covariance matrix of asset returns by imposing
additional constraints on the portfolio weights. With results from linear regression models, we show that
portfolio weights derived from new approaches enjoy two favorable properties: sparsity and stability.
Moreover, we present insights into these new approaches as well as their connections to alternative
strategies in literature. Four empirical studies show that the proposed strategies have better out-of-sample
performance and lower turnover than many other strategies, especially when the estimation risk is large.
KEY WORDS: Mean-variance analysis; Penalized least squares; Portfolio selection; Shrinkage
estimation.

1. INTRODUCTION
Expected returns and covariance matrix are two inputs of a
portfolio selection problem. If the true expected returns and
true covariance matrix are known to investors, mean-variance
analysis guarantees the optimal portfolio positions. However,
since these two parameters have to be estimated from historical
data, the out-of-sample performance of mean-variance analysis
is impacted by parameter uncertainty and estimation errors. In

portfolio management literature, it has long been recognized
that the sample mean and the sample covariance matrix are
suboptimal, and usually deliver extremely poor out-of-sample
performance (for a review, see Brandt 2009).
To reduce the undesired impact of sample estimates, shrinkage estimators of expected returns and covariance matrix were
proposed (Jorion 1986; Ledoit and Wolf 2003; 2004). Alternatively, methods have been proposed to directly address the
decision variable of a portfolio selection problem: the portfolio weights. These methods solve the classic problem with
sample mean and sample covariance matrix, but impose additional constraints on the portfolio weights. Jagannathan and
Ma (2003) studied the shortsale-constrained global minimumvariance portfolio. They found that such constraints actually improve the empirical performance of portfolios. DeMiguel et al.
(2009) and Brodie et al. (2009) proposed the global minimum
variance portfolio with the L1-norm constraint or the L2-norm
constraints on the portfolio weights. Behr, Guettler, and Miebs
(2013) imposed flexible upper and lower bounds on each portfolio weight, allowing their strategy to nest different benchmarks.
All the above portfolio strategies constrain the magnitude of
portfolio weights in different ways. These constraints generate
suboptimal solutions if true parameters are known, but may be
beneficial in the presence of parameter uncertainty and estimation risk. Since extreme portfolio weights are usually brought
by large estimation errors of unknown parameters rather than
true parameter values of the population, correcting the portfolio


weights directly amounts to reducing the impact of parameter uncertainty (Frost and Savarino 1986; Jagannathan and Ma
2003; Garlappi, Uppal, and Wang 2007).
In this article, we first show how a general portfolio selection
problem can be recast as a regression problem, so that estimating portfolio weights is equivalent to estimating regression
coefficients in a linear regression model. Britten-Jones (1999)
proposed the regression formulation of a tangency portfolio, and
Fan, Zhang, and Yu (2012) showed how Markowitz’s risk minimization problem can be formulated as a regression problem.
However, no regression formulation was proposed for a general
mean-variance portfolio selection problem. The first part of this
article will fill this gap, allowing the existing statistical theory of
regressions to provide valuable insights into portfolio selection
problems. For our problem in particular, this framework implies
that imposing additional constraints on the portfolio weights
is equivalent to imposing additional constraints on regression
coefficients.
We then study the weight-constrained mean-variance portfolios. Shrinking portfolio weights directly makes sense for three
reasons. First, Fan, Zhang, and Yu (2012) showed that the estimation risk is bounded by a quadratic function of the L1 norm
of portfolio weights, and thus constraining portfolio norms is
equivalent to constraining estimation risks. Second, estimation
errors may accumulate through arithmetic operations in calculating portfolio weights. By working on the portfolio weights

directly, their desired forms and properties could be achieved.
Moreover, the magnitude of portfolio weights is a proxy for the
transaction cost (e.g., Brodie et al. 2009).

381

© 2015 American Statistical Association
Journal of Business & Economic Statistics
July 2015, Vol. 33, No. 3
DOI: 10.1080/07350015.2014.954708
Color versions of one or more of the figures in the article can be
found online at www.tandfonline.com/r/jbes.

Downloaded by [Universitas Maritim Raja Ali Haji] at 19:39 11 January 2016

382

Journal of Business & Economic Statistics, July 2015

We further propose two favorable properties of portfolio

weights when the number of assets is large: sparsity and stability, and show how constraints could encourage these two
properties. With sparsity, portfolio weights could be zero for a
subset of assets. In fact the idea of sparsity has been implicitly
explored. If a portfolio selection procedure predetermines a subset of candidate assets, the resulting portfolio is sparse, since it
effectively assigns zero weights to the rest of the assets. Sparse
portfolio strategies are also those who test the hypothesis that
whether some small portfolio weights are not significantly different from zero. If the data suggest that some portfolio weights
are not statistically different from zero, setting them to zero reduces portfolio variance and ameliorate the impact of parameter
uncertainty (Britten-Jones, 1999; Garlappi, Uppal, and Wang
2007).
Other than sparsity, portfolio stability is another desirable
property, since in practice, extremely variable positions are usually observed from the classic mean-variance framework. Such
instability is largely due to the estimation errors in the estimated covariance matrix as well as the step of taking its inverse.
As a result, large positions lead to abysmal out-of-sample performance rather than efficient diversification. Given these two
favorable properties of portfolio rules, we will demonstrate how
the new approaches address these issues directly. We estimate
the sparse and stable portfolio weights in the framework of
penalized least squares. Intuitively if all excess returns are uncorrelated and have a common variance, the new approach is
equivalent to shifting and scaling the portfolio weights derived
from the sample estimates toward zero. In this way, small portfolio weights are set to zero, and extremely large positions are

regulated, resulting in sparse and stable portfolios.
Finally we examine the relation between our new approaches
and various robust portfolio selection approaches in literature,
including the portfolio using the empirical Bayes-Stein estimators (Jorion 1986), the ambiguity averse portfolio (Garlappi,
Uppal, and Wang 2007), and those based on the shrinkage estimators of covariance matrix (Ledoit and Wolf 2003; 2004).
Interestingly, although many other portfolio rules start with different assumptions and considerations, the proposed framework
nests them as special cases with different levels of sparsity or
stability. Empirical analyses suggest that the portfolio strategies
developed in this article have superior out-of-sample performance with relatively low turnover. The conclusion is stronger
if parameter uncertainty is larger.
The rest of this article is organized as follows. Section 2
presents the regression formulation of portfolio selection problems. Section 3 proposes four weight-constrained portfolios, and
shows how sparse and stable portfolio weights are achieved. In
Section 4, the connections between new strategies and other robust portfolio strategies are discussed. Section 5 presents methods for and results from empirical studies. In Section 6, we
provide concluding remarks.
2.

REGRESSION APPROACH FOR SHRINKAGE
ESTIMATIONS


Consider a standard portfolio choice problem with N risky assets. Suppose the excess returns at time t, Rt , follow a multivariate normal distribution with mean µ and variance-covariance

matrix , where µ is an N × 1 vector,  is an N × N matrix
and Rt is the asset returns in excess of risk-free rate. At time t
an investor determines the portfolio weights w to maximize the
mean-variance objective function:
γ
(1)
U (w) = w T µ − w T w,
2
where γ > 0 is the coefficient of relative risk aversion. The optimal portfolio weights are given by w = γ1  −1 µ. Proposition
1 shows that the portfolio choice problem can be formulated as
a linear regression problem.
Proposition 1. Consider the following multiple linear regression with N independent variables and N observations:
y = Xw + e,

(2)

where y is an N-dimensional dependent variable, X is an N × N
matrix of independent variables, w is an N-dimensional vector

of regression coefficients, and e is a vector of random errors.
Let
1

(3)
X = γ2
and
1
1
y = √  − 2 µ.
γ

(4)

Then the least squares estimator of w, ŵOLS = (XT X)−1 (XT y),
is the same as the optimal portfolio weights ŵ = γ1  −1 µ. In
other words, the least squares estimator solves the portfolio
selection problem (1).
Since both µ and  are unknown to investors, in practice
plug-in strategy is a popular one (Kan and Zhou 2007; Brandt

2009). This two-step strategy first replaces both unknown parameters in (1) or equivalently (3)–(4) by their estimates based
on historical data. Then optimization is carried out to find out the
optimal portfolio weights conditional on the estimated parameters. Among all estimators, the maximum likelihood estimators
ˆ are widely used, where µ̂ = . . . (Rt − µ̂)T , with T beµ̂ and 
ing the total number of observations. Then the optimal portfolio
maximizes
γ
ˆ
U (w) = wT µ̂ − w T w.
(5)
2
This plug-in strategy is intuitive and convenient, but fails
to take into account parameter uncertainty and estimation risk.
There are three sources of estimation errors: estimated mean µ̂,
ˆ and the inverse operator on the
estimated covariance matrix ,
estimated covariance matrix. In fact, when two return series are
ˆ could be amplified
highly correlated, the estimation error in 
dramatically by the inverse operator, resulting in highly volatile

ˆ −1 and ŵ. One advantage of linear regression formulation

is to test the estimator ŵ formally using statistical inference
procedures. For example, by solving for the tangency portfolio using a regression approach, Britten-Jones (1999) provided
estimates of the portfolio weights and the associated standard
errors, and tested the hypothesis that some weights are zero.
Similar approaches for constructing confidence intervals and
testing hypothesis can also be formulated under our linear regression approach for portfolio selections.

Li: Sparse and Stable Portfolio Selection With Parameter Uncertainty

3.

imply the following:

A GENERAL FRAMEWORK

In this section, we propose four portfolio strategies that solve
the standard problem subject to additional constraints on the
portfolio weights. Portfolio rules associated with these strategies
can be regarded as shrinkage estimators of portfolio weights.

Downloaded by [Universitas Maritim Raja Ali Haji] at 19:39 11 January 2016

3.1 Sparse Portfolio
When a subset of assets receives zero weights, the portfolio
rule is said to be sparse. Sparse portfolio weights have been implicitly explored in the finance literature. Britten-Jones (1999)
proposed t-test and F-test for testing whether the estimated
weight for an asset or a group of assets is statistically different from zero. The test provides direct guidance in constructing
efficient portfolios and assessing the estimation risk. Garlappi,
Uppal, and Wang (2007) provided a solution for a portfolio selection problem that explicitly considers parameter uncertainty.
They suggested (in their Proposition 3) that an investor should
set the portfolio weights to zero and invest only in risk-free asˆ −1 µ̂, is not
set if the estimated squared Sharpe ratio, θ̂ 2 = µ̂T 
statistically different from zero. This hypothesis upon which investment decisions are based is easy to test, since the sampling
N
FN,T −N
distribution of θ̂ 2 is a scaled F-distribution, or θ̂ 2 ∼ T −N
(Kan and Zhou 2007).
When the number of assets is large, sparse portfolio rule is desired. First of all, zero portfolio weights reduce transaction cost
as well as portfolio management cost. Second, since the number
of historical asset returns T is relatively small compared with
the number of assets N, estimation error is large. By setting a
subset of small portfolio weights to zero, the estimated portfolio
weights are no longer unbiased, but their variances and the mean
squared prediction errors could be reduced. This is also known
as bias-variance tradeoff.
Sparse portfolio weights can be obtained by constraining the
L1 norm of portfolio weights. Interestingly, this L1 norm is also
related to the upper bound of the estimation risk. Fan, Zhang, and
Yu (2012) showed that for the general portfolio choice problem
(1),
ˆ − U (w; µ, )| ≤ ||µ̂ − µ||∞ ||w||1
|U (w; µ̂, )
γ ˆ
+ ||
− ||∞ ||w||21 , (6)
2
ˆ −
where U (w; θ1 , θ2 ) = w T θ1 − γ2 w T θ2 w, ||µ̂ − µ||∞ and ||
the maximum component-wise estimation error, and
||∞ are
||w||1 = N
j =1 |wj | is the L1 norm of vector w. Therefore, as
long as ||w||1 is bounded, the estimation error is controlled by
the largest component-wise error.
To this end, we propose to estimate the portfolio weights by


γ
ˆ
Rule I: ŵL1 = argmax w T µ̂ − w T w
,
2
w
subject to ||w||1 < s1 ,

383

(7)

ˆ and µ̂ are maximum likelihood estimates of  and µ
where 
√ ˆ1
2
respectively, and s1 > 0 is a constant. If one lets X = γ 
1
1

ˆ
2
and y = √γ  µ̂, the linear regression framework for portfolio selection together with the method of Lagrange multipliers



ŵL1 = argmin ||y − Xw||22 + λ1 ||w||1 ,

(8)

w

where λ1 > 0 is a constant, and ||.||22 is the squared L2 norm of
a vector. This objective function extends ordinary least squares
(OLS), and is known as penalized least squares.
When λ1 = 0, the sizes of |wj |, j = 1, . . . , N are not
penalized. When λ1 > 0, the portfolio weights ŵL1 =
(ŵL1,1 , . . . , ŵL1,N )T are shrunk toward zero in a way that, if
the OLS estimate ŵj is small enough in absolute value, the penalized least squares estimate ŵL1,j is exactly zero. Otherwise,
ŵL1,j is ŵj minus some constant when ŵj > 0, or ŵj plus some
constant if it is negative (Tibshirani, 1996). In other words, with
the L1 norm constraint we may have sparse portfolio weights.
Such a shrinkage scheme implies that assets receiving small
positions may be excluded from the portfolio.
L1 norm penalized regressions have been popular in statistics (e.g., Tibshirani 1996) and econometrics (e.g., Bai and Ng
2008; De Mol, Giannone, and Reichlin 2008). By recovering the
reduced structure in a lower dimensional space, this technique
is capable of producing a better estimator of regression coefficient. Although the estimator is biased, it can have substantially
lower variance than OLS, leading to a lower mean squared error
(MSE). Such reduction of MSE could balance the in-sample
performance and the out-of-sample performance of a statistical
model.
Note that there is another way of constructing sparse portfolio weights, namely hard-thresholding strategy. Using the linear
regression formulation without constraints outlined in Section
2, each portfolio weight can be tested with the null hypothesis
being that the weight is zero (Britten-Jones 1999). Then those
assets whose weights are not statistically different from zero are
excluded from the portfolio. This strategy, however, tends to be
associated with unstable portfolios. To gain more insights, we
assume asset returns are uncorrelated, and compare the results
of hard-thresholding strategy to those of L1 norm constrained
portfolio. For any given asset j, Figure 1 plots the classic portfoˆ −1 µ̂, versus
lio weight ŵj , which is the jth element of ŵ = γ1 
the portfolio rule derived from the hard-thresholding strategy
(Figure 1(a)) or the rule derived from the L1 norm-constrained
strategy (Figure 1(b)). Clearly, both strategies are modifications
of ŵj . However, since estimator ŵj is a random variable that
changes slightly if new observations are available in estimating
µ and , the change may result in jumps of portfolio weights
from the hard-thresholding strategy. On the other hand, portfolio weights from imposing L1 norm constraint will change
continuously, and thus are robust against estimation errors.
3.2

Stable Portfolio

The stability of portfolio weights is another desirable property
that has been explicitly explored in literature (Ledoit and Wolf
2003; 2004; Brodie et al. 2009). It has been documented that in
mean-variance efficient portfolios, extreme positions are usually
observed, and portfolio weights may change dramatically when
new return information is used, as well as when a set of assets
becomes unavailable for trading (Garlappi, Uppal, and Wang
2007; Brandt 2009). These stylized facts are mainly due to large

384

Journal of Business & Economic Statistics, July 2015

ˆ and the inverse operation on 
ˆ
estimation errors contained in 
(Ledoit and Wolf 2003). In fact when returns of asset i and asset j
ˆ −1 is highly volatile with extreme entries
are highly correlated, 
(i,i), (i,j), (j,i), and (j,j), resulting in extreme positions in asset i
and asset j that swing dramatically over time. Hence imposing
stability constraints is expected to reduce the estimation risk due
to parameter uncertainty and multicollinearity.
Motivated by this observation, improved estimators of  were
proposed. Ledoit and Wolf (2003, 2004) proposed the general
form of the shrinkage estimator of :
ˆ + (1 − v)
ˆ g,
ˆ s = v


posing a constraint on a quadratic form of the portfolio weight,
and how its solution is connected with the classic solution ŵ.
Proposition 2. The optimal mean-variance portfolio selection problem with a shrinkage estimator of covariance matrix
ˆ + (1 − v)
ˆg
ˆ s = v



γ
ˆ sw
ŵ = argmax w T µ̂ − w T 
(9)
2
w
is equivalent to the problem


γ
ˆ
ˆ g w < s,
ŵ = argmax w T µ̂0 − w T w
subject to w T 
2
w
(10)
where µ̂0 = µ̂v , and s is a positive constant inversely proportional
> 0. The solution is given by
to λ = γ (1−v)
2v


I −A
ŵ,
(11)
ŵ =
v
ˆ −1 L(γ C −1 + LT 
ˆ −1 L)−1 LT , L is a lowerwhere matrix A = 
triangular matrix, and C is a diagonal matrix satisfying the

1

10

10

20

20

30

30

40

40

50

ˆ g is a shrinkage target with lower variances, and v is the
where 
shrinkage intensity. Ledoit and Wolf (2003, 2004) further derived closed-form solutions for v, and demonstrated the superior
out-of-sample performance of this strategy. We show in the next
ˆ with 
ˆ s in (5) is equivalent to improposition that replacing 

1

Downloaded by [Universitas Maritim Raja Ali Haji] at 19:39 11 January 2016

Figure 1. Alternative portfolio rules as a function of the portfolio weight derived from the sample mean and the sample covariance matrix.
(a) Hard-thresholding portfolio rule. (b) The L1 norm constrained portfolio rule. (c) The portfolio rule that scales the classic portfolio weights
by a constant smaller than zero (Ledoit and Wolf 2004; Garlappi, Uppal, and Wang 2007; Kan and Zhou 2007). (d) The portfolio rule with both
L1 norm constraint and L2 norm constraint. All excess returns are assumed to be uncorrelated.

1987

1990

1994

1998

(a)

2002

2006

2010

2014

1986

1990

1994

1998

2002

2006

2010

2014

(b)

Figure 2. Cumulative wealth of “Rule IV” (red) relative to the benchmark (blue) with initial wealth of $1. (a) Investing in G10 country bonds,
(b) investing in 500 individual stocks.

Li: Sparse and Stable Portfolio Selection With Parameter Uncertainty

ˆg =
Cholesky decomposition of a positive definite matrix 2λ
LCLT .
Therefore, shrinking covariance matrix amounts to taking a
linear transformation of the portfolio weight, where the transformation depends on both shrinkage intensity v and the shrinkage
ˆ g . When the target covariance matrix is an identity matarget 
trix (Ledoit and Wolf 2004), it is straightforward to verify that as
the covariance matrix is shrunk toward the target, A → I , and
the weight is shrunk toward zero. Moreover, when the expected
returns are uncorrelated with a lower variance 0 < δ 2 < 1, this
strategy is equivalent to scaling the sample portfolio weights ŵ
1
by a constant v+(1+v)δ
2 (see Figure 1(c)).

Downloaded by [Universitas Maritim Raja Ali Haji] at 19:39 11 January 2016

3.3 Sparse and Stable Portfolio
The above discussion motivates how to achieve both sparsity
and stability in a portfolio choice problem. First we could extend the L1 norm constrained portfolio (7) by incorporating a
shrinkage estimator of  and propose




γ
MKT
ˆ + (1 − v)
ˆ MKT w ,
= argmax w T µ̂ − w T v 
ŵL1
2
w
subject to ||w||1 < s1 , (12)
or




γ
CC
ˆ + (1 − v)
ˆ CC w ,
ŵL1
= argmax w T µ̂ − w T v 
2
w
subject to ||w||1 < s1 ,

(13)

ˆ MKT is the covariance matrix implied by the singlewhere 
ˆ CC is a constantfactor model (Ledoit and Wolf 2003), 
correlation covariance matrix (Ledoit and Wolf 2004), v is the
shrinkage intensity, and s1 ≥ 0 is a constant. Proposition 2 implies that


γ
MKT
ˆ
Rule II: ŵL1
= argmax w T µ̂0 − w T w
2
w
ˆ MKT w < s2 ,
subject to ||w||1 < s1 , and w T 

(14)

and


γ
CC
ˆ
Rule III: ŵL1
= argmax w T µ̂0 − w T w
2
w
ˆ CC w < s2 .
subject to ||w||1 < s1 , and w T 

(15)

Other than methods in portfolio management literature, statistical methods are also available to stabilize the portfolio weights.
ˆ is near
It is well-known that if the sample covariance matrix 
singular matrix or ill-conditioned, inverting it is numerically unstable. Adding a constant along its diagonal fixes the problem,
producing a stable estimate of its inverse. This is a standard
technique to handle multicollinearity problem in regressions,
and is known as ridge regression. Since portfolio weights are
linear functions of estimated inverse covariance matrix, this procedure stabilizes the portfolio weights. Specifically, we have


γ
ˆ + δI )w ,
ŵL1L2 = argmax w T µ̂ − w T (
2
w
subject to ||w||1 < s1 ,

(16)

385

where δ > 0 is a constant and I is an identity matrix. It is
straightforward to show that the problem is equivalent to


γ
ˆ
,
Rule IV: ŵL1L2 = argmax w T µ̂ − w T w
2
w
subject to ||w||1 < s1 , and ||w||22 < s2 ,

(17)

for some constant s2 ≥ 0, with ||w||22 = w T w being the squared
L2 norm of a vector.
When s1 = s2 = ∞, the portfolio weights derived from (14),
(15), and (17) are unbounded, and the problem reduces to a
standard one (5). When s2 = ∞, the problem is the L1 norm
constrained one. When both s1 and s2 are positive, the portfolio weights are shrunk toward zero in two different ways,
promoting both sparsity and stability. In statistics literature, a
combination of L1 norm constraint and L2 norm constraint is
known as “elastic-net” (Zou and Hastie 2005). In a regression
problem if the ith column and the jth column of X are highly correlated and both independent variables are important, regression
with only L1 norm constraint tends to assign a large estimate
to one of β̂i and β̂j randomly, and set the other to zero (Efron
et al. 2004; Zou and Hastie 2005). But with an additional L2
norm constraint, regression with both constraints tends to produce similar estimates of β̂i and β̂j while maintaining sparsity.
Hence Zou and Hastie (2005) described the combination of L1
constraint and L2 constraint as “a stretchable fishing net that
retains all the big fish.”
Note that all objective functions of the proposed portfolio
strategies (7), (14), (15), and (17) can be written as a penalized
least squares


(18)
ŵ = argmin ||y − Xw||22 + λ1 ||w||1 + λ2 ||w||22 ,
w

CC
MKT
where ŵ represents ŵL1 , ŵL1
, ŵL1
or ŵL1L2 , and four approaches differ in terms of how X and y are defined, and whether
λ2 = 0. We determine tuning parameters λ1 and λ2 by crossvalidation (see also DeMiguel et al. 2009) and estimate the
penalized least squares with the coordinate descent algorithm.
In Figure 1(d), we visualize the effect of simultaneously imposing two constraints when the asset returns are uncorrelated.
It can be seen that its consequence is a mixture of consequences
of imposing L1 norm constraint and L2 norm constraint separately. That is, ŵ is scaled and then shifted toward zero. The
scaling is a consequence of the stabilized problem, while those
assets whose weights are small after stabilizing the problem are
excluded from the portfolio.
When the portfolios are monthly rebalanced and the number
of assets is large, say 500, at least 42 years’ monthly returns are
ˆ is invertible
required, so that the sample covariance matrix 
and X and y in (7), (14), (15), and (17) are well-defined. Three
approaches can address this practical issue. First, longer time
series of returns could be used to estimate . But other than the
data availability issue for most of the securities, such strategy imposes an unrealistic assumption that the true covariance matrix
is time-invariant. The second approach is to estimate this lowfrequency covariance matrix by high-frequency returns. This
strategy gains popularity in recent years due to the availability of high-frequency financial data (see, e.g., Andersen et al.
2001). However, high-frequency returns from nontraditional asset classes, such as mutual funds, hedge funds, real assets and

386

Journal of Business & Economic Statistics, July 2015

private equity, are generally unavailable. The third approach is
to estimate the N-dimensional covariance matrix directly using T < N observations by novel statistical techniques. Fan,
Liao, and Mincheva (2013) proposed the principal orthogonal
complement thresholding (POET) method for calculating highdimensional covariance matrices. This method assumes common factors underlying returns, and estimates their covariance
matrix by applying thresholding to the part of the sample covariance matrix that cannot be explained by the factor structure.
As a result, the POET estimator is optimization-free and asymptotically invertible. In what follows, we choose to use the POET
ˆ POET when N > T .
estimator 

Downloaded by [Universitas Maritim Raja Ali Haji] at 19:39 11 January 2016

4.

CONNECTIONS TO EXISTING PORTFOLIO
STRATEGIES

The standard solution to the portfolio selection problem ŵ =
is a plug-in approach where µ and  are estimated
by the maximum likelihood estimators (MLEs). Since MLEs
are invariant under reparameterization, ŵ is also a maximum
likelihood estimator. Kan and Zhou (2007) also considered two
¯ = T 
ˆ and
other plug-in approaches, which estimate  by 
T −1
T
˜ =
ˆ leading to the unbiased estimators of  and w,

,
T −N−2
respectively.
Other than three simple plug-in rules, approaches incorporating parameter uncertainty could take similar forms as plug-in
approaches. For example, by assuming standard diffusion priors on µ and  (Klein and Bawa 1976; Stambaugh 1997), the
Bayesian approach is essentially a plug-in rule that estimates 
ˆ Bayes = T +1 .
ˆ Moreover, Kan and Zhou (2007) derived
by 
T −N−2
the optimal two-fund rule that maximizes the expected out-ofˆ two−fund = /c
ˆ 2,
sample performance by estimating  using 
where c2 ∈ (0, 1) is a constant.
ˆ by a constant c greater than
Since all these methods scale 
one, the resulting solution ŵplug−in is a scaled version of ŵ =
1 ˆ −1
 µ̂, or
γ
1
= ŵ.
c

(19)

This relates ŵL1L2 in rule IV to these plug-in approaches. However, our approach is more general, in the sense that it determines
the scaling factor adaptively from the data and further shifts the
scaled portfolio weights to encourage sparsity.

4.2 Connection to the Shrinkage Estimator of µ
In formulating our portfolio rules, we have shown the effect
ˆ with an shrinkage estimator. Jorion (1986, 1991)
of replacing 
also proposed a shrinkage estimator of µ. It shrinks the vector
of sample mean µ̂ toward a common target µ̂g , resulting in a
Bayes–Stein estimator
with µ̂g =



γ
ˆ
ŵµ = argmax w T µ̂BS − w T w
2
w

ŵµ =

1 ˆ −1
 µ̂
γ

µ̂BS = (1 − v)µ̂ + v µ̂g ,

Proposition 3. The optimal portfolio choice problem with a
shrinkage estimator of mean µ̂BS = v µ̂ + (1 − v)µ̂target

ˆ −1 µ̂
1TN 
T ˆ −1
1N  1N

,

(20)

has a solution

4.1 Connection to Plug-in Approaches

ŵplug−in

where v ∈ (0, 1) and the shrinkage target µ̂g is the average excess return over the risk-free rate of the sample global minimumvariance portfolio.1
Shrinking the sample mean µ̂ toward a target appropriately
could reduce the expected quadratic loss of the estimator (Stein
1956; Berger 1985). The following proposition provides the
solution to a portfolio selection problem where µ is estimated
by µ̂BS , and relates this method to our approach.

1
λ
ŵ +
ŵtarget ,
1+λ
1+λ

ˆ −1 µ̂target is the portfolio
where λ = 1−v
> 0, and ŵtarget = γ1 
v
weight once µ is estimated by the shrinkage target µ̂target .
In particular, when the shrinkage target µ̂target is the average
excess return on the sample global
portfo minimum-variance
N

=

.
Under
lio 1N µ̂g (Jorion 1986, 1991), N
µ,j
j
j =1
j =1
−1
−1
ˆ
ˆ
the mild condition || (1N µ̂g )||1 < || µ̂||1 , the portfolio

N
weight is shrunk toward zero, or N
j =1 |ŵj |.
j =1 |ŵµ,j | <
Therefore, incorporating the shrinkage estimator of µ is to
find a linear combination of sample portfolio weight ŵ and the
target portfolio weight ŵtarget . If the shrinkage target µ̂g equals
zero, the portfolio weight ŵµ is exactly a scaled version of
ŵ, resembling the effect of imposing the L2 norm constraint.
Otherwise, the portfolio weight is scaled and shifted, as how
the portfolio weights behave once both norm constraints are
imposed.
4.3 Connection to Multiprior Approach
Garlappi, Uppal, and Wang (2007) considered the portfolio
selection problem that explicitly takes into account parameter
uncertainty (or ambiguity) and investors’ aversion to such ambiguity. They imposed an additional constraint on (5) to reflect the
ambiguity aversion (AA), so that the probability of true parameters being in respective confidence intervals is large. In other
words,


T (T − N)
T −1
(µ̂ − µ)  (µ̂ − µ) ≤ ǫ = 1 − p, (21)
P
(T − 1)N
where  is assumed to be known, ǫ is a scalar proportional to the
level of ambiguity and ambiguity aversion, and p is a significant
level. They showed that when constraint (21) is imposed, the

1Jorion’s approach was motivated from a Bayesian perspective. By assigning an

informative conjugate prior for asset returns, empirical Bayes approach suggests
that the mean of the predictive density function of asset returns takes the form
of µ̂BS = (1 − v)µ̂ + v µ̂g , where µ̂g “happens to be the average return for the
minimum variance portfolio” (see p. 285 of Jorion 1986).

Li: Sparse and Stable Portfolio Selection With Parameter Uncertainty

solution is
⎧
1/2 

⎨ 1−
ǫ

µ̂T  −1 µ̂
ŵ AA =
⎩0

if

1


if

1






µ̂T  −1 µ̂ >
T −1

µ̂  µ̂ ≤

ǫ


,

Downloaded by [Universitas Maritim Raja Ali Haji] at 19:39 11 January 2016

tions. For this reason, we report the average number of active
constituents (assets with nonzero portfolio weights) over time
for each portfolio. Finally, we report the portfolio turnover

ǫ


.
(22)
This portfolio rule has an interesting interpretation. Note that
1
(µ̂T  −1 µ̂) is the resulting expected utility once the classic

portfolio rule ŵ = γ1  −1 µ̂ is implemented. Therefore, ambiguity aversion portfolio sets portfolio weights to zero if the
expected utility from investing in risky assets is small. Otherwise, it scales ŵ by a constant smaller than zero. This seems to
be a binary decision of whether encourage sparsity or stability.
5.

387

EMPIRICAL STUDIES

5.1 Data and Models
In this section, we evaluate the out-of-sample performance
of alternative portfolio strategies using four empirical datasets,
including two datasets of factor-mimicking portfolios, one with
foreign exchange rates from ten developed countries and one
with individual stocks from the U.S. equity market. These
datasets contain monthly returns from 10 assets to as many as
500 assets. We consider a mean-variance investor with a relative
risk aversion of γ = 3. Monthly rebalanced portfolios are constructed, since in practice they are associated with reasonably
low transaction costs and management fees. Also, institutional
investors usually have to take several days to slice and place
their large blocks of orders in hope of minimizing the price
impact of their trades (e.g., Bertsimas and Lo 1998).
Four portfolio strategies proposed in this study are compared
against the following portfolio rules in the literature: (1) the
classic portfolio rule based on sample mean and sample covariance matrix, (2) the global minimum variance portfolio, (3) the
ambiguity averse portfolio (Garlappi, Uppal, and Wang 2007),
(4) a portfolio based on the empirical Bayes-Stein estimator
(Jorion 1986), (5) the portfolio based on a shrinkage estimator of the covariance matrix, where the shrinkage target is a
covariance matrix implied by the single-factor model (Ledoit
and Wolf 2003), (6) the portfolio based on a shrinkage estimator of the covariance matrix, with the shrinkage target being
a constant-correlation model (Ledoit and Wolf 2004), and (7)
the 1/N portfolio rule (DeMiguel, Garlappi, and Uppal 2009).
We employ a 10-year rolling estimation window for parameter estimations and portfolio decisions starting from the end of
the first estimation window. Portfolio weights from all strategies
and realized returns are collected monthly throughout the whole
out-of-sample period for evaluations.
5.2 Performance Measures
For each portfolio strategy applied to each dataset, we report
the annualized out-of-sample Sharpe ratio (SR) and the variance
reduction defined as the ratio of portfolio variance from an
alternative strategy to that from (5).
In the presence of L1 norm constraint, portfolio weights may
have many zeros, leading to a very concentrated portfolio. Such
portfolio is less favorable for risk management and diversifica-

Turnover =

T̃ −1 N




1

|ŵj,t+1 − ŵj,t+1
| ,
T̃ − T − 1 t=T j =1

(23)

where N is the number of assets, ŵj,t+1 is the portfolio weight

is the portfolio weight
for asset j at time t + 1, and ŵj,t+1
right before rebalancing at time t + 1. Therefore, this turnover
represents the average monthly trading volume. This measure
has been widely used to evaluate portfolios in the literature. See,
for example, DeMiguel, Garlappi, and Uppal (2009), DeMiguel
et al. (2009), and Kourtis, Dotsis, and Markellos (2012). (Note
that the 1/N portfolio could be associated with nonzero turnover.
This is because although ŵj,t = 1/N for all t, the portfolio

weight ŵj,t+1
before rebalancing at time t + 1 could deviate
from the 1/N as long as assets have different price changes.
Therefore, rebalancing is expected.)
To assess the statistical significance of economic gains, we
use the Ledoit and Wolf (2008) two-sided test of whether the
Sharpe ratio of a portfolio rule is different from that of a benchmark. For factor-mimicking portfolios, all portfolio rules are
compared against the 1/N benchmark (see, e.g., Li, Tsiakas, and
Wang 2014). Carry trade strategy and the S&P 500 index are
benchmarks for the currency market example and the equity
market example, respectively.
In practice, transaction cost has impact on the profitability
of a trading strategy. We assess this impact by computing the
realized returns net of transaction costs (see, e.g., Della Corte,
Sarno, and Tsiakas 2009). At time t + 1, the net realized return
for a portfolio ŵt is
net
rp,t+1
= rp,t+1 − τt+1 = ŵtT Rt+1 + rf,t+1 − τt+1 ,

(24)

where rp,t+1 is the portfolio return without transaction cost, τt+1
is the portfolio’s total transaction cost, and rf,t+1 is the risk-free
interest rate. Then the portfolio’s excess return net of transaction
cost is
net
Rp,t+1
= ŵtT Rt+1 − τt+1 .

(25)

The total transaction cost τt+1 is the sum of individual asset’s
transaction cost incurred by rebalancing at time t + 1:
τt+1 =

N

j =1





,
τj,t+1 ŵj,t+1 − ŵj,t+1

(26)


where τj,t+1 is the transaction cost for asset j, and ŵj,t+1
=
ŵj,t (1 + rj,t+1 )/(1 + rp,t+1 ) is the portfolio weight for asset j
right before rebalancing at time t + 1. The transaction cost τj,t+1
is a function of bid-ask spread. In particular, if at time t + 1 we
1+ct
sell the asset that was bought at time t, τj,t+1 = log( 1−c
),
t+1
Pa

−P b

j,t+1
a
b
with ct = j,t+1
, Pj,t+1
being the ask price, Pj,t+1
being
2Pj,t+1
the bid price, and Pj,t+1 being the mid-price. In the litera-

Pa

−P b

j,t+1
ture, 2ct = j,t+1
is known as proportional bid-ask spread.
Pj,t+1
If otherwise we buy the asset at time t + 1, the associated

388

Journal of Business & Economic Statistics, July 2015

1−ct
).2 This analysis imtransaction cost is τj,t+1 = log( 1+c
t+1

plies that τj,t+1 =

1+ct
)
log( 1−c
t+1

if ŵj,t+1 >


ŵj,t+1
,

Downloaded by [Universitas Maritim Raja Ali Haji] at 19:39 11 January 2016

1−ct
log( 1+c
)
t+1

and τj,t+1 =

otherwise. Therefore given one-way proportional
bid-ask spread ct , realized returns net of transaction costs can
be calculated.
Jones (2002) found that the average proportional bid-ask
spread on large U.S. stocks has declined substantially over the
past century from about 0.8% in the 1900s to about 0.2% in
2000. Neely, Weller, and Ulrich (2009) also documented that,
on average, the proportional transaction cost in the foreign exchange market has declined from 0.1% in the 1970s to 0.02%
in recent years. Following Neely, Weller, and Ulrich (2009),
among others, we estimate a simple time trend of the proportional bid-ask spread over the out-of-sample evaluation period.
Based on these estimates, we calculate the realized returns net of
transaction costs. Specifically, the proportional bid-ask spreads
are set to 0.6% and 0.1% for trading stocks and currencies, respectively, at the beginning of the sample, and are set to 0.2%
and 0.02% by the end of the sample. We denote the Sharpe ratio
net of transaction cost as SRτ .
5.3 Results
Panel A and Panel B of Table 1 show portfolio performance
for the first two empirical datasets: (1) Fama-French 25 size and
momentum portfolios, and (2) Fama-French 100 size and bookto-market portfolios, where the out-of-sample period is from
January 1986 to December 2013. When the number of assets
is 25, existing portfolio strategies have similar performance to
ours. But once the number of risky assets increases to 100, four
new strategies are much better than most of the alternatives. In
terms of the portfolio variances, new portfolio rules have lower
variances than alternative strategies except the global minimum
variance portfolio and the 1/N portfolio. This is because portfolio variance is not the direct target of these norm-constrained
strategies. As N increases, new strategies achieve greater variance reductions than many other strategies.
In terms of turnover, the 1/N benchmark has the lowest
turnover by construction. Among all actively managed portfolios, the global minimum variance portfolio has the lowest
turnover when the number of assets is small (Panel A). But
when N = 100 (Panel B), four norm-constrained portfolios have
much smaller turnovers. Moreover, turnovers of traditional portfolios increase dramatically as more assets are considered, but
this is not the case for norm-constrained portfolios (Unreported
results indicate that out of 4950 pairwise correlations among
returns from Fama-French 100 portfolios, 53 are more than 0.9.
Footnote 8 in DeMiguel, Garlappi and Uppal (2009) illustrated
that high return correlations lead to extremely large and unstable
portfolio weights.). With large turnovers, reasonable transaction
costs wipe out the economic gains of many traditional portfo2The derivation is as follows. If at time t

b
+ 1, we sell the asset (at Pt+1
) that was
b

P
bought (with Pta ) at time t, the realized return is log( Pt+1
a )
t

log(

δ
Pt+1 (1− 2Pt+1
t+1
δt
)
Pt (1+ 2P
t

)

) = log(

= log(

Pt+1 −δt+1 /2
Pt +δt /2 )

=

lios, as evidenced by the Sharpe ratio reductions after transaction
costs. For these two datasets, the portfolio that shrinks the sample covariance matrix toward a factor-model implied structure
(“LW - MKT”) seems to be a good one. This is because the factor
model could well explain returns from these factor-mimicking
portfolios in the U.S. stock market. By having an additional
L1 norm constraint, however, “Rule II” significantly improves
risk-return tradeoff and reduces portfolio turnovers.
In the third example, we follow Della Corte, Sarno, and
Tsiakas (2009, 2011), among others, and consider a U.S. investor who builds a portfolio by allocating the wealth between ten bonds: one domestic (U.S.), and nine foreign bonds
(Australia, Canada, Switzerland, Germany, UK, Japan, Norway,
New Zealand and Sweden). We collect interest rates from these
nine foreign countries,3 and also collect returns of nine foreign
exchange rates relative to the U.S. dollar (USD): the Australian
dollar (AUD), Canadian dollar (CAD), Swiss franc (CHF),
Deutsche mark/euro (EUR), British pound (GBP), Japanese yen
(JPY), Norwegian krone (NOK), New Zealand dollar (NZD),
and Swedish krona (SEK).4 The data sample ranges from
January 1977 to December 2013 for a total of 444 monthly
observations.
At the end of month t + 1, the foreign bonds yield a riskless
return in local currency but a risky return in U.S. dollars. The
exchange rate is defined as the U.S. dollar price of a unit of foreign currency so that an increase in the exchange rate implies a
depreciation of the U.S. dollar. Then the excess return of invest∗
+
st+1 − it+1 ,
ing in a foreign bond is equal to Rt+1 = it+1

and it+1 are the risk-free foreign interest rate and
where it+1
the U.S. interest rate, respectively, st+1 is the log U.S. dollar
spot exchange rate for a particular currency at time t + 1, and

st+1 = st+1 − st is the exchange rate return at time t + 1. Due
to the FX component, the excess return Rt+1 of a foreign bond is
risky at time t (see, e.g., Della Corte, Sarno, and Tsiakas 2009).
We estimate parameters using a 10-year rolling estimation window, and formulate portfolios starting from January 1987 based
on these estimates.
Carry trade strategy is the most popular trading strategy in the
currency market and serves as our benchmark. This strategy invests in high-interest currencies by borrowing from low-interest
currencies (e.g., Menkhoff et al. 2012). Since high-interest rate
currencies tend to appreciate while low-interest rate currencies
tend to depreciate, carry trades deliver substantial profits over
time. For example, based on a monthly dataset from 48 countries, Menkhoff et al. (2012) showed that carry trade portfolios
deliver excess returns of more than 5% per annum after accounting for transaction costs. The empirical success of carry
trade is based on the violation of Uncovered Interest rate Parity
(UIP), which is also known as “forward premium puzzle” in the
international finance literature.
Among all strategies in this example, four norm-constrained
portfolios have the highest Sharpe ratios (see Panel A of Table 2).
Three of them perform better than the carry trade after controlling for transaction costs. Their variances are next to those
from the global minimum variance portfolio and the carry trade,

Pt+1 (1−ct+1 )
Pt (1+ct ) )

1+ct
= rt+1 − log( 1−c
), where δt is the

3End-of-month

δt
2Pt

1−ct
. Similarly, τj,t+1 = log( 1+c
) when

4The

bid-ask spread at time t and ct =
we buy the asset at time t + 1.

t+1

t+1

Eurodeposit rates from Datastream are used.

data were obtained through the Download Data Program of the Board of
Governors of the Federal Reserve System.

Li: Sparse and Stable Portfolio Selection With Parameter Uncertainty

389

Table 1. Out-of-sample performance of alternative portfolio rules on factor-mimicking portfolios

Downloaded by [Universitas Maritim Raja Ali Haji] at 19:39 11 January 2016

Portfolio rule

SR

Panel A: 25 Fama-French Factors
Rule I
Rule II
Rule III
Rule IV
Portfolio rules in the literature
Sample means and covariance
Global minimum variance
Ambiguity averse
Bayes-Stein shrinkage estimator
LW - MKT
LW - CC
1/N
Panel B: 100 Fama-French Factors
Rule I
Rule II
Rule III
Rule IV
Portfolio rules in the literature
Sample means and covariance
Global minimum variance
Ambiguity averse
Bayes-Stein shrinkage estimator
LW - MKT
LW - CC
1/N

SRτ

Variance

Nonzero

Turnover

1.487a
1.590a
1.301b
1.540a

0.984c
1.160b
1.099b
1.134b

0.476
0.439
0.323
0.440

7.140
7.857
7.188
18.387

5.029
3.715
2.773
4.191

1.419a
0.705
1.335b
1.425a
1.530a
1.236b
0.509

0.961
0.248
1.002c
0.976
1.143c
1.041c
0.440

1.000
0.068
0.427
0.890
0.698
0.689
0.093

25.000
25.000
21.429
25.000
25.000
25.000
25.000

14.241
0.814
4.953
12.459
8.272
3.974
0.044

1.076b
1.058b
0.876c
0.978c

0.663c
0.694b
0.550
0.569c

0.021
0.027
0.024
0.022

15.015
24.539
20.438
40.188

3.016
3.960
4.159
3.460

0.200
0.314
0.135
0.198
0.944c
0.652
0.552

−1.468a
−1.519a
−1.153a
−1.468a
0.242
0.296
0.479

1.000
0.010
0.106
0.754
0.093
0.101
0.006

100.000
100.000
34.524
100.000
100.000
100.000
100.000

761.040
7.382
64.119
573.786
28.695
16.314
0.045

NOTE: The superscripts a, b, and c denote statistical significance at the 1%, 5%, and 10% level, respectively.

Table 2. Out-of-sample performance of alternative portfolio rules on foreign exchange rates and U.S. stocks
Portfolio rule
Panel A: International Bonds
Rule I
Rule II
Rule III
Rule IV
Portfolio rules in the literature
Sample means and covariance
Global minimum variance
Ambiguity averse
Bayes-Stein shrinkage estimator
LW - MKT
LW - CC
Carry trade
Panel B: 500 U.S. Stocks
Rule I
Rule II
Rule III
Rule IV
Portfolio rules in the literature
Sample means and covariance
LW - MKT
LW - CC
S&P500

SR

SRτ

Variance

Nonzero

Turnover

0.719b
0.535c
0.576c
0.755b

0.602b
0.436
0.475
0.669b

0.272
0.371
0.393
0.192

6.719
6.336
6.579
6.965

3.589
2.712
2.314
1.793

0.179c
0.242
−0.030b
0.154
0.235
0.273
0.487

0.125c
0.200
−0.063b
0.102c
0.180
0.205
0.452

1.000
0.016
0.652
0.949
0.883
0.514
0.098

9.000
9.000
8.347
9.000
9.000
9.000
2.000

7.475
0.093
3.400
6.893
6.450
4.608
0.562

0.605b
0.621b
0.573b
0.630b

0.553c
0.565c
0.525c
0.588b

0.146
0.127
0.107
0.085

23.712
22.917
22.442
107.981

1.154
1.406
1.226
0.811

0.164
0.172
−0.154b
0.285

−0.006c
−0.025c
−0.355b
0.285

1.000
0.548
0.444
0.011

500.000
500.000
500.000
500.000

22.596
20.314
21.029


NOTE: The superscripts a, b, and c denote statistical significance at the 1%, 5%, and 10% level, respectively.

Downloaded by [Universitas Maritim Raja Ali Haji] at 19:39 11 January 2016

390

where the former strategy is designed to minimize portfolio
variance, and the latter one purely explores the forward premium anomaly and is free of estimation risk. On average, less
than seven assets are included in each of four norm-constrained
portfolios. “Rule II” and “Rule III,” however, show limited improvements over carry trade, which indicates factor model and
constant-correlation m