Manajemen | Fakultas Ekonomi Universitas Maritim Raja Ali Haji 073500105000000270

Journal of Business & Economic Statistics

ISSN: 0735-0015 (Print) 1537-2707 (Online) Journal homepage: http://www.tandfonline.com/loi/ubes20

Standard Errors as Weights in Multilateral Price
Indexes
Robert J Hill & Marcel P Timmer
To cite this article: Robert J Hill & Marcel P Timmer (2006) Standard Errors as Weights in
Multilateral Price Indexes, Journal of Business & Economic Statistics, 24:3, 366-377, DOI:
10.1198/073500105000000270
To link to this article: http://dx.doi.org/10.1198/073500105000000270

View supplementary material

Published online: 01 Jan 2012.

Submit your article to this journal

Article views: 48

View related articles


Full Terms & Conditions of access and use can be found at
http://www.tandfonline.com/action/journalInformation?journalCode=ubes20
Download by: [Universitas Maritim Raja Ali Haji]

Date: 12 January 2016, At: 23:32

Standard Errors as Weights in
Multilateral Price Indexes
Robert J. H ILL
School of Economics, University of New South Wales, Sydney 2052, Australia (r.hill@unsw.edu.au )

Marcel P. T IMMER
Faculty of Economics, University of Groningen, 9700 Av Groningen, The Netherlands (m.p.timmer@rug.nl )

Downloaded by [Universitas Maritim Raja Ali Haji] at 23:32 12 January 2016

Various multilateral methods for computing price indexes use bilateral comparisons as their basic building
blocks. Some give greater weight to those bilateral comparisons deemed more reliable. However, none
of the existing reliability measures adjusts for gaps in the data. We show how the standard errors on

bilateral price indexes derived using weighted least squares provide reliability measures that solve this
problem. We then apply our methodology to a dataset on agricultural production in 103 countries. Our
results demonstrate the appeal of weighted methods and the importance of using weights based on a
comprehensive measure of reliability.
KEY WORDS: International Comparisons Program; Multilateral price index; Reliability measure;
Spanning tree; Stochastic approach; Weighted-Eltetö–Köves–Szulc.

1. INTRODUCTION
Comparing price levels and living standards across countries is an issue of interest to national governments, firms,
households, and international organizations such as the International Monetary Fund, World Bank, and European Union. Such
comparisons can influence budget contributions to international
organizations and the allocation of aid flows. Relative price levels, or purchasing power parities, are best known for their use
in obtaining international comparisons of GDP per capita, such
as in the Penn World Table (see Summers and Heston 1991).
They enable comparisons of productivity levels across countries, which allow policy makers and firms to benchmark domestic performance. These comparisons are also relevant to the
fields of development and international economics and to the
literature on economic convergence.
Purchasing power parity (PPP) indexes are built from prices
and quantities for a set of products in different countries and
can be either bilateral or multilateral. Bilateral indexes, such

as the Fisher index, have the disadvantage of being intransitive; that is, they do not generate internally consistent results
for comparisons involving three or more countries. Transitivity is achieved only when using multilateral methods. A number of multilateral methods have been proposed in the index
number literature. A distinction can be drawn between methods that use bilateral comparisons as their basic building blocks
and those that do not (see Hill 1997). Methods in the former
category include the Eltetö–Köves–Szulc (EKS) and minimum
spanning tree (MST) methods; methods in the latter category
include the Geary–Khamis method. The PPP currency conversion rates used by the Penn World Table are computed using the
Geary–Khamis method.
One attraction of building up a multilateral comparison from
bilateral comparisons is that this approach opens up the possibility of discriminating between bilateral comparisons on the
basis of their reliability, giving greater weight to those deemed
more reliable. For example, a comparison between France and
Germany may be considered more reliable than one between
France and India, because there is a greater overlap in the products bought in France and Germany and because the prices and

consumption levels of the products consumed by both countries
are more similar, thus making the comparison less sensitive to
the choice of the index number formula. This observation provides the underlying rationale for both the weighted-EKS (see
Rao 1999) and MST (see Hill 1999) methods; however, both
methods require that the reliability of bilateral comparisons be

quantified.
In a survey of the literature on reliability measures, Rao and
Timmer (2003) concluded that the main problem of existing
measures, such as Hill’s (1999) Paasche–Laspeyres spread and
Diewert’s (2002) class of relative price dissimilarity measures,
is that they fail to make adjustments for gaps in the data. Rao
and Timmer drew a distinction between statistical and index
theoretic measures of reliability. The former take a sampling
perspective; bilateral comparisons based on a small number of
matched product headings or a low coverage of total expenditure or production (averaged across the two countries) are
deemed less reliable. In addition to the standard statistical arguments regarding small samples and a low coverage not being representative, little overlap in the product headings priced
by the two countries implies that they are very different and,
by implication, inherently difficult to compare. Index theoretic
measures, in contrast, focus on the sensitivity of a bilateral comparison to the choice of price index formula. Most of the reliability measures proposed in the literature, including Hill’s
(1999) Paasche–Laspeyres spread and Diewert’s (2002) class of
relative price dissimilarity measures, are of this type. Although
these measures perform well when there are few gaps in the
data, they can generate highly misleading results when there
are many gaps. This is because they fail to penalize bilateral
comparisons made over a small number of matched headings.

It is precisely for such datasets that weighted methods are potentially the most useful, but only if they are able to combine
the statistical and index theoretic criteria.

366

© 2006 American Statistical Association
Journal of Business & Economic Statistics
July 2006, Vol. 24, No. 3
DOI 10.1198/073500105000000270

Downloaded by [Universitas Maritim Raja Ali Haji] at 23:32 12 January 2016

Hill and Timmer: Standard Errors in Multilateral Price Indexes

367

The objective of this article is to develop such a measure. We
depart from the existing literature, which is typically couched
in an axiomatic setting, by approaching the problem from a stochastic perspective. We show how standard errors on the logarithms of Törnqvist price indexes, derived from a stochastic
model, provide a natural index theoretic measure of reliability

that automatically penalizes bilateral comparisons with small
overlaps of headings. We argue, therefore, that using standard
errors as reliability measures will improve the quality of the results generated by weighted-EKS and MST.
We conclude the article with an empirical comparison of our
reliability measure and its resulting weighted-EKS and MST
price indexes with those obtained using other measures. The
dataset from the United Nations (UN) Food and Agriculture
Organization (FAO) comprises of agricultural producer prices
and quantities for 181 agricultural crops covering 103 countries. The interesting feature of the dataset from our perspective
is that it covers a large and diverse set of countries and contains
many gaps. The presence of gaps in such a dataset is inevitable;
for example, it is not surprising that tropical foods and spices
are not grown in Norway. If we want to find the most reliable bilateral comparisons in this dataset, it is crucial that we take into
account the amount of overlap of crops in each bilateral comparison. We show that the failure to make such adjustments can
lead to a highly undesirable allocation of weights, thus compromising weighted binary-based multilateral methods, such as
weighted-EKS and MST, in precisely those situations where
they are most needed.

2.1 Bilateral Price Indexes
The set of countries is indexed by k = 1, . . . , K, and the set

of commodity headings is indexed by n = 1, . . . , Njk . Here we
allow for the possibility that the set of headings over which each
bilateral comparison is made may not be identical. This is the
reason for the jk subscript on N. The price and quantity data of
heading n in country k are denoted by pkn and qkn .
Let Pjk and Qjk denote bilateral price and quantity index comparisons between countries j and k. Four important bilateral index number formulas are defined as follows (with the history of
these formulas discussed in Diewert 2001):
Njk

PLjk = n=1
Njk

pkn qjn

,

n=1 pjn qjn

Paasche:


Njk

PPjk = n=1
Njk

pkn qkn

n=1 pjn qkn

Fisher:


PFjk = PPjk × PLjk ,

Njk

QLjk = n=1
Njk

pjn qkn


Njk

pkn qkn

,

n=1 pjn qjn

,

QPjk = n=1
Njk

(1)
,

n=1 pkn qjn

QFjk =




Törnqvist: PTjk =

Njk 



pkn (sjn +skn )/2

,

Njk 



qkn (sjn +skn )/2

,


n=1

QTjk =

n=1

where sjn =

pjn qjn
Njk

pjn

qjn

(4)

.

m=1 pjm qjm

One weakness of these formulas is that they are not transitive;
for example, in general, PFjk × PFkl = PFjl .
2.2 Multilateral Price Indexes
A multilateral price index, by construction, is transitive. Multilateral price indexes for countries j and k are denoted herein
by Pj and Pk . A bilateral comparison of prices made using a
multilateral formula can be expressed as
Pjk =

Pk
.
Pj

Transitivity is achieved by sacrificing the independence of irrelevant alternatives (see van Veelen 2002). That is, the ratio Pk /Pj
in general will depend not only on the price and quantity vectors
of countries j and k, but also on the price and quantity vectors
of some or all of the other countries in the comparison. A large
number of multilateral formulas have been proposed in the index number literature (see, e.g., Balk 1996; Hill 1997; Diewert
1999; Cuthbert 2000 for surveys of this literature).
3. BINARY–BASED MULTILATERAL METHODS

2. BILATERAL AND MULTILATERAL
PRICE INDEXES

Laspeyres:

and

(2)

QPjk × QLjk ,
(3)

Binary-based multilateral methods are constructed by combining bilateral comparisons between pairs of countries. Here
we consider four binary-based multilateral methods. The first
method weights the bilateral comparisons in an arbitrary way,
whereas the second gives equal weight to all bilateral comparisons. The third and fourth methods give greater weight to those
bilateral comparisons deemed more reliable. It is these latter
methods that are the focus of attention here, because their performance depends critically on the way reliability is measured.
3.1 Star Methods
Graph theory provides a useful framework for analyzing the
underlying structure of binary-based multilateral price indexes.
A graph consists of a collection of vertices linked by edges. In
the context of spatial comparisons, each vertex represents one
of the countries in the comparison, and each edge represents a
bilateral comparison between a pair of countries. Two particularly important graphs, depicted in Figure 1 for the case of five
vertices, are the star and complete graphs.
Perhaps the simplest multilateral method is the star method,
which uses the star graph. The star method places one country, denoted here by b, at the center of the star. The multilateral
price index for country k is then defined as Pk = Pbk , where
Pbk is a bilateral price index, such as the Fisher or Törnqvist index. This means that a comparison between countries j and k is
made by linking together bilateral comparisons between countries j and b and countries b and k.

368

Journal of Business & Economic Statistics, July 2006
(a)

(b)

Downloaded by [Universitas Maritim Raja Ali Haji] at 23:32 12 January 2016

Figure 1. Examples of Graphs: (a) Star Graph; (b) Complete Graph.

But the fact that the bilateral formulas are not transitive implies that the multilateral price indexes depend on which country is placed at the center of the star. For example, suppose that
country b is replaced with country d at the center of the star. In
general, then,

(2003), allows each bilateral comparison to be given different
weight in the multilateral comparison. The EKS price indexes
Pj and Pk can be obtained as the solution to the minimization
problem
K

K


min
(ln Pk − ln Pj − ln PFjk )2 ,

Pbk
Pdk
=
.
Pbj
Pdj

ln Pj ,ln Pk

This is the main weakness of the star method. In most applications, it is not clear which country should be placed at the center of the star. Such methods as those of Geary (1958), Khamis
(1972), and Iklé (1972) solve this problem by putting an artificial average country at the center of the star. As a result, neither
of these approaches is binary-based.
3.2 The EKS Method
The EKS method, named after Eltetö and Köves (1964) and
Szulc (1964) but first proposed by Gini (1931), also uses the
star graph. It manages to treat all countries symmetrically by
generating K series of multilateral price indexes, each with a
different country at the center of the star. These K series of
results are then averaged. The EKS method usually uses the
Fisher index to make the bilateral comparisons. The Törnqvist
version of EKS is often referred to as the CCD method (Caves,
Christensen, and Diewert 1982). One attractive feature of the
CCD method is that it can also be represented as a star method
with an artificial country at the center of the star. The EKS formula transitivizes the bilateral price indexes as
Pk =

K



(Pjk )1/K .

(5)

j=1

EKS is the preferred multilateral method of Eurostat and the
OECD.
The EKS method can also be described using a complete
graph (see Fig. 1), because it uses bilateral comparisons between all possible pairs of countries. A total of K(K − 1)/2
distinct bilateral comparisons are defined on a set of K vertices.
Inevitably, some of these bilateral comparisons are likely to be
more reliable than others. This observation provides the rationale for the weighted-EKS method discussed next.
3.3 The Weighted-EKS Method
The weighted-EKS method, proposed by Rao (1999) and discussed in greater detail by Rao (2001) and Rao and Timmer

j=1 k=1

where the normalization P1 = 1 is imposed. The solutions,
ln P̂j and ln P̂k are the ordinary least squares estimators of
ln Pj and ln Pk in the model
ln PFjk = ln Pk − ln Pj + ǫjk ,

(6)

with E(ǫjk ) = 0 and var(ǫjk ) = σ 2 .
Rao’s weighted-EKS method assumes instead that the errors
are heteroscedastic, that is,
var(ǫjk ) = σ 2 /wjk

for j = k

var(ǫjj ) = 0.

and

(7)

The weights, wjk , measure the reliability of the comparison between countries j and k. We discuss specification of the wjk later
in the article. For now, we treat them as given. The complete
matrix of weights is denoted here by W. The matrix W must
be symmetric. In addition, if a particular bilateral comparison
is assigned a weight of 0, then in (7) this comparison should be
interpreted as having an infinite variance. Hence it plays no part
in the determination of the weighted-EKS price indexes,


0
w12 · · · w1K
0
· · · w2K 
 w21
.
W=
.. 
..
 ...
. 
.
wK1

wK2

···

0

The weighted-EKS price indexes, Pk , are obtained as

  K w2j
ln P2
j=2
−w
 ln P3  
32
 . =
..
 ..  

.
ln PK
−wK2


−1
−w2K
−w3K 


..

K .
···
j=K wKj
 − K w ln PF 
j=2 2j
2j
 − K w ln PF 

j=3 3j
3j 
×
.
..


.
K
− j=K wKj ln PFKj

−w23
K
j=3 w3j
..
.
−wK3

···
···

Hill and Timmer: Standard Errors in Multilateral Price Indexes

The price index for country 1, P1 , is normalized to 1. In the
case where wjk = w for all j and k, the weighted-EKS method
reduces to the standard EKS formula in (5).

Downloaded by [Universitas Maritim Raja Ali Haji] at 23:32 12 January 2016

3.4 The Minimum Spanning Tree Method
A multilateral comparison between K countries can be made
by simply chaining together K − 1 bilateral comparisons (i.e.,
edges), as long as the underlying graph is a spanning tree (see
Hill 1999). A spanning tree is a connected graph that does not
contain any cycles. In other words, any two vertices in the graph
are connected by one and only one path of edges. The reason why there must be no cycles in the graph is to ensure that
the multilateral price indexes are transitive and hence internally
consistent. A total of K K−2 different spanning trees are defined
on a set of K vertices. Three examples of spanning trees defined
on the set of nine vertices are shown in Figure 2. The star graph
in Figure 1 is another example of a spanning tree.
The resulting set of multilateral price indexes depends on
both the choice of formula for making the bilateral comparisons and the choice of spanning tree. The bilateral comparisons
should be made using a superlative formula, such as Fisher or
Törnqvist. (See Diewert 1976 for a definition and discussion of
the properties of superlative indexes.) Because superlative formulas satisfy the country reversal test (i.e., Pjk = 1/Pkj ), there
is no need for directional arrows on the edges in the spanning
tree to identify the base country in each bilateral comparison.
The choice of spanning tree is more problematic. A criterion
is needed for deciding which edges (i.e., bilateral comparisons)
to include and which to exclude. As with the weighted-EKS
method, this requires that a weight be placed on each bilateral
comparison. Again, the specification of weights is deferred until later. In the context of the MST method as described by Hill
(1999), however, it should be noted that the greater the reliability of a bilateral comparison, the smaller its reliability measure.
The MST method can be easily reformulated so that reliability is an increasing function of the reliability measure, as is the
case in the weighted-EKS method. Given this setup, it is the
maximum-spanning tree, not the minimum-spanning tree that
is required.
The maximum spanning tree can be derived using Kruskal’s
algorithm. Kruskal’s algorithm proceeds by selecting sequentially the bilateral comparisons (edges) with the largest weights
subject to the constraint that adding that edge to the graph does
not create a cycle. The algorithm terminates once it is no longer
possible to select any more edges without creating a cycle. It
turns out that the resulting spanning tree has the maximum sum

369

of weights. This is a well-known theorem in the graph theory literature (see, e.g., Wilson 1985). In some sense, the MST
method can be considered a special case of the weighted-EKS
method for which 2(K − 1) of the elements of the W matrix are
set equal to 1 and all other elements equal 0.
4. A SURVEY OF EXISTING MEASURES OF
RELIABILITY FOR BILATERAL COMPARISONS
Both the weighted-EKS and MST methods require the construction of a matrix of weights. Ideally, we should use whatever bilateral comparisons are most reliable. One important
difference between the two methods is that MST price indexes
depend only on the ordinal ranking of the reliability measures.
Hence MST price indexes, unlike weighted-EKS price indexes,
are unaffected by monotonic transformations of the reliability measure. For example, the logarithmic transformation of
the PLS measure defined in (8) matters for the weighted-EKS
method but is of no consequence for the MST method.
The literature on reliability measures for bilateral comparisons has been developed for the most part in an axiomatic
setting. When discussing the sensitivity of a bilateral comparison to the choice of index number formula, it is useful to
first consider the limiting cases where all formulas give the
same answer. The data are consistent with the conditions for
Hicks’ aggregation theorem (Hicks 1946) if pkn = λpjn for
n = 1, . . . , Njk , where λ denotes a positive scalar. In this case all
price index formulas reduce to λ. The data are consistent with
the conditions for Leontief’s aggregation theorem (Leontief
1936) if qkn = µqjn for n = 1, . . . , Njk , where µ again is a positive scalar. In this case all quantity index formulas reduce to µ.
It follows, therefore, that all price index formulas should reNjk
Njk
pjn qjn ), because price indexes
pkn qkn )/(µ n=1
duce to ( n=1
can be obtained implicitly from quantity indexes.
One measure of sensitivity that has received attention in the
index number literature is the Paasche–Laspeyres spread (PLS),
usually defined as some function of the ratio of a Paasche
price index to a Laspeyres price index. (The ratio of Paasche
to Laspeyres is the same for price and quantity indexes.) For
example, Hill (1999) defined it as


max(PPjk , PLjk )
PLSjk = ln
.
(8)
min(PPjk , PLjk )
The PLS has the attractive property of equaling 0 if the data
satisfy the conditions for either Hicks or Leontief aggregation.
When either condition is satisfied, there is no index number

Figure 2. Examples of Spanning Trees.

Downloaded by [Universitas Maritim Raja Ali Haji] at 23:32 12 January 2016

370

Journal of Business & Economic Statistics, July 2006

problem, because all formulas should give the same answer.
This suggests that we can have a high degree of confidence in
the results of a bilateral comparison with a small PLS, because
the underlying data are broadly consistent with either Hicks or
Leontief aggregation. However, the link between the PLS and
Hicks or Leontief aggregation is not exact, because the PLS can
equal 0 even when the conditions for Hicks and Leontief aggregation are both violated.
For this reason, Diewert (2002) advocated separate measures
of sensitivity for price and quantity indexes, which he referred
to as “relative dissimilarity measures.” He considered the axiomatic properties of a number of alternative measures. His relative dissimilarity measures for prices (quantities) all share the
characteristic that they equal 0 if and only if the data satisfy the
conditions for Hicks (Leontief ) aggregation. One of his preferred measures is
Njk 
 
 


sjn + skn
1 pkn 2
P
Sjk ≡
ln T
(9)
2
Pjk pjn
n=1

and
Q
Sjk

Njk 
 
 


sjn + skn
1 qkn 2
ln
,

2
QTjk qjn

(10)

n=1
Q

P and S denote the price and quantity dissimilarity
where Sjk
jk
measures and PTjk and QTjk denote Törnqvist price and quantity
indexes as defined in (4).
P and SQ can be combined as
If desired, Sjk
jk
Q

P
Sjk ≡ min(Sjk
, Sjk ).

(11)

This measure, a variant of which was used by Hill (2004),
equals 0 if and only if the data are consistent with either
P < SQ , and
Hicks or Leontief aggregation. For most datasets, Sjk
jk
P . This empirical regularity was
hence Sjk will simplify to Sjk
noted by Allen and Diewert (1981). It is worth noting that when
Njk = 1 (i.e., the comparison is made over only one heading),
P = SQ = S = 0. Thus, for the measure to be meanPLSjk = Sjk
jk
jk
ingful, we must restrict attention to cases where Njk ≥ 2.
All of the reliability measures discussed herein, however,
share one fundamental weakness: They all assume that there
are no gaps in the data. As soon as there are gaps, this means
that some bilateral comparisons will be made over larger baskets of products than others. All other things being equal, we
should prefer bilateral comparisons made over larger baskets.
Furthermore, the sheer fact that two countries have very little
overlap in the products produced (or consumed) indicates that
these countries are very different and, by implication, hard to
compare. Therefore, ideally a measure of reliability should penalize bilateral comparisons where the overlap of products is
small. At first glance, it seems that any such adjustment must
be arbitrary; however, by approaching the problem from a stochastic perspective, in the next section we derive a reliability
measure that naturally makes such an adjustment. In addition,
although we approach the problem from a very different perspective than Diewert (2002), who used an axiomatic approach,
it emerges that our reliability measure is a generalization of one
of his measures.

5. A STOCHASTIC APPROACH TO THE
MEASUREMENT OF RELIABILITY
In this section we show how when the same problem of measuring the reliability of bilateral comparisons is approached
from the stochastic perspective, we obtain standard errors on
the logarithms of Törnqvist price indexes that can serve as
measures of the reliability of bilateral comparisons. Furthermore, we link these standard errors with the existing literature by showing that they are, in fact, generalizations of one of
Diewert’s relative price dissimilarity measures. The main difference is that the standard errors contain an additional term that
penalizes bilateral comparisons where the overlap of products
is small. This finding provides an interesting new link between
the axiomatic and stochastic approaches to index numbers.
Our approach here is somewhat analogous to that used in
the weighted country product dummy (WCPD) method (see
Rao 2001; Diewert 2004a). Our approach specifies a stochastic model in price relatives in a bilateral setting, as opposed to
WCPD, which specifies a stochastic model in price levels in a
multilateral setting.
Our stochastic model builds on the work of Clements and
Izan (1981), Cuthbert and Cuthbert (1989), and Selvanathan
and Rao (1994). These authors showed how the stochastic approach can be used to derive standard errors on the logarithms
of Paasche, Laspeyres, and Törnqvist indexes, as functions of
the number of observed price headings (see also Diewert 1995).
Although they do not draw attention to this issue (which is
not surprising because these articles predate Diewert 2002),
it turns out that the standard errors on the logarithms of the
Törnqvist price indexes derived by Clements and Izan (1981)
and Selvanathan and Rao (1994) differ from Diewert’s dissimP only in that they make as adjustment by diilarity measure Sjk
P
viding Sjk by Njk − 1. Hence these standard errors do make a
simple adjustment for gaps in the data. However, the adjustment
is not entirely satisfactory because it does not take into account
the value share of each product. Our model differs slightly from
theirs and in the process generates a more satisfactory method
of adjustment that explicitly factors in value shares. The approach of Cuthbert and Cuthbert (1989) is slightly different
again, in that it focuses on comparisons below the basic heading
level. (Cuthbert and Cuthbert’s contribution is discussed later in
the article.)
It is useful to begin with a discussion of Clements and Izan’s
stochastic model of the Törnqvist price index. They assumed
that the price relatives can be modeled as


pkn
ln
(12)
= αjk + εjk,n .
pjn
The term αjk in (12) represents the systematic part of the difference in the purchasing power of the currencies of countries
j and k, whereas εjk,n denotes the random element. Clements
and Izan assumed that the errors are independently distributed as
E(εjk,n ) = 0,

var(εjk,n ) =

σjk2
Njk wjk,n

,

(13)

where wjk,n > 0 denotes a nonrandom weight attached to headNjk
ing i, such that n=1
wjk,n = 1. It is assumed that Njk ≥ 2. Also,

Downloaded by [Universitas Maritim Raja Ali Haji] at 23:32 12 January 2016

Hill and Timmer: Standard Errors in Multilateral Price Indexes

371

here we have added Njk to the denominator of the variance term
so that the harmonic mean of the variances across all headings
is independent of the number of headings. This adjustment is
needed to make the results consistent with a different version of
the model discussed later.
The important point about Clements and Izan’s specification
is that wjk,n differs across headings. The model presumes that
the price relatives of headings with larger weights (value shares)
are measured with greater accuracy and hence more closely approximate the underlying price index.
Continuing for the moment with our slightly modified
version of the Clements and Izan model, it follows from
(12) and (13) that the generalized least squares estimator of αjk
is
Njk 




pkn
wjk,n ln
.
(14)
α̂jk =
pjn
n=1

lead to the same estimator of αjk , the estimator’s variance differs in the two models.
Solving (18), we obtain the following estimator of αjk :
α̃jk =

n=1

var(α̃jk ) =

Njk 



w2jk,n

n=1



σjk2
Njk wjk,n



=

σjk2
Njk

.

(15)

An unbiased estimator of σjk2 is
σ̂jk2

=



σjk2

Njk



w2jk,n .

(20)

σ̃jk2 =



1
1−

Njk

2
n=1 wjk,n

Njk 


 

2 
pkn
wjk,n ln
− α̃jk
.
pjn
n=1

(21)

This result is derived as
 Njk
 Njk




2
2
[wjk,n E(xjk,n
)] − [E(α̃jk )]2
wjk,n (xjk,n − α̃jk ) =
E
n=1

2
= σjk2 + αjk
− σjk2

(16)

Returning to (13), suppose that now we modify the underlying assumptions as
var(εjk,n ) = (σjk )2 ,

αjk

Njk 
 
2 



pkn
wjk,n ln
;
− αjk
pjn

=

σjk2

(17)

so that the variances of the errors are independent of the
weights. We believe this assumption is more realistic, because
the weights are later set equal to the average value shares, that
is, wnjk = (sjn + skn )/2. There is no reason to presume that headings with larger value shares are measured more accurately. In
fact, if anything, the reverse is likely to be true, as is illustrated
by the examples of the health and education headings, which
have large shares and are very hard to measure. Hence the assumption regarding the variance of the errors in (17) is definitely an improvement on (13).
Our second departure from the approach of Clements and
Izan is that instead of generalized least squares (GLS) in a
heteroscedastic model, we use weighted least squares in a homoscedastic model to estimate αjk , with the weight on observation n equal to wjk,n ,
min

Njk



2
w2jk,n − αjk

n=1

n=1

E(εjk,n ) = 0,

(19)

Comparing (14) and (19) reveals that α̂jk and α̃jk are the same.
However, the variances of α̂jk and α̃jk differ unless wjk,n =
1/Njk for n = 1, . . . , Njk .
An unbiased estimator of σjk2 in our model is

n=1

Njk 
2 


 

Njk
pkn
.
− α̂jk
wjk,n ln
Njk − 1
pjn



pkn
wjk,n ln
,
pjn

n=1

When wjk,n = (sjn + skn )/2, α̂jk reduces to the logarithm of the
Törnqvist price index. It follows from (13) that
var(α̂jk ) =

Njk 



(18)

n=1

that is, we give greater weight to those headings that have larger
value shares, because they are more important in an economic
sense. GLS, in contrast, gives greater weight to headings that
are measured with greater accuracy. If it is assumed (unrealistically in our opinion) that headings with larger value shares are
measured more accurately, as in the Clements and Izan model,
then it might seem that the two approaches are mathematically
equivalent. But this is not the case. Although both approaches



1−

Njk



w2jk,n

n=1



,

where xjk,n = ln( pkn /pjn ).
Setting the weights equal to the average value shares, wjk,n =
(sjn + skn )/2, in (16) and (21), we obtain that


Njk
2
SP
(22)
σ̂jk =
Njk − 1 jk
and
σ̃jk2

=



1
1−

Njk

n=1 [(sjn

+ skn

)/2]2



P
Sjk
.

(23)

That is, the estimated variance is a function of Diewert’s relative
P . Replacing σ with σ̂ in (15),
price dissimilarity measure, Sjk
jk
jk
we obtain that


1
SP ,
var(α̂jk ) =
(24)
Njk − 1 jk
where var(α̂jk ) is an unbiased estimator of var(α̂jk ). Similarly,
replacing σjk with σ̃jk in (20), we obtain that
Njk
{ n=1
[(sjn + skn )/2]2 } P
var(α̃jk ) =
(25)
Sjk ,
Njk
{1 − n=1
[(sjn + skn )/2]2 }

where var(α̃jk ) is an unbiased estimator of var(α̃jk ). Equation (25) provides a new measure of the variance of the logarithm of a Törnqvist price index.
It is important to consider the impact of Njk on var(α̂jk )
and var(α̃jk ). It is not clear in general how an increase in Njk

Downloaded by [Universitas Maritim Raja Ali Haji] at 23:32 12 January 2016

372

Journal of Business & Economic Statistics, July 2006

P . However, an increase in N must
(for Njk ≥ 2) will affect Sjk
jk
Njk
reduce 1/(Njk − 1) and should reduce { n=1
[(sjn + skn )/2]2 }/
Njk
{1 − n=1
[(sjn + skn )/2]2 }. By construction, sjn and skn are
calculated only over the products (headings) supplied by both
Njk
Njk
countries j and k (i.e., n=1
sjn = n=1
skn = 1). Hence, as
Njk rises, each of the sjn and skn terms, on average, should fall
which in turn should reduce the numerator and increase the denominator of (25), thus causing var(α̃jk ) to fall. The empirical analysis later in the article bears out these claims. Although
P ) bilateral comparisons
var(α̂jk ) clearly penalizes (relative to Sjk
with smaller overlaps, the adjustment takes no account of the
relative importance in terms of value shares of each product. It
is for this reason that we believe var(α̃jk ) to be a superior measure of reliability.
Quantity versions of the stochastic models in (13) and (17)
could also be developed. However, in this context it is more
natural to treat the prices as being stochastic and the quantities
as responding to prices.

6. CHOOSING THE WEIGHTS
6.1 Above Basic Heading Level
It is useful to draw a distinction between applications of price
index methods above and below basic heading level. Above basic heading level, value shares are available, whereas below
basic heading level, they are not. This distinction is relevant
mainly to consumer datasets, such as those used by the International Comparisons Program (ICP), the OECD, and Eurostat,
which rely on household expenditure surveys to obtain value
shares.
Focusing first on comparisons above basic heading level
(the usual case), if the Fisher index is replaced by Törnqvist
in (6), and assuming heteroscedastic errors as in (7), we obtain
the following model:
ln PTjk = ln Pk − ln Pj + ǫjk ,
where E(ǫjk ) = 0 and var(ǫjk ) =

σ2
.
wjk

(26)

It is important not to confuse σ 2 with σjk2 . The former is the variance of the logarithmic deviations of Törnqvist price indexes
from CCD price indexes in a multilateral comparison, whereas
the latter is the variance of the logarithms of the price relatives
in a bilateral comparison between countries j and k.
It follows from (26) that
var(ln PTjk ) = var(ǫjk ) =

σ2
.
wjk

Finally, using (25), we obtain the following weights for the
weighted-EKS and MST methods:
Njk
{1 − n=1
[(sjn + skn )/2]2 } σ 2
σ2
=
. (27)
wjk =
Njk
P
var(ln PTjk )
{
[(sjn + skn )/2]2 } Sjk
n=1

With the weights defined in this way, the MST method uses the
maximum spanning tree. The value of σ 2 in (27) has no effect
on the resulting multilateral price indexes, and hence can be set
equal to 1.

6.2 Below Basic Heading Level
Below basic heading level (i.e., at a level of detail lower than
that provided in household expenditure surveys), value shares
are not available. This distinction between above and below basic heading level often arises in datasets covering consumer expenditures. The ICP, OECD, and Eurostat all have to construct
price indexes at both levels.
A distinction can be drawn between cases where some products are identified as representative for a particular country—
an approach pioneered by Eurostat and further developed in the
latest round of the ICP (see International Comparison Program
2005)—and cases where no such distinction is made. Considering first the latter scenario, it follows that wjk,n = 1/Njk for all n.
Therefore, the Törnqvist index in (4) reduces to the Jevons index (see Diewert 2004b), defined as
Jevons:

PJjk

=

Njk 



pkn 1/Njk

n=1

pjn

.

(28)

Similarly, Diewert’s price dissimilarity measure defined in (9)
reduces to
Njk  
 
1 pkn 2
1

P
ln J
.
(29)
Sjk =
Njk
Pjk pjn
n=1
Equation (27) now reduces to
wjk =

σ 2 (Njk − 1)
,
P
Sjk

where Sjk is defined as in (29).
Suppose now that each country identifies some products as
representative. Now let n = 1, . . . , Njk index the products that
are representative in either country j or k and are priced in both
j and k. Also let NjkR denote the number of products that are representative in country j and priced in k, whereas NkjR denotes the
number of products representative in country k that are priced
in j.
The weights on each product can now be defined as wjk,n =
(s∗j,n + s∗k,n )/2, where s∗j,n = 1/NjkR if product n is representative
in country j and s∗j,n = 0 otherwise. Similarly, s∗k,n = 1/NkjR if
product n is representative in country k, and s∗k,n = 0 otherwise.
This means that any data available on a product that is not representative for either country is ignored, even if both countries
price this product. If such a scenario is deemed unsatisfactory,
then an alternative approach is to simply give greater weight to
representative products. For example, let NjkU denote the number of unrepresentative products in country j that are priced by
both countries. We could set s∗j,n = θ/(θ NjkR + NjkU ) if product n
is representative and s∗j,n = 1/(θ NjkR + NjkU ) if n is unrepresentative, where θ ≥ 1 is a parameter that determines the relative
weight given to representative and unrepresentative products.
The greater the value of θ , the smaller the weight given to unrepresentative items. At one extreme, when θ = 1, representative and unrepresentative products priced by both countries are
treated symmetrically. At the other extreme, in the limiting case
as θ tends to infinity, we obtain the previous model.
Using this approach, quasi-Törnqvist indexes can be computed with sjn and skn replaced by s∗jn and s∗kn in (4). Using

Downloaded by [Universitas Maritim Raja Ali Haji] at 23:32 12 January 2016

Hill and Timmer: Standard Errors in Multilateral Price Indexes
P
these quasi-value shares, it is likewise possible to compute Sjk
and var(α̂jk ), and hence the methodology becomes analogous to
the above basic heading case.
In a typical multilateral comparison, this process is repeated
for each heading (expenditure class) of which there may be
around 200. Therefore, although the standard procedure should
work in most cases, there almost certainly will be a few problematic headings where either NjkR and NjkU , NkjR and NkjU , or
both will equal 0 in one or more of the bilateral comparisons. Clearly, when this happens it is not possible to comP , and var(α̂ ), because the weights w
pute PTjk , Sjk
jk
jk,n are not
defined. For the purposes of the weighted-EKS and MST methods, in such cases var(α̂jk ) should be set to infinity. This
ensures that this bilateral comparison is completely ignored
by both methods. If, for a particular heading, this situation
arises for so many bilateral comparisons that it is not even
possible to construct a spanning tree from the remaining comparisons, then quasi-Törnqvist must be replaced by either quasiP
geometric Laspeyres or Paasche where required in the Sjk
formulas.
The stochastic properties of the quasi-Törnqvist index have
been considered previously by Cuthbert and Cuthbert (1989),
although, confusingly, they referred to this index as a Fisher
index. (This terminology can be traced back to Eurostat.)
Cuthbert and Cuthbert derived an expression for the variance
of the quasi-Törnqvist index that corresponds to (20), with the
quasi-weights defined as earlier with θ set to infinity (see also
Rao 2001). But Cuthbert and Cuthbert did not derive an estimator for σjk2 for their stochastic model, and hence as it stands, the
variances cannot actually be computed (up to a scalar of proportionality) unless it is assumed that σjk is the same across all
bilateral comparisons.

7. EMPIRICAL APPLICATION TO AN
AGRICULTURAL DATASET
The dataset consists of agricultural producer prices and quantities for 181 agricultural products (mainly crops) for the year
1995. Because quantities (and hence value shares) are available
for each product, as is usually the case in producer datasets,
the distinction between above and below basic heading level
is not relevant. The weighting formula in (27) can be used directly. The dataset covers 103 countries. It was constructed by
Rao, Ypma, and van Ark (2003) from a UN FAO agricultural
and producer prices database. We have slightly modified Rao
et al.’s original dataset, which contained 111 countries, by removing 8 countries. We removed Singapore and Chad due to a
very limited number of agricultural products, and removed six
Central Asian countries for which the price data were not original but appeared to be imputed from the Ukraine. (See Rao et al.
2003 for a full description of the dataset.)
The interesting feature of the dataset from our perspective is
that it contains many gaps. This is inevitable in a dataset that
covers most of the crops grown in the world. For example, it
is not surprising that tropical foods and spices are not grown in
Norway. If we want to find the most reliable bilateral comparisons in this dataset, it is crucial that we take into account the
amount of overlap of crops in each bilateral comparison.

373

The bilateral comparisons selected by the MST method are
particularly illuminating. A total of 102 bilateral comparisons
are selected (by construction, one less than the number of countries in the comparison). The top 25 bilateral links are listed
P,
in Table 1 in descending order for Diewert’s measure, 1/Sjk
and our measure, 1/var(α̃jk ). The full list of 102 bilateral links
for each weighting scheme has been provided by Hill and
Timmer (2004). It is immediately apparent that many of the
P are surprising. Some particularly eye-catching
links for 1/Sjk
links that were among the first 25 selected (i.e., the best bilateral links in the MST) are Norway–Niger (2), Norway–
Malaysia (2), Norway–Indonesia (3), Malaysia–Canada (4),
Mali–Georgia (4), and Ghana–Canada (2). The size of the product overlap, Njk , is provided in brackets after each link. The fact
that these bilateral comparisons all have a low product overlap indicates that they may not be reliable, and hence probably
should not be used to construct a spanning tree or given high
weights in the weighted-EKS method. Only two of these six
links are selected by our measure, 1/var(α̃jk ), and in both cases
they are further down the list. The MST links obtained using our
measure are intuitively more plausible. In addition, the average
P meavalue of Njk is 27.6, as opposed to 14.2 for Diewert’s 1/Sjk
sure. The difference is even more dramatic for the best 25 links.
The average Njk for this subset of links is 35.0 for our measure,
P.
compared with 11.1 for 1/Sjk
P is that when
The problem with using weights based on 1/Sjk
Njk is small, it will contain a lot of noise. This means that in
P could be large even
such cases, there is a chance that 1/Sjk
though, almost by definition, countries j and k must be quite
different. Given that the MST method selects the bilateral comparisons with the smallest dissimilarity measures, it will therefore pick up quite a few of these observations. However, in such
cases it does not follow that these pairs of countries face similar relative prices for their agricultural products. In fact, when
Njk is small, the situation is quite the reverse, because it implies that the mix of products they produce is very different,
and hence it would be highly misleading to conclude that they
face similar relative prices.
It is also important to compare the price indexes generP and 1/var(α̃ ) for both
ated by the reliability measures 1/Sjk
jk
the weighted-EKS and MST methods. These results along
with EKS–Törnqvist (i.e., CCD) price indexes for 30 countries are presented in Table 2, with the United States as the
base. Again the full list for all 103 countries has been given
by Hill and Timmer (2004). Particularly striking are the results
for Cameroon; the price index differs by a factor of 9 depending
on which method is used!
In the context of this article, we are particularly interested in
P
assessing the impact of the choice between using either 1/Sjk
or 1/var(α̃jk ) as weights on the weighted-EKS and MST price
indexes. The following sensitivity measure is useful in this context:

f
f
g
g
K 
100
max[Pk (z)/Pb (z), Pk (z)/Pb (z)]
b
−1 ,
Lfg (z) =
f
f
g
g
K
k=1 min[Pk (z)/Pb (z), Pk (z)/Pb (z)]
where z denotes a weighted binary-based multilateral method
such as MST or weighted-EKS, f and g denote alternative

374

Journal of Business & Economic Statistics, July 2006

Table 1. Top 25 Bilateral Links in MST
P
Weighting scheme 1/Sjk

Downloaded by [Universitas Maritim Raja Ali Haji] at 23:32 12 January 2016

MST bilateral links
Norway
Norway
Sudan
Norway
Norway
Ireland
Canada
U.K.
Finland
Peru
Norway
Norway
Spain
Malaysia
Switzerland
Ireland
Norway
Mali
Portugal
Slovenia
Ghana
Tunisia
Mali
Sweden
Ukraine

Niger
Malaysia
Canada
Albania
New Zealand
Denmark
U.S.A.
Ireland
Canada
Norway
Indonesia
Costa Rica
Norway
Canada
Malaysia
Germany
Bangladesh
Georgia
Norway
Norway
Canada
Norway
Jordan
Canada
Sweden

Weighting scheme 1/var(αjk )
P
1/Sjk

Njk

168.96
75.16
57.97
52.06
48.62
46.38
43.62
41.58
39.26
37.15
36.43
34.10
30.60
29.43
29.42
29.13
28.70
27.88
27.85
27.81
26.89
26.43
26.13
25.71
25.61

2
2
5
9
12
25
24
30
13
10
3
4
13
4
13
29
4
4
12
13
2
9
5
16
15

P and 1/var(α̃ ), b denotes the
weighting schemes such as 1/Sjk
jk
f
f
base country, and Pk (z)/Pb (z) denotes the price index for country k obtained using method z and weighting scheme f , with
b (z) can be interpreted as measuring
country b as the base. Lfg
the average percentage impact on method z (with country b
as the base) of changing the weighting scheme from f to g,
b (z) = Lb (z). For example, the meaor vice versa, because Lfg
gf
sured price indexes obtained using the MST method with the
United States as the base differ on average by 20.7% dependP or 1/var(α̃ ) is used as a weight. The
ing on whether 1/Sjk
jk
problem with this measure is that the results are not invariant to
the choice of base country, as can be seen in Table 3. Indeed,
b (z) for z =MST and the weighting schemes f = 1/SP and
Lfg
jk
g = 1/var(α̃jk ) ranges between 18.9% (when Japan is the base)
and 96.1% (when Guinea is the base). An overall measure of
the sensitivity of the price indexes of method z to the choice of
b (z)
weighting scheme is obtained by averaging the results for Lfg
across all possible base countries,

Lfg (z) =

K
1
b
Lfg (z).
K
b=1

b (z), for 30 base countries are shown in Table 3
The results, Lfg
along with the overall average, Lfg (z), obtained from using all
103 countries in turn as the base. (Again, see Hill and Timmer
2004 for the complete results.) From Table 3, it is clear that the
MST price indexes are more sensitive to the choice of weights
than are the weighted-EKS price indexes. On average, the MST
P
price indexes change by 29.1%, depending on whether 1/Sjk
or 1/var(α̃jk ) is used as a weight, whereas the weighted-EKS
price indexes change by only 2.1%. But even a 2.1% average
change is quite significant. Hence for datasets containing many
P and 1/var(α̃ ) as a measure of
gaps, the choice between 1/Sjk
jk

MST bilateral links
Slovakia
Ireland
U.K.
Zambia
Spain
Norway
Senegal
South Africa
Canada
Peru
Spain
Zambia
South Africa
Spain
Morocco
Spain
Zambia
Zambia
Zambia
Sweden
Sweden
Finland
Italy
Slovakia
Slovakia

Mexico
Denmark
Ireland
Tunisia
U.S.A.
Niger
Romania
Mexico
U.S.A.
China
Greece
South Africa
China
Portugal
Jordan
Italy
Saudi Arabia
Jordan
Guatemala
Czech Rep
Malaysia
Canada
Hungary
Greece
Brazil

1/var(αjk )

Njk

208.94
197.96
188.76
171.30
159.91
154.54
153.37
153.33
150.70
149.58
149.53
148.92
143.76
140.12
139.84
138.61
133.84
132.88
132.87
132.85
131.52
130.39
129.16
127.77
127.68

46
25
30
18
75
2
19
57
24
48
67
27
53
55
37
66
10
16
15
28
11
13
54
47
32

reliability can have a major impact on the resulting price indexes.
8. CONCLUSION
The latest round of the ICP is attempting to make detailed
comparisons of price levels across almost all countries in the
world (see Diewert 2004a; International Comparisons Program
2005). Even though the world comparison is being broken up
into regional blocks, obtaining complete matrices of prices at
basic heading level for all the countries in each regional block
is a major undertaking that may result in either excessive aggregation of data or loss of characteristicity (i.e., countries may
be forced to supply price data on products that are not representative of their consumption patterns). This problem arises
frequently in international comparisons. Thus it may be counterproductive to try and eliminate all gaps. We have shown here
that gaps (i.e., missing observations) in the data are not an insurmountable problem, particularly when weighted binary-based
multilateral methods are used. However, it is important that explicit account be taken of these gaps when deciding how much
weight is given to each bilateral comparison in the overall multilateral comparison. Failure to make such an adjustment may
compromise weighted binary-based methods precisely when
they are most needed (i.e., in a comparison over a heterogeneous set of countries). We have developed a method with
strong theoretical foundations that automatically makes such an
adjustment. Our weights, which are derived from the standard
errors on Törnqvist price indexes, naturally penalize bilateral
comparisons containing many gaps.
Our standard errors can also be modified for use in consumer
datasets below the basic heading level (where no value shares
are available). If countries identify products as representative or

Hill and Timmer: Standard Errors in Multilateral Price Indexes

375

Table 2. PPP Exchange Rates per U.S. Dollar for Agricultural Output

Downloaded b