Apr 2, 2010 - This follows from the fact that if we go round a cycle, .... (Laspeyres for office supplies, bed clothing both at the GTIN and group level; also ... Let be the PIM of a transitive price index, which corresponds to a PIM that is not.
Transitivity of price indices
Leon Willenborg May 2018
Content 1.
Introduction
2.
Transitivity 8 2.1 Definition of transitive price indices 8 2.2 Representation of transitive price indices 9 2.3 Examples of intransitive price indices 11 2.4 Transitive closure 16 2.5 Transitive reduction 21 2.6 Aggregation and transitivity 22 2.7 Combinations of transitive price indices 23
3.
Transitivization: MST and GEKS methods 3.1 MST method 24 3.2 GEKS method 25
4.
Transitivization: cycle method 28 4.1 Introductory remarks 28 4.2 From GEKS to cycle method 29 4.3 The cycle method explained 29 4.4 Form of the adjusted price index 30 4.5 Invariance properties 30 4.6 Cycle matrix and weight matrix 31 4.7 Alternative approach to the cycle method 32 4.8 Applications of the cycle method 32
5.
TPD model
6.
Some price indices applied to DPoGs 35 6.1 Jevons price index 35 6.2 Indices with weighted geometric averages 36 6.3 Dutot price index 37 6.4 Lowe price index 38 6.5 Composites / subgroups 38
7.
κ-index for composites 40 7.1 Elementary price indices for composites 40 7.2 Aggregated price indices 42
8.
Price ratios and the cycle method 8.1 Idea of the method 45 8.2 Weights: time dimension 46 8.3 Weights: goods dimension 47
9.
Updating: incremental price index computation 9.1 Preliminary remarks 49 9.2 Incremental cycle method 49 9.3 Incremental κ-index 52
10. Discussion
5
23
34
45
49
53 CBS | Discussion Paper 2018 | May 2018 2
References
55
Appendix A. Notation and terminology Appendix B. Algebra of price indices
56
60
CBS | Discussion Paper 2018 | May 2018
3
Summary The concept of transitivity of price indices is explored, in particular in case of a dynamic population of goods. A transitive price index is (by definition) free of chain drift. Many classical (bilateral) price indices, such as those named after Laspeyres, Paasche, Fisher, Törnqvist, are not transitive, however. Other bilateral indices are transitive only if the population of goods is static, which is a rather academic situation. Intransitive price indices can be made transitive, which is called transitivization. Some of these transitivization methods are discussed in more detail in the paper: GEKS, MST and the cycle method. This later method can be shown to be a generalization of the GEKS method, and is of more recent origin. The cycle method is discussed in some detail and various applications of the method are presented. Contrary to GEKS, the cycle method has parameters to control the transitivization of an intransitive price index. The cycle method uses constrained weighted linear regression to compute the transitivized version of the input index. It also allows to quantify the extent to which an index is intransitive ('degree of intransitivity'). The cycle method is best applied to numerical examples. Symbolic expressions can be computed, but tend to be fairly complicated. In matrix form, however, these symbolic computations are straightforward and insightful. The same is true for GEKS, and probably for any other multilateral price index method. The cycle method can also by applied in an incremental approach, where every month the set of known indices is augmented with the newest results, which preserve the transitivity of the price index involved. Keywords Price index, consumer price index, price changes, transitivity, transitivizing, GEKS, cycle method, cycle matrix, constrained linear regression, RGLS, restricted generalized least squares, multilateral index method, dynamic population of goods, updating.
CBS | Discussion Paper 2018 | May 2018
4
1. Introduction In the view of the present author good price indices should be transitive. Many classical indices (like those named after Laspeyres, Paasche, Fisher and Törnqvist to mention but a few) do not have this property, however. But not all of these classical indices lack this property. Indices named after Jevons and Dutot, for instance, are transitive, but only for static populations of goods (PoG). But there are many more. Some of these are discussed in the present report.1 Intransitive price index methods can be replaced by transitive ones, by applying a statistical procedure to the intransitive price index. The well-known GEKS method is one such method. This method is easy to apply. However, it has a number of drawbacks as pointed out in Willenborg (2017d). The cycle method, proposed a few years ago by the author (cf. Willenborg, 2010), is another method to produce transitive indices from intransitive ones, and of more recent origin. The method was originally inspired by land surveying and only later applied to price indices. In Willenborg (2017d) it is shown, among other things, how the GEKS method can be generalized to the cycle method. For most persons involved in price indices, this may be the easier route to understanding the cycle method. So, although intransitive price indices are imperfect, they are therefore not necessarily useless. What saves them is that they can be used to produce transitive price indices, using a transitivizing method such as GEKS or the cycle method. But this transformation is at a price: a simple bilateral price index is transformed into a more complicated multilateral price index. It is the present author's opinion that this price is worth paying, as intransitive price indices may be prone to severe chain drift. The present paper has several goals. First of all, the concept of transitivity for price indices is explored. Counterexamples of intransitive price indices are shown, which are taken from a recent study focussed at comparing several price index methods for scanner data. In De Haan et al. (2016) various price index methods are presented and discussed. Chessa et al. (2017) contains the results of several of these methods applied to four sets of data, allowing direct comparison of the results. The second goal of the present paper is to focus on the cycle method as a general method to transitivize intransitive price indices. The aim is to present the method via the GEKS method, thereby heavily leaning on Willenborg (2017d). However, the aim is not to duplicate this paper, only to use its main results and findings. So the reader who wants to have a more complete picture is urged to read this paper as well. The most difficult part of application of the method is the calculation of the cycle matrix (which is explained in the text; see section 4.2). The user is also required to
1
The author is grateful to Sander Scholtus for reviewing this paper and for his suggestions to improve it. CBS | Discussion Paper 2018 | May 2018
5
specify a suitable weight matrix. It requires some experimentation to find a suitable choice for a weight matrix. The present paper provides some general ideas that can be used in the search of a suitable weight matrix. Once both matrices are known, the cycle method can be applied to transitivize an intransitive price index. The third goal of the present paper is to consider transitivity in dynamic populations of goods (DPoGs).2 Such populations different from the static population of goods (SPoG), typically assumed in traditional price index theory. SPoGs do not really exist in reality. In fact, DPoGs tend to become more and more dynamic in due course, meaning that items tend to be available in markets for ever shorter periods. It turns out that many classical price indices that are transitive in an SPoG loose this property in a DPoG. In fact they even become conceptually problematic at the GTIN level, which is the level items are specified and observed in shops, etc. A practical way out of this situation is by considering subgroups, or composites, of items. These typically include several GTINs, but are smaller than COICOPs and are more homogeneous in composition. In fact the composites are parts (= elements of a partition) of COICOP groups. COICOP is a classification to classify and analyse individual consumption expenditures by households, etc. ; it is developed by the United Nations Statistics Division. As to the notation in the present paper, we would like to remark that a solid black square (■) is used to denote the end of an example. The organization of the remainder of the paper is as follows. In Section 2 the concept of transitivity is explored, including transitive closure and transitive reduction. The most important result of this section is about the representation of transitive price indices as a ratio of price averages. This result is used repeatedly in the paper to produce examples of transitive price indices, in particular for DPoGs. It is also shown in Section 2 that a transitive index can be represented very compactly, and that transitive closure can be used to compute price indices for any pair of months from this compact form. Examples based on real data are shown to indicate how results for intransitive price indices may diverge. This indicates that intransitive price indices should be avoided. Many well-known classical price indices are intransitive, like Laspeyres, Paasche, Fisher and Törnqvist indices (among others). An intransitive price index is not necessarily useless, but should not be used without adjustment. In fact, such an index can be used as input for a procedure that is aimed at modifying the original index so as to become transitive. This procedure - transitivization - is discussed in the next two sections. In fact, this may be an attractive way to produce transitive price indices. In Section 3 some older methods are discussed for this transitivization: the GEKS and the MST method. It is argued that both methods are not quite satisfactory, for various reasons. MST simply solves a problem by avoiding it. GEKS has the problem that
2
Willenborg (2017c) considers such populations and proposes some simple measures to capture their dynamic character. Some examples are provided to illustrate the differences in dynamics. CBS | Discussion Paper 2018 | May 2018
6
price indices have the same weight in the adjustment method, irrespective of the distance between base and reporting month. In Section 4 the cycle method is discussed, which can be viewed as a generalization of the cycle method. It allows to weigh different indices differently, thus avoiding a shortcoming of the GEKS method. Special attention is devoted to understanding two matrices that play a key role in the cycle method: the cycle matrix and the weight matrix. It is also indicated that the cycle method can be used, not only to adjust intransitive price indices, but also to quantify the degree of intransitivity of the original price index. In Section 5 a variant of the Time Product Dummy (TPD) model is considered. The solution presented here differs from the usual one, in that it gets rid of the nuisance parameters in the model and the indices for each month are estimated directly. Section 6 considers several classical price indices that are transitive for SPoGs. However, as it turns out, transitivity may break down for these indices when applied to DPoGs. This is an important defect, as DPoGs are the rule rather than the exception in practice. Standard price index theory is based on SPoGs, and therefore is not a very helpful source. For DPoGs new ideas and different kinds of indices are needed. One problem one faces is that building price indices at the levels of GTINs may be problematic, as GTINs have finite lifetimes. However, one can consider subgroups of GTINs within a group (such as a COICOP) of items. This idea is explored to some extent, in particular in view of the aim to produce transitive price indices as well. Section 7 introduces a new index, the 𝜅-index, that is defined for composites and is quite easy to apply in practice. This index can be used for a DPoGs. It is intended to be applied to scanner data, as it assumes the availability of turnover and quantity information. Section 8 presents an index method that generalizes one used earlier by the present author for internet data, based on price observations only. The generalization is intended to be applied to scanner data, which allows to weigh items according to their turnover. It is an example of a method in which first rough (bilateral) indices are computed, that are then made transitive by a mechanical procedure, say the cycle method. In Section 9 updating methods are considered, to incrementally compute transitive price indices. These methods differ in whether or not previously computed indices are allowed to change, as well as in terms of the length of the time window that is used to make adjustments. This section completes the core of the paper. Section 10 contains a discussion of the results obtained and identifies some topics that have been touched upon in the present paper and that, in the opinion of the author, merit further investigation.
CBS | Discussion Paper 2018 | May 2018
7
A list of references concludes the main part of the text. It should be mentioned that the papers of the present author mentioned here have been uploaded to ResearchGate where they can be accessed (viewed and downloaded) for free by anyone interested. Two appendices complement the paper. Appendix A contains a vocabulary for the present paper. The reader is referred to this appendix in case he is not clear about some notion mentioned in the text. Appendix B provides an algebraic view on price indices, in particular transitive ones.
2. Transitivity 2.1
Definition of transitive price indices
Let a PIDG be given, with price indices associated with its arcs. The price index for this PIDG is transitive if 𝜋 𝑖𝑘 = 𝜋 𝑖𝑗 𝜋 𝑗𝑘 ,
(2.1)
for vertices 𝑖, 𝑗 and 𝑘 and arcs (𝑖, 𝑗), (𝑗, 𝑘) and (𝑖, 𝑘) arcs in this PIDG, with associated price indices 𝜋 𝑖𝑗 , 𝜋 𝑗𝑘 , 𝜋 𝑖𝑘 , respectively. So (2.1) states that comparing state 𝑖 and 𝑘 directly yields the same price index as comparing first states 𝑖 and 𝑗, and the states 𝑗 and 𝑘, and multiplying the respective price indices. Assuming that time reversibility holds, i.e. 𝜋 𝑗𝑖 = 1⁄𝜋𝑖𝑗 , (2.1) can also be written as 𝜋 𝑖𝑗 𝜋 𝑗𝑘 𝜋 𝑘𝑖 = 1,
(2.2)
which states that the product of the price indices on a cycle of three arcs is always 1. If this results holds for any triple of vertices for which the PIDG has arcs, then it is easy to see that it holds for any cycle, as this can be divided into a number of such triangular cycles. The definition of a transitive price index involves only three states and their comparisons (arcs). But if there are more states involved we can decompose the cycles into smaller ones which are triangles. Consider the situation in Figure 2.2.1. The PIDG on the left involves 4 states and 4 comparisons. The one on the right consists of two triangles. If the price index is transitive we have for these triangles: 𝜋 𝑣 𝜋 𝑤 𝜋 𝑧 = 1 and 𝜋 𝑥 𝜋 𝑦 𝜋 −𝑧 = 1, where 𝜋 −𝑧 = 1⁄𝜋 𝑧 , implying 𝜋 𝑣 𝜋 𝑤 𝜋 𝑥 𝜋 𝑦 = 1. So for a transitive index the product of the price indices associated with the arcs of a 4-cycle equals 1. It is clear that the result is independent of the decomposition of the 4-cycle into two 3-cycles. Proceeding inductively, one can prove similar results for 𝑚-cycles, with 𝑚 > 4. CBS | Discussion Paper 2018 | May 2018
8
2.1.1 A 4-cycle decomposed as two 3-cycles.
The results for cycles can be translated into results for price indices between two points in a PIDG. Transitivity implies that the price index between any two points in the PIDG can be defined in a way that is independent of the actual choice of the path.3 For take two paths from a point 𝑖 in the PIDG to a point 𝑗. Reverse one of these paths and compose them into a single path. Which is in fact a cycle starting and ending at node 𝑖. When the corresponding product of the price indices is computed it is equal to 1. From this it follows that the values associated with each of these paths are equal. As a final remark of this section we would like to point out that one can study the algebraic properties of price index matrices (PIMs), in particular if they are transitive. See Appendix B for details.
2.2
Representation of transitive price indices
For any price index we assume to hold: 𝜋 𝑖𝑖 = 1,
(2.3)
for any vertex 𝑖, and 𝜋 𝑖𝑗 𝜋 𝑗𝑖 = 1,
(2.4)
for any pair of vertices 𝑖, 𝑗 for which (𝑖, 𝑗) or (𝑗, 𝑖) is an arc in the corresponding PIDG. The price indices are associated with the arcs of a PIDG. However, due to the path independence of a transitive index we can associate values with the points in the corresponding PIDG that can be used to derive the index. Suppose we have a PIDG for which the underlying PIG is connected. Take the point 1 in this PIDG, that we will use as a reference point.4 Let j be an arbitrary point in the PIDG. There is a path from 1 to j in the PIG: (𝑖1 , … , 𝑖𝑘 ) with 𝑖1 = 1 and 𝑖𝑘 = 𝑗 and
3 4
Barring possible numerical errors due to rounding. But these can be controlled. The choice is convenient but otherwise unimportant. CBS | Discussion Paper 2018 | May 2018
9
(𝑖𝑎 , 𝑖𝑎+1 ) is in the PIG for 𝑎 = 1, … , 𝑘 − 1. The aim is to associate values 𝜋 𝑗 with the points 𝑗 in the PIDG. We then have for any path ((𝑖1 , 𝑖2 ), … , (𝑖𝑘−1 , 𝑖𝑘 )) in the PIDG connecting point 𝑖1 = 1 and point 𝑖𝑘 = 𝑗: 𝜋 𝑗 ≜ 𝜋 𝑖1 𝜋 𝑖1𝑖2 … 𝜋 𝑖𝑘−1𝑖𝑘 ,
(2.5)
where the 𝜋 𝑖,𝑗 's are price indices associated with the various arcs on the PIDG. Now, because of the transitivity of the price index, any other path connecting 1 with 𝑗, would yield the same value. This follows from the fact that if we go round a cycle, that is a path with the same starting point and end point, the values associated with the arcs in the path are 1, i.e. in case of (2.5) we have that 𝜋 𝑖1 𝑖2 … 𝜋 𝑖𝑘−1𝑖𝑘 = 1. So the values 𝜋 𝑗 's are well-defined, given the value at a reference point 1, for which we assume 𝜋1 = 1. Any point in the PIDG can serve as a reference point. Although not explicitly indicated the values of the 𝜋 𝑗 's depend on the choice of the reference point. It is easy to express these values with respect to a new reference point if they have been given relative to another reference point. 5 Now, consider two points 𝑖 and 𝑗, with associated values 𝜋 𝑖 and 𝜋 𝑗 , with respect to reference point 1. Suppose that (𝑖, 𝑗) is an arc in the PIDG. Then we have as a result of the transitivity of the price index that holds 𝜋 𝑖 𝜋 𝑖𝑗 = 𝜋 𝑗 ,
(2.6)
from which it follows that6 𝜋 𝑖𝑗 =
𝜋𝑗 𝜋𝑖
.
(2.7)
So this results says that for a transitive price index values can be associated with the points of a PIDG so that the value 𝜋 𝑖𝑗 associated with an arc (𝑖, 𝑗) of this PIDG can be written as the ratio of the values associated with the end points of this arc 𝑖 and 𝑗 as expressed in (2.7), where the value associated with the reference month is in the denominator and that of the reporting month in the numerator. Note that the value 𝜋 1 of the reference point 1 is eliminated from (2.7), as it appears in numerator and denominator. Evidently, if a price index has the form (2.7) then it is transitive. Representation (2.7) for a transitive price index is a key result, that we shall use repeatedly in the sequel. It shows that any transitive price can be represented as the ratio of two quantities associated with the nodes of the PIDG. The 𝜋 𝑖 's can be viewed as average prices (over the period associated with the node)7, associated with the
5
Compare the situation with a set of height differences and heights with respect to some reference point which is termed 'ground level'. 6 In case (𝑖, 𝑗) is not an arc in the PIDG, we can use (2.7) to define 𝜋 𝑖𝑗 . 7 In our applications in the present paper. But the node may also represent other things, such as countries or other areas, or area-period combinations (countries in a particular period). CBS | Discussion Paper 2018 | May 2018
10
vertices 𝑖 of the PIDG. So the form of the price index in (2.7) actually characterizes transitive price indices. Representation (2.7) can be viewed as an extension of the price index as a price ratio for a single good to the case of a subgroup of goods within a group of goods (COICOP). The 𝜋 𝑖 's are average prices of subgroups of goods, not necessarily of a single item. We will explore this idea in Section 6.5. It will come with a change in perception of what a good actually is, or rather, how it can be defined conveniently. It will make things a lot easier, in particular when using an incremental approach (see Section 9).
2.3
Examples of intransitive price indices
We illustrate the divergence of the results of a nontransitive price index on the basis of several classical and intransitive price indices (Laspeyres, Paasche, Fisher and Törnqvist), comparing the direct price index with its month-on-month (MoM) counterpart (in Figures 2.3.1, 2.3.2, 2.3.3 and 2.3.4). The results are based on scanner data from a Dutch retailer. These data cover a period of 50 months. They have been used in a study carried out at Statistics Netherlands in 2016 that compared various index methods by applying them all to four sets of goods (office supplies, bed clothing, pastries and men's T-shirts) at two levels of aggregation (EAN level and an aggregate level). See De Haan et al. (2016) which describes the index methods and Chessa et al. (2017) which presents some key findings concerning application of these methods to the data just described. In Figures 2.3.1, 2.3.2, 2.3.3 and 2.3.4 the scales on the y-axis is not the same in all cases. But for each good at the same level (GTIN or group level) they are the same (with one exception) so that direct comparison is possible. Also, it should be noted that the 'degrees of intransitivity' are different for the various indices and levels of aggregation. When we take a qualitative look at the graphs in Figures 2.3.1, 2.3.2, 2.3.3 and 2.3.4 a few thing can be noticed: – The Fisher and Törnqvist indices tend to give similar results for the fixed base and chained indices for office supplies and bed clothing, both at the GTIN and group level, and for pastries at the group level. However, for T-shirts the fixed base and chained indices also differ considerably. – The Laspeyres and Paasche indices diverge for office supplies and bed clothing. – For T-shirts the Paasche, Fisher and Törnqvist index show a significant discrepancy between the results based on a fixed base version and the chained version – In some cases the chained version yields higher results than the fixed base variant (Laspeyres for office supplies, bed clothing both at the GTIN and group level; also for T-shirts at the GTIN level). In other cases the reverse effect is visible (Paasche for office supplies and bed clothing; all four indices for pastries at the GTIN level; for T-shirts Paasche, Fisher and Törnqvist for both the GTIN and group level). In some cases the results are comparable: Fisher and Törnqvist for office supplies and bed clothing; Fisher and Törnqvist for pastries at the group level.
CBS | Discussion Paper 2018 | May 2018
11
– The discrepancies between the fixed base price indices and the chained price indices tend, in general, to be smaller at the group level. Apparently the grouping of goods has a dampening effect on values of the price indices.
2.3.1 Price indices for office supplies at the GTIN and the group level. Office supplies (GTIN)
Office supplies (GRO)
CBS | Discussion Paper 2018 | May 2018
12
2.3.2 Price indices for bed clothing at the GTIN and the group level. Bed clothing (GTIN)
Bed clothing (GRO)
CBS | Discussion Paper 2018 | May 2018
13
2.3.3 Price indices for pastries at the GTIN and the group level. Pastries (GTIN)
Pastries (GRO)
CBS | Discussion Paper 2018 | May 2018
14
2.3.4 Price indices for men’s T-shirts at the GTIN and the group level. T-shirts (GTIN)
T-shirts (GRO)
CBS | Discussion Paper 2018 | May 2018
15
Looking at results in the Figures 2.3.1 to 2.3.4 we see quite different behaviours of the price indices, even qualitatively: one type of index is ascending, while the other is descending (cf. the Paasche price index for bed clothing at the GTIN level). For 'pastries' the results between the fixed base and the chained indices show the least discrepancies when comparing the same indices at different levels of aggregation. For three of four articles the Fisher and Törnqvist indices give similar results for the same indices at different levels of aggregation. Only for T-shirts these indices show big discrepancies. So what to conclude from these results? Certainly that nontransitive price indices can give widely different results and therefore are not very reliable measures for price developments.
2.4
Transitive closure
Transitive closure is the process to calculate the price indices that are missing from an incomplete PIDG of a transitive price index. Given a PIM for such a PIDG leads to a transitively closed PIM, using matrix computations. Let 𝑃 be the PIM of a transitive price index, which corresponds to a PIM that is not necessarily complete. Let 𝑃𝑆𝑇 be the PIM of a spanning tree of the PIDG correspon∗ ding to 𝑃. The transitive closure 𝑃𝑆𝑇 of 𝑃𝑆𝑇 can be calculated by matrix computations applied to Boolean matrices, that is matrices with 0's and 1's only. Example The spanning tree is as in Figure 2.4.1. This corresponds to a MoM chaining, where indices have been calculated for pairs of adjacent months. This corresponds to a PIDG in the form of a linear tree. The corresponding PIM is as in Table 2.4.2. The price indices 𝑝𝑖𝑗 can be considered ‘measured’ or 'observed' (although some computations are involved). By assumption 𝑝𝑖𝑖 = 1, for 𝑖 = 1, … ,7. These are not indicated in Figure 2.4.1. The price indices associated with the reverse arrows are assumed to be the reciprocals of the ones shown, that is, 𝑝 𝑗𝑖 = 1⁄𝑝𝑖𝑗 .
2.4.1 Linear spanning tree.
CBS | Discussion Paper 2018 | May 2018
16
2.4.2 PIM corresponding to the PIDG in Figure 2.4.1. 1 2 3 4 5 6 7
1 1 -
2 𝑝12 1 -
3 𝑝23 1 -
4 𝑝34 1 -
5 𝑝45 1 -
6 𝑝56 1 -
7 𝑝67 1
Using the ‘measured’ indices and assuming transitivity, one can calculate the remaining indices, in terms of the ‘measured’ ones. For instance 𝑝25 = 𝑝23 𝑝34 𝑝45 . Using transitive closure we can compute all remaining indices in the upper triangle of Table 2.4.2. The ones in the lower triangle can be calculated from these by taking reciprocal values. The results are presented in Table 2.4.3. It shows symbolic expressions for the derived price indices. Note that in this case the derived price indices are only products of the initial indices. ■ In the next two examples we have some periods that act as a fixed base for a short time. In the first example this period is only 2 months, in the second example it is three months long. In practice one often encounters a similar situation when a Laspeyres index, for instance, is used at the beginning of a year. When a new year starts, a new such period begins. This ‘shifting of the base’ is for a good reason: the PoG may be (very) dynamic. When the two months are too far apart the states of the PoG may be too different. There even may be no items present in both months.
CBS | Discussion Paper 2018 | May 2018
17
2.4.3 The completion of the upper triangle of Table 2.4.2. 1 2 3 4 5 6 7
1 1 -
2 𝑝12 1 -
3 𝑝 𝑝 𝑝23 1 -
12 23
4 𝑝 𝑝 𝑝 𝑝23 𝑝34 𝑝34 1 -
12 23 34
5 𝑝 𝑝 𝑝 𝑝 𝑝23 𝑝34 𝑝45 𝑝34 𝑝45 𝑝45 1 -
12 23 34 45
6 𝑝 𝑝 𝑝 𝑝 𝑝 𝑝23 𝑝34 𝑝45 𝑝56 𝑝34 𝑝45 𝑝56 𝑝45 𝑝56 𝑝56 1 -
12 23 34 45 56
7 𝑝 𝑝 𝑝 𝑝 𝑝 𝑝 𝑝23 𝑝34 𝑝45 𝑝56 𝑝67 𝑝34 𝑝45 𝑝56 𝑝67 𝑝45 𝑝56 𝑝67 𝑝56 𝑝67 𝑝67 1
12 23 34 45 56 67
CBS | Discussion Paper 2018 | May 2018 18
Example In this example we consider a different situation. Instead of MoM comparisons, we use some months to make comparisons for 1 and 2 months ahead. The PIDG that corresponds to this situation is presented in Figure 2.4.4.
2.4.4 Spanning tree: max 2 steps ahead.
The PIM that corresponds to Figure 2.4.4 is presented in Table 2.4.5. The indices 𝑝𝑖𝑗 presented there are the ‘measured’ ones. Again we assume that 𝑝𝑖𝑖 = 1, for 𝑖 = 1, … ,7. We also assume that 𝑝 𝑗𝑖 = 1⁄𝑝𝑖𝑗 , for 𝑖, 𝑗 = 1, … ,7.
2.4.5 The PIM corresponding to the digraph in Figure 2.4.4. 1 2 3 4 5 6 7
1 1 -
2 𝑝12 1 -
3 𝑝13 1 -
4 𝑝34 1 -
5 𝑝45 1 -
6 𝑝56 1 -
7 𝑝57 1
We can use transitive closure again to compute the indices in the upper triangle of Table 2.4.6. The result is Table 2.4.7. For instance, we find the expression for 𝑝25 .
2.4.6 The PIM of Table 2.4.5 augmented with reciprocal price indices. 1 2 3
1 1 1 𝑝12 1 𝑝13
2 𝑝12 1
3 𝑝13 -
4 -
5 -
6 -
7 -
-
1
𝑝34
𝑝45
-
-
1
1
-
-
-
-
1
𝑝56
𝑝57
1
1
-
-
1
4
-
-
5
-
-
6
-
-
-
-
7
-
-
-
-
𝑝34 1 𝑝45
𝑝56 1 𝑝57
The (unique) path in the spanning tree in Figure 2.4.4 is as follows: 2 → 1 → 3 → 5, which yields the arcs (2,1), (1,3), (3,5), which corresponds to the price indices 𝑝21 = 1⁄𝑝12 , 𝑝13 , 𝑝35 . As a result we have 𝑝25 = 𝑝13 𝑝35 ⁄𝑝12 . Other indices are obtained in a similar way. Completion of the entire Table 2.4.7 is obtained by taking reciprocal values using 𝑝 𝑗𝑖 = 1⁄𝑝𝑖𝑗 . CBS | Discussion Paper 2017 | May 2018 19
2.4.7 The completion of the upper triangle of Table 2.4.6. 1 2
1 1 -
2 𝑝12 1 -
3 𝑝13 𝑝13 𝑝12 1 -
4 𝑝13 𝑝34 𝑝13 𝑝34 𝑝12 𝑝34 1
3 4
-
5 6 7
-
5 𝑝13 𝑝35 𝑝13 𝑝35 𝑝12 𝑝45 𝑝35 𝑝34 1 -
6 𝑝13 𝑝35 𝑝56 𝑝13 𝑝35 𝑝56 𝑝12 𝑝35 𝑝56 𝑝35 𝑝56 𝑝34 𝑝56 1
-
-
-
-
-
-
-
-
-
7 𝑝13 𝑝35 𝑝57 𝑝13 𝑝35 𝑝57 𝑝12 𝑝35 𝑝57 𝑝35 𝑝57 𝑝34 𝑝57 𝑝57 𝑝56 1
Note that the computation of the indices can be dome a bit more efficiently by using results of earlier computations. But for the typical size of problems encountered in practical situations this advantage is only minimal. ∎ Example In this example the idea of the situation in the previous example 2 is carried one step further. Instead of using a month as a base month for two pairs of months, it is a basis for three pairs of months. The corresponding PIDG is shown in Figure 2.4.8, and the corresponding PIM in Table 2.4.9.
2.4.8 Spanning tree: max 3 steps ahead.
The PIM that corresponds to Figure 2.4.8 can be found in Table 2.4.9.
2.4.9 PIM corresponding to digraph in Figure 2.4.8. 1 2 3 4 5 6 7
1 1 -
2 𝑝12 1 -
3 𝑝13 1 -
4 𝑝14 1 -
5 𝑝45 1 -
6 𝑝46 1 -
7 𝑝47 1
CBS | Discussion Paper 2018 | May 2018
20
2.4.10 1 2 3 4
1 1 1 𝑝12 1
PIM of Table 2.4.9 augmented with reciprocal price indices. 2 𝑝12 1
3 𝑝13 -
4 𝑝14 -
5 -
6 -
7 -
-
1
-
-
-
-
1
45
46
𝑝13 1 𝑝14
-
-
5
-
-
-
6
-
-
-
7
-
-
-
1 𝑝45 1 𝑝46 1 𝑝47
𝑝
𝑝
𝑝47
1
-
-
-
1
-
-
-
1
The completion of the upper triangle of Table 2.4.10 is carried out in a similar way as in the previous two examples, namely by applying transitive closure. But, as in the previous example, one needs results from the lower triangle. Table 2.4.11 shows the transitive closure of the digraph represented by Table 2.4.10.
2.4.11
The completion of the upper triangle of Table 2.4.10.
1 2
1 1 -
2 𝑝12 1
3
-
-
3 𝑝13 𝑝13 𝑝12 1 -
4 𝑝14 𝑝14 𝑝12 𝑝14 𝑝13 1 -
5 𝑝14 𝑝45 𝑝14 𝑝45 𝑝12 14 45 𝑝 𝑝 𝑝13 𝑝45 1
-
-
-
-
6 𝑝14 𝑝46 𝑝14 𝑝46 𝑝12 14 46 𝑝 𝑝 𝑝13 𝑝46 𝑝46 𝑝45 1
4 5
-
-
6
-
7
-
-
-
-
-
-
7 𝑝14 𝑝47 𝑝14 𝑝47 𝑝12 14 47 𝑝 𝑝 𝑝13 𝑝47 𝑝47 𝑝45 𝑝47 𝑝46 1
So this example shows again that the information stored in a PIDG that is a spanning tree is sufficient to produce the indices for all pairs of months in the period. It is also the minimum information that is needed. ∎
2.5
Transitive reduction
Transitive reduction is, in a sense, the opposite of transitive closure: instead of expanding a PIDG by deriving price indices for new pairs of states (as in transitive closure), a PIDG is reduced to a smaller variant by removing those price indices that can be calculated using transitive closure to a sub-PIDG. This reduction process stops at a certain point: With the PIDG one has then, transitive closure is able to generate the complete PIDG. Transitive reduction for a PIDG corresponding to a transitive price index can be obtained via spanning trees. In practice, two kinds of spanning trees are used a lot: one corresponding to a price index with a fixed base, and the other based on MoM chainCBS | Discussion Paper 2018 | May 2018
21
ing. However, often the price indices used are not transitive (say a Laspeyres price index), so that another choice of spanning tree for the same price index would yields different results. But this is not the case when the index used is transitive. The advantage of a transitive price index is that it can be represented very compactly. A PIDG for a transitive price index can be reduced to the bare minimum. If needed, the price index values are obtained by expanding the spanning tree used to a full PIDG. If there are n states, the full PIDG has of the order 𝑛2 arcs, whereas a spanning tree has of the order 𝑛 arcs. In Section 2.4 several examples are given to illustrate the difference in size.
2.6
Aggregation and transitivity
Typically transitivity of price indices is discussed in a temporal, spatio-temporal or spatial setting with a fixed level of aggregation of the goods. However, aggregating price indices over several levels also involves a concept of transitivity. It is about aggregating indices, and whether the result depends on intermediate aggregates or not. If not, the index is said to be consistent in aggregation. Example To illustrate the idea, suppose there are three levels of aggregation for a class of goods (cf. Figure 2.6.1). Level 1 is the most detailed level, level 2 an intermediate level and level 3 the top level (least detailed). Now consistency in aggregation requires that aggregating the price indices at level 1 directly to level 3, should yield the same result as aggregating the index first to level 2 and from this level to level 3.
2.6.1 Three levels of aggregation.
It should be remarked that Figure 2.6.1 (and similar ones for other examples of 'levels of aggregation') indicate how partitions of a set (in this case of goods) are ordered. An arc from 𝑎 to 𝑏 indicates that the partition associated with 𝑎 is a refinement of that associated with 𝑏. ■ Of course, in practice one can have more complicated situations, with more levels and, with local aggregations (some strata aggregated, others not), but this simple example should give the idea. No matter how you aggregate, using various intermediate levels, the final index for the entire group of data is the same. In other words, irrespective of the path chosen in the aggregation diagram (indicating a partial order of partitions of a group of goods), the same index will result for the top level.8
8
The conclusion from this section is that the transitivity concept can also be applied to aggregation. However, this should not tempt the reader to the conclusion that the methods discussed in this paper CBS | Discussion Paper 2018 | May 2018
22
2.7
Combinations of transitive price indices
The geometric mean of two transitive price indices is again a geometric transitive 𝑖𝑗 𝑖𝑗 price index. Let 𝜋1 and 𝜋2 be two transitive price indices, where 𝑖, 𝑗 are months. Let 𝑖𝑗
𝑖𝑗
𝜋 𝑖𝑗 = √𝜋1 𝜋2 . Then for months 𝑖, 𝑗, 𝑘: 𝜋 𝑖𝑗
𝑖𝑗
𝑖𝑗
𝑘𝑗
𝑘𝑗
𝑘𝑗
𝑘𝑗
= √𝜋1 𝜋2 = √𝜋1𝑖𝑘 𝜋1 𝜋2𝑖𝑘 𝜋2 = √𝜋1𝑖𝑘 𝜋2𝑖𝑘 𝜋2 𝜋2 𝑘𝑗
(2.8)
𝑘𝑗
= √𝜋1𝑖𝑘 𝜋2𝑖𝑘 √𝜋1 𝜋2 = 𝜋 𝑖𝑘 𝜋 𝑘𝑗 , assuming that 𝜋1 , 𝜋2 are known for the pairs {𝑖, 𝑗}, {𝑖, 𝑘}, {𝑗, 𝑘}. Also: 𝑗𝑖
𝑗𝑖
𝜋 𝑗𝑖 = √𝜋1 𝜋2 = √
1
1
𝑖𝑗
𝑖𝑗
𝜋1 𝜋2
=
1 √𝜋𝑖𝑗 𝜋𝑖𝑗 1 2
=
1 𝜋𝑖𝑗
.
(2.9)
Also for each month 𝑖 in the reference period 𝜋 𝑖𝑖 = 1.
(2.10)
So from (2.8), (2.9) and (2.10) we conclude that 𝜋 𝑖𝑗 is a transitive price index. For more on this see Appendix B. It should be noted that for other averaging operations, such as arithmetic or harmonic means, the resulting index is not transitive, in particular does not satisfy (2.8) (that is, in general, 𝜋 𝑖𝑗 ≠ 𝜋 𝑖𝑘 𝜋 𝑘𝑗 ) but does satisfy (2.9) (that is, 𝜋 𝑗𝑖 = 1⁄𝜋 𝑖𝑗 ) and (2.10).
3. Transitivization: MST and GEKS methods In Section 3 transitive price indices are discussed. But not all price indices fall into this category; in fact many price indices are intransitive, also well-known ones, even for SPoGs. They are not necessarily useless for that reason. They can be made transitive. In the present section as well as in the next one we delve more deeply into the problem of making an intransitive index transitive, or transitivization.
that can be applied in case this aggregation of indices is not transitive. This may or may not be the case. I simply did not investigate this matter. CBS | Discussion Paper 2018 | May 2018
23
The present section discusses two methods that were known for some time to produce transitive price indices from intransitive ones, namely the GEKS method and the MST method. In Section 5 another such method of more recent origin is discussed, namely the cycle method. The GEKS9 method, and the MST10 method are designed for this purpose (among others, but we will not discuss them here). Both methods are described in Balk (2008) that also has pointers to the literature, including the original papers.
3.1
MST method
The MST method was originally proposed in the context of international comparisons by Hill (see Balk, 2008, Section 7.6 ). To illustrate the idea, suppose that the average price level of every country is compared with that of any of the remaining countries. 𝑛 So, if there are 𝑛 such countries, there are ( ) = 𝑛(𝑛 − 1)/2 such comparisons to 2 make. This may be quite a big number. So it is an attractive idea to try to reduce this number by choosing only a (small) subset of possible comparisons. If the countries are different in terms of the structure and levels of development of their respective economies, it is obvious to directly compare countries that have similar economies. Other countries can then compared only indirectly, through a string of direct comparisons. In order to do this for each pair of countries, one needs to have the right direct comparisons. A problem that might arise is with a pair of countries for which there are several paths (consisting of direct comparisons) that connect them. A way out of this is to choose only one of these connecting paths, and consider that the preferred way to compare these countries. This construction is possible if the PIDG corresponding to the direct comparisons forms at least a spanning tree, that is, a spanning tree plus, possibly, additional arcs. If it forms a spanning tree, for each pair of countries there is exactly one path connecting them. This is an attractive feature as inconsistencies can be avoided because different paths can be chosen to 'connect' two countries. One can even try to find an MST, in the sense that for direct comparison pairs of countries are chosen that are as similar as possible, assuming some kind of measure to quantify this. 11 In this way the choice of a suitable spanning tree in fact leads to an MST problem. Hence the name of the method.
9
Named after Gini, Eltetö, Köves and Szulc. Minimum Spanning Tree. 11 Hill considers the Paasche-Laspeyres spread (see Balk, 2008, Section 7.6) to measure comparability of countries. The smaller this weight the more two countries are considered comparable. But weights could be chosen in alternative ways as well. Given a set of weights for the edges (or rather: arcs) one can look for (one of ) those trees that have a minimum total weight, that is, a minimum spanning tree (for the given set of weights). 10
CBS | Discussion Paper 2018 | May 2018
24
In order to apply this method, one needs to find an MST in a given PIDG which is supposed to contain enough arcs so that every vertex is incident to at least one arc. The user has to provide a weight matrix, containing the weights on the arcs of the PIDG. The MST method can readily be applied in a temporal setting. Now the time periods considered (say months) play the same roles as countries in the original setting. In the new setting, we are considering the price development of a number of similar items (say, jeans) in a certain time window. It is most likely that adjacent months are most similar. So the MST would then correspond to a MoM method. As is well-known, for certain price indices this method may lead to (serious) chain drift. This is not quite satisfactory, as suboptimal use is made when other comparisons are used as well. However, this is what the cycle method does.
3.2
GEKS method
The GEKS method is known for some time (cf. Balk, 2008, Section 7.3.1) and can be used to produce transitive price indices from intransitive ones. The method is quite easy to apply, even in a spreadsheet. Willenborg (2017d) contains a discussion of the original GEKS method, as well as various variants. In particular this paper shows how the method can be generalized to the cycle method. For price index specialists the approach to the cycle method starting from the GEKS method should be easy to follow. It is probably an easier route to the cycle method than the original paper Willenborg (2010), which in turn is an application of the ideas in Willenborg (1993) 12 to price index numbers. The GEKS method is best explained by considering a small example. From this it is easy to generalize to more general situations. This example is taken from Willenborg (2017d). Example Consider the 6 × 6 PIM matrix implicit in Table 3.2.1. It is not transitive, as e.g. 𝑃24 = 1.35 ≠ 𝑃23 𝑃34 = 1.4 × 1.05 = 1.47. The numbers in the blue fields are the geometric averages of the corresponding columns. The numbers the green fields are the geometric averages of the corresponding rows. Note that a column (geometric) average and its corresponding row (geometric) average are reciprocal values. This holds in general. It will be used here to calculate the adjusted price index.
3.2.1 The input PIM matrix.
12
When this paper was written the author was not familiar with price index theory. The methods discussed in Willenborg (1993) were inspired by land surveying. But the methods in that document are of wider applicability, beyond land surveying and index number theory. CBS | Discussion Paper 2018 | May 2018
25
Table 3.2.2 shows the resulting table, with the adjusted PIM matrix included. It should be noted that 'taking a geometric average' in Excel is a standard function.
3.2.2 The resulting GEKS PIM matrix.
The formulas used to calculate the elements of the adjusted PIM from the geometric column averages in Table 3.2.1 are shown in Table 3.2.2. The row averages are in the green column, and the column averages are in the blue row. ■ The example illustrates the GEKS method. Extension to periods of other lengths are straightforward. Important for the input data is that the following holds: – No index is missing, that is, for all 𝑖, 𝑗 ∈ {1, … , 𝑛} there is a𝑃𝑖𝑗 . – 𝑃𝑖𝑖 = 1 for all 𝑖 ∈ {1, … , 𝑛}. – 𝑃𝑖𝑗 𝑃𝑗𝑖 = 1, for all 𝑖, 𝑗 ∈ {1, … , 𝑛}. Here we have assumed that the period of interest is {1, … , 𝑛}.The method assumes that the input price index is intransitive. The resulting GEKS index is transitive. Of course, it depends on the choice of the period {1, … , 𝑛}. The weights that have been used to average the input price index values have been taken equal to 1.
CBS | Discussion Paper 2018 | May 2018
26
3.2.3 Formulas to compute the elements of the adjusted PIM from the geometric averages.
CBS | Discussion Paper 2017 | May 2018 27
4. Transitivization: cycle method 4.1
Introductory remarks
As we saw, the MST method selects a single spanning tree from the input data, and uses this to calculate the other price indices by transitive closure. In the temporal case this method is likely to lead to a MoM price index, that may be quite different from a similar index with a fixed base, indicating that there is a lot of drift. By selecting the spanning tree so early on, valuable information about the index is not used. The cycle method is different. It uses all the known information. It finds a transitive price index that is close to the original index, where the 'closeness' can be controlled by a weight matrix. Compared to the GEKS method, the cycle method has several advantages. First of all, it allows one to use different weights for the various arcs in the PIDG, whereas GEKS assumes equal weights. It, however, does not seem to be desirable to give the same weight to an index that compares two adjacent months and one that compares two distant months. These months may even be so distant that there are no items in common. In that case the GEKS method does not work, 13 as it assumes that the price indices are available for all pairs of months in the time window one is considering. The cycle method, however, does not require this. It works also when some of the indices are missing, either because they could not be computed due to lack of overlapping items, or because one has decided that such a comparison is of little value and should not be made. Finally, the cycle method allows one to quantify the discrepancy between an intransitive index and its adjustment. So it makes sense to talk about the 'degree of intransitivity' of a price index. The idea of the cycle method is to take as input a PIDG that is not transitive, and possibly incomplete. The cycle method computes from this PIDG another one that is as close as possible to the original PIDG and transitive. One can use this corrected PIDG to complete it by using transitive closure. Or, more succinctly, use only the spanning tree of the corrected PIDG for this. Transitive closure can then be used to this compact PIDG and produce the same results. This is very much as in case of the MST method. Whereas this method discards information, the cycle method uses all-information and produces a new price index that is transitive.
13
The original GEKS method, that is. However, a variant of the original method may work. But we only consider the original GEKS method here. CBS | Discussion Paper |May 2018 28
4.2
From GEKS to cycle method
In Willenborg (2017d, Section 5) it is shown how the GEKS method can be generalized to an optimization procedure which is in fact the cycle method. For most price index specialists this approach to the cycle method should be easier to comprehend than the original one inspired by ideas from land surveying used in the original papers on (what was later coined) the cycle method (see Willenborg, 1993, 2010).
4.3
The cycle method explained
The discussion in the present section is based on Willenborg (2010), which in turn is based on Willenborg (1993). Suppose that a PIDG 𝐺 = (𝑉, 𝐸) is given, with 𝑛 points in 𝑉 and 𝑚 arcs in 𝐸. In particular we assume that the PIDG has been pruned of linear subgraphs. These subgraphs are of no interest here, because they do not contain cycles. Let 𝑥 denote the vector of observations, which in our case corresponds to the natural logarithm of the computed elementary price index values associated with the arcs of 𝐺. We shall call these values elementary log price indices. Let 𝑊 be a non-singular, diagonal, nonnegative, 𝑚 × 𝑚 weight matrix, associated with each of the arcs of 𝐺. 𝑊controls which values can be perturbed more, and which should be perturbed less. The higher the weights the more an original price index may be adjusted. Let 𝐶 be a cycle matrix associated with 𝐺, of order (𝑚 − 𝑛 + 1) × 𝑚. 𝐶 is a (-1, 0, 1)-matrix. If 𝐶 = (𝑐𝑖𝑗 ), where 𝑖 indicates the elementary cycle and 𝑗 the arc 𝑗 = (𝑣, 𝑤), then 𝑐𝑖𝑗 = 0 if arc 𝑗 is not part of cycle 𝑖, 𝑐𝑖𝑗 = 1 if 𝑗 = (𝑣, 𝑤) is part of cycle 𝑖, and 𝑐𝑖𝑗 = −1 if the reverse of arc 𝑗, i.e. (𝑤, 𝑣), is part of cycle 𝑖. Let 𝑦̂ denote an adjustment of 𝑥 that satisfies the cycle condition 𝐶𝑦̂ = 0.14 We assume that 𝑦̂ is obtained by minimising the expression (𝑥 − 𝑦)′𝑊 −1 (𝑥 − 𝑦) under the condition 𝐶𝑦 = 0. This can be achieved using the Lagrangian multiplier method. It yields an ‘estimator’ that is known as an RGLS-estimator.15 In our case we have: 𝑦̂ = 𝑥 − 𝑊𝐶 ′ (𝐶𝑊𝐶 ′ )−1 𝐶𝑥 = (𝐼𝑚 − 𝑊𝐶′(𝐶𝑊𝐶 ′ )−1 𝐶)𝑥 = 𝑃𝑥
(4.1)
where 𝐼𝑚 is the 𝑚 × 𝑚 identity matrix and 𝑃 = 𝐼𝑚 − 𝑊𝐶′(𝐶𝑊𝐶′)−1 𝐶, which is an 𝑚 × 𝑚 matrix. The matrix 𝐶𝑊𝐶′ is non-singular because 𝐶 is of full row rank 𝑚 − 𝑛 + 1 and 𝑊is non-singular. The matrix 𝑃 is a projection matrix, for which 𝑃2 = 𝑃 holds (idempotency). Its rank equals 𝑇𝑟 𝑃 = 𝑚 − (𝑚 − 𝑛 + 1) = 𝑛 − 1, with 𝑇𝑟 𝑃 = ∑𝑖 𝑃𝑖𝑖 and 𝑇𝑟 denotes the trace operator. 𝑃 satisfies the equality 𝐶𝑃 = 0, which implies 𝐶𝑦̂ = 0, as required.
14 15
It should be noted that the roles of 𝑥 and 𝑦 have been swapped, compared to Willenborg (2010). RGLS = Restricted Generalized Least Squares. In our case we are not dealing with an estimator at all, because there are no observations with errors, yielding random variables. Our setting, however, is deterministic, not stochastic. CBS | Discussion Paper | May 2018 29
If we write Σ = 𝐶′(𝐶𝑊𝐶 ′ )−1 𝐶 , we have 𝑃 = 𝐼𝑚 − 𝑊Σ, with 𝐼𝑚 , 𝑊, Σ symmetric matrices, so that 𝑃′ = 𝐼𝑚 − Σ𝑊. We have: (𝑦̂ − 𝑥)′ 𝑊 −1 (𝑦̂ − 𝑥)
= =
(𝑃𝑥 − 𝑥)′ 𝑊 −1 (𝑃𝑥 − 𝑥) = 𝑥 ′ 𝛴𝑊𝛴𝑥 𝑥′𝐶′(𝐶𝑊𝐶′)−1 𝐶𝑥
(4.2)
which, in statistical terms, can be interpreted as a variance. 16 If 𝑥 is a vector corresponding to a transitive price index, we have 𝐶𝑥 = 0. This in turn implies 𝑃𝑥 = 𝑥, irrespective of the choice of the weight matrix 𝑊. So the cycle method will always yield the same price index if this is transitive.
4.4
Form of the adjusted price index
The results in the previous section are in terms of log price indices. Equation (4.1) shows that each adjusted index 𝑦̂𝑖 is a linear function of the original log price indices 𝑗 𝑥𝑗 . So if we write 𝑦̂𝑖 = ∑𝑚 ̂𝑖 = ln 𝑃̂ 𝑖 , then 𝑗=1 𝛼𝑖𝑗 𝑥𝑗 , where 𝑥𝑗 = ln 𝑃 and 𝑦 𝑃̂ 𝑖 = ∏𝑛𝑗=1(𝑃𝑗 )𝛼𝑖𝑗 .
(4.3)
So (4.3) presents the general form of the adjusted indices 𝑃̂ 𝑖 , in terms of the original indices 𝑃𝑗 . It is understood that if 𝑃𝑗 is the price index associated with arc 𝑗, (𝑃𝑗 )−1 is the price index associated with the reversed arc of 𝑗. The adjusted price indices have the property that the product of all the adjusted indices belonging to an arc on any cycle in the PIDG equals 1. From this follows that the price index between a base point in the PIDG and a reference point is independent of the path in the PIDG to connect these points. In other words, the adjusted price index is transitive, as promised.
4.5
Invariance properties
For any non-singular (𝑚 − 𝑛 + 1) × (𝑚 − 𝑛 + 1) matrix 𝒜 the projection matrix 𝑃 in (4.1) is invariant for substituting 𝒜 𝐶 for 𝐶. This implies (by making specific choices for this matrix 𝒜) that the vector of adjusted values 𝑦̂ is independent of, the numbering of the arcs in the PIDG, the numbering of the elementary cycles in the PIDG (a basis of the cycle space), arc reversal, as well as the choice of a basis for the cycle space of the PIDG. This latter assertion in fact implies that another choice of spanning tree yields another cycle basis, but nevertheless the same adjustment emerges. So any choice of a cycle basis is possible to compute the adjustment. See Willenborg (1993) for more information on these assertions. There, also a topological invariance property is mentioned, namely that one only has to concentrate on that part of the
16
But that is not the matter here, as there is no stochastics involved (or not necessarily). This is just the deviation between the original data and the adjusted data. CBS | Discussion Paper | May 2018 30
PIDG that contains cycles. Any linear parts ('filaments') can be pruned first, without affecting the result.
4.6
Cycle matrix and weight matrix
In the cycle method two matrices are used, the cycle matrix and the weight matrix. Both need to be explained a bit more in order to understand them. The computation of the cycle matrix is technically the most demanding part of the method. The specification of a suitable weight matrix requires statistical deliberations to make a good choice. In separate subsections we consider these matrices in more depth. Cycle matrix The computation of the cycle matrix is technically the most demanding part of the application of the cycle method. But with specialized software this problem can be solved for the user. It even allows users who are not interested in the intricacies of how to compute a cycle matrix, to use the cycle method. Dedicated software for the cycle method should remove a serious obstacle in using the method. Weight matrix The choice of a suitable weight matrix is a problem that requires statistical considerations to make a good choice. This problem is not unique for the cycle method, but for any problem based on weighted linear regression. Each choice yields a solution, and it is of course of interest to find out how sensitive the solution is for the choice of a weight matrix. The weight matrix for the cycle method is, in principle, on pairs of arcs in the corresponding PIDG. But that is somewhat complicated to handle. So the simplifying assumption is that only the weights on the main diagonal of the weight matrix are specified. These weights correspond to single arcs. This also means a serious reduction in the number of weights to be specified. A further way to simplify the structure of the weight matrix is by assuming that the weight 𝑤(𝑖,𝑗) for arc (𝑖, 𝑗) in fact is only determined by the difference |𝑖 − 𝑗|. So in𝑛 stead of a possible maximum of ( ) weights we are only dealing with 𝑛 − 1 weights. 2 So far we assumed that the user had to specify a suitable weight matrix 𝑊. Another option that we consider here is to compute such a weight matrix by solving the following optimization problem, given 𝑥 and 𝐶 (which are situation specific): min 𝑥′𝐶′(𝐶𝑊𝐶′)−1 𝐶𝑥 such that
(4.4)
𝑇𝑟(𝑊) = 1, 𝑊 ≥ 0. CBS | Discussion Paper | May 2018 31
The object function in (4.4) is the variance-like expression in (4.2). As 𝐶 and 𝑥 are given, 𝑊, as computed in (4.4), depends on these two quantities. How to actually solve problems of type (4.4), I have not investigated yet.
4.7
Alternative approach to the cycle method
Instead of the derivation of the cycle method in Section 4.3 we can use the representation of transitive price indices given in Section 2.2. Let a PIM 𝑃 = (𝑝𝑖𝑗 ) be given for a period 𝑇 = {1,2, … , 𝑛}. Suppose that 𝑝𝑖𝑖 = 1 for all 𝑖 ∈ 𝑇 and 𝑝𝑖𝑗 𝑝 𝑗𝑖 = 1 for all 𝑖, 𝑗 ∈ 𝑇. Suppose furthermore that 𝑃 is not transitive. The aim is now to approximate 𝑃 by a transitive PIM. This is generated by 𝑛 monthly prices 𝑝1 , … , 𝑝𝑛 , unknown, that we have to estimate, such that the matrix 1
1
𝑝1
𝑝𝑛
( ,…,
′
) (𝑝1 , … , 𝑝𝑛 )
(4.5)
approximates 𝑃 as well as possible given a metric to measure the distance between both matrices. In fact, we want to compare 𝑝𝑖𝑗 to 𝑝𝑗 ⁄𝑝𝑖 via the ratio 𝑝𝑖𝑗 ⁄(𝑝𝑗 ⁄𝑝𝑖 ) = 𝑝𝑖𝑗 𝑝𝑖 /𝑝 𝑗 . Instead of considering the price index values we consider their (natural) logarithms. We then obtain terms like log 𝑝𝑖𝑗 + log 𝑝𝑖 − log 𝑝𝑗 . The next task is to minimize an object function of the form 2
∑𝑛𝑖,𝑗=1 𝜔𝑖𝑗 (log 𝑝𝑖𝑗 + log 𝑝𝑖 − log 𝑝𝑗 ) ,
(4.6)
for suitable weights 𝜔𝑖𝑗 for each arc (𝑖, 𝑗). The 𝑝𝑖𝑗 ’s are given and the 𝑝𝑖 ’s have to be found, with 𝑝𝑖 > 0. The 𝑝𝑖 ’s are average monthly prices for the group. As in case of the cycle method (cf. Section 4.6) we can choose the 𝜔𝑖𝑗 's in such a way that they depend on |𝑖 − 𝑗|, in the sense that the larger this number the smaller the 𝜔𝑖𝑗 , implying that the method is more tolerant for a discrepancy between log 𝑝𝑖𝑗 and log 𝑝𝑖 − log 𝑝𝑗 . It should be noted that the approach described in the present section should yield the same result as the original approach to the cycle method described in Section 4.3. The difference is that the approach presented here uses the result presented in Section (2.2). The form of the transitive index is known and this fact is exploited here. It results in associating (unknown) prices to the points in the PIDG, rather than working with index values associated with the arcs of the PIDG.
4.8
Applications of the cycle method
Transitivization The direct application of the cycle method is to 'adjust' intransitive price indices. This application is the obvious one. CBS | Discussion Paper | May 2018 32
It should be realized that the input of the cycle method, i.e. the vector 𝑥, consisting of logarithms of price index values in a PIDG is very general. How these price indices have been calculated is of no importance to the cycle method. It should be remarked that one could, in principle, use the cycle method in symbolic calculations, using formulas instead of numbers. But the results specified at the component level of vectors can be quite complicated. It is probably also of limited value, and only provides some insight for small examples. This is not only true for the cycle method but for the GEKS method as well (cf. Willenborg, 2017d). So the cycle method is most profitably applied to numbers and not symbolic expressions. But this is true, not only for the cycle method, but for any multilateral price index method. Quantifying the degree of intransitivity A second application, closely related to the main application discussed in the previous section, is worth mentioning separately. It can be viewed as a quality assessment of an intransitive price index. A price index is not simply transitive or intransitive. There are degrees of intransitivity, and the cycle method can be used to make this idea precise. Although the cycle method always produces a transitive PIDG, that is as close as possible to the original price index, this minimum distance may be considered too big. The original and adjusted price indices may differ in certain (essential) properties. The original price index may simply be ‘too intransitive’. So transitivity is not necessarily a black or white thing (transitive or not) but it may be a gradual thing. If we look at the expression for the adjusted price index (formula (4.1)) we see that 𝑥, the original index vector, and 𝑦̂, the adjusted index vector, differ by a linear term that depends on the cycle matrix 𝐶, the weight matrix 𝑊 and 𝑥. In fact, the variance expression can be taken as a measure for the distance between 𝑥 and 𝑦̂, and hence as a measure for intransitivity. The bigger this number, the bigger the correction needed for the original index to become transitive, and hence the more intransitive the original index is. So the term 𝑥′𝐶′(𝐶𝑊𝐶′)−1 𝐶𝑥
(4.7)
can be viewed as an expression for the (in)transitivity of 𝑥. Note that this expression depends on both objective characteristics (the input vector 𝑥 and the cycle matrix 𝐶) and a subjective one (the weight matrix 𝑊). The adjustment obtained by the cycle method may be used perhaps, to dismiss the original index because of its severe violation of the transitivity property. Instead of using the cycle method to find a transitive replacement, one should reconsider the original price index and try to replace it by a better one.
CBS | Discussion Paper | May 2018 33
5. TPD model In the present section we consider the TPD model from a different angle than usual. The usual method estimates the parameters we are interested in (that can be interpreted as price indices) but also other parameters that we are not interested in have to be estimated. These latter parameters can be considered as nuisance parameters. However, by considering price ratios they can be avoided. The TPD model assumes a decomposition of the prices 𝑝𝑖𝑗 per item subgroup 𝑖 and month 𝑗, as follows: 𝑝𝑖𝑗 = 𝜅𝛼𝑖 𝜋𝑗 𝜀𝑖𝑗 ,
(5.1)
where 𝑝𝑖𝑗 ≥ 0, 𝜅 > 0 is a constant, the 𝛼𝑖 > 0 are viewed as time averaged prices for good 𝑖, the 𝜋𝑗 > 0 are viewed as dimensionless price indices for month 𝑗, assuming that 𝜋1 = 1, and the 𝜀𝑖𝑗 > 0 are error terms. Our interest is in the 𝜋𝑗 's. By taking logarithms we can replace (5.1) by a linear: log 𝑝𝑖𝑗 = 𝜂 + 𝜗𝑖 + 𝜆𝑗 + 𝛿𝑖𝑗 ,
(5.2)
where 𝜂 = log 𝜅, 𝜗𝑖 = log 𝛼𝑖 , 𝜆𝑗 = log 𝜋𝑗 , and 𝛿𝑖𝑗 = log 𝜀𝑖𝑗 . We can estimate the unknown parameters in this model by minimizing the following weighted quadratic sum 2
2
(5.3)
∑𝑖,𝑗 𝑤𝑖𝑗 (𝛿𝑖𝑗 ) =∑𝑖,𝑗 𝑤𝑖𝑗 (log 𝑝𝑖𝑗 − 𝜂 − 𝜗𝑖 − 𝜆𝑗 ) ,
for some weight matrix 𝑊 = (𝑤𝑖𝑗 ), and assuming that 𝜆1 = 0. In this case we estimate all the parameters in model (5.3). This method is usually applied to estimate the model parameters. However, we can also look at the ratios of the prices for a product 𝑖 available in periods 𝑗1 and 𝑗2 . We then have: log 𝑝𝑖𝑗2 − log 𝑝𝑖𝑗1 = 𝜆𝑗2 − 𝜆𝑗1 + 𝛿𝑖𝑗1 𝑗2 ,
(5.4)
where 𝛿𝑖𝑗1𝑗2 denotes the error term. To estimate the 𝜋's, via the 𝜆's, we can minimize the following quadratic sum 2
∑𝑗1 0 iff 𝑣̅𝑖𝑗 > 0
or
𝑞̅𝑖𝑗 = 0 iff 𝑣̅𝑖𝑗 = 0.
(7.18)
If there is no turnover for a subgroup 𝑖 in month 𝑗, 0 items have been sold for this subgroup in month 𝑗 and vice versa. In other words, it is assumed that there are no items for free; every item has a strictly positive price. From these matrices 𝑉𝑚𝑡 and 𝑄𝑚𝑡 the price matrix
𝑃𝑚𝑡
𝑝̅11 =( ⋮ 𝑝̅𝑚1
⋯ ⋱ ⋯
𝑝̅1𝑡 ⋮ ). 𝑝̅𝑚𝑡
(7.19)
can be computed as 𝑝̅𝑖𝑗 = 𝑣̅𝑖𝑗 ⁄𝑞̅𝑖𝑗 , which is only defined if 𝑞̅𝑖𝑗 > 0 (iff 𝑣̅𝑖𝑗 > 0); otherwise the value is undefined. As in the case 𝑣̅𝑖𝑗 = 0, we simply count such a month as in the case 𝑣̅𝑖𝑗 > 0.19 The problem is now to aggregate these subgroup level prices for each of the months in period 𝑇 (with |𝑇| = 𝑡) to the group (COICOP) level. The GK-method could be used to solve this problem. This method defines equations for two sets of quantities: price indices 𝜋𝑗 (with respect to reference month 1, 𝜋1 = 1) in a period 𝑇 and time averaged prices 𝛼𝑖 of subgroups 𝑖 ∈ 𝐴. The equations are as follows
𝛼𝑖 = 𝜋𝑗 =
∑𝑗∈𝑇 𝑣𝑖𝑗 /𝜋𝑗 ∑𝑗∈𝑇 𝑞𝑖𝑗 ∑𝑖 ∈𝐴 𝑣𝑖𝑗 ∑𝑖 ∈𝐴 𝛼𝑖 𝑞𝑖𝑗
, (𝑖 ∈ 𝐴)
, (𝑗 ∈ 𝑇)
(7.20)
𝜋1 = 1
19
The point is: we do not consider such a month as a structural 0. CBS | Discussion Paper | May 2018 43
The first set of equations define time averaged prices for each subgroup 𝑖 by taking the total turnover for this subgroup in the entire period 𝑇 and dividing this by the total quantity sold in that period. Because the monetary value is not constant over time, price indices 𝜋𝑗 are used to make each monthly turnover comparable to that of the first month in 𝑇, assuming 𝜋1 = 1. The second set of equations computes for each month 𝑗 in 𝑇 the index by taking the total turnover in that month and dividing it by a similar total based on the time averaged prices for each subgroup that were defined in the first set of equations. Thus we have a recursive set of equations that has to be solved iteratively, in practice. An iteration can be started by assuming that all price indices are equal to 1. Instead of the GK method we propose an alternative method that is computationally simpler, as it avoids these iterations. We achieve this by considering the relative budget shares for each subgroup of items for each month in 𝑇. We then obtain 𝑣̅11
⋯
𝑣̅.1
𝑉̅𝑚𝑡 = ( ⋮
⋱ ⋯
𝑣̅𝑚1 𝑣̅.1
𝑣̅1𝑡 𝑣̅.𝑡
⋮ ).
(7.21)
𝑣̅𝑚1 𝑣̅.𝑡
By considering relative budget shares per month we avoid intertemporal comparisons as is required in the GK method. For each subgroup we can average these relative budget shares over the entire period 𝑇, and obtain the vector 1
𝛽1𝑡 ( ⋮ )= 𝑡 𝛽𝑚
𝑡
∑𝑡𝑗=1
𝑣̅1𝑗 𝑣̅.𝑗
⋮
,
𝑣̅𝑚𝑗 ∑𝑡 𝑡 𝑗=1 𝑣̅.𝑗
(7.22)
1
(
)
where |𝑇| = 𝑡. It is reasonable to use unweighted averages as each month in the 𝑡 time period 𝑇 is equally important.20 Now, using 𝛽 𝑡 = ∑𝑚 𝑖=1 𝛽𝑖 , we can rescale the 𝛽’s so that they add to unity for each month 𝑡 ∈ 𝑇 :
𝛾1𝑡 ( ⋮ )= 𝛾𝑚𝑡
𝛽1𝑡 𝛽𝑡
⋮
.
(7.23)
𝑡 𝛽𝑚 𝑡
(𝛽 ) We use these 𝛾’s to weigh the subgroup prices for each month 𝑡 ∈ 𝑇: 𝑡 𝑡 𝑝̿ 𝑡 = ∑𝑚 𝑖=1 𝛾𝑖 𝑝̅𝑖 =
20
𝑡 𝑡 ∑𝑚 𝑖=1 𝛽𝑖 𝑝̅𝑖 𝑡 ∑𝑚 𝑖=1 𝛽𝑖
.
(7.24)
Of course, if one wishes to diminish the influence of more distant months then one could use appropriate weights. CBS | Discussion Paper | May 2018 44
Note that the prices 𝑝̅𝑖𝑡 are only based on data in month 𝑡, whereas the weights 𝛾𝑖𝑡 are based on data from the entire period 𝑇. A price index is computed from these average price (7.24) by taking price ratios, that is for month 𝑗1 as reference month and month 𝑗2 as reporting month we have the price index 𝜋 𝑗1𝑗2 =
𝑝̿ 𝑗2 𝑝̿ 𝑗1
.
(7.25)
This price index is obviously transitive in a DPoG. Of course, instead of taking arithmetic averages in (7.22), we can use different averages, including robust ones such as the median. If a subgroup seizes to exist ('fades out') its contribution to the group price (7.24) gets smaller as time progresses. A variation of the approach above would be to use a finite rolling window, to avoid that the past is dragged along all the time. It is interesting to study the development of the 𝛽's and 𝛾's per subgroup.
8. Price ratios and the cycle method 8.1
Idea of the method
The method that we want to apply is very much at the heart of price index theory in the sense that it is based on price ratios. Price ratios of composites, rather than price ratios of items at a more basic (i.e. GTIN) level. The approach was applied earlier by the author in the context of internet data (cf. Willenborg, 2017a, b). The approach below is an extension of the earlier method to scanner data, so that values (turnovers) and quantities are available and not only prices as in case of internet data. This allows us to differentiate goods as to their 'economic value' in the price index computations through associated weights. So the approach, applied to some time period, computes price ratios at the subgroup level, for as many (ordered) month pairs as possible. These pairs then form a PIDG that is likely to be intransitive. The cycle method is then used to transitivize this index. This sounds straightforward. In case of internet data it is, because no weights are used. But if weights are used, things get a bit more complicated. We are in a similar situation as in the GK case: in order to compute the weights one needs to have price indices. And in order to compute price indices one needs to have weights. A chickenand-egg problem. As in the GK case, it can be solved through applying an iteration. (First assume values for weights, then calculate price indices using these weights, then with these indices compute the weights again, and the new price indices with CBS | Discussion Paper | May 2018 45
these new weights, etc. until convergence.) In the end the method yields a transitive price index.
8.2
Weights: time dimension
We want to apply the cycle method to a PIDG with rough indices for certain months pairs. There are certain subtleties here that we want to discuss. We assume that the context is scanner data, so that weights based on turnover are available. We consider the computation of the average prices per stratum and the weight to be attached to a price ratio associated with a month pair (𝑗1 , 𝑗2 ). We start with a group of goods that has been partitioned into homogeneous, stable subgroups. Suppose we want to compute the ingredients for month pair (𝑗1 , 𝑗2 ). Consider subgroup 𝑖. We have the following price ratio: 𝑝̅ 𝑖𝑗2 𝑝̅ 𝑖𝑗1
,
(8.1)
where 𝑝̅𝑖𝑗1 , 𝑝̅𝑖𝑗2 are average prices for subgroup 𝑖 in months 𝑗1 and 𝑗2 , respectively. They are computed as follows: 𝑝̅𝑖𝑗1 =
𝑣𝑖𝑗1 𝑞𝑖𝑗1
=
∑𝑘 𝑝𝑖𝑗1 ,𝑘 𝑞𝑖𝑗1, 𝑘 , ∑𝑘 𝑞𝑖𝑗1 ,𝑘
(8.2)
=
∑𝑘 𝑝𝑖𝑗2 ,𝑘 𝑞𝑖𝑗2 ,𝑘 . ∑𝑘 𝑞𝑖𝑗2 ,𝑘
(8.3)
and 𝑝̅𝑖𝑗2 =
𝑣𝑖𝑗2 𝑞𝑖𝑗2
So the average price for subgroup 𝑖 in month 𝑗1 is the ratio of the total value of the items belonging to subgroup 𝑖 sold in month 𝑗1 . Similarly for month 𝑗2 . The items in subgroup 𝑖 in month 𝑗1 may be different from those in month 𝑗2 . Note that the numerator and denominator in (8.2) and (8.3) are additive measures of value and size, respectively. So we want to use the price ratio (8.1) for the month pair (𝑗1 , 𝑗2 ). But we also want to use a weight for this ratio. This weight should reflect the economic value of the subgroup 𝑖. The usual approach (for instance in the Törnqvist price index) is to take the arithmetic average of the budget shares in months 𝑗1 and 𝑗2 . But we will make a different choice, based on the following reasoning. Consider the months 𝑗1 and 𝑗2 . The value of the sales in those months for subgroup 𝑖 is 𝑣𝑖𝑗1 and 𝑣𝑖𝑗2 , respectively. They represent different values, so we should not add them. We should make them comparable by using price indices 𝜋𝑗1 and 𝜋𝑗2 , respectively. In terms of the value in month 1, the total value of the sales of products in months 𝑗1 and 𝑗2 is: CBS | Discussion Paper | May 2018 46
𝑣𝑖𝑗1 𝜋𝑗1
+
𝑣𝑖𝑗2 𝜋𝑗2
.
(8.4)
So the relative share of sales of items in subgroup 𝑖 in months 𝑗1 and 𝑗2 is: 𝑣𝑖𝑗1 /𝜋𝑗1 + 𝑣𝑖𝑗2 /𝜋𝑗2
𝜚𝑖𝑗1𝑗2 = ∑
𝑖 𝑣𝑖𝑗1 /𝜋𝑗1 + 𝑣𝑖𝑗2 /𝜋𝑗2
.
(8.5)
This is the weight that we propose to use for the relative share of subgroup 𝑖, for the month pair (𝑗1 , 𝑗2 ). For the weighted contribution of (8.1) we take 𝑝̅ 𝑖𝑗2
(
𝑝̅ 𝑖𝑗1
)
𝜚𝑖𝑗1 𝑗2
.
(8.6)
We then take as the average price ratio for the month pair (𝑗1 , 𝑗2 ): 𝑝̅ 𝑖𝑗2
∏𝑖 (
𝑝̅ 𝑖𝑗1
𝜚𝑖𝑗1 𝑗2
)
,
(8.7)
where the product is taken for those subgroups 𝑧 for which 𝑝̅𝑖𝑗1 > 0 and 𝑝̅𝑖𝑗2 > 0. These are the rough indices associated with the month pair (𝑗1 , 𝑗2 ). Now the cycle method can be used to adjust these rough indices to a transitive PIDG, which in turn one can represent by the MoM price indices. Of course, the price indices are not known, when we start calculating them for an entire period. But we can apply an iterative method as in case of the GK-index (see (7.20), starting with initial values for the price indices equal to 1. In case the indices are calculated in a piecemeal fashion, for each new month we have all the previously computed price indices. Only those involving the most recent month as a reference month are new. In that case, we can use the price index of the previous month as a substitute, and probably, as a good approximation. It can also be an initial value for an iteration, as just described in the previous case.
8.3
Weights: goods dimension
Similar to the approach in the previous section, we can compute price ratios of average prices for different subgroups in the same month. In this way we obtain the equivalent of a price index but in the goods dimension. We consider the ratios of the average prices of subgroups 𝑖1 and 𝑖2 in month 𝑗: 𝑝̅ 𝑖2 𝑗 𝑝̅ 𝑖1 𝑗
,
(8.8)
where 𝑝̅𝑖1 𝑗 , 𝑝̅𝑖2 𝑗 are average prices for subgroups 𝑖1 and 𝑖2 in month 𝑗, respectively. They are computed as follows:
CBS | Discussion Paper | May 2018 47
𝑝̅𝑖1𝑗 =
𝑣 𝑖1 𝑗 𝑞 𝑖1 𝑗
=
∑𝑘 𝑝𝑖1 𝑗,𝑘 𝑞𝑖1 𝑗,𝑘 ∑𝑘 𝑞𝑖1 𝑗,𝑘
,
(8.9)
.
(8.10)
and 𝑝̅𝑖2𝑗 =
𝑣 𝑖2 𝑗 𝑞 𝑖2 𝑗
=
∑𝑘 𝑝𝑖2 𝑗,𝑘 𝑞𝑖2 𝑗,𝑘 ∑𝑘 𝑞𝑖2 𝑗,𝑘
For month 𝑗 the turnover of subgroup 𝑖1 and 𝑖2 is 𝑣𝑖1 𝑗 and 𝑣𝑖2 𝑗 , respectively, which we can add and express in terms of the value of month 1: 𝑣𝑖1 𝑗+𝑣𝑖2 𝑗 𝜋𝑗
.
(8.11)
The relative share of the sales of the items in subgroups 𝑖1 and 𝑖2 over the entire period, keeping inflation in this period in mind, is: (𝑣𝑖1 𝑗+𝑣𝑖2 𝑗)/𝜋𝑗
𝛼𝑖1 𝑖2𝑗 = ∑
𝑗(𝑣𝑖1 𝑗 +𝑣𝑖2 𝑗 )/𝜋𝑗
.
(8.12)
Using (8.5) to weigh the price ratio (8.8) yields the contribution 𝑝̅ 𝑖1 𝑗
(
𝑝̅ 𝑖2 𝑗
)
𝛼 𝑖1 𝑖2 𝑗
,
(8.13)
so that we can take as the weighted average of the ratios of the subgroup prices: 𝑝̅ 𝑖1 𝑗
∏𝑗 (
𝑝̅ 𝑖2 𝑗
𝛼 𝑖1 𝑖2 𝑗
)
,
(8.14)
where the product is taken over those months in the period considered for which the price ratio (8.8) exists, that is, for those months 𝑗 for which 𝑝̅𝑖1 𝑗 > 0 and 𝑝̅𝑖2 𝑗 > 0. As in the previous section an iterative scheme is needed to compute these weights. Using the cycle method in the goods dimension we obtain transitive 'goods indices' comparing the prices among the various subgroups. If the period considered is long these indices should be stable.
CBS | Discussion Paper | May 2018 48
9. Updating: incremental price index computation 9.1
Preliminary remarks
When explaining an index method, usually it is assumed that the data for several months are available and that price indices for the entire period have to be computed in one go. But in practice this never happens. There an incremental approach is used. Every month when new data have been collected the price index numbers for this new month are calculated. It is typically not allowed to correct previously published price index numbers. The incremental process of updating price index numbers is also referred to as updating (a series of price index numbers). Of course, this publication practice does not prevent one to update previous price index numbers ‘behind the curtain’. It may be a sensible practice to check that the published figures do not depart too much from their unrestricted counterparts. In this section we consider the incremental variants of the 𝜅-index and of the cycle method. The one for the former method are extremely easy to apply. It only differs from the original method (as explained in Section 7) in that the newly calculated weights can be used for the price index numbers of the newest month.
9.2
Incremental cycle method
In practice price indices are often published as soon as they have been calculated. And it is common practice not to revise previously published price indices 21. So the question then is how to compute (new) price indices under the restriction that previously calculated price indices should be unchanged. This implies that any adjustments can only be applied to the newest price index estimates. But in the background one is able to make adjustments without the restriction concerning previously published price indices. This gives an opportunity to better control the outcome. The problem that is created in this way, is generally known as an update problem. Several solutions have been tried in the past to deal with this problem. We do not intend to review any of these, as none of these are related to the cycle method. The situation we wish to consider here is that of sequentially calculating price indices, under the restriction that all indices published previous to a certain period (month) are fixed. If the information of a new month comes available, there are also several comparisons (price index values) that become available. We now present a simple example to make the discussion more concrete.
21
Unless a grave mistake comes to light that obviously needs rectification. But such a calamity is an exception. We want to concentrate on the business-as-usual situation. CBS | Discussion Paper | May 2018 49
Example The indices for 6 months of data have already been published, and are considered fixed. We assume that the published indices are transitive and therefore can be represented by a linear digraph, as shown in Figure 9.2.1. This can be viewed as a spanning tree of a complete PIDG on 6 points (months).
9.2.1 Linear spanning tree for a period of 6 consecutive months.
Now for month 7 new price data have been collected, yielding new indices, 𝑝1,7 , 𝑝2,7 , 𝑝3,7 , 𝑝4,7 , 𝑝5,7 , 𝑝6,7 (and their time reversals, which we assume to be the reciprocal values). We can decide which of these new indices we want to use in the update method. Below we shall consider the situation that all these indices are used. But it is probably a good decision to discard indices concerning pairs of months that are widely separated. So let’s assume that all these price indices are taken into account in the update for month 7. For the new period, consisting of months 1, … ,7, we can add arc (6,7) and price index 𝑝6,7 to get a new linear PIDG, acting as the spanning tree for the new period. See Figure 9.2.2.
9.2.2 Update with month 7. Information concerning new price indices in red.
This new arc generates 5 new elementary cycles, each yielding an equation for the adjusted versions of the new indices: 𝑝56 𝑝̃ 67 𝑝̃75 = 1, 𝑝45 𝑝56 𝑝̃67 𝑝̃74 = 1, 𝑝34 𝑝45 𝑝56 𝑝̃67 𝑝̃ 73 = 1, 𝑝23 𝑝34 𝑝45 𝑝56 𝑝̃ 67 𝑝̃72 = 1, 𝑝12 𝑝23 𝑝34 𝑝45 𝑝56 𝑝̃67 𝑝̃71 = 1.
(9.1)
In (9.1) the price indices that are fixed are the ones written without a tilde and the ones that have to be computed are written with a tilde. Taking (natural) logarithms of the equations in (9.1) and writing 𝑥𝑖𝑗 = log 𝑝𝑖𝑗 and 𝑦𝑖𝑗 = log 𝑝̃𝑖𝑗 and using 𝑝̃𝑖𝑗 𝑝̃ 𝑗𝑖 = 1 or log 𝑝̃ 𝑗𝑖 = − log 𝑝̃𝑖𝑗 we obtain the following linear system of equations:
CBS | Discussion Paper | May 2018 50
𝑦67 − 𝑦57 = −𝑥56 ≜ −𝑐5 , 𝑦67 − 𝑦47 = −𝑥45 − 𝑥56 ≜ −𝑐4 , 𝑦67 − 𝑦37 = −𝑥34 − 𝑥45 − 𝑥56 ≜ −𝑐3 , 𝑦67 − 𝑦27 = −𝑥23 − 𝑥34 − 𝑥45 − 𝑥56 ≜ −𝑐2 , 𝑦67 − 𝑦17 = −𝑥12 − 𝑥23 − 𝑥34 − 𝑥45 − 𝑥56 ≜ −𝑐1 .
(9.2)
On the left-hand side of the equations in (9.2) are the unknown 𝑦's, whereas on the right-hand side there are the constant 𝑥's, as these values correspond to fixed price indices related to the first 6 months of the period. There are 5 equations with 6 unknowns. In matrix form (9.2) can be written as 1 0 0 0 (0
0 1 0 0 0
0 0 1 0 0
0 0 0 1 0
0 0 0 0 1
𝑦17 𝑐1 −1 𝑦27 𝑐2 −1 𝑦37 𝑐 −1 𝑦47 = 3 . 𝑐4 −1 𝑦57 −1) (𝑦 ) (𝑐5 )
(9.3)
67
Equality (9.3) also shows that in fact there is a single parameter y67 that, when specified, determines the remaining ones, that is 𝑦17 , … , 𝑦57 . The minimization problem we now have to solve is: ̃ −1 (𝑦𝜌 − 𝑥𝜌 ) (𝑦𝜌 − 𝑥𝜌 )′𝑊 such that 𝑅𝑦𝜌 = 𝑐,
(9.4)
where 𝑥𝜌 = (𝑥17 , … , 𝑥67 )′, 𝑦𝜌 = (𝑦17 , … , 𝑦67 )′ and where the linear constraint (9.3) is written in the succinct form 𝑅𝑦𝜌 = 𝑐, with 1 0 𝑅= 0 0 (0
0 1 0 0 0
0 0 1 0 0
0 0 0 1 0
0 0 0 0 1
−1 −1 −1 , −1 −1)
(9.5)
and 𝑐 = (𝑐1 , … , 𝑐5 )′. Note that 𝑅 in (9.5) has full row rank, which is 5. We can apply a re-parameterization and replace 𝑦𝜌 by 𝑦̃𝜌 ≜ 𝑦𝜌 − 𝑅′(𝑅𝑅′)−1 𝑐,
(9.6)
so that 𝑅𝑦̃𝜌 = 0 , and replace 𝑥𝜌 by 𝑥̃𝜌 ≜ 𝑥𝜌 − 𝑅′(𝑅𝑅′)−1 𝑐.
(9.7)
Note that (9.6) and (9.7) are well-defined as 𝑅 has full row rank and hence 𝑅𝑅′ is a square matrix of full rank 5, so that its inverse exists.
CBS | Discussion Paper | May 2018 51
Our minimization problem (9.4) can be defined in terms of the new variables 𝑥̃𝜌 and 𝑦̃𝜌 as follows: ̃ −1 (𝑦̃𝜌 − 𝑥̃𝜌 ) min (𝑦̃𝜌 − 𝑥̃𝜌 )′𝑊 such that 𝑅𝑦̃𝜌 = 0 ,
(9.8)
̃ is the sub-matrix of the weight matrix 𝑊 restricted to the red arcs in Figure where 𝑊 9.2.2. 𝑅 is not a cycle matrix, but a more general constraint matrix, derived from constraints on some of the cycles. Problem (9.8) has the same form as the minimization problem for the cycle method (cf. section 4.3). Using (4.1) we can write the solution to the problem (9.8) in the form: ̃ 𝑅′ (𝑅𝑊 ̃ 𝑅′ )−1 𝑅 𝑥̃𝜌 , 𝑦̂̃𝜌 = 𝑥̃𝜌 − 𝑊
(9.9)
which we can rewrite, using (9.6) and (9.7), as 𝑦̂𝜌
−1
̃ 𝑅′ (𝑅𝑊 ̃ 𝑅′ ) 𝑅 (𝑥𝜌 − 𝑅′(𝑅𝑅′ )−1 𝑐) = 𝑥𝜌 − 𝑊 ̃ 𝑅′ (𝑅𝑊 ̃ 𝑅′ )−1 (𝑅𝑥𝜌 − 𝑐). = 𝑥𝜌 − 𝑊
(9.10)
Again we notice the same form as the solution for the cycle method: the original ̃ 𝑅′ (𝑅𝑊 ̃ 𝑅′ )−1 (𝑅𝑥𝜌 − 𝑐). value, 𝑥𝜌 , and a correction term, −𝑊
9.3
Incremental 𝜿-index
The 𝜅-index was explained in Section 7. There it was assumed that the data of an entire period were available. In practice it is possible to use a moving window, so that information from the remote past (beyond the window) is not taken into account any more. In practice we are typically dealing with a situation where the data do not become available at once for an extended period, but on a monthly basis. The goal is to publish price indices as soon as they become available. The policy often adopted is that results that have been published earlier cannot be changed (unless gross mistakes have been made). The κ-index can be easily adapted to this situation. It only means that the 𝛾-weights (as defined in (7.23)) have be computed for the entire period. In the new situation we have for the average price for the newest month 𝑡 + 1: 𝑡+1 𝑡+1 𝑝̿ 𝑡+1 = ∑𝑚 𝑝̅𝑖 . 𝑖=1 𝛾𝑖
(9.11)
CBS | Discussion Paper | May 2018 52
The subgroup prices 𝑝̅𝑖𝑢 for 𝑢 ∈ 𝑇 = {1, … , 𝑛} remain unchanged in the future, by the assumption that there were no rectifications of earlier calculated group prices. The weights 𝛾𝑖𝑡+1 for 𝑖 ∈ 𝐴 = {1, … , 𝑚}, have to be computed for the new, extended period 𝑇 ∪ {𝑛 + 1} = {1, … , 𝑛 + 1}. The weights for the previous periods remain unchanged. Because the 𝜅-index is transitive, we only need to compute one new index value, say for the pair (𝑡, 𝑡 + 1): 𝜋 𝑡,𝑡+1 =
𝑝̿ 𝑡+1 𝑝̿ 𝑡
.
(9.12)
The indices for the remaining month pairs can be obtained by applying transitive closure. It should be remarked that the approach above can be refined by using only a maximum of 𝑙 months prior to the current month, for a suitable value of 𝑙. This yields a moving window approach. The advantage of such an approach is that older information is not used to compute a price index value. In this way one can get rid of historical data that are not relevant anymore. The same idea is used in time series models. The simplicity of the updating procedure for the 𝜅-index is one of its appealing features.
10. Discussion The present paper argues that transitivity is not an optional property of a price index but a mandatory one. Otherwise - by definition - the result of a comparison of a pair of points may depend on the path chosen to connect them. Another word for such a defect is 'drift'. It can be viewed as a defect because there is no particular reason to favour one path over another. Classical price index theory is about static populations of goods, in fact, or mostly about two periods (months, say) where prices of goods are being compared. The assumption is that all items are present in both months. In practice populations of goods are dynamic: items come and go to a market. Items are sometimes replaced by similar ones, but in a new guise and at a different, usually higher, price. Price index theory has to cope with these situations. Simply computing prices at the GTIN level will, as a rule, not work, the exceptions being the items that exist for a long time. Also for dynamic populations it is desirable to use transitive price indices. However, most classical price indices (when extrapolated to dynamic goods populations) are not transitive. There are many transitive price indices. In fact, the general form of a transitive price index can be specified, as is shown in the present paper (in Section 2.2). It should be stressed that this is a key result. It was already derived in Willenborg (1993) in a more CBS | Discussion Paper | May 2018 53
general context. Using this representation it is easy to define transitive price indices as ratios of price averages. The 𝜅-index is an example of an index emerging from the application of the representation result for transitive price indices. The 𝜅-index is defined for composites. The method is actually close to the basic idea of a price index for a single item: comparing its prices at different points in time. But instead of a single item composites of items are considered to cope with the dynamics of the goods population. The 𝜅-index is transitive, easy to apply, and easy to update. But before recommending its use it should be thoroughly tested on real data and its results compared to that of other appropriate price indices. A price index that is not transitive it is not necessarily useless.22 It is possible to adjust it such that the resulting price index is transitive. There are several methods to do so, such as GEKS, MST and the cycle method. As is argued the cycle method is considered superior to GEKS and MST. The cycle method allows one to control the amount of adjustment applied to each component of the price index being adjusted. But extensive empirical work is needed in order to become able to choose a suitable weight matrix in a particular application. To do this it would be beneficial to have general software tools, say in R or Python, that implement the cycle method. Some bilateral price indices are transitive for static populations of goods, but break down for dynamic populations. To cope with this problem, there are several options. One can operate at the basis level of the items (GTINs) and compute rough price indices for pairs of months within an item group. These can be aggregated to the group (COICOP) level. The aggregate price indices that emerge are likely to be intransitive, but can be made transitive with the cycle method. A simpler method is to compute average prices for each subgroup (or composite) for each month, and use these to compute price indices, very much as in the case of a single item, by considering price ratios. This naturally leads to the 𝜅-index.
22
An 'intransitive price index' is an oxymoron to the present author. He views it as a semi-finished product that needs to be transitivized before it can be considered as a proper tool to measure price developments. CBS | Discussion Paper | May 2018 54
References Balk, B. (2008). Price and quantity index numbers, Cambridge University Press. Chessa, A., J. Verburg & L. Willenborg (2017). A comparison of price index methods for scanner data. Paper presented at the 15th Meeting of the Ottawa Group on Price Indices, Eltville am Rhein, Germany, 10-12 May 2017. De Haan, J., L. Willenborg & A. Chessa (2016). An overview of price index methods for scanner data. Paper presented at the UNECE-ILO Meeting of the Group of Experts on Consumer Price Indices, Geneva, Switzerland, 2-4 May 2016. Willenborg, L. (1993). An adjustment method based on graph homology. Report, CBS, Voorburg. Willenborg, L. (2010). Chain indices and path independence. Report, CBS, The Hague. Willenborg, L. (2017a). Elementary price indices for internet data. Discussion Paper, CBS, The Hague. Willenborg, L. (2017b). Transitivizing elementary price indices for internet data. Discussion Paper, CBS, The Hague. Willenborg, L. (2017c). Quantifying the dynamics of populations of articles. Discussion Paper, CBS, The Hague. Willenborg, L. (2017d). From GEKS to the cycle method. Discussion Paper, CBS, The Hague.
CBS | Discussion Paper | May 2018 55
Appendix A. Notation and terminology In the present appendix we have collected the concepts that play a key role in the present paper. The terminology is sometime drawn from the area of price index theory, sometimes from other areas, such as graph theory and (linear) algebra. Concept / acronym Arc
Bilateral price index Chained price index COICOP Complete PIDG Composite Connected graph Counter-arc Cycle Cycle matrix
Degree of intransitivity Digraph
Direct price index
DPoG
Explanation An ordered pair of nodes (𝑎, 𝑏). In a picture an arc (𝑎, 𝑏) is denoted by an arrow pointing from 𝑎 to 𝑏. The node 𝑎 is called the start and 𝑏 is called the finish of the arc (𝑎, 𝑏). See: Rough price index. In the present paper the chained indices considered are based on month-on-month (MoM) price indices. For instance 𝑃5,8 ≜ 𝑃5,6 𝑃6,7 𝑃7,8 . Classification Of Individual COnsumption according to Purpose. A PIDG such that its PIG is a complete graph, meaning that for each pair of nodes there is an edge. A subgroup of items within a (COICOP-)group that is stable over a longer period of time and homogeneous. A graph for which each pair of points can be connected by a path. If (𝑎, 𝑏) is an arc then (𝑏, 𝑎) is its counter-arc, that is, the arc with the opposite orientation of (𝑎, 𝑏). A closed path, a path where start and finish node coincide. A (-1,0,1)-matrix where the rows correspond to the cycles in a graph. The columns correspond to the arcs of a graph or digraph. A 0 indicates that the corresponding arc is not on a cycle, a 1 that it is, and a -1 that the counter-arc is. Intransitivity of an index can be quantified. For the cycle method this is the expression 𝑥′𝐶′(𝐶𝑊𝐶′)−1 𝐶𝑥. It measures the distance between the original index and its adjustment. Directed graph. It consists of points and arcs. Arcs are ordered pairs of nodes: (𝑎, 𝑏). Such an arc is depicted as an arrow with its tail in 𝑎 and its head in 𝑏. The arrow points from tail to head. Price index using a fixed base month (at least for a period of a year). They can be used to define indices with a different base month, assuming transitivity. For instance: 𝑃5,8 ≜ 𝑃1,8 /𝑃1,5 , assuming that 1 is the base month. See: Dynamic population of goods. CBS | Discussion Paper | May 2018 56
Drift Dynamic population of goods (DPoG) GEKS GTIN
Goods Graph
Group
Item MST
MoM method
Node Path
PIDG PIG PIM
PoG Point
Price Index Digraph
A discrepancy between price index values as a result of the intransitivity of the underlying price index. Collection of GTINs in a (COICOP-)group that changes over time as items disappear or appear on the market. Sometimes new items are in fact old ones in a new disguise, but (typically) with a higher price tag: a relaunch. A transitivization method named after Gini, Eltetö, Köves and Szulc. Global Trade Item Number, which can be used by companies to uniquely identify all of their trade items. Trade items are products or services that are priced, ordered or invoiced at any point in the supply chain. Materials that satisfy human wants and provide utility. In its most basic form it is a GTIN in the present paper. A graph is special kind of digraph, where each arc (𝑎, 𝑏) has its counter-arc (𝑏, 𝑎). Instead of arcs and counter-arcs one typically uses edges {𝑎, 𝑏} to denote pairs of nodes that are related in a specific way. Edges are undirected arcs. COICOPS are the collections of items that are of interest for reporting price indices. But they may be too broadly defined to calculate price indices directly for these sets of items. See: Goods. Minimum spanning tree. A spanning tree of a graph where the edges carry weights. An MST is a spanning tree for this graph where the sum of its weights are minimized. A method where price indices are computed for consecutive months (month-on-month = MoM). For the remaining pairs the price indices can be derived by transitive closure. It is assumed that the price index has the property of time reversal. See: Point. A path in a digraph 𝐺 = (𝑉, 𝐸) from a ∈ V to b ∈ V is a function 𝑝: {1, … , 𝑘} → 𝑉, for some 𝑘 ≥ 2, such that 𝑝(1) = 𝑎, 𝑝(𝑘) = 𝑏 and (𝑝(𝑖), 𝑝(𝑖 + 1)) ∈ 𝐸 for each 𝑖 = 1, … , 𝑘 − 1. 𝑘 − 1 is the path length. The path 𝑝 connects 𝑎 to 𝑏. By definition, a node is also a path, of length 0. If 𝑎 and 𝑏 are equal, the path is a cycle. See: Price Index Digraph. See: Price Index Graph. Price index matrix. For a period 𝑊 = {1, … 𝑛}, consisting of components 𝑃𝑖𝑗 that are price indices with reference month 𝑖 and reporting month 𝑗. Population of goods One of the ingredients of a graph, apart from the edges. The points typically represent objects and the edges relations between these objects. In the context of the present paper a digraph in which the CBS | Discussion Paper | May 2018 57
(PIDG) Price Index Graph (PIG)
RGLS Rough price index
Spanning tree (of a graph)
SPoG State
Static population of goods (SPoG)
Subgroup Time reversal
TPD model
Trace (operator), 𝑇𝑟
Transitive closure
points corresponds to time periods (i.c. months) and the arrows to price comparisons or price indices. The underlying graph of a Price Index Digraph (PIDG). This graph has the same points as the PIDG and an edge {𝑣, 𝑤} in the PIG corresponds to an arc (𝑣, 𝑤) or (𝑤, 𝑣) (or both) in the corresponding PIDG. Restricted Generalized Least Squares estimator. A bilateral price index for a pair of periods. To compute them one only needs information from these periods. They serve as input for multilateral index methods to transitivize a collection of such indices, as part of a PIDG. A sub graph of a given graph with the same set of nodes as the graph and with a subset of the edges of the graph, which is a tree. The graph is obtained by adding extra edges (provided it is not a tree already). See: Static population of goods. Typically the average information concerning prices of goods in years and months is an example of a state, in a temporal setting. In a spatial setting is could concern regions (countries, provinces, municipalities, etc.). In a spatio-temporal setting a state could be a region at a particular time interval. So 'state' is a neutral indicator of the meaning of units about which is reported. A population that does not change in type of item. In reality static populations do not exist in the long run, only in a short period of time. It is a convenient concept on which classical index theory was built. See: Composite. The property of a price index that if base month and reporting month are reversed, the index is the reciprocal value of the original price index. Symbolically: 𝑃𝑏𝑎 = 1⁄𝑃𝑎𝑏 , where 𝑃𝑎𝑏 denotes a price index with base month 𝑎 and reporting month 𝑏. Time product dummy model. It is a multiplicative model for prices of (subgroups of) items i and months j. If 𝑝𝑖𝑗 is such a price for item I and month j, then the TPD model assumes that it can be written as the follows: 𝑝𝑖𝑗 = 𝜆𝛼𝑖 𝜋𝑗 𝜀𝑖𝑗 , where 𝜆 is a constant, 𝛼𝑖 is a parameter that depends on item i only and 𝜋𝑗 on month j only, and 𝜀𝑖𝑗 is an error term depending on i and j. The 𝑝𝑖𝑗 's are observed and the parameters are estimated by minimizing a function of the 𝜀𝑖𝑗 's. A linear operator on square matrices that yields the sum of 𝑎11 ⋯ 𝑎1𝑛 ⋱ ⋮ ) and 𝑇𝑟 is their diagonal elements. So if 𝐴 = ( ⋮ 𝑎𝑛1 ⋯ 𝑎𝑛𝑛 the trace operator operating on 𝐴 then 𝑇𝑟 𝐴 = ∑𝑛𝑖=1 𝑎𝑖𝑖 . If 𝐺 = (𝑉, 𝐴) is a digraph its transitive closure is 𝐺 ∗ = (𝑉, 𝐴∗ ) CBS | Discussion Paper | May 2018 58
(of a digraph)
Transitive price index Transitive reduction
Transitivization Vertex Weight matrix
with 𝐴 ⊆ 𝐴∗ such that if (𝑎, 𝑏), (𝑏, 𝑐) ∈ 𝐴∗ then also (𝑎, 𝑐) ∈ 𝐴∗ . Alternatively, an arc (𝑎, 𝑏) ∈ 𝐴∗ if and only if there is a path in 𝐺 from 𝑎 to 𝑏. A price index 𝑃 is transitive if 𝑃𝑎,𝑐 = 𝑃𝑎,𝑏 𝑃𝑏,𝑐 for all points 𝑎, 𝑏, 𝑐 in a PIDG. If 𝐺 = (𝑉, 𝐴) is a digraph its transitive reduction 𝐺 ∘ = (𝑉, 𝐴∘ ) is a minimal sub digraph of 𝐺, such that the transitive closure of 𝐺 and G∘ coincide, symbolically, if 𝐺 ∗ = (𝐺 ∘ )∗ . 'Minimal' means that no arc of G∘ can be removed without affecting the transitive closure property of G∘ . The process of making a price index transitive. See: Point. Specifically: in the cycle method. This is a matrix that controls how much the price indices corresponding to the various arcs in the associated PIDG can be changed when the cycle method is applied to adjust a nontransitive price index. Typically the weight for the arc (𝑖, 𝑗) is chosen to depend on |𝑖 − 𝑗|, the distance of the months 𝑖 and 𝑗 (assuming that the months are consecutively numbered, and not by year and month).
CBS | Discussion Paper | May 2018 59
Appendix B. Algebra of price indices Part B.1 If we arrange the price indices for a particular application at a time interval 𝑇 = {1, … , 𝑛} in a matrix form, we obtain a set 𝒫𝑛 of 𝑛 × 𝑛 nonnegative matrices. If we assume the price indices to have certain general properties such as identity, reciprocity (time reversal) and transitivity, this reflects into the structure of the matrices in 𝒫𝑛 and in particular in its algebraic structure. Some of these are mentioned here. Let 𝑃, 𝑄, 𝑅 ∈ 𝒫𝑛 , with 𝑃 = (𝑝𝑖𝑗 ), 𝑄 = (𝑞𝑖𝑗 ), 𝑅 = (𝑟𝑖𝑗 ). Then 𝑝𝑖𝑗 ≥ 0, 𝑝𝑖𝑗 > 0 if and only if 𝑝𝑗𝑖 > 0 and 𝑝𝑖𝑗 𝑝𝑗𝑖 = 1. 𝒫̅𝑛 be the subset of 𝒫𝑛 consisting of matrices 𝑃 that are strictly positive, i.e. for which 𝑃 > 0. 1 Let 𝕀𝑛 = ( ⋮ 1
⋯ 1 ⋱ ⋮ ), then 𝕀𝑛 ∈ 𝒫̅𝑛 . ⋯ 1
We can define a multiplication on 𝒫̅𝑛 as follows: 𝑃 ⊛ 𝑄 = (𝑝𝑖𝑗 𝑞𝑖𝑗 ), that is elementwise multiplication. ⊛ is a Hadamard product. Obviously, 𝑃 ⊛ 𝑄 ∈ 𝒫̅𝑛 . The following properties hold: 1. 2. 3. 4.
𝑃 ⊛ 𝑄 = 𝑄 ⊛ 𝑃 (commutativity), 𝑃 ⊛ 𝕀𝑛 = 𝕀𝑛 ⊛ 𝑃 = 𝑃 (𝕀𝑛 is a unit element), (𝑃 ⊛ 𝑄) ⊛ 𝑅 = 𝑃 ⊛ (𝑄 ⊛ 𝑅) (associativity), Furthermore, for each 𝑃 ∈ 𝒫̅𝑛 there is a (unique) 𝑄 ∈ 𝒫̅𝑛 such that 𝑃 ⊛ 𝑄 = 𝑄 ⊛ 𝑃 = 𝕀𝑛 (existence of a (right and left) inverse).
The properties 1, 2 and 3 hold because they are inherited from ℝ\ℝ− , the set of positive real numbers, which is the common domain of the entries of the matrices in 𝒫𝑛 . Because of property 3 we can leave out the bracket and write 𝑃 ⊛ 𝑄 ⊛ 𝑅 without ambiguity. As to property 4: take 𝑞𝑖𝑗 = 𝑝𝑗𝑖 then 𝑄 = (𝑞𝑖𝑗 ) is also strictly positive and 𝑝𝑖𝑗 𝑞𝑖𝑗 = 𝑝𝑖𝑗 𝑝𝑗𝑖 = 1, so that 𝑃 ⊛ 𝑄 = 𝕀𝑛 and hence also 𝑄 ⊛ 𝑃 = 𝕀𝑛 . So we conclude that Q is the inverse of 𝑃. It is also the unique (left and right) inverse of 𝑃. To indicate this we write 𝑄 = 𝑃𝑖𝑛𝑣 . Hence, we conclude that 𝒫̅𝑛 is an abelian group, with '⊛' as multiplication and ' as an inversion operator.
𝑖𝑛𝑣
'
It is also possible to define roots on 𝒫𝑛 : if 𝑃 = (𝑝𝑖𝑗 ) ∈ 𝒫𝑛 then
CBS | Discussion Paper | May 2018 60
𝑚
√𝑝11 √𝑃 ≜ ( ⋮ 𝑚 √𝑝𝑛1
⋯ ⋱ ⋯
𝑚
𝑚
√𝑝1𝑛 ⋮ ). 𝑚 √𝑝𝑛𝑛
𝑚
(B.3.1)
𝑚
It also holds that √𝑃 ⊛ … ⊛ √𝑃 = 𝑃, where the product consists of 𝑚 terms. If the price index is transitive we have 𝑝𝑖𝑗 𝑝𝑗𝑘 = 𝑝𝑖𝑘 , for all 𝑖, 𝑗, 𝑘 ∈ {1, … , 𝑛}. Suppose 𝑃 = (𝑝𝑖𝑗 ) is a transitive PIM and let 𝑛 > 2. Then for each entry 𝑝𝑖𝑗 we can find a 𝑘 ∈ {1, … , 𝑛}/{𝑖, 𝑗} such that 𝑝𝑖𝑗 = 𝑝𝑖𝑘 𝑝𝑘𝑗 = 𝑝𝑖𝑗 . We then have matrices 𝑃1 , 𝑃2 ∈ 𝒫𝑛 such that 𝑃 = 𝑃1 ⊛ 𝑃2 where 𝑃1 , 𝑃2 are derived from the decomposition of the entries by applying transitivity. So each transitive P has divisors. These need not to be unique. Example 1 (𝑝21 𝑝31
𝑝12 1 𝑝32
𝑝13 𝑝23 ) 1
1 (𝑝23 𝑝31 𝑝32 𝑝21
=
𝑝13 𝑝32 1 𝑝31 𝑝12
𝑝12 𝑝23 𝑝21 𝑝13 ) 1 (B.3.2)
1 (𝑝23 𝑝32
=
𝑝32 1 𝑝31
𝑝23 1 𝑝13 ) ⊛ (𝑝31 1 𝑝21
𝑝13 1 𝑝12
𝑝12 𝑝21 ) 1
The matrices in the final expression of (B.3.2) belong to 𝒫𝑛 . Note that this decomposition is not unique: the second expression can be replaced by 1 (𝑝31 𝑝32
𝑝13 1 𝑝31
𝑝23 1 𝑝13 ) ⊛ (𝑝23 1 𝑝21
𝑝32 1 𝑝12
𝑝12 𝑝21 ), 1
(B.3.3)
𝑝32 1 𝑝31
𝑝23 1 𝑝13 ) ⊛ (𝑝31 1 𝑝21
𝑝13 1 𝑝12
𝑝12 𝑝21 ), 1
(B.3.4)
or by 1 (𝑝23 𝑝32
to mention but a few possibilities. ■ Example Period: months 1,…,5. Arrange the price indices for this period in matrix form: 1 11
𝑝 𝑝21 𝑝31 𝑝41 (𝑝51
12
𝑝 𝑝22 𝑝32 𝑝42 𝑝52
13
𝑝 𝑝23 𝑝33 𝑝43 𝑝53
14
𝑝 𝑝24 𝑝34 𝑝44 𝑝54
15
𝑝 𝑝25 𝑝35 𝑝45 𝑝55 )
1 𝑝12 1
=
𝑝12
𝑝13
𝑝14
𝑝15
1
𝑝23
𝑝24
𝑝25
1
𝑝34
𝑝35 .
1
𝑝13 1
𝑝23 1
𝑝14 1
𝑝24 1
𝑝34 1
(𝑝15
𝑝25
𝑝35
1
1 1 𝑝45
(B.3.5)
𝑝45 1 )
CBS | Discussion Paper | May 2018 61
Now use the representation result of Section 2.1 for transitive price indices as ratios of average monthly prices: 1 𝑝1
𝑝2
𝑝3
𝑝4
𝑝5
𝑝1
𝑝1 𝑝3
𝑝1 𝑝4
𝑝1 𝑝5
𝑝2
𝑝2 𝑝4
𝑝2 𝑝5
𝑝3
𝑝3 𝑝5
1
𝑝2 𝑝1
𝑝2
𝑝3 𝑝1
𝑝3 𝑝2
𝑝3
𝑝4 𝑝1
𝑝4 𝑝2
𝑝4 𝑝3
(𝑝5
𝑝5
1
𝑝5
1 𝑝4 𝑝5
𝑝4
1)
1 1 𝑝2 1
=
𝑝2
𝑝3
𝑝4
𝑝5
1
𝑝3
𝑝4
𝑝5
1
𝑝4
𝑝5
1
𝑝3 1
𝑝3 1
𝑝4 1
𝑝4 1
𝑝4 1
(𝑝5
𝑝5
𝑝5
1
1 1 𝑝5
𝑝5 1)
1
1
1
1
1
𝑝1
𝑝1 1
𝑝1 1
𝑝1 1
𝑝2
𝑝2 1
𝑝2 1
𝑝3
𝑝3 1
𝑝1
1
⊛ 𝑝 1
𝑝2
1
𝑝1
𝑝2
𝑝3
1
(𝑝1
𝑝2
𝑝3
𝑝4
.
𝑝4
1) (B.3.6)
This decomposition is not unique. The first matrix on the right-hand side of (B.3.6) can be multiplied by a factor 𝑓 > 0 and the second with a factor 1⁄𝑓 , leaving their product unchanged. More succinctly we can write the matrix on the left-hand side of (B.3.6) as 1
1
𝑝1
𝑝5
′
( , … , ) (𝑝1 , … , 𝑝5 ),
(B.3.7)
where the multiplication of the two vectors is ordinary matrix multiplication. ■ Part B.2 Let ℐ𝑇 be a set of transitive indices on a base set V of months. Let 1 be the index that is equal to 1 for every pair of months 𝑖, 𝑗 ∈ 𝑉: 1𝑖𝑗 = 1. For 𝜋1 , 𝜋2 ∈ ℐ𝑇 define (as in Section 2.7): 𝜋1 ⋇ 𝜋2 ≜ √𝜋1 𝜋2 .
(B.3.8)
As is shown in Section 2.7 the product is well-defined on ℐ𝑇 . This a product on ℐ𝑇 with the following properties: 1 ∈ ℐ𝑇 . For all 𝜋1 , 𝜋2 ∈ ℐ𝑇 : 𝜋1 ⋇ 𝜋2 = 𝜋2 ⋇ 𝜋1 (commutativity). For all 𝜋1 ∈ ℐ𝑇 : 𝜋1 ⋇ 𝜋1 = 𝜋1 (idempotency). For each 𝜋1 ∈ ℐ𝑇 : there is a 𝜋2 ∈ ℐ𝑇 such that 𝜋1 ⋇ 𝜋2 = 1 (existence of an inverse). e. For 𝜋1 ∈ ℐ𝑇 and 𝜋2 , 𝜋3 ∈ ℐ𝑇 such that 𝜋1 ⋇ 𝜋2 = 𝜋1 ⋇ 𝜋3 = 1 we have 𝜋2 = 𝜋3 (uniqueness of the inverse). So we can write 𝜋1−1 as the unique inverse in of 𝜋1 ∈ ℐ𝑇 . f. (𝜋1 ⋇ 𝜋2 ) ⋇ (𝜋3 ⋇ 𝜋4 ) = (𝜋1 ⋇ 𝜋3 ) ⋇ (𝜋2 ⋇ 𝜋4 ) = (𝜋1 ⋇ 𝜋4 ) ⋇ (𝜋2 ⋇ 𝜋3 ) (exchangeability). a. b. c. d.
1 ∈ ℐ𝑇 is not a unit element, as 𝜋1 ⋇ 1 ≠ 𝜋1 , in general. But, in view of property d, it shares a property with a unit element in groups. Also ⋇ is not associative: 𝜋1 ⋇ (𝜋2 ⋇ 𝜋3 ) ≠ (𝜋1 ⋇ 𝜋2 ) ⋇ 𝜋3 , in general. So (ℐ𝑇 ,⋇, −1 ) is an uncommon algebraic strucCBS | Discussion Paper | May 2018 62
ture: it is not a category, a semi-group, a group or a groupoid, because it is not associative. But it is a magma, a quasigroup and a loop, with some extra properties (idempotency, commutativity, exchangeability).
CBS | Discussion Paper | May 2018 63
Explantion of symbols Empty cell . * ** 2017–2018 2017/2018 2017/’18 2015/’16–2017/’18
Figure not applicable Figure is unknown, insufficiently reliable or confidential Provisional figure Revised provisional figure 2017 to 2018 inclusive Average for 2017 to 2018 inclusive Crop year, financial year, school year, etc., beginning in 2017 and ending in 2018 Crop year, financial year, etc., 2015/’16 to 2017/’18 inclusive Due to rounding, some totals may not correspond to the sum of the separate figures.
Colophon Publisher Statistics Netherlands Henri Faasdreef 312, 2492 JP The Hague www.cbs.nl Prepress Statistics Netherlands, CCN Creation and visualisation Design Edenspiekermann Information Telephone +31 88 570 70 70, fax +31 70 337 59 94 Via contactform: www.cbsl.nl/information © Statistics Netherlands, The Hague/Heerlen/Bonaire 2018. Reproduction is permitted, provided Statistics Netherlands is quoted as the source.
CBS | Discussion Paper | May 2018 64