A computer software price index using scanner data Marc Prud’homme Statistics Canada Dimitri Sanga Transport Canada Kam Yu Department of Economics, Lakehead University
Abstract. In the last 20 years, the expenditure share of prepackaged software in the national output has grown. The large number of characteristics in computer software make hedonic regression techniques impractical for purposes of controlling for quality changes. In this study, matched model price indices are constructed using monthly scanner data on prices and unit values for various prepackaged computer software titles and categories sold in Canada from January 1996 to June 2000. Quality differences are controlled for by applying the maximum overlap method. Results show that prices declined during the period studied at an average annual rate of 5.9%. JEL classification: C43, L86 Un indice de prix de logiciel a` partir de donne´es scanographiques. Au cours des derniers 20 ans, la portion des de´penses pour les logiciels dans la de´pense nationale s’est accrue. Le vaste e´ventail de caracte´ristiques des logiciels rend l’utilisation des techniques de re´gression he´donique inutilisable a` cause de leur incapacite´ a` normaliser les changements de qualite´. Dans ce texte, on construit des indices de prix mensuels a` partir de donne´es scanographiques sur les prix et valeurs unitaires pour diffe´rentes familles de logiciels vendus au Canada entre janvier 1996 et juin 2000. Les diffe´rences de qualite´ sont controˆle´es en utilisant la me´thode du maximum de recoupement. Les re´sultats indiquent que les prix ont chute´ au cours de la pe´riode e´tudie´e a` un rythme annuel de 5.9%
We are grateful to Statistics Canada for research support; Wally Lebreton for programming assistance; and Ana Aizcorbe, Ernst Berndt, Erwin Diewert, Ellen Dulberger, Brent Moulton, Alice Nakamura, Mike Shannon, Jack Triplett, participants of the Brookings Workshop on Economic Measurement and the ZEW conference on Price Indices and the Measurement of Quality Changes, and two anonymous referees for valuable suggestions and comments. The views expressed in this paper are those of the authors and do not represent those of Statistics Canada or Transport Canada. Email:
[email protected] Canadian Journal of Economics / Revue canadienne d’Economique, Vol. 38, No. 3 August / aouˆt 2005. Printed in Canada / Imprime´ au Canada
0008-4085 / 05 / 999–1017 /
Ó
Canadian Economics Association
1000
M. Prud’homme, D. Sanga, and K. Yu
This paper is the result of an initial investigation by Statistics Canada into the construction of a price index for computer software. It examines the behaviour of ‘prepackaged’ software prices.1 Research in this area is lagging compared with that of other hi-tech products such as computers and semi-conductors, which has progressed rapidly. In fact, Jorgenson (2001) explicitly recognizes that important information gaps exist for software prices and that better information in this area is needed to improve the measurement of overall economic performance. There are two important reasons for a ‘made-inCanada’ software price index. First, the current consumer price index (CPI) published by Statistics Canada does not include software prices in its basket. Second, quality improvements in software may lead to substantial productivity gains. The current Canadian system of national accounts (SNA) borrows a U.S.based deflator for software; therefore, more accurate measurement will increase our understanding of the importance of software in productivity issues. There is a long tradition of successful application of the hedonic approach to production of constant-quality price indices for products subject to rapid technological change. In contrast, only a few studies have used hedonics for producing quality-adjusted price indices for software. A number of reasons are probably behind the absence of more research. First, a reliable hedonic regression requires large amounts of good quality data about a product’s features and characteristics. Such information for software has simply not been as readily available to researchers as it has been for computers.2 Second, characteristics associated with software are, in many instances, difficult to identify (Hollanders and Meijers 2001). Oliner and Sichel (1994) admit that hedonic models, because of their need for quantifiable characteristics for each product, may not be suited to complex and hard-to-describe products like software. Brynjolfsson and Kemerer (1996) mention that estimating precise hedonic functions for software is likely to be difficult because strategic pricing and other factors beyond the hedonic functions may be significant. For example, in a web study comparing the virtues of WordPerfect 7 against MS Word 7, the number of features that were evaluated was over 150. This was for just two varieties of the same product type. Clearly a hedonic price index would be difficult to produce if the exercise was extended to a larger number of products. By its very nature, the heterogeneity of the 1 For the remainder of the paper we will simply refer to prepackaged software as software, unless otherwise noted. 2 At one time, we were thinking that maybe a reliable hedonic estimate could be obtained if the number of lines of codes in a given software program was used as an explanatory variable. The reasoning was that an increase in the number of lines of code from one period to the next of a given software title may be viewed as an improvement in quality over the previous version. For example, in 1990, Windows 3.1 had two and a half million lines of code. Today, Windows XP has 40 million lines of code. This view is probably too simplistic. According to McGraw, a leading computer guru of sorts (see Festa, 2001), ‘software is way more complicated than it used to be . . . and the best way to determine how many problems are going to be in a piece of software is to count how many lines of code it has. The simple metric goes like this: more lines, more bugs.’
Computer software price index
1001
product itself precludes the possibility of producing hedonic price indices. Prices move differently depending on the category, which means a large representative sample is required in order to reflect what is happening with software prices in general. Monitoring these changes would be very resource intensive. Furthermore, certain important quality aspects such as ease-oflearning and ease-of-operation are subjective and thus difficult to quantify (Varian 1993). Finally, there is a lack of solid economic research on the software industry and, more specifically, the factors behind software pricing practices. Indeed, Varian (1993) states that the market for computer software is large and rapidly growing. Despite this, there has been little theoretical investigation of the unique economic features of the software market. Although Varian said this almost 10 years ago, research in this field has not exactly mushroomed since then. ACNielsen has recently agreed to provide Statistics Canada with scanner data for prepackaged computer software. The data are collected during the course of their market research activities. We believe that this is the first time that such information has been made available to study software prices. Interest in scanner data for price index research is growing and with good reason. Hawkes and Piotrowski (2000) point out three benefits to price statisticians from using scanner data or other electronic point-of-sale data: (1) more data and, consequently, less variance; (2) better data and consequently, less bias; and (3) better methods.3 Moreover, the authors suggest that such data offer a way around many of the obstacles that prevent the use of hedonics, notably for items where product turnover is rapid, the number of relevant characteristics is large, and the current sample size is small.4 Given some of the challenges associated with producing a hedonic price index for software, this paper proposes the use of scanner data and the maximum overlap method to deal with the quality problem. This approach, a variant of the traditional matched model approach, seems well suited for producing price indices when scanner data are available. The results of this paper show that software prices for business and government software declined during the studied period at an average annual rate of almost 4.4%. The current practice used by the Canadian SNA results in an average annual drop of slightly less than 4% in the price index used as a deflator for software investment expenditures by governments and businesses. Therefore, results are similar. Consumer software prices fell faster, dropping, on average, 7.9% annually. The paper proceeds as follows. Section 1 provides context on software expenditures in the Canadian SNA and CPI. Section 2 introduces the 3 These advantages to using scanner data are based on a study for food items. Clearly these same advantages would apply to other products. 4 Among the growing body of studies using scanner data we note Lowe (1998) and Ioannides and Silver (2003).
1002
M. Prud’homme, D. Sanga, and K. Yu
maximum overlap approach used to compute the price indices. Section 3 describes the two data sets used in the analysis. Section 4 presents the results of the multi-period matching exercise and a number of indices, followed by comparisons with other results in section 5. Section 6 concludes. 1. Software in the system of national accounts and the consumer price index The Canadian SNA recognizes and tracks investment expenditures and prices for three broad software categories (Jackson 2000). They are custom software, own account software, and prepackaged software.5 Prepackaged software is sold or licensed in standardized form and is delivered in packages or electronic files downloaded from the Internet. Custom software is tailored to the specific application of the user and is delivered along with analysis, design, and programming services required for customization. Own-account software consists of software created for a specific application (Parker and Grimm 2000). Furthermore, software investment expenditures in the SNA are divided into government purchases and business purchases, but given that both sectors probably purchase similar software products, there is no need for a separate price index for each sector. A software price index would also improve the CPI. Currently, no software prices, imputed or otherwise, are collected for the Canadian CPI. The absence of a reliable quality adjustment methodology is part of the reason why no attempts have been made to collect software prices in the CPI. In what follows, we experiment with a ‘matched model’ and compare the results with some published software price indices. We find that our results are in line with other matched model indices for software. The methodology is described in the next section, followed by the empirical results. 2. Methodology In a typical matched model, the price of a product in the base (previous) period is compared with the price of the product with identical attributes or characteristics in the comparison (current) period. Differences in these prices can be interpreted as pure price change devoid of influence from quality changes. In normal practice, vanished or replaced products are deleted from the sample. A new product is included in the sample for matching in the next period. In effect, all products that are commonly available in two successive periods are matched and used to measure the change in the price index. This so-called maximum overlap method is used in the International Price Program (IPP) at the U.S. Bureau of Labor Statistics (BLS) for computing export and import price indices. A Eurostat (1999) task force calls this approach a quantity ‘weighted monthly chaining and resampling’ method and recommends it for 5 Other countries such as the U.S. use the same classification system.
Computer software price index
1003
computing software price indices. This also describes our approach quite faithfully. Here, the prices of all varieties of a product that is on the market each month are observed (resampling). The price change is then computed from month to month as an average over those items that are in the sample in both months. Price changes between months that are further apart are calculated by multiplying the month-to-month links (chaining). Alterman, Diewert, and Feenstra (1999) compare the IPP seasonal price index with a more robust year-over-year index and find that the former performs very well. They employ this method because a large number of commodities in the sample exhibit strong seasonal patterns and disappear from the market for several months of each year. Turvey (2001) suggests a method similar to the Eurostat method. According to Turvey, the approach is a possible solution for products that are subject to rapid technical progress leading to continually improving performance, where new models frequently appear in the market and where models fall in price as they become older. Software in many respects falls into this category of products. In the context of quality adjustment, the matched model assumes that the average price ratio of the matched products is the same as that of the products not matched. In other words, we assume that when new products come into the market, the equilibrium prices of the existing old versions are bid up or down relative to the new one. Furthermore, we assume that the sellers do not incorporate pure price changes when they introduce the new products; otherwise, these pure price changes cannot be captured by the matched samples. Triplett (1997, 2004) argues that sellers do often include pure price changes when they introduce new products. Also, when the market is slow to adjust to a new equilibrium, a matched model price index will be biased. The direction of the bias depends on whether the new products are priced upward or downward. For hi-tech products undergoing rapid quality change, such as computers, manufacturers tend to decrease prices when changing models. For example, a newly developed chip with higher speed may be cheaper to produce than the existing slower chips. In this case the matched model index does not capture the price decrease and so is biased upward. In fact, many studies of computer prices confirm that matched model indices underestimate price decline compared with hedonic indices. With this qualification in mind, we try to minimize the quality change bias with high-frequency sampling so that the resulting index from a matched model is a close approximation to a hedonic index. In the limiting case where products are identical in two periods with no ‘quality change, a matched model index will, in theory, give the same result as a hedonic index.6 An initial count of the ACNielsen data set shows average month-to-month matches of 6 See Triplett (2000) and Diewert (2001). Triplett (2004) also discusses the cases when hedonic and matched indices differ.
1004
M. Prud’homme, D. Sanga, and K. Yu
80%. With such a high percentage of matching, it is hoped that the matched model will give a satisfactory quality-adjusted price index for pre-packaged software in Canada. After matching the products in each category for two successive months, Laspeyres, PL, and Paasche, PP, price indices are calculated as follows: P 2 1 pq PL ¼ Pi i1 i1 (1) i pi qi P 2 2 pq PP ¼ Pi i1 i2 , (2) i pi qi where pti and qti are the price and quantity of product i sold in period t, t ¼ 1, 2. The implicit Laspeyres quantity index is defined as P 2 2 1 pq e Pi i1 i1 QL ¼ (3) PL i pi qi eP .7 and similarly for the implicit Paasche quantity index Q Definition (3) reflects a desirable property that the product of the price and quantity indices equals the expenditure ratio of the two periods. It is well known in index number theory that there exists a cost-of-living index (COLI) bounded between Laspeyres and Paasche indices.8 We approximate this COLI with the Fisher Ideal index, PF, a superlative index which is simply the geometric mean of PL and PP.9 Together with the maximum overlap procedure, the Fisher price index provides a close approximation to the qualityadjusted COLI. It should be pointed out that the Laspeyres and the Paasche indices are consistent in aggregation; that is, a two-stage index computed using the price and quantity subindices of each category with the same index formula yields the same result as a single-stage index using all individual prices and quantities. The Fisher index does not satisfy this property, but the discrepancy has been found to be small in practice (Diewert 1978). 3. Data description The data obtained from ACNielsen is part of their Computer Product Index for the personal computer market. Data are collected electronically (scanner) and manually from 28 major chain stores in addition to a smaller sample of 7 The implicit Laspeyres (Paasche) quantity index can be shown to be exactly the Paasche (Laspeyres) quantity index. 8 Note that the bounds do not require the assumption of homotheticity. The same bounds apply to a producer price index. 9 A superlative index is derived from an expenditure (cost) function that has a flexible functional form; that is, it can approximate an arbitrary function to the second order. For this reason a superlative index mitigates substitution bias. See Diewert (1976).
Computer software price index
1005
independent outlets across Canada. The sample also includes some on-line retailers.10 Data obtained through ACNielsen are monthly and cover the period from January 1996 through June 2000. Software products are categorized according to their function (e.g., games, word processing). Information about each product typically includes the software title, its sales volume, total generated revenue, and its average price. The average price field is derived by dividing total revenue by sales volume. In 1996, for example, ACNielsen surveyed 1,730 software titles, which were in turn grouped into 34 categories. Each category thus averages 51 titles. Each year new categories may be introduced to reflect new applications and to further refine and adjust the existing classifications. Additional information accompanying each title includes the language of the software and the operating system the particular software application is designed for. There is also an additional field that qualifies the title as a special edition version. For example, ‘MICROSOFT WORD UPGRADE W95’ stands for the upgraded version of Microsoft Word for Windows 95. From the description of the title it is not always possible to determine the exact version of the software. As will be shown below, this can have significant consequences for the construction of a price index. The data do provide the possibility of circumventing two weaknesses noted by Gandal (1994) in his study, namely, the absence of market shares and the absence of transaction prices. Without market shares (quantity data) as weights, the computed price index can be significantly biased. The absence of transaction prices can have an important effect. Brynjolfsson and Kemerer (1996) mention that the average discount can be as high as 30% and that the correlation between market price and list price was 0.88. International Data Corporation of Canada (IDC) provided the second data set. Monthly data on transaction prices are available from January 2000 to December 2002 in Canada. IDC tracks approximately 2000 items with the following variables: the vendor name, the software category, the sector of the purchaser (household, business, and government), the manufacturer number, the software description, the list price, as well as the street price (the resale price). The list price is quoted directly from the vendors (McAfee, Corel, Microsoft, Adobe, etc.) To obtain the street price, they use an average resale cost per system from their contacts in the distribution channel. This average price is then marked up by an appropriate percentage to obtain the Canadian street price.11 The IDC data have the advantage of having a unique identifier for each product and version, which makes matching very easy. Unfortunately, there are no accompanying quantity data from which to construct weights. 10 Information on how the samples are selected has not been disclosed by ACNielsen. 11 For details of the IDC sampling methods see Statistics Canada (2003).
1006
M. Prud’homme, D. Sanga, and K. Yu
TABLE 1 Data characteristics Data set
Product description
Quantity
Transaction price
Site licences & shareware
ACNielsen IDC
Software title Product code
Yes No
Yes Yes
No No
Therefore we use the average value weights for each category obtained from the ACNielsen data set to aggregate the price ratios at the category level. The resulting price indices are therefore fixed basket indices instead of Fisher indices. Neither the ACNielsen nor IDC data sets, however, includes pre-installed software on new computers, software downloaded on the Internet, and site licences for government and business uses. For example, sales of operating systems in retail stores are mainly for upgrading instead of new purchases. Also, sales of office suites may represent largely small business purchases. Data on prices and quantities of shareware downloaded on-line are also difficult to collect. Table 1 summarizes the characteristics of the two data sets. 4. Empirical results The Laspeyres, Paasche, and Fisher price indices for each category are initially computed by matching the product names using the ACNielsen data set. The resulting monthly indices for the majority of categories are found to be quite erratic with the following common observations: . The price index for a particular month is too high or too low. For example,
the month-to-month changes in the Laspeyres and Paasche indices for the category ‘Document Management’ are 6.54 and 9.83, respectively, from January to February 1996. . The spread between the Laspeyres and Paasche indices is unreasonably large. For example, the month-to-month Laspeyres and Paasche indices for the category ‘Mail’ are 1.47 and 0.51, respectively, from March to April 1997. The spread indicates that there are dramatic changes in price or quantity (which affects the weights) in some products within that category. The above aberrations are observed in about half of the monthly indices in a number of categories, namely, Anti-virus, Database, Mail, Network Administration, and Software Utility. Normal price changes can be caused by sales promotion, changes in outlet shares, or introduction of new product versions with the same names as the previous versions. The observed erratic behaviour seems, upon closer inspection, to be driven by data error. The most common forms of error are
Computer software price index
1007
TABLE 2 Outliers count in 1997 No. of observations Month
Before
After
Excluded
% Excluded
1997–01 1997–02 1997–03 1997–04 1997–05 1997–06 1997–07 1997–08 1997–09 1997–10 1997–11 1997–12 Average
1565 1549 1420 1470 1449 1447 1405 1449 1386 1414 1403 1423 1448
1406 1462 1268 1317 1291 1309 1277 1325 1238 1239 1234 1260 1302
159 87 152 153 158 138 128 124 148 175 169 163 146
10.2 5.6 10.7 10.4 10.9 9.5 9.1 8.6 10.7 12.4 12.0 11.5 10.1
. Negative prices . Sudden rise or fall in prices in one month and return to normal levels
afterward. The price fluctuations can be in the order of ten times between two consecutive months. . Fractional quantities with normal revenues, resulting in exceptional high prices. One possible explanation for the negative prices is that when some large orders were returned, the retailers register a negative price instead of a negative quantity. This ‘bouncing’ effect in prices may cause the Laspeyres and Paasche indices to drift. To exclude the outliers from the data the following screening test is imposed. In two successive months the price ratio p2/p1 of a matched product is computed. The product is excluded from the sample if p2/p1 is greater than 2 or less than 0.5. Table 2 illustrates the number of matched observations in all categories before and after the screening rule is applied to the 1997 data. Overall about 10% of the observations are discarded, which does not seem to be a high price to pay in exchange for a more representative price index. Each category’s price index is recalculated using the screened data and then aggregated to form a single index for the prepackaged software sector. Table 3 shows the overall monthly bilateral and chained Fisher indices as well as the monthly percentage change. Since we have 54 periods of observations, the average monthly index is calculated as the 53th root of the last (June 2000) chained index. Similarly, the average annual index is the 12/53th root. As expected the Laspeyres index (not shown) provides an upper bound to the true COLI, with an average annual change of 18%. This is because the underlying utility function of the index is a Leontief form based on the first period purchase.12 That is, the buyer will purchase the same first-period basket of 12 A Leontief utility function is defined by U(x) ¼ mini {xi/ai : ai > 0}.
1008
M. Prud’homme, D. Sanga, and K. Yu
TABLE 3 A Fisher price index for software Month
Bilateral
Chained
1996–01 1.000 1996–02 1.057 1.057 1996–03 0.964 1.019 1996–04 0.938 0.955 1996–05 1.006 0.961 1996–06 1.031 0.991 1996–07 0.982 0.973 1996–08 0.947 0.921 1996–09 0.999 0.920 1996–10 1.046 0.963 1996–11 0.964 0.928 1996–12 1.020 0.947 1997–01 1.020 0.966 1997–02 0.949 0.917 1997–03 0.991 0.908 1997–04 1.023 0.929 1997–05 1.051 0.976 1997–06 0.955 0.933 1997–07 1.017 0.949 1997–08 1.007 0.956 1997–09 0.980 0.936 1997–10 1.035 0.969 1997–11 0.977 0.947 1997–12 0.998 0.945 1998–01 0.974 0.921 1998–02 1.009 0.929 1998–03 0.990 0.920 Average monthly index Average annual index
% changed
Month
Bilateral
Chained
% changed
5.7 3.6 6.2 0.6 3.1 1.8 5.3 0.1 4.6 3.6 2.0 2.0 5.1 0.9 2.3 5.1 4.5 1.7 0.7 2.0 3.5 2.3 0.2 2.6 0.9 1.0
1998–04 1998–05 1998–06 1998–07 1998–08 1998–09 1998–10 1998–11 1998–12 1999–01 1999–02 1999–03 1999–04 1999–05 1999–06 1999–07 1999–08 1999–09 1999–10 1999–11 1999–12 2000–01 2000–02 2000–03 2000–04 2000–05 2000–06
1.034 0.991 0.985 1.037 0.954 1.006 0.965 1.030 1.004 1.000 0.977 0.958 1.019 1.025 0.947 1.017 0.998 0.991 0.995 0.965 1.003 1.017 0.979 0.975 0.996 0.993 0.965
0.952 0.943 0.929 0.963 0.918 0.924 0.891 0.918 0.922 0.922 0.901 0.863 0.880 0.902 0.854 0.868 0.866 0.858 0.854 0.824 0.827 0.841 0.823 0.802 0.799 0.794 0.766 0.995 0.941
3.4 0.9 1.5 3.7 4.6 0.6 3.5 3.0 0.4 0.0 2.3 4.2 1.9 2.5 5.3 1.7 0.2 0.9 0.5 3.5 0.3 1.7 2.1 2.5 0.4 0.7 3.5 0.5 5.9
goods even if relative prices change in the second period. This is reflected in equation (1), where the first-period quantities q1i are used as the weight for the prices in both periods. The buyers therefore do not react to any price change, resulting in an upward bias in the price index. On the other hand, the Paasche index assumes a Leontief utility function based on the second-period purchase. Using a similar argument, one can show that the Paasche index is downward biased. This is reflected in an average annual change of 24.9%. The Fisher index, as mentioned above, approximates the true COLI by accommodating the substitution effect, and from table 3 we see that the average annual change is bounded between the other two indices at 5.9%. Figure 1 plots the overall Fisher price index in our sample period. From the graph the overall price index does not indicate any noticeable seasonal pattern. Tables 4 and 5 present the results for the matched-model Fisher price indices with market shares and price changes for Canada in the business and consumer categories respectively. From table 4 we see that the average annual change for
Computer software price index
1009
01/1996 = 1 1.1 1.05 1 0.95 0.9 0.85 0.8 0.75 0.7 01/96
07/96
01/97
07/97
01/98
07/98
01/99
07/99
01/00
Period FIGURE 1 Fisher chained price index for software in Canada
the overall business price index is 4.4%. On an annual basis, the index registers its largest drop between 1999 and 2000, 9.3%; the smallest drop is between 1997 and 1998, 0.6%. The price changes are not uniform across all software applications. Although most exhibited price declines, some have seen their prices increase. For instance, prices for electronic forms have fallen over 18% per year on average over the studied period, the largest recorded drop among our business applications. In contrast, the profusion of computer networking in the last decade is certainly behind the average annual 18% increase in prices for those applications. Noteworthy among the price changes for business applications is the price behaviour of what are considered the major categories. For instance, database applications, which recorded a price increase of 11% between 1996 and 1997, saw its average annual price fall by 1.6% as a result of consecutive price declines over the remainder of the study period. The price of spreadsheet applications also dropped by almost 10%, on average, over the course of the study period. Prices for word processors registered only a slight average annual decline of 1.4%. In table 5 we see that prices for consumer software titles also trend downward in most cases, although prices for the overall consumer software market fell almost 7.9%, almost twice as fast as prices for business applications. The most important category within the consumer classification is, not surprisingly, games, for which there has been a marked fall in price since 1996, averaging an annual rate of decline of 11%. Notable price decreases are also recorded for
1010
M. Prud’homme, D. Sanga, and K. Yu
TABLE 4 Fisher price indices for business categories Category Accounting Anti-virus CAD & CAM Communication Database Desktop Publishing Doc Management Electronic Forms Emulators Graphics Hardware Utility Integrated Internet Mail Network Admin Network Peer Networking Operating Systems Presentation Graphics Programming Project Management Software Utility Spreadsheet Suite Tax Word Processing Total Business
Average observations
Average matches
Revenue share (%)
Monthly AGR* (%)
Annual AGR (%)
44.3 53.4 33.9 76.8 104.9 46.2 7.9 6.5 60.7 107.9 122.6 29.0 84.3 83.2 35.8 21.9 28.1 52.7 63.4 121.8 32.1 83.2 49.8 96.8 14.2 53.4
32.1 44.4 27.0 55.6 77.3 36.2 6.4 4.4 46.5 81.6 93.8 21.8 60.9 61.7 23.2 21.2 26.7 46.2 55.5 68.9 27.7 57.1 46.8 69.3 12.2 37.6
3.1 2.9 1.1 2.5 3.1 1.7 0.0 0.0 1.6 4.5 4.1 1.2 2.0 3.9 0.7 4.2 6.3 12.2 0.7 2.5 2.5 1.2 1.2 30.2 1.1 1.5 100.0
0.18 1.20 1.36 0.36 0.03 0.68 1.01 1.59 0.03 0.11 1.04 0.57 0.31 1.28 1.58 0.77 1.97 0.16 1.31 0.42 0.36 0.04 0.99 0.56 0.63 0.14 0.52
1.3 13.8 12.4 1.4 1.6 6.7 12.4 18.7 0.7 0.4 12.1 6.0 5.0 14.1 15.8 7.2 18.5 1.8 12.2 2.9 2.0 1.8 9.9 6.0 6.9 1.4 4.4
* Note: AGR ¼ average growth rate
other consumer software titles. For instance, anti-virus, education, and tax software all declined in price on an average annual basis by 13.5%, 15.5%, and 9.3%, respectively. However, not all classes of consumer titles fell in price over the study period. Communication, mail, and personal information management (PIM) recorded annual average price increases of 22.9%, 9.1%, and 18.1%, respectively.13 Although both the business and consumer software markets exhibit overall downward price trends during the study period, the rates of change are not 13 The readers may notice that some classes of products appear in the business and consumer categories. This is expected because of the existence of some multi-purpose software titles such as antivirus software and operating systems. The ACNielsen data do not distinguish between software bought for home use and that bought for business use. When faced with multi-purpose titles, half of the expenditures to each category were naively allocated between both groups. When possible, we have also attempted to separate the products and allocate them to their appropriate class. For instance, we don’t believe that many consumers are buying or have bought Windows NT or Windows 2000, therefore they are found only in the business category.
Computer software price index
1011
TABLE 5 Fisher price indices for consumer categories Category
Average observations
Average matches
Anti-virus Communication Education Edutainment Emulators Encyclopedia Games Hardware Utility Internet Mail Operating Systems PIM Software Utility Tax
43.5 22.6 13.6 128.0 23.5 27.7 61.9 87.9 40.7 49.7 16.7 29.6 55.8 13.2
34.7 13.0 11.0 101.3 12.8 24.0 49.1 65.6 29.0 36.2 16.5 29.4 35.9 11.4
Total Consumer
614.2
469.8
Revenue share (%)
Monthly AGR* (%)
Annual AGR (%)
4.7 0.8 7.1 7.6 1.1 3.7 36.2 6.2 2.3 4.6 18.0 2.5 2.0 1.9
1.19 2.57 1.64 0.23 0.43 0.24 1.08 0.65 0.63 0.91 0.17 1.87 0.07 0.83
13.5 22.9 15.5 3.2 6.2 3.5 10.9 7.3 5.2 9.1 1.4 18.1 3.0 9.3
100.0
0.68
7.9
* Note: AGR ¼ average growth rate
identical. Consumer software prices declined at a much faster pace than those for business applications. Furthermore, software titles and applications within each broad category may exhibit very different price movements. Overall, our price index for business software applications falls on an average annual basis by 4.4%. Notice that this result, because of the broader coverage of our index and the different time periods, is not strictly comparable with previous studies. Nevertheless, it is interesting to note that using database, spreadsheet, and word processing software applications as proxies (as suggested by some) for the overall prepackaged software market can be misleading. For instance, as illustrated in table 4, these applications, despite registering in one case the largest price decline (spreadsheets, 9.1%), respectively accounted for 3.1%, 1.2%, and 1.5% of the revenue share of the business software market. By contrast, sales of office suites account for 30% of the market, yet no studies have examined the price behaviour of this product. Moreover, other categories within business also have relatively important shares of the market and exhibit differing price trends when compared with the overall category. For instance, programming, networking, and graphics applications with revenue shares of 2.5%, 6.3%, and 4.5%, respectively, have seen their prices either decline by very little or even increase over the study period, therefore slowing the price decline of the overall business field. Currently the U.S. Bureau of Economic Analysis (BEA) uses in their National Income and Product Accounts a matched model price index supplied by the BLS. (see Moulton, Parker, and Seskin 1999). An adjustment for bias of 3.15% per year is made to the index, which is one-half of the difference
1012
M. Prud’homme, D. Sanga, and K. Yu
between the Oliner-Sichel matched model index and a hedonic index produced by BEA from 1985 to 1993 for spreadsheets and word processors.14 But since tables 4 and 5 show that price movements differ from category to category, imposing a bias adjustment based on two types of software with small value shares seems to be a risky decision. We also calculate a fixed-weight index from January 2000 to December 2002 using the IDC data for Canada. The result shows that the average monthly index increases at the rate of 0.1% (1.4% annually). 5. Comparing the matched model with hedonic indices A review of the literature reveals little in terms of research in the area of software prices. This contrasts sharply with computer hardware prices, for which there has been in the last 20 to 30 years a relative abundance of research by academics and practitioners alike. Furthermore, the studies that do deal with software price behaviour have mostly limited themselves to a few applications, notably spreadsheets, databases, and word processors. Gandal (1994, 1995), in a empirical study that tests for network externalities in the U.S. computer spreadsheet market, estimates price indices from 1986 to 1991 for the U.S. Given the large number of features that have either improved or been added to spreadsheets over the period, the author uses a hedonic regression approach with a log-linear specification to control for quality variations across time. Some of the characteristics used as explanatory variables are sorting, graph plotting, spreadsheet size, linking, and compatibility with Lotus 1-2-3, the most popular title at the time of the study. Gandal’s 1994 study shows that the quality-adjusted price of spreadsheets declines at an average annual rate of 15%. The Gandal results are compatible with those obtained by Brynjolfsson and Kemerer (1996), who, while studying the effects of network externalities and using a hedonic approach for the 1987–92 period, arrive at an annual price decline of 16% for spreadsheets. Using all the characteristics in both Brynjolfsson and Kemerer (1996) and Gandal (1994), McCahill (1997) studies spreadsheet prices from 1986 to 1993 and finds that, on average, their prices fall by 9.6% annually. In the same study, he finds that prices of word-processing software decline annually by 18.5% from 1985 to 1994. The rapid price declines reported in these studies, all of which use unweighted regression models, seem excessive. It is possible that prices of the less popular models are cheaper or decline at a faster rate, so that the overall price indices are downward biased. Oliner and Sichel (1994), because of the limitations mentioned previously in producing hedonic price indices for software, use a matched-model approach 14 The data on characteristics used in the regression by the BEA from National Software Testing Laboratories’ Rating Reports are no longer available from 1994 onward.
Computer software price index
1013
for three popular categories of software applications: word processors, spreadsheets, and databases. They find that the matched-model indices decline by an average of 2.7% a year for all three categories for the 1987–93 period.15 Harhoff and Moch’s (1997) study of database prices in Germany from 1986 to 1994 find that the hedonic price index declines by 7.41% a year while matched model indices at the version, product, and brand levels of aggregation fall on an annual basis by 9.25%, 4.36%, and 3.86%, respectively. Moulton, Parker, and Seskin (1999) describe the sources of prices used to arrive at the deflators of their current-dollar estimates of business and government purchases of software. Price data for software are obtained from the following sources: BEA hedonic price indices for 1985–94 for business applications; matched-model indices for selected types of software, including spreadsheets, databases, and word processing; matched-model price indices for 1985–93 that were developed by Oliner and Sichel; and beginning in 1997, a BLS producer price index (PPI) for applications software that is also based on prices of matched models. In 1998 the U.S. Bureau of Labor Statistics (BLS) started producing a subindex for computer software and accessories in the CPI based on a matched-model approach, including overlap and production cost approaches. Overall, the comparable average annual price changes are 5.8% and 8.1% for the U.S. and Canada, respectively, from December 1997 to June 2000, a period during which data are available in both countries. The difference can certainly be partly explained, on the one hand, by the use of a different index formula and, on the other hand, by differences in both markets. Monthly price fluctuation in Canada seems to be slightly more volatile in the U.S. This is perhaps because the U.S. subindex is broader in scope, including what they term ‘other computer accessories,’ which may manifest different price movements than software. Table 6 summarizes these studies according to the approach used to control for quality change, types of software covered in the studies, countries, the period of analysis, and the reported average annual price change. Almost all the studies are for the period from the mid-1980s to the early 1990s. With the exception of Gandal (1995), as is usually the case, hedonic price indices generally decline more rapidly than their matched-model counterparts.16 The 4.4% average annual change in Harhoff and Moch’s (1997) matched-model index is based on matching the product name of the packages. When the packages are matched with the same versions in addition to names, they find that the index falls 9.25% annually, which is below their hedonic result of 15 In their study, spreadsheet prices fall by an annual rate of 4.5%, on average, for the 1985– 93 period. 16 As previously mentioned, Gandal (1995) examines the possibility of network externalities, and as a result his software price indices play a secondary role in the study. Therefore, the paper does not present the price indices per se. The numbers in table 6 are derived from the estimated time dummies in tables 3 and 4 of the paper.
1014
M. Prud’homme, D. Sanga, and K. Yu
TABLE 6 Comparison of price index studies in computer software Study
Method
Type
Country
Period
AAC (%)
Gandal (1994) Gandal (1995)
Hedonic Hedonic Hedonic Hedonic
Spreadsheet Spreadsheet Database Spreadsheet
U.S. U.S. U.S. U.S.
1986–91 1989–91 1989–91 1987–92
15.0 4.4 1.5 16.0
Hedonic Hedonic Hedonic Matched Matched Matched Matched Matched
Spreadsheet Word Processing Database Database Word Processing Spreadsheet Database General
U.S. U.S. Germany Germany U.S. U.S. U.S. U.S.
1986–93 1985–94 1986–94 1986–94 1985–93 1985–93 1985–93 1998–2000
9.6 18.5 7.4 4.4 2.6 4.5 4.7 6.6
Brynjolfsson and Kemerer (1996) McCahill (1997) Harhoff and Moch (1997) Oliner and Sichel (1994) BLS
7.41%. The wide diversity of the results in the hedonic studies is what led Oliner and Sichel (1994) to conjecture that the application of hedonic techniques to software may be problematic. A hedonic approach is not feasible in our present study for at least three reasons. First, details of characteristics for each product are not available in our data set. Second, each hedonic study normally requires some product knowledge. Therefore, a certain amount of research effort is required to construct a hedonic index for any type of software. Given the number of categories available (34 in 1996), it is impractical to apply the hedonic technique to each individual category in order to come up with an aggregate price index for the software sector. Third, products in some categories, such as games and education, are very diversified in nature. They are grouped under the same categories not because they perform a commonly well-defined task, as is the case in spreadsheet and word processing. Therefore, it is difficult to find a common set of characteristics in these categories that would make a hedonic exercise feasible. Eurostat (1999) reports that none of the participating countries in Europe uses the hedonic method to compute the price index for software. Even in the computer software literature, the notion of quality is nebulous. Software quality evaluations are mostly based on subjective judgment by experts in the field (Rosqvist, Koskela, and Harju 2003). For example, software designers use different attributes, called metrics, to predict faults in program modules. Commonly used metrics include number of operators and information content of operators. A measure of quality is the number of problems or faults discovered during a system test that resulted in a change to the code (Khoshgoftaar and Allen 2001). Others define quality as reliability, maintainability, understandability, testability, and fault-tolerance, and so on (Bieman 2003).
Computer software price index
1015
01/2000 = 1 1.03 1.02 1.01
+ +
1+*
*
0.96
*
+
0.98 0.97
+
*
0.99
+
* IDC ACN BLS
0.95 01/00
* *
+
02/00
03/00
04/00
05/00
06/00
Period FIGURE 2 Comparison of IDC, ACNielsen, and BLS price indices
There is a six-month overlapping period for the ACNielsen, IDC, and the BLS data. In figure 2 we plot the three price series from January to June 2000. The divergence between the ACNielsen and the IDC series illustrates the importance of using the appropriate value weights and the coverage of product prices. Overall, the IDC data contain many fewer observations than the ACNielsen data. The results from using matched modelling for software are probably reasonable, at least with regard to the later years. Most spreadsheets and word processors are quite compatible with each other these days. The new features that are added to the more recent versions of the more popular software applications seem to be marginal. For instance, Keizer (2001), in a review of Office XP, claims that the latest edition of the world’s leading productivity suite does not take us light-years ahead of Office 2000, its previous incarnation. Costa de Oliveira (1997), in a more scientific approach shares Keizer’s view, whereby there is no value added to the software vendor to keep investing in easy-to-use enhancements in order to attract new users. Users are probably paying more today for their software, but the marginal benefits of the additional features may apply only to the few (the so-called power users). This is certainly the case when one weighs the benefits of the new features against the costs associated with not having the software. 6. Conclusions We have obtained price and quantity data from ACNielsen for software products in Canada. Our initial investigation shows that given the data content
1016
M. Prud’homme, D. Sanga, and K. Yu
and the complexity of the problem, it is impractical at this stage to carry out a full hedonic study in this sector. Some of the defects in the data have justified a sensible but unavoidable arbitrary elimination of outliers, while a major remaining defect is the absence of precise specifications of the products so that, in an unknown proportion of cases, the price ratios that have entered the index for different products have been treated as though they were the same product. With these caveats in mind, we believe that we have produced a ‘best practice’ price index using more detailed data than were previously available. Matched-model results using a monthly chained Fisher index indicate that prices of pre-packaged software in Canada decline at an average annual rate of 5.9% overall and 4.4% for business and government software applications from January 1996 to June 2000; during the same period consumer software titles fell 7.9%. The results are comparable, at least with regard to the direction of the movements, to other matched-model indices, particularly the one produced by the BLS for the CPI in the U.S. Published hedonic studies on spreadsheets and databases show high variability in the price indices. One study using German data shows that the direction of bias of the matched model relative to the hedonic approach depends on whether or not different versions of the same products are matched. Our data set includes limited information in this regard; therefore, it is difficult to judge the direction and degree of the bias, if any, in our matched-model index. References Alterman, William F., W. Erwin Diewert, and Robert C. Feenstra (1999) International Trade Price Indexes and Seasonal Commodities (Washington, DC: Bureau of Labor Statistics) Bieman, James (2003) Editorial, ‘The illusive nature of quality,’ Software Quality Journal 11, 7–8 Brynjolfsson, Erik, and Chris F. Kemerer (1996) ‘Network externalities in microcomputer software: an econometric analysis of the spreadsheet market,’ Management Science 42, 1627–47 Costa de Oliveira, Eduardo (1997) ‘Growing software: an economic analysis,’ http:// citeseer.ist.psu.edu/362244.html Diewert, W.E. (1976) ‘Exact and superlative index numbers,’ Journal of Econometrics 4, 115–45 –– (1978) ‘Superlative index numbers and consistency in aggregation,’ Econometrica 46, 883–900 –– (2001) ‘Hedonic regressions: a consumer theory approach,’ Discussion Paper No. 0112, Department of Economics, University of British Columbia Eurostat (1999) Report of the Task Force on Volume Measures for Computers and Software Luxembourg: Eurostat B1/CN 408e Festa, Paul (2001) ‘The root of the problem: bad software,’ http://news.com.com/20081082-276316.html?legacy=cnet Gandal, Neil (1994) ‘Hedonic price indexes for spreadsheets and an empirical test for network externalities,’ RAND Journal of Economics 25, 160–70 –– (1995) ‘Competing compatibility standards and network externalities in the PC software market,’ Review of Economics and Statistics 77, 599–608
Computer software price index
1017
Harhoff, Dietmar, and Dietmar Moch (1997) ‘Price indexes for PC database software and the value of code compatibility,’ Research Policy 26, 509–20 Hawkes, William J., and Frank W. Piotrowski (2000) ‘Using scanner data to improve the quality of measurement and the measurement of quality in the Consumer Price Index,’ unpublished manuscript, ACNielsen Company, 16 September 2000 Hollanders, Hugo, and Huub Meijers (2001) ‘Quality-adjusted prices and software investments: the use of hedonic price indexes,’ http://www.merit.unimaas.nl/publications/ docs/QualityAdjustedPricesAndSoftwareInvestments.pdf Ioannides, Christos, and Mick Silver (2003) ‘Chained, exact and superlative hedonic price changes: estimation from microdata,’ Applied Economics 35, 1005–14 Jackson, Chris (2000) ‘The capitalization of software in the National Accounts,’ Statistics Canada Report Jorgenson, Dale W. (2001) ‘Information technology and the U.S. Economy,’ American Economic Review 91, 1–32 Keizer, Gregg (2001) ‘Microsoft Office XP: Review,’ http://reviews.cnet.com/ Microsoft_Office_XP/4514-3524_7-5152705.html Khoshgoftaar, Taghi M., and Edward B. Allen (2001) ‘Empirical assessment of a software metric: the information content of operators,’ Software Quality Journal 9, 99–112 Lowe, Robin (1998) ‘Televisions: quality changes and scanner data,’ in Proceedings from the Fourth Meeting of the International Working Group on Price Indices (Washington DC: U.S. Department of Labor, Bureau of Labor Statistics) McCahill, Robert John (1997) ‘A Hedonic Study of Prepackaged Software,’ M.A. thesis in economics, Virginia Polytechnic Institute and State University Moulton, Brent R., Robert P. Parker, and Eugene P. Seskin (1999) ‘A preview of the 1999 comprehensive revision of the National Income and Product Accounts,’ U.S. Department of Commerce, Survey of Current Business 79, 7–20 Oliner, Stephen D., and Daniel E. Sichel (1994) ‘Computers and output growth revisited: how big is the puzzle?’ Brookings Papers on Economic Activity 2, 273–317 Parker, Robert, and Bruce Grimm (2000) ‘Recognition of business and government expenditures for software investment: methodology and quantitative impacts,’ paper presented at the 5 May 2000 meeting of the BEA Advisory Committee Rosqvist, Tony, Mika Koskela, and Hannu Harju (2003) ‘Software quality evaluation based on expert judgement,’ Software Quality Journal 11, 39–55 Statistics Canada (2003) Information and Communication: Concepts and Methods, Catalogue No. 62-014-XIE (Ottawa: Minister of Industry) Triplett, Jack (1997) ‘Measuring consumption: the post-1973 slowdown and the research issues,’ Federal Reserve Bank of St Louis Review May/June, 9–42 –– (2000) Handbook on Quality Adjustment of Price Indexes for Information and Communication Technology Products (Paris: OECD) –– (2004) ‘When do hedonic and matched model indexes give different results? And why?’ Paper presented at the SSHRC Conference on Index Number Theory and the Measurement of Prices and Productivity, Vancouver, B.C., 30 June – 3 July Turvey, Ralph (2001) ‘Consumer Price Index methodology,’ http://www.turvey. demon.co.uk/index.htm Varian, Hal (1993) ‘Economic incentives in software design,’ Working Paper, University of Michigan, www.sims.berkeley.edu/~hal/Papers/Software.pdf