Free-to-Paid Transition of Newspapers: How to ...

3 downloads 0 Views 1013KB Size Report
various reasons. For example, the New York Times removed its paywall for big impact news events such ...... weather, and the New York Yankees baseball team.
1

Free-to-Paid Transition of Newspapers: How to Manage the Negative Impact of Online Paywalls on Web Traffic June 2018

Ho Kim Assistant Professor of Marketing University of Missouri-St. Louis College of Business Administration 1 University Blvd, St. Louis, MO 63121, USA Email: [email protected] Reo Song Associate Professor of Marketing California State University, Long Beach College of Business Administration 1250 Bellflower Blvd, Long Beach, CA 90840, USA Email: [email protected] Youngsoo Kim Assistant Professor of Information Systems Singapore Management University School of Information Systems 80 Stamford Road, Singapore, 178902, Singapore Email: [email protected]

2

Free-to-Paid Transition of Newspapers: How to Manage the Negative Impact of Online Paywalls on Web Traffic

ABSTRACT

Many newspapers have recently introduced a paid digital subscription plan, or paywall. The introduction of a paywall, an endeavor to move from a free to a fee-based content business, raises important strategic questions. This paper investigates the impact of a newspaper paywall on daily pageviews and how newspapers can manage this impact through content strategies and temporary removal of paywalls. Examining 42 cases where U.S. newspapers have introduced paywalls over the last 8 years, we find that paywall introduction has a negative long-term impact on daily pageviews for most newspapers. More important, we find that newspapers can mitigate the negative impact by allocating more space to politics, business/economics, sports, and general social issues, and less space to technology/science and lifestyle/entertainment content. Newspapers can also reduce the negative impact by provisioning more unique content on their websites. Furthermore, we find that temporary removal of a paywall significantly increases web traffic during the removal periods. Our findings offer significant implications for newspapers considering a free-to-paid transition.

Keywords: free-to-paid transition; online paywall; content policy; content uniqueness; countercyclical offering

3

Introduction

In order to generate a new revenue stream, many online publishers have recently added a paid digital subscription plan, also called a paywall, to their original free services. For example, as of May 2015, nearly 78% of the top 100 newspapers in the U.S. have introduced a paywall (Williams 2016). In the U.K., newspapers such as The Times of London and the Financial Times have erected paywalls as well. Many professional magazines have also moved from free to paid online content model (Pauwels and Weiss 2008). This paper investigates how newspapers can manage the impact of a paywall on web traffic. At the core of this question lies newspapers’ fear of losing digital advertising revenue, which is directly related to web traffic. For example, The Times of London lost nearly 43% of unique visitors (from 2.81 million to 1.61 million) in the three months following the introduction of its paywall (The Huffington Post 2010). Reduced web traffic may also have an indirect, but potentially more devastating long-term impact on the website, as web traffic is an important factor that market research firms (e.g., similarweb.com and alexa.com) use to rank websites. Thus, we investigate how an online newspaper’s switch to a paid digital subscription policy affects its daily pageviews, which are an important measure of website traffic and online advertising revenue. We examine whether paywall introduction has a persistent, long-term effect on daily pageviews and how newspapers can influence the long-term impact of their paywalls through content and pricing policy. We consider the content composition (i.e., the proportion of various news sections), content uniqueness, political slant of a newspaper, the level of digital edition price, the number of monthly free articles1 , and paywall introduction timing. In doing so, we control for other relevant variables such as circulation volume and reader demographics.

4

Another important issue regarding a paywall is a temporary reversion to a free business model, or more aptly, whether to temporarily open the website to the public when an important event happens. Indeed, some newspapers have temporarily removed paywalls in the past for various reasons. For example, the New York Times removed its paywall for big impact news events such as the presidential election and coverage of natural disasters such as Hurricane Sandy that hit the East Coast. We examine the effect of such temporary removals of a paywall on daily pageviews. To explore these issues, we use data on recent cases of U.S. newspapers’ paywall introduction. Our data set consists of 42 paywalled newspapers, their content characteristics (content composition and uniqueness), paywall timings and price levels, reader demographics, and their daily pageviews from January 1, 2010 to December 31, 2017. As all the newspapers in our dataset introduced a paywall before the end of 2014, the data span is sufficiently long to examine the long-term impact of a paywall on pageviews. We apply rigorous econometric models to isolate the true effects of a paywall from various confounding factors including daily news importance, weekday seasonality, and long-run exogenous trends in daily pageviews observed commonly across multiple newspapers. We find that paywall introduction has a negative impact on daily pageviews for most newspapers (35 of 42 newspapers) at the 1% significance level—that is, the actual pageviews of a paywalled newspaper are significantly smaller than the counterfactual pageviews that would have been observed if the newspaper had not introduced a paywall—and that the negative impact varies greatly across newspapers ranging from -10% (The Fresno Bee) to -54% (Akron Beacon Journal). For another five newspapers, our data does not have enough evidence that pageviews have decreased as a result of paywall introduction. For two newspaper, pageviews have

5

increased as a result of paywall introduction, although the increment was not substantial (about 5%). In sum, while most newspapers experience loss in pageviews, the impact of a paywall on daily pageviews varies across newspapers. This finding introduces a second question—how can newspapers manage the impact of a paywall on pageviews? To answer this question, we examine the effects of newspapers’ decision variables on paywall impact. We find that the paywall impact is influenced by content policy as well as price level, number of free articles, circulation volume, and paywall timing. Our findings imply that paywalled newspapers will perform better (in terms of daily pageviews) by allocating more space to politics, business/economics, sports, and general social issues and less space to technology/science and lifestyle/entertainment content. We also find that content uniqueness is an important factor; by provisioning unique content, newspapers can mitigate the reduction in pageviews due to paywall introduction. Political slant matters too. Newspapers with a more conservative slant experience a smaller drop in pageviews. Not surprisingly, newspapers with a higher digital edition price undergo a larger drop in daily pageviews than those with lower edition prices, and newspapers with a smaller circulation volume see a larger drop compared to those with larger circulation volumes. Contrary to our expectation, newspapers that provide a larger number of monthly free articles also experience a larger drop in pageviews. The negative effect suggests that free articles may attract different user groups than regular subscribers with paywall introduction. Perhaps users who mainly consume free articles view fewer pages, so even if free articles may increase web traffic, pageviews suffer. If advertising revenue is mostly decided by pageviews, newspapers are advised not to offer a large number of free articles. Finally, newspapers that adopted an early paywall (vs. a late one) experienced a larger drop in daily pageviews. Lastly, we find that temporary removal of a paywall during impactful news

6

events increases the daily pageviews during the paywall removal period. This positive impact of a short-term paywall removal was significant even after controlling for various confounding factors such as the daily news importance. That is, temporary removal of a paywall increases daily pageviews above and beyond the level that would have been attained without the paywall removal.

Literature Review and Conceptual Framework

Free-to-Paid Transition Broadly speaking, our study is related to an online publishers’ free-to-paid transition, or the decision to adopt a paid online content model. Pauwels and Weiss (2008) examine the revenue impact of the free-to-paid transition of a professional magazine. They distinguish various sources of revenue losses and gains resulting from the adoption of a paid subscription plan, finding that the net revenue loss was minor. Chiou and Tucker (2013) use the paywall experiments of three local newspapers owned by Gannet. They find that overall, the newspapers experienced a 51% drop in visits after the introduction of a paywall. This is a sufficiently large decline to raise concern for online content providers that are considering a free-to-paid transition. Pattabhiramaiah et al. (2017) examine the effects of the New York Times’ paywall introduction case. They find that the paywall decreased the number of unique visitors by 13.1% but did not significantly affect reader engagement (e.g., visits, pages consumed, and duration per visitor). Recent reports from the news media industry demonstrate the uncertainty about the consequence of adopting a paywall. Some newspapers boasted a significant increase in their digital subscription revenue (e.g., Ember 2016), while others reported a significant drop in web

7

traffic (e.g., Grant 2009, Halliday 2010, Sebastian 2013). In summary, academic research and industry reports have delivered diverging evidence on the impact of a free-to-paid transition, suggesting the importance of explaining the mixed findings. Furthermore, the academic studies so far have focused on a specific publisher (Pattabhiramaiah et al. 2017, Pauwels and Weiss 2008) or a small number of newspapers from the same owner (Chiou and Tucker 2013), leaving the generalizations of the findings as future research. Our paper expands the previous research by analyzing a diverse set of newspapers and investigating factors that influence the effects of freeto-paid transition. Moderating Factors of Paywall Impact Because consumers in the online content market are price sensitive (e.g., Ascarza et al. 2012, Shampanier et al. 2007), free-to-paid transitions are likely to decrease their web traffic (e.g., Chiou and Tucker 2013, Pattabhiramaiah et al. 2017). However, literature also suggests that a paywall’s impact on web traffic (such as visitors and pageviews) may differ across newspapers due to various factors. We focus on content design as an important product policy of a newspaper, and briefly discuss the role of brand reputation, timing of paywall adoption, and price policies. Content Policy. Different news topics may have different importance to different readers. For example, only 17% of the U.S. adults are active science news consumers while 49% of them are uninterested consumers who read science news infrequently (Pew Research Center 2017). Thus, media firms in a competitive market have an economic incentive to differentiate their content to better serve their readers (Littman and Bridges 1986). Kanuri et al. (2014) use differential content preference of readers to redesign a local newspaper’s content. They report that four months after the content redesign, the newspaper was able to increase the print price by

8

75% and maintain a similar level of print circulation. Because different news sections appeal differently to various segments of readers and because different segments of readers may react differently to a price change, newspaper content design will have a bearing on the impact of a paywall on daily pageviews. In addition to different emphases on different topics, content uniqueness will influence the paywall impact. Nagle et al. (2016) show that consumers are less price sensitive when a brand has unique positioning, which suggests that newspapers with unique content may perform better with a paywall than newspapers with less unique content. Perhaps, the successful paywall introduction by the Wall Street Journal may be related to provisioning unique content. As a business newspaper, the Wall Street Journal focuses on business- and economics-related news and has a more unique positioning compared to other newspapers. The paper successfully introduced a paid digital subscription plan as early as 1997 when most newspapers—including the New York Times and USA Today—provided online content free of charge. The success of the Wall Street Journal, along with the two failed attempts of the New York Times (in 2005 and 2007), points to the importance of delivering unique content in coping with consumers’ reluctance to pay for online news content. The importance of unique content can also be learned from the findings that readers have a high interest in local news. Mitchell et al. (2015) report that nearly 90% of readers follow local news very or somewhat closely and National Newspaper Association (2014) finds that community newspapers are highly valuable to readers. Lacy and Sohn (1990) find a strong positive relationship between the local coverage of a newspaper (su ch as the local government, sports, businesses) and its circulation in the local area. The above studies suggest that content uniqueness may moderate the impact of a paywall on its daily pageviews such that newspapers featuring a larger volume of unique content experience a

9

smaller drop in daily pageviews. The political ideology of a newspaper may also be related to a paywall’s impact. Consumers tend to appreciate brands that are congruent with their own beliefs (Escalas and Bettman 2003) and prefer like-minded content. In the news media industry, the fit between the political leanings of a news outlet and the political views of a reader significantly influences the reader’s choice of news outlet (Pew Research Center 2014). In general, liberals are more active on the Internet and social media than conservatives (Rainie and Smith 2012), are more familiar with digital news outlets, and turn to more diverse sources for news about politics (Pew Research Center 2015). These behavioral patterns imply that when liberals are faced with a paywall at their main news outlet, they may be more willing to find alternative (possibly free) news sources than conservatives would be. As a result, the rollout of a newspaper paywall may have a greater impact on liberals than conservatives, suggesting that the introduction of a paywall by a liberal newspaper may have a larger negative impact on its pageviews than would be the case for a conservative newspaper. Brand Reputation. Brand equity literature argues that a product with a higher brand equity can command a higher price than competitors with lower brand equity (Aaker 1991, 1996, Agarwal and Rao 1996, Sethuraman 2000, Sethuraman and Cole 1997), while maintaining a larger market share (Chaudhuri and Holbrook 2001). Thus, newspapers with a better brand equity or reputation may experience a smaller drop in pageviews as a result of paywall introduction. Ailawadi et al. (2003) propose that the brand equity of a product not only has a direct influence on the unit sales of the product, but also moderates the effect of marketing mix on unit sales. To summarize, the print circulation volume as a measure of brand reputation of a newspaper, may moderate the impact of paywall introduction on pageviews such that

10

newspapers with a larger print circulation volume experience a smaller drop in pageviews. Paywall Timing, Price, and Free Articles. The impact of a newspaper’s paywall on its pageviews may also be affected by the timing of its introduction relative to that of other firms in the industry. Consumers are less price sensitive when there is a smaller number of substitutes (e.g., Gumus et al. 2016). Reference price research suggests that consumers use the prevailing industry price to form their reference price, which affects their product choice (Mazumdar et al. 2005). Therefore, newspapers that adopt a paywall late (vs. early) will likely experience a smaller drop in pageviews, because late movers have a smaller number of free news sources to compete with, and readers’ reference price for online news may have been adjusted (from zero to some positive value) due to the earlier paywalls. The digital edition price of a newspaper may have a negative effect on the impact of a paywall due to the law of price and demand; that is, a newspaper with a higher digital edition price will experience a larger drop in daily pageviews than a newspaper with a lower price. Finally, the number of monthly free articles can affect the paywall impact, but the direction is not as clear as that of paywall timing and digital edition price. On the one hand, free articles may mitigate the negative impact of a paywall on pageviews because a higher number of free articles allows nonsubscribers to read more articles. On the other hand, free articles may exacerbate the negative impact of a paywall. Audiences attracted by free articles are “casual” or “incidental” (versus regular) readers whose news consumption needs are generally satisfied with the limited number of free articles of the newspaper that they visit. This means that as a newspaper increases the number of free articles, a larger number of occasional readers who would otherwise become subscribers will continue to be nonsubscribers and occasional visitors of the newspaper website. In this case, a newspaper may mitigate the negative impact of a

11

paywall on daily pageviews by decreasing the number of free articles—that is, the number of free articles will have a negative impact on daily pageviews. Temporary Removal of a Paywall Newspapers temporarily remove paywalls for important news events. For example, the New York Times removed its paywall for various events including the 2016 presidential election and the Wall Street Journal removed its paywall for inclement weather conditions such as hurricanes. In spite of the importance of the topic, research is scant on the effect of temporary removal of a paywall (i.e., temporarily reverting to a free online content plan). Broadly speaking, temporary removal of a paywall is related to countercyclical offering (Lambrecht and Misra 2017) and to the seemingly puzzling strategy of increasing prices amid declining demand in the newspaper industry (Pattabhiramaiah et al. 2018). Lambretch and Misra (2017) suggest that when consumers’ valuation of content is heterogeneous, firms should increase the share of free content during periods of high demand. On the other hand, Pattabhiramaiah et al. (2018) show that newspapers “had to” increase subscription price mainly because of their declining ability to earning surplus from advertisers. Our context of temporary removal of a paywall is different from that of the two previous studies because the removal of a paywall is usually very short, not lasting more than several days. Confounding Factors There are confounding factors that may affect daily pageviews. While they are not the focus of our analysis, they need to be addressed to avoid bias in estimating the impact of paywall. Furthermore, controlling for those confounding factors may provide additional insights into marketing strategy such as temporary paywall removal. We consider three types of confounding factors. The first is daily news importance. Lambretch and Misra (2017) find that web traffic to

12

ESPN.com increases during sports seasons. For the newspaper industry, this finding implies that an important news event by itself can attract even the most casual news readers. That is, regardless of the paywall removal, newspapers are likely to observe abnormally large web traffic and pageviews when important news breaks. Second, there may exist a long-run exogenous trend in daily pageviews, which is not shaped by firm strategies. For example, Pauwels and Weiss (2008) show that a long-run trend exists in “free” digital subscription of a professional magazine, not influenced by marketing activities. It is crucial to account for the exogenous trends in daily pageviews of newspapers to accurately estimate the true paywall impact. Third, it is well known that day-of-week seasonality affects website traffic in many industries. Newspaper websites may also be subject to day-of-week seasonality. Figure 1 represents the conceptual framework of the above reasoning. First, the introduction of a paywall will affect daily pageviews in the long-run as some readers will turn away from the newspaper. However, the effect size will vary, moderated by newspaper content policy, price level, free articles, print circulation volume, and paywall timing. Second, temporary removal of a paywall may affect the daily pageviews in the short-run. Third, some confounding factors that are not controllable by newspapers may affect daily pageviews. ==Figure 1 about here==

Data

Daily Pageviews The initial data collection included the top 60 U.S. newspapers (by print circulation volume, as of 2010) that had adopted a paywall as of May 2015. The initial data set included two national

13

newspapers (The Wall Street Journal and The New York Times) and 58 local newspapers (e.g., the Los Angeles Times, the Chicago Tribune). We excluded some newspapers from our analysis for at least one of the following reasons: (1) some newspapers changed their website address during our analysis period and/or built separate websites for paid content, creating a data discontinuity problem; (2) some newspapers use a subdomain as their home page, but our data provider reports web traffic only at the domain level; (3) some newspapers adopted an online paywall before our analysis period (e.g., The Wall Street Journal); (4) one newspaper (The Washington Post) adopted a new business model after being acquired by Amazon (Marx and Clark 2014)2; (5) some newspapers reverted to a non-paywall model; (6) some newspapers are not in English. This selection process resulted in 42 paywalled newspapers for our analysis.3 For each of these 42 newspapers, we collect the daily pageviews (per 1 million Internet users) of their websites from Alexa.4 All the newspapers in our sample adopt metered, leaky paywall systems and rely on both advertising and subscription revenues from online and print editions. Table 1(a) summarizes the descriptive statistics of the daily pageviews. It suggests that the distribution of daily pageviews is highly skewed to the right, and that a day-of-the-week seasonality exists. Figure 2 illustrates the daily pageviews of three newspapers—The New York Times, Los Angeles Times, and Chicago Tribune—along with the times at which they introduced paywalls. The daily pageview of the three newspapers show some notable patterns. First, the daily pageviews seem to follow a common long-run trend. The daily pageviews start with an upward trend, which lasts until the second quarter of 2011, shows a declining trend until the second quarter of 2013, and experience a sudden jump around the beginning of the third quarter of 2013. Unless properly controlled for, this nonlinear long-run trend can confound the long-term impact of a paywall. Second, the daily pageviews show occasional, short-lived spikes. The

14

spikes correspond to big news events such as the Japanese earthquake and tsunami (March 11, 2011), the death of Steve Jobs (October 5, 2011), and the reelection of President Barack Obama (November 7, 2012). Third, consistent with the day-of-week seasonality in Table 1, the pageviews show a regular, relatively short-term pattern. Researchers need to control for these confounding factors to identify the true paywall impact on daily pageviews. ==Table 1 about here== ==Figure 2 about here== Newspaper Characteristics and Reader Demographics For each of the 42 newspapers, we collect the following variables: the newspaper’s characteristics (the moderating variables in Figure 1) and the demographic profile of the newspaper’s reader base. We collect the usual demographic variables (the distribution of the readers’ age, educational attainment, and income) from the Alliance for Audited Media (AAM) and the Nielsen-Scarborough. The newspaper characteristics include the proportion of different news sections of a newspaper, content uniqueness, political slant, print circulation volume, digital edition price, and timing of paywall introduction. We calibrate the composition of different news sections and content uniqueness of a newspaper by applying Latent Dirichlet Allocation (LDA) topic modeling (Blei, et al. 2003) 5 to online news articles of the focal newspapers. Web Appendices W1 and W2 describe the process in detail and Web Appendix W3 summarizes the calibrated content uniqueness index and content composition of the 42 newspapers. The political slants of the newspapers are collected from Gentzkow and Shapiro (2010). The digital edition prices of newspapers come from the newspapers’ AAM reports and their subscription webpages. The print circulation volume comes from the AAM report. We use the

15

print circulation for the year 2010—the most recent circulation before the focal newspapers in our sample introduced paywalls. The timing of paywall introduction is collected from individual newspapers’ announcements and news articles that cover the related paywall stories. For the paywall introduction timing of a newspaper, we use the number of newspapers that had already introduced a paywall when the focal newspaper erected its own paywall. Table 1(b) shows the descriptive statistics of newspaper characteristics and demographic variables. Table 2 shows the correlation coefficients of newspaper characteristics, our key moderating variables. Print circulation volume is negatively correlated with content uniqueness (correlation coefficient = -0.35 for the 42 newspapers), implying that small newspapers (newspapers with a small circulation volume) tend to allocate more space for unique content than do large newspapers. Newspapers with a large print circulation volume tend to introduce paywalls earlier than do small newspapers (correlation coefficient = -0.21) and tend to be more expensive (correlation coefficient = 0.21). When it comes to topic emphasis, large newspapers tend to allocate more space for political articles (correlation coefficient = 0.17), but less space for general social issues (correlation coefficient = -0.12). We also find correlations between content uniqueness and topic emphasis. Newspapers that allocate more space for political news and sports news tend to be less content-unique, which is reasonable considering those topic’s popularity across all populations. ==Table 2 about here==

Model

Impact of Paywall on Pageviews

16

Let 𝑃𝑉𝑖𝑡 be the log-transformed daily pageviews of newspaper 𝑖’s website on day 𝑡 and 𝑃𝑎𝑦𝑤𝑎𝑙𝑙𝑖𝑡 be a dummy variable that takes value 0 during the newspaper’s non-paywall period and 1 during its paywall period. The data (Figure 2), as well as the conceptual framework (Figure 1), suggests that 𝑃𝑉𝑖𝑡 is affected not only by the paywall decisions of the focal newspaper, but also by non-marketing factors such as daily news importance, seasonality, and long-run trends. Because these confounding factors are at work across all newspapers and potentially timevarying, we call them common factors and denote by 𝐅𝑡 . We incorporate the common factors in our model to isolate the true paywall impact: 𝑃𝑉𝑖𝑡 = 𝛼𝑖0 + 𝛼𝑖1 𝑃𝑎𝑦𝑤𝑎𝑙𝑙𝑖𝑡 + 𝛗𝑖 𝐅𝑡 + 𝜀𝑖𝑡 , in which 𝛼𝑖1 measures the causal effect of a paywall on daily pageviews, 𝐅𝑡 is the vector of common factors, and 𝛗𝑖 represents the effects of the common factors on pageviews of newspaper 𝑖. The identification assumption is that 𝛗𝑖 𝐅𝑡 controls for all other factors (except paywall introduction) that affect daily pageviews. Under this assumption, the expected counterfactual pageviews (i.e., pageviews if the newspaper had not introduced a paywall) is 𝐸(𝑃𝑉𝑖𝑡 ) = 𝛼𝑖0 + 𝛗𝑖 𝐅𝑡 , while the expected daily pageviews under a paywall is 𝐸(𝑃𝑉𝑖𝑡 ) = 𝛼𝑖0 + 𝛼𝑖1 + 𝛗𝑖 𝐅𝑡 . Thus, 𝛼𝑖1 is interpreted as the causal effect of paywall introduction. Note that the widely used time-fixed effect is a special case of the common factor 𝐅𝑡 where 𝐅𝑡 consists of time dummy variables. Due to time-varying 𝐅𝑡 , mere subtraction of the (average) pre-paywall pageviews from the (average) post-paywall pageviews do not properly estimate the true paywall impact. We estimate the time-varying common factors using additional data not included in the sample. Estimation of common factors also allows us to tackle interesting questions such as the effect of temporary removal of a paywall on daily pageviews.

17

Figure 2 suggests three types of common exogenous factors. First, there is a gradual, long-run trend commonly observed across newspapers. Second, occasional spikes, which correspond to important news events (e.g., the 2012 U.S. presidential election), suggests that daily news is a factor affecting daily pageviews. Third, the regular pattern in web traffic suggests a day-of-week seasonality (e.g., fewer people may visit newspaper websites on Sundays than on Mondays). Let 𝑁𝑒𝑤𝑠𝑡 be the daily news importance (or News Index) of day 𝑡, 𝑇𝑟𝑒𝑛𝑑𝑡 be the long-run exogenous trend that exists commonly across multiple newspapers, and 𝐃𝐚𝐲𝐨𝐟𝐖𝐞𝐞𝐤 𝑡 be the vector of dummy variables representing days of the week. Equation (1-1) models the effect of a paywall after controlling for the time-varying common factors. (1-1)

𝑃𝑉𝑖𝑡 = 𝛼𝑖0 + 𝛼𝑖1 𝑃𝑎𝑦𝑤𝑎𝑙𝑙𝑖𝑡 +𝛼𝑖2 𝑇𝑟𝑒𝑛𝑑𝑡 + 𝛼𝑖3 𝑁𝑒𝑤𝑠𝑡 + 𝛂′𝑖4 𝐃𝐚𝐲𝐨𝐟𝐖𝐞𝐞𝐤 𝑡 + 𝜀𝑖𝑡 ,

where 𝛼𝑖2 𝑇𝑟𝑒𝑛𝑑𝑡 + 𝛼𝑖3 𝑁𝑒𝑤𝑠𝑡 + 𝛂′𝑖4 𝐃𝐚𝐲𝐨𝐟𝐖𝐞𝐞𝐤 𝑡 represents the effect of common factors (𝛗𝑖 𝐅𝑡 ). We allow the error term 𝜀𝑖𝑡 to be serially correlated: 𝜀𝑖𝑡 = 𝜌𝜀𝑖𝑡−1 + 𝑣𝑖𝑡 where 𝑣𝑖𝑡 is independently distributed with mean zero and a common variance 𝜎 2 (i.e., 𝑣𝑖𝑡 ~𝑁(0, 𝜎 2 )). Thus, this model posits that the daily news has only contemporaneous effects on the daily pageviews, but behavioral factors of news readers such as their inertia in visiting a website can cause daily pageviews fluctuate smoothly over time (Hanssens, Parsons, and Schultz 2001 pp. 146-147). Moderating Effect of Newspaper Characteristics We model the long-term paywall impact as a function of the moderating variables in Figure 1. We use the following notations for content-related variables: 𝑃𝑜𝑙𝑖 is the proportion of politics related news in newspaper 𝑖; 𝐵𝑢𝑠𝐸𝑐𝑜𝑛𝑖 is the proportion of business/economics/market related news; 𝑇𝑒𝑐ℎ𝑆𝑐𝑖𝑖 is the proportion of technology/science/health/environment related news; 𝐿𝑖𝑓𝑒𝑖 is the proportion of lifestyle/entertainment/culture related news; 𝑆𝑝𝑜𝑟𝑡𝑠𝑖 is the proportion of sports news; 𝑆𝑜𝑐𝑖𝑒𝑡𝑦𝑖 is the proportion of the news related to general society issues;

18

𝑈𝑛𝑖𝑞𝑢𝑒𝑛𝑒𝑠𝑠𝑖 is the content uniqueness; and 𝐶𝑜𝑛𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑣𝑒𝑆𝑙𝑎𝑛𝑡𝑖 is the political leaning index where a larger value means a more conservative political slant. We use the following notations for the other newspaper characteristics: 𝑃𝑟𝑖𝑐𝑒𝑖 is the weekly digital edition price of newspaper 𝑖; 𝐹𝑟𝑒𝑒𝐴𝑟𝑡𝑖𝑐𝑙𝑒𝑠𝑖 is the number of free articles allowed to nonsubscribers per month; 𝐶𝑖𝑟𝑐𝑢𝑙𝑎𝑡𝑖𝑜𝑛𝑖 is the print circulation volume (in millions; as of 2010); and 𝑁𝑢𝑚𝑃𝑎𝑦𝑤𝑎𝑙𝑙𝑠𝑖 is the number of newspapers that had already installed a paywall when newspaper 𝑖 erected one. 𝑁𝑢𝑚𝑃𝑎𝑦𝑤𝑎𝑙𝑙𝑠𝑖 measures the relative paywall timing of newspaper 𝑖 in comparison with other newspapers. Let 𝑢𝑖 denote the effects of omitted variables that are potentially correlated with the explanatory variables, and 𝜂𝑖 denote the error term that is uncorrelated with the explanatory variables (i.e., structural error). Equation (1-2) examines the moderating effects of newspaper characteristics (content policy, price policy, reputation, and paywall timing). (1-2)

𝛼𝑖1 = 𝛽0 + 𝛽1 𝑃𝑜𝑙𝑖 + 𝛽2 𝐵𝑢𝑠𝐸𝑐𝑜𝑛𝑖 + 𝛽3 𝑇𝑒𝑐ℎ𝑆𝑐𝑖𝑖 + 𝛽4 𝐿𝑖𝑓𝑒𝑖 + 𝛽5 𝑆𝑝𝑜𝑟𝑡𝑠𝑖 +𝛽6 𝑆𝑜𝑐𝑖𝑒𝑡𝑦𝑖 + 𝛽7 𝑈𝑛𝑖𝑞𝑢𝑒𝑛𝑒𝑠𝑠𝑖 + 𝛽8 𝐶𝑜𝑛𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑣𝑒𝑆𝑙𝑎𝑛𝑡𝑖 +𝛽9 𝑃𝑟𝑖𝑐𝑒𝑖 + 𝛽10 𝐹𝑟𝑒𝑒𝐴𝑟𝑡𝑖𝑐𝑙𝑒𝑠𝑖 + 𝛽11 𝐶𝑖𝑟𝑐𝑢𝑙𝑎𝑡𝑖𝑜𝑛𝑖 + 𝛽12 𝑁𝑢𝑚𝑃𝑎𝑦𝑤𝑎𝑙𝑙𝑠𝑖 +𝑢𝑖 + 𝜂𝑖 . Identification of Causality. In Equation (1-2), 𝑢𝑖 represents the newspaper-specific

unobserved effect that might be correlated with the paywall impact, 𝛼𝑖1 . If 𝑢𝑖 is correlated with an explanatory variable in the equation, the corresponding coefficient suffers from omitted variable bias. A general approach to addressing omitted variable bias is to identify and include variables that are potentially correlated with the explanatory variables and, at the same time, affect the dependent variable (Wooldridge 2010). For this purpose, we turn to reader demographics. First, it is not difficult to imagine that the reader demographics of a newspaper

19

influences the paper’s paywall impact—i.e., the dependent variable in Equation (1-2), because readers with certain characteristics may be more sensitive to the introduction of a paywall (Chiou and Tucker 2013). Second, it is possible that reader demographics of a newspaper are correlated with the newspaper’s strategic decisions such as content policy, price policy, and paywall timing decisions—the independent variables in Equation (1-2). Research finds that a newspaper’s political slant is aligned with its readers’ political leaning (Gentzkow and Shapiro 2010), which is related to demographics (Doherty and Weisel 2015). Similarly, newspapers may decide the proportion of different news sections (politics, business, sports, lifestyle and the like) to meet the content needs of their readers (Kanuri et al. 2014), which are also related to demographics. Newspapers’ pricing decisions and paywall timing decisions can also be correlated with reader demographics. For example, if a newspaper thinks that its readers are reluctant to the idea of charging them for online articles, then the newspaper may delay the paywall timing until other newspapers adopt one (so that paid digital subscription has become a new norm). To deal with the endogeneity of price-related variables, we use an instrumental variable approach. For an instrument of weekly digital edition price (including the number of monthly free articles) of a focal newspaper, we use owner dummy variable of the focal newspaper and the average digital edition price of newspapers that belong to the same owner. Because paywall decisions are made at the owner level, the digital edition price of a newspaper is correlated with that of other newspapers from the same owner. However, news readers do not choose newspapers based on newspaper owners. This reasoning suggests that newspaper owner dummy variables explain the variation in the digital edition price of the newspaper but do not affect the daily pageviews of individual newspapers. Furthermore, because newspapers from the same owner cover different, non-overlapping geographic areas, the digital edition price of one

20

newspaper from the same owner will not affect the daily pageviews of another newspaper from the same owner, suggesting that the average digital edition price of other newspapers from the same owner can be an instrument for the focal newspaper’s digital edition price. Newspapers may decide their paywall timing to minimize the long-term paywall impact on audience churn. Recall that we included the long-run exogenous trend in daily pageviews in Equation (1-1), because newspapers’ expectation on future daily pageviews (without a paywall) may affect their paywall timing decisions. Therefore, we use newspaper owner variables as instruments for paywall timing in Equation (1-2) for the same reason that we use them as instruments for a newspaper’s digital edition price. To summarize, we consider demographic variables, specifically age, income and educational attainment to model 𝑢𝑖 ; we treat digital edition price (and the number of monthly free articles) and paywall timing as endogenous variables; then we mitigate potential endogeneity bias using newspaper owner dummy variables and the digital edition price of other newspapers from the same owner. Because income and educational attainment are highly correlated, we exclude education and use only age and educational attainment as covariates. Let 𝐴𝑔𝑒18_34𝑖 be the proportion of readers of age 18 and 34; 𝐴𝑔𝑒35_44𝑖 be the proportion of readers of age 35 and 44; 𝐴𝑔𝑒55𝑃𝑙𝑢𝑠𝑖 be the proportion of readers of age 55 and more; 𝐼𝑛𝑐50_74𝑖 be the proportion of newspaper 𝑖’s readers whose’ annual household income is between 50,000 and 74,999 dollars. 𝐼𝑛𝑐75_99𝑖 and 𝐼𝑛𝑐150𝑃𝑙𝑢𝑠𝑖 are similarly defined. Then, we model 𝑢𝑖 as follows: (1-3)

𝑢𝑖 = 𝜃0 + 𝜃1 𝐴𝑔𝑒18_34𝑖 + 𝜃2 𝐴𝑔𝑒35_44𝑖 +𝜃3 𝐴𝑔𝑒55𝑃𝑙𝑢𝑠𝑖 +𝜃4 𝐼𝑛𝑐50_74𝑖 + 𝜃5 𝐼𝑛𝑐75_99𝑖 + 𝜃6 𝐼𝑛𝑐150𝑃𝑙𝑢𝑠𝑖 + 𝜈𝑖 ,

21

where 𝜈𝑖 is an error term. After including the demographic variables as in Equation (1-3), we plug Equation (1-2) into Equation (1-1) for estimation. Note that after plugging Equation (1-2) into Equation (1-1), the error term of the estimation equation becomes heteroskedastic. Thus, we apply a panel generalized least squares (PGLS) method.

Empirical Results

The models are estimated in two steps. In the first step, we calibrate the long-run exogenous trend (𝑇𝑟𝑒𝑛𝑑𝑡 ) and the daily News Index (𝑁𝑒𝑤𝑠𝑡 ) with news media excluded from our sample; in the second step, we substitute the calibrated 𝑇𝑟𝑒𝑛𝑑𝑡 and 𝑁𝑒𝑤𝑠𝑡 and Equation (1-2) in Equations (1-1) to estimate the effects of the moderating variables in Equation (1-2) on paywall impact. To save space, this main text concentrates on the second step. The calibration of long-run trends and daily News Index and their identification logic can be found in Appendix A. To briefly mention key points: for identification of long-run trends, we use the daily pageviews of the Wall Street Journal; to calibrate the daily news index, we apply a dynamic factor model (e.g., Bruce, Peters, and Naik 2012) to daily unique visitors of news media websites that are excluded from our sample. The Appendix explains the rationale of using the Wall Street Journal to identify the long-run exogenous trend and show that our identification rationale is empirically supported; Appendix A also explains the rationale of using a dynamic factor model to extract daily news importance. In this main text, we show the calibration results of the long-run exogenous trend and daily news importance in Figure 3. ==Figure 3 about here== Impact of Paywall on Pageviews

22

Because the calibration of the News Index and long-run exogenous trend complicates the model estimation, we examine whether the added complexity improves the model fit. We devise three variants of Equation (1-1) and compare their performance with that of the original Equation (11). The three variants (M1, M2, and M3) are as follows: M1, the simplest model, does not include 𝑁𝑒𝑤𝑠𝑡 and 𝑇𝑟𝑒𝑛𝑑𝑡 ; M2 includes 𝑁𝑒𝑤𝑠𝑡 but not 𝑇𝑟𝑒𝑛𝑑𝑡 ; and M3 includes 𝑇𝑟𝑒𝑛𝑑𝑡 but not 𝑁𝑒𝑤𝑠𝑡 . The full model (Equation (1-1)) is denoted by M4. In addition, we compare a model (M0) that features time-fixed effects in place of the common factors; M0 is widely used to analyze panel data. We include 2,921 time-dummy variables to control for time-fixed effects in M0 (versus two common factors in the suggested model—i.e., 𝑁𝑒𝑤𝑠𝑡 and 𝑇𝑟𝑒𝑛𝑑𝑡 ). Table 3 shows log-likelihood, Akaike Information Criterion (AIC), and Bayesian Information Criterion (BIC). M0 performs notably worse than the common factor models (M1 – M4), even with the 2,921 additional time-dummy variables. The table also shows the importance of including daily News Index and long-run exogenous trend in Equation (1-1). ==Table 3 about here== Figure 4 demonstrates how our model works. Figure 4(a) is the actual pageviews (per 1 million unique daily visitors) of Akron Beacon Journal that introduced a paywall on July 1, 2014. It clearly shows that the newspaper’s daily pageviews have decreased since the paywall introduction. Figure 4(b) shows the predicted and counterfactual pageviews by M4. The predicted pageviews were generated by assuming that the newspaper introduced its paywall on July 1, 2014, the actual day that the newspaper introduced its paywall. The counterfactual pageviews were generated by assuming that the newspaper did not introduce a paywall throughout the analysis period. Nonetheless, the counterfactual pageviews show a declining trend over time, because the paper’s pageviews are under the influence of the “common” long-

23

run trend identified in Figure 3(a). In other words, the newspapers would have experienced a declining pageviews even if it had not introduced a paywall on July 1, 2014. This is an important point for identifying the true paywall impact; merely subtracting the (average) pre-paywall pageviews from the (average) post-paywall pageviews will overestimate the true impact of the paywall on pageviews for this newspaper. However, Figure 4(b) shows that the predicted pageviews has declined even more than the counterfactual pageviews after July 1, 2014, suggesting that the newspaper’s paywall had a negative impact on its daily pageviews. ==Figure 4 about here == We hereafter focus on M4, the best performing model. We estimate the model using Ordinary Least Squares (OLS) method. Table 4 shows the estimation results of key parameters in Equation (1-1). Column (a) shows that the long-term impact of a paywall varies greatly across newspapers. 36 newspapers experienced statistically significant drop in daily pageviews. The largest decline happened for Akron Beacon Journal. Our model predicts that the newspaper lost about 54% (= 1 − e−0.783 ) of its counterfactual daily pageviews—i.e., the daily pageviews that it would have obtained if it had not introduced a paywall. For five newspapers, the data does not show statistical evidence that their daily pageviews have changed as a result of their paywall introduction. For one newspaper (Chicago Tribune), daily pageviews have actually increased by 6 percent. To summarize, a paywall rollout tends to decrease the daily pageviews for most newspapers but the amounts vary greatly across them. Column (b) shows how closely the daily pageviews of the individual newspapers follow the long-run exogenous trend identified from the WSJ pageviews. A significantly positive number means that the long-run fluctuation in the corresponding newspaper’s daily pageviews resembles the identified long-run trend. We find that all newspapers have a significantly positive

24

trend coefficient, supporting the idea that a “common” long-run trend exists in daily pageviews. This implies that the estimated long-term paywall impact will be biased if the long-turn trend is not controlled for. Column (c) shows the impact of daily news importance on daily pageviews. A significantly positive number means that the corresponding newspaper’s fluctuation in daily pageviews is positively correlated with the day’s news importance, while a nonsignificant number means that the corresponding newspaper’s daily pageview is not affected by news importance. Twenty-seven newspapers have a significantly positive coefficient at the 1% level. ==Table 4 about here== Effects of Moderating Variables on Paywall Impact Table 5 shows the estimation results. We find that the content policy of a newspaper plays a crucial role in explaining the changes in daily pageviews as a result of paywall introduction. The proportion of politics, business/economics, sports, and general society related articles in a newspaper has a positive effect on the paywall impact. This finding means that newspapers that allocate more space to politics, business/economics, sports, and general social topics tend to experience a “smaller decline” in daily pageviews. On the other hand, the proportion of articles related to technology/science and lifestyle/entertainment issues in a newspaper has a negative effect on the long-term impact of the newspaper’s paywall. Newspapers that allocate these topics tend to experience a larger decline in daily pageviews. Content uniqueness matters too. We find that the drop in daily pageviews is smaller for newspapers that allocate more space to unique content. This finding is consistent with the widely accepted notion that unique positioning is important. Provisioning unique content seems especially important for newspapers in the digital era as news content is quickly becoming a commodity that anyone can easily access. When it comes to political slant, conservative

25

newspapers tend to have a smaller drop in daily pageviews. Weekly digital subscription price is negatively correlated with the impact of a paywall on daily pageviews. Print circulation volume also matters. Newspapers with a larger circulation volume tend to experience a smaller drop in daily pageviews. The number of free articles has a negative influence on paywall impact. As we suspected, a higher number of free articles seems to satisfy enough the needs of the casual readers, reducing the incentive to subscribe to the newspaper. This result offers an important implication to newspapers. They might be able to increase web traffic by providing a higher number of free articles, but pageviews may decline. If advertising revenue largely depends on pageviews instead of web traffic, newspapers are advised not to offer a large number of free articles. Finally, paywall timing also matters such that a newspaper that introduced a paywall relatively late have a smaller drop in daily pageviews than a newspaper that introduced a paywall relatively early. ==Table 5 about here== Effect of Temporary Removal of Paywall Studying ESPN.com, Lambrecht and Misra (2017) suggest that a “countercyclical offering,” or the practice of increasing the share of free content during periods of high demand may increase web traffic during those periods. Should newspapers increase the share of free content during an important event to rake in more traffic and advertising revenue? Indeed, some newspapers temporarily removed their paywalls during important events. For example, The New York Times offered complimentary access to its online articles for five days during Hurricane Sandy and its aftermath. It also offered 24 hours of free digital access beginning at 6:00 P.M. EST on November 6, 2012 (the date of the 2012 U.S. presidential election). Whether lifting a paywall for a short period will increase the focal website’s traffic is an important question for online

26

newspapers. If temporary removal of a paywall has a positive causal effect on the daily pageviews of the focal newspaper, then online publishers can remove paywalls to increase their display advertising revenue when influential news occurs. Table 6 (a) shows the 7 cases of temporary paywall removal by the New York Times and/or the Wall Street Journal from 2011 to 2017. Both newspapers removed their paywalls for the inclement weather incidents. However, only the New York Times lifted its paywall for the following two events: the 2015 Paris terror attack and 2016 U.S. presidential election. If the New York Times’ daily pageviews are larger than that of the Wall Street Journal on only those two events, while the pageviews of the two newspapers are not different during the other five events (i.e., when both of them removed their paywalls), this suggests that temporary removal of a paywall increase the focal newspaper’s daily pageviews on that day. We first examine whether daily pageviews are significantly larger on the days when the seven events happened. We define the following seven dummy variables that represent the seven incidents in Table 6: 𝐼𝑟𝑒𝑛𝑒𝑡 , 𝑆𝑎𝑛𝑑𝑦𝑡 , 𝐸𝑙𝑒𝑐𝑡𝑖𝑜𝑛2012𝑡 , , 𝑃𝑎𝑟𝑖𝑠𝑡 , 𝐸𝑙𝑒𝑐𝑡𝑖𝑜𝑛2016𝑡 , 𝐻𝑎𝑟𝑣𝑒𝑦𝑡 , and 𝐼𝑟𝑚𝑎𝑡 . For example, the variable 𝐼𝑟𝑚𝑎𝑡 is one on September 8, 2017. We replace 𝑁𝑒𝑤𝑠𝑡 in Equation (1-1) with the seven dummy variables to see whether the daily pageviews is abnormally large during the events. Table 6(b) shows the results. For the New York Times, the daily pageviews are abnormally large except during the following two events: Hurricane Harvey and Hurricane Irma. In case of the Wall Street Journal, the daily pageviews are abnormally large except during the following three events: Hurricane Irene, Hurricane Harvey, and Hurricane Irma. Thus, we show that important events tend to increase daily pageviews of the two newspapers. ==Table 6 about here==

27

To investigate the issue further, we develop Equations (2-1) and (2-2) that model the daily pageviews of the New York Times and the Wall Street Journal, consistent with Equation (11). For the Wall Street Journal, we do not include the paywall dummy variable because it introduced its paywall in 1997. (2-1) 𝑃𝑉𝑁𝑌𝑇,𝑡 = 𝛼𝑁𝑌𝑇,0 + 𝛼𝑁𝑌𝑇,1 𝑃𝑎𝑦𝑤𝑎𝑙𝑙𝑁𝑌𝑇,𝑡 + 𝛼𝑁𝑌𝑇,2 𝑇𝑟𝑒𝑛𝑑𝑡 + 𝛼𝑁𝑌𝑇,3 𝑁𝑒𝑤𝑠𝑡 + 𝛂′𝑁𝑌𝑇,4 𝐃𝐚𝐲𝐨𝐟𝐖𝐞𝐞𝐤 𝑡 + 𝜀𝑁𝑌𝑇,𝑡 (2-2) 𝑃𝑉𝑊𝑆𝐽,𝑡 = 𝛼𝑊𝑆𝐽,0

+ 𝛼𝑊𝑆𝐽,2 𝑇𝑟𝑒𝑛𝑑𝑡 + 𝛼𝑊𝑆𝐽,3 𝑁𝑒𝑤𝑠𝑡 + 𝛂′𝑊𝑆𝐽,4 𝐃𝐚𝐲𝐨𝐟𝐖𝐞𝐞𝐤 𝑡 + 𝜀𝑊𝑆𝐽,𝑡 ,

where 𝜀𝑁𝑌𝑇,𝑡 = 𝜌𝑁𝑌𝑇,𝑡 𝜀𝑁𝑌𝑇,𝑡−1 + 𝑣𝑁𝑌𝑇,𝑡 , 𝜀𝑊𝑆𝐽,𝑡 = 𝜌𝑊𝑆𝐽,𝑡 𝜀𝑊𝑆𝐽,𝑡−1 + 𝑣𝑊𝑆𝐽,𝑡 , and 𝑣𝑁𝑌𝑇,𝑡 and 𝑣𝑊𝑆𝐽,𝑡 are serially uncorrelated. Our focal interest lies in the difference, 𝑣𝑁𝑌𝑇,𝑡 − 𝑣𝑊𝑆𝐽,𝑡 , the daily “incremental” pageviews of the New York Times compared to the Wall Street Journal after controlling for systematic variation. We examine whether the difference is significantly positive only for the two events that only the New York Times removed its paywall by regressing the difference 𝑣𝑁𝑌𝑇,𝑡 − 𝑣𝑊𝑆𝐽,𝑡 on the seven dummy variables: (3)

𝑣𝑁𝑌𝑇,𝑡 − 𝑣𝑊𝑆𝐽,𝑡 = 𝜃0 + 𝜃1 𝐼𝑟𝑒𝑛𝑡 + 𝜃2 𝑆𝑎𝑛𝑑𝑦𝑡 + 𝜃3 𝐸𝑙𝑒𝑐𝑡𝑖𝑜𝑛2012𝑡 + 𝜃4 𝑃𝑎𝑟𝑖𝑠𝑡 +𝜃5 𝐸𝑙𝑒𝑐𝑡𝑖𝑜𝑛2016𝑡 + 𝜃6 𝐻𝑎𝑟𝑣𝑒𝑦𝑡 + 𝜃7 𝐼𝑟𝑚𝑎𝑡 + 𝜛𝑡 ,

where 𝜛𝑡 is the error term. We first estimate Equations (2-1) and (2-2) and compute the residuals, 𝑣̂𝑁𝑌𝑇,𝑡 and 𝑣̂𝑊𝑆𝐽,𝑡 . Then, we estimate Equation (3) by substituting the residuals. Table 7 shows the estimation results. The daily pageviews of the New York Times are abnormally larger than that of the Wall Street Journal only for the two events for which only the New York Times removed its paywall. Combined with the above results, these finding suggest that while the pageviews of both newspapers increased during some of the temporary paywall removal periods, the difference in pageviews is significant only when one of the paper removed its paywall. ==Table 7 about here==

28

Conclusion

When The New York Times introduced its paywall in March 2011, expert opinion diverged. While some journalists advocated paywalls as a means of facilitating quality journalism (Kafka 2011), many others opposed. The diverging views in the newspaper industry attest to the importance of understanding the effects of the free-to-paid transition on online publishers. Our study of the U.S. newspapers’ recent paywall rollouts reveals that while introducing a paywall generally reduces the pageviews of the focal newspaper, the extent of the effect is moderated by various factors including the newspaper’s content policy such as the composition of different sections, content uniqueness, and political slant, and pricing policy such as the digital edition price, the number of monthly free articles, and paywall timing. The upshot of our findings is that newspapers can manage the potentially negative impact of a paywall (on pageviews) with proper content and pricing strategy. We also find a positive impact of a temporary paywall lift on daily pageviews. The fact that much online content is free generates a pessimistic outlook on the future of the newspaper industry; however, we find some ground for optimism that newspapers can manage not only the long-term impact of a paywall with careful strategy, but also aptly influence the short-term pageviews with a temporary paywall lift. In answering our research questions, we proposed two novel indexes that can be applied in future newspaper industry research. First, we proposed a method to quantify the importance of daily news events. The daily News Index not only improves the model fit of daily pageviews, but also reveals that the web traffic of different newspapers responds differently to important news events. The Index also provides a tool for examining the effectiveness of a countercyclical offering, an important strategic issue for paywalled content providers. It can also be used to

29

predict how much benefit the daily pageviews of each newspaper website will gain from an important news event and how persistent such effects will be. Thus, it helps advertisers make evidence-based decisions to adjust their digital media advertising spending in real time when important news events occur. Second, we proposed a method to compute the content uniqueness index of a newspaper. Using an unsupervised content analysis method (Blei, Ng, and Jordan et al. 2003), our approach minimizes the intervention of human coders, thereby improving its scalability. This method can easily be extended to compute the content uniqueness index of more newspapers. There are several interesting research opportunities in this area. First, because online paywalls can discourage subscribers from canceling print subscriptions (Grueskin et al. 2011), it is important to understand how paywalls influence print subscriptions. In Appendix B, we examine the specific case of The New York Times’ paywall, finding that online and offline newspapers are substitutive in the case of The New York Times. Extending the analysis to a more diverse group of newspapers may offer deeper insights. Second, how the online paywall of the focal newspaper affects the web traffic/pageviews of other newspapers will offer valuable insights into the competitive structure of the industry. To examine this cross effect of a paywall, detailed individual reader-level data will be needed. Third, our study does not provide a complete picture of the impact of a paywall on newspaper revenues due to data limitation. Examining the total effect of a paywall on online/offline advertising revenues and digital/print subscription revenues will be the ultimate interest of the newspapers that are considering paywalls. Related to this issue, advertising effectiveness (therefore, revenues) may change after the paywall because paid digital subscribers might be different from the consumers of free articles. Newspapers will

30

also be able to obtain detailed data about the preferences of their paid subscribers, which will be valuable information to advertisers.

31

References Aaker, DA (1991) Managing Brand Equity (The Free Press, New York). Aaker, DA (1996) Measuring brand equity across products and markets. California Management Review 38(Spring):102-120. Agarwal, MK, Rao, V (1996) An empirical comparison of consumer-based measures of brand equity. Marketing Letters 7(3):237-247. Ailawadi, KL, Lehmann DR, Neslin SA (2003) Revenue premium as an outcome measure of brand equity. Journal of Marketing 67 (October): 1-17. Ascarza E, Lambrecht A, Vilcassim NJ (2012) When talk is “free”: An analysis of subscriber behavior under two- and three-part tariffs. Journal of Marketing Research 49(6):882-899. Blei, DM, Ng, AY, Jordan, MI (2003) Latent Dirichlet allocation. Journal of Machine Learning Research 3:993-1022. Bruce, NI, Peters K, Naik PA (2012) Discovering how advertising grows sales and builds brands. Journal of Marketing Research 49(6):793-806. Chaudhuri, A, Holbrook MB (2001) The chain of effects from brand trust and brand affect to brand performance: The role of brand loyalty. Journal of Marketing 65(2):81-93. Chiou, L, Tucker C (2013) Paywalls and the demand for online news. Information Economics and Policy 25:61-69. Doherty, C, Weisel, R (2015) A deep dive into party affiliation: Sharp differences by race, gender, generation, and education. Pew Research Center, April 7, 2015. Ember, S (2016) New York Times Co. reports an advertising drop, though digital results grew. The New York Times, November 2, 2016,

32

https://www.nytimes.com/2016/11/03/business/media/new-york-times-co-reports-anadvertising-drop-though-digital-results-grew.html, accessed April 2017. Escalas, JE, Bettman, JR (2003) You are what they eat: The influence of reference groups on consumers’ connections to brands. Journal of Consumer Psychology 13(3):339-348. Gentzkow, M, Shapiro, JM (2010) What drives media slant? Evidence from U.S. daily newspapers. Econometrica 78(1):35-71. Gottfried, J, Funk C (2017) Most Americans get their science news from general outlets, but many doubt their accuracy, www.pewresearch.org/fact-tank/2017/09/21/most-americansget-their-science-news-from-general-outlets-but-many-doubt-their-accuracy. Grant, D (2009) Newsdays.com sees pay wall-induced drop in traffic. Adweek, December 11, 2009, http://www.adweek.com/digital/newsday-com-sees-pay-wall-induced-drop-intraffic/, accessed April 2017. Grueskin, B, Seave A, Grves L (2011) The Story So Far: What We Know about the Business of Digital Journalism, Columbia Journalism School. Gumus, M, Kaminskyb, P, Mathurc, S (2016) The impact of product substitution and retail capacity on the timing and depth of price promotions: Theory and evidence. International Journal of Production Research 54(7):2108-2135. Halliday, J (2010) Times loses almost 90% of online readership. The Guardian, July 20, 2010, https://www.theguardian.com/media/2010/jul/20/times-paywall-readership, accessed April 2017. Hanssens, DM, Parsons LJ, Schultz, RL (2001) Market Response Models: Econometric and Time Series Analysis, 2nd ed. Boston: Kluwer Academic Publisher.

33

Kanuri VK, Thorson E, Mantrala MK (2014) Using reader preferences to optimize news content: A method and a case study. The International Journal on Media Management 16:55-75. Kafka, Peter (2011) Q&A: New York Times digital Czar Martin Nisenholtz on the paywall, pricing, Google and Apple. AllThingsD, March 18, 2011, http://allthingsd.com/20110318/qa-new-york-times-digital-czar-martin-nisenholtz-on-thepaywall-pricing-google-and-apple/, accessed April 2017. Lacy, S, Sohn AB (1990) Correlations of newspaper content with circulation in the suburbs: A case study. Journalism & Mass Communication Quarterly 67(4): 785-793. Lambrecht, A, Misra K (2017) Fee or free: When should firms charge for online content? Management Science 63(4):1150-1165. Litman, BR, Bridges J (1996) An economic analysis of daily newspaper performance. Newspaper Research Journal 7:9-26. Marx G, Clark A (2014) Can The Washington Post’s national push help support local news? Columbia Journalism Review, April 4, 2014, https://archives.cjr.org/united_states_project/washington_post_local_papers_partnership. php?page=all, accessed May 2017. Mazumdar, T, Raj SP, Sinha I (2005) Reference price research: Review and propositions. Journal of Marketing 69(4):84-102. Mitchell, A, Holcomb, J, Page, D (2015) Local news in a digital age. Pew Research Center. Nagle, TT, Hogan, JE, Zale J (2016) The Strategy and Tactics of Pricing: A Guide to Growing More Profitably, 5th ed. (Routledge, New York). National Newspaper Association (2014) Two-thirds of residents in small towns and cities read community newspapers, National Newspaper Association, February 3, 2014,

34

http://www.nnaweb.org/resources?articleTitle=two-thirds-of-residents-in-small-townsand-cities-read-community-newspapers--1391441142--739--resources, accessed April 2017. Pattabhiramaiah A, Sriram, S. Manchanda, S (2017) Paywalls: Monetizing online content. Unpublished manuscript. Pattabhiramaiah A, Sriram, S. Sridhar, S (2018) Rising prices under declining preferences: The case of the U.S. print newspaper industry. Marketing Science 37(1):97-122. Pauwels, K, Weiss A. (2008) Moving from free to fee: How online firms market to change their business model successfully. Journal of Marketing 72 (May):14-31. Pew Research Center (2014) Political Polarization & Media Habits: From Fox News to Facebook, How Liberals and Conservative Keep Up with Politics. http://www.journalism.org/2014/10/21/section-1-media-sources-distinct-favorites-emergeon-the-left-and-right, accessed April 2017. Pew Research Center (2015) A deep dive into party affiliation: Sharp differences by race, gender, generation, and education. http://www.people-press.org/2015/04/07/a-deep-diveinto-party-affiliation/, accessed April 2017. Rainie, L, Smith A (2012) Social Networking Sites and Politics. Pew Research Center’s Internet & American Life Project. http://www.pewinternet.org/2012/03/12/main-findings-10/, accessed April 2017. Sebastian, M (2013) Dallas Morning News kills paywall, now will pare ads for those who pay. Adage, October 1, 2013, http://adage.com/article/media/dallas-morning-news-killspaywall-pares-ads-a-price/244485/, accessed April 2017.

35

Sethuraman, R (2000) What makes consumers pay more for national brands than for store brands: Image or quality? Report No. 00-110. Marketing Science Institute Paper Series. Cambridge, MA. Sethuraman, R, Cole, C (1997) Why do consumers pay more for national brands than for store brands? Working paper no. 97-126. Marketing Science Institute. Cambridge, MA. Shampanier K, Mazar N, Ariely D (2007) Zero as a special price: The true value of free products. Marketing Science 26(6):745-757. The Huffington Post (2010) The Times of London websites loses 1.2 million readers following paywall. http://www.huffingtonpost.com/2010/08/16/the-times-of-londonwebsi_n_683411.html, accessed April 2017. Williams, AT (2016), Paying for digital news: The rapid adoption and current landscape of digital subscriptions at U.S. newspapers. American Press Institute, https://www.americanpressinstitute.org/publications/reports/digital-subscriptions/singlepage/, accessed January 2017. Wooldridge JM (2010) Econometric Analysis of Cross Section and Panel Data. The MIT Press.

36

Footnotes

1. A metered paywall system allows nonsubscribers to access a set number of free articles per month. Newspapers with a soft paywall usually adopt a “leaky” system, which allows search engines to index all articles behind the paywall and users to share articles on social media. In contrast, a hard paywall requires a paid subscription to access any content. This type of paywall is rare among newspapers, but more common among B2B publications. In the U.S., only The Wall Street Journal uses a hard paywall. 2. After the acquisition, The Washington Post provided free digital access to subscribers of local newspapers around the country. This new business model greatly increased the web traffic of the newspaper (Kim 2016). 3. The 18 excluded newspapers’ average daily print circulation in 2010 is 324,052 and the standard deviation is 446,300. The highly right-skewed distribution is mainly due to the exceptional circulation of the Wall Street Journal, which introduced a “hard” paywall in 1997. If we consider ‘metered’ paywalls, the average is 221,870 and the standard deviation is 109,303. These statistics are largely comparable to those of the newspapers included in the analysis (See Table 1(b)). The excluded metered paywall newspapers account for 17% of the total market share of the top 100 newspapers and 27% of the total market share of the 60 paywalled newspapers. 4. Alexa estimates website traffic based on data from a global traffic panel, which is a sample of millions of Internet users using one of more than 25,000 different browser extensions (www.alexa.com/about). According to Alexa, its traffic information is statistically meaningful if

37

a website’s global ranking is higher than 100,000. All newspapers used in our analysis have a global ranking higher than 100,000. 5. We include not only the 42 focal newspapers but also 54 other newspapers from the top 100. This is because the LDA topic modeling results improve with more observations.

38

Table 1 Descriptive Statistics (a) Daily Pageview Mean 1129

Daily pageview

Median 245

St. Dev. 4231

Min 3

Max 106352

Sunday 1107 224 4364 0 59200 Monday 1207 260 4625 0 72059 Tuesday 1169 265 4250 6 57240 Day of Week Wednesday 1152 256 4236 8 106352 seasonality Thursday 1146 255 4193 5 61601 Friday 1113 242 4109 0 57855 Saturday 1015 216 3800 0 61122 Note. N = 122,724 (2,922 days × 42 newspapers). Daily pageview is defined as the number of daily pageviews per 1 million Internet users.

(b) Newspaper Characteristics and Reader Demographics Mean Median St. Dev. Min Variable Proportion of articles in politics business/economics technology/science lifestyle/entertainment sports social issues other categories Content uniqueness index Conservative slant (degree of conservativeness) Weekly digital edition price Daily print circulation (2010) Number of existing paywalls* Age: 18 – 24 Age: 25 – 34 Age: 35 – 44 Age: 45 – 54 Age: 55 and over Income: Below $50,000 Income: $50,000 – $74,999 Income: $75,000 – $99,999 Income: $100,000 – $149,999 Income: $150,000 and more

Max

0.097 0.118 0.056 0.157 0.190 0.267 0.115

0.093 0.118 0.052 0.146 0.179 0.271 0.095

0.031 0.028 0.023 0.050 0.059 0.059 0.075

0.030 0.057 0.032 0.072 0.068 0.125 0.050

0.170 0.201 0.180 0.290 0.355 0.457 0.465

2.194 0.446 1.964 199,665 27.841 0.112 0.163 0.157 0.178 0.391

2.134 0.443 1.990 155,995 27.500 0.105 0.166 0.157 0.178 0.392

0.903 0.033 0.934 152,426 15.860 0.033 0.024 0.019 0.018 0.043

0.282 0.350 0.000 75,615 2.000 0.063 0.120 0.103 0.139 0.307

4.812 0.500 4.850 876,638 56.000 0.189 0.213 0.191 0.220 0.465

0.422 0.176 0.149 0.145 0.110

0.400 0.177 0.146 0.146 0.098

0.064 0.022 0.021 0.026 0.045

0.306 0.125 0.116 0.093 0.060

0.559 0.228 0.210 0.198 0.253

Education: High school or less 0.340 0.346 0.061 0.211 0.442 Education: Some college 0.324 0.328 0.034 0.218 0.410 Education: College or more 0.336 0.330 0.063 0.212 0.571 Note. N = 42 newspapers. * Number of paywalled newspapers (among the top 100 newspapers) when a focal newspaper adopted a paywall

39

Table 2 Correlation of Newspaper Characteristics (a) Proportion of political articles (b) business/economics articles (c) technology/science articles (d) lifestyle/entertainment articles (e) sports articles (f) society articles (g) other articles (h) Content uniqueness index (i) Conservative slant (j) Weekly digital edition price (k) Daily print circulation (l) Number of existing paywalls Note. N = 42 Newspapers.

(a)

(b)

(c)

(d)

(e)

(f)

(g)

(h)

(i)

1.00 0.00 -0.23 0.03 -0.13 0.21 -0.44 -0.39 -0.22 -0.03 0.17 -0.13

1.00 0.35 -0.47 -0.15 0.03 -0.03 0.24 0.13 0.40 0.06 -0.39

1.00 -0.11 -0.17 -0.42 0.19 0.07 0.19 0.49 -0.02 -0.05

1.00 -0.20 -0.25 -0.27 -0.26 -0.11 -0.14 -0.02 0.03

1.00 -0.03 -0.33 -0.42 0.14 0.02 0.09 0.00

1.00 -0.54 0.08 -0.13 -0.27 -0.12 0.04

1.00 0.48 0.08 0.02 -0.04 0.16

1.00 0.15 0.00 -0.35 0.02

1.00 0.30 -0.25 0.16

Table 3 Model Fit Comparison M0 (time fixed effects) M1 (without 𝑁𝑒𝑤𝑠𝑡 and 𝑇𝑟𝑒𝑛𝑑𝑡 ) M2 (with 𝑁𝑒𝑤𝑠𝑡 only) M3 (with 𝑇𝑟𝑒𝑛𝑑𝑡 only) M4 (with 𝑁𝑒𝑤𝑠𝑡 and 𝑇𝑟𝑒𝑛𝑑𝑡 )

Log-likelihood -35,154.59 -4,825.17 -4,454.04 1,234.41 1,656.88

AIC 0.5679 0.0842 0.0787 -0.0139 -0.0201

BIC 0.7878 0.1108 0.1088 0.0161 0.0132

(j)

(k)

1.00 0.21 -0.23

1.00 -0.21

(l)

1.00

40

Table 4 Paywall Impact on Daily Pageviews (a) Paywall Impact (𝛼𝑖1 ) Coef. S.E. p-val. Akron Beacon Journal Asbury Park Press Chicago Tribune Daily Herald Democrat and Chronicle The Fresno Bee Knoxville News Sentinel Lexington Herald-Leader Los Angeles Times Miami Herald Milwaukee Journal Sentinel Orlando Sentinel San Jose Mercury News Star Tribune Sun Sentinel The Arizona Republic The Atlanta Journal Constitution The Baltimore Sun The Blade The Buffalo News The Charlotte Observer The Cincinnati Enquirer The Columbus Dispatch The Courier-Journal The Dallas Morning News The Denver Post The Des Moines Register The Hartford Courant The Indianapolis Star The Kansas City Star The Morning Call The New York Times The News & Observer The News Journal The News Tribune The Palm Beach Post The Philadelphia Inquirer The Post and Courier The Sacramento Bee The State Tulsa World Wisconsin State Journal

-0.783

0.049

0.000

-0.330 0.059 -0.251 -0.211 -0.107 -0.380 -0.148 -0.197 -0.506 -0.228 -0.386 -0.436 -0.426 -0.540 -0.436 -0.207 -0.416 -0.139 -0.397 0.091 -0.501 -0.234 -0.377 -0.448 0.034 -0.167 -0.663 -0.202 -0.251 -0.238 -0.407 -0.131 -0.415 0.036 -0.354 0.001 -0.336 0.004 -0.783 -0.330 0.059

0.056 0.018 0.036 0.034 0.038 0.036 0.033 0.022 0.029 0.025 0.034 0.026 0.024 0.021 0.025 0.041 0.030 0.028 0.033 0.051 0.041 0.024 0.027 0.065 0.038 0.029 0.029 0.037 0.015 0.028 0.043 0.031 0.033 0.019 0.026 0.040 0.034 0.041 0.049 0.056 0.018

0.000 0.001 0.000 0.000 0.005 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.077 0.000 0.000 0.000 0.000 0.374 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.060 0.000 0.973 0.000 0.913 0.000 0.000 0.001

R-squared Adjusted R-squared Log likelihood AIC BIC Durbin Watson Statistic

0.969 0.969 1656.8 -0.020 0.013 2.104 (𝜌̂ = 0.689)

Coef.

(b) Trend (𝛼𝑖2 ) S.E.

(c) News Index (𝛼𝑖3 ) Coef. S.E. p-val.

p-val.

0.870

0.066

0.000

0.023

0.006

0.000

1.073 0.933 1.094 1.099 0.788 1.020 0.884 0.965 0.610 1.448 1.246 0.578 1.562 1.000 1.112 0.546 0.762 0.793 1.328 0.905 1.109 1.022 0.860 0.776 1.291 1.279 0.793 1.040 0.463 0.919 1.322 1.051 1.189 1.081 0.875 0.733 0.839 0.616 0.870 1.073 0.933

0.087 0.023 0.041 0.051 0.055 0.059 0.050 0.024 0.049 0.030 0.044 0.034 0.038 0.025 0.039 0.060 0.041 0.050 0.044 0.047 0.061 0.033 0.035 0.053 0.051 0.045 0.048 0.044 0.012 0.042 0.061 0.047 0.040 0.029 0.043 0.050 0.044 0.052 0.066 0.087 0.023

0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

0.013 0.038 0.024 0.004 0.007 0.001 0.011 0.083 0.010 0.013 0.023 0.018 0.014 0.014 0.017 -0.005 -0.001 0.007 0.013 0.009 0.013 0.011 0.016 0.016 0.025 0.009 0.010 0.010 0.071 0.015 -0.002 0.001 0.019 0.012 0.015 0.017 0.009 0.006 0.023 0.013 0.038

0.009 0.006 0.006 0.006 0.005 0.007 0.006 0.005 0.005 0.004 0.006 0.004 0.004 0.003 0.004 0.005 0.005 0.004 0.005 0.005 0.005 0.004 0.005 0.007 0.005 0.005 0.005 0.005 0.003 0.004 0.007 0.005 0.007 0.003 0.005 0.005 0.006 0.005 0.006 0.009 0.006

0.141 0.000 0.000 0.487 0.203 0.912 0.046 0.000 0.038 0.003 0.000 0.000 0.002 0.000 0.000 0.371 0.802 0.120 0.006 0.078 0.013 0.008 0.001 0.025 0.000 0.051 0.033 0.061 0.000 0.000 0.767 0.907 0.004 0.000 0.001 0.001 0.132 0.230 0.000 0.141 0.000

41

Table 5 Effects of Content and Paywall Policies on Paywall Impact (a) Panel GLS Coefficient Std. p-value Err.

(b) Panel GLS with instruments Coefficient Std. Err. p-value

Proportion of articles in politics business/economics technology/science lifestyle/entertainment sports social issues Content uniqueness index Conservative slant Weekly digital edition price Number of monthly free articles Daily print circulation Previous paywalls

0.362 2.717 -1.220 -0.808 0.773 0.489 0.031 0.545 -0.081 -0.010 0.049 0.003

0.110 0.121 0.159 0.075 0.064 0.056 0.005 0.119 0.004 0.001 0.008 0.000

0.001 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

0.387 2.713 -1.151 -0.833 0.758 0.501 0.033 0.637 -0.088 -0.010 0.056 0.003

0.110 0.122 0.159 0.075 0.064 0.056 0.005 0.120 0.004 0.001 0.008 0.000

0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

Age between 18 and 34 35 and 44 55 and over Income between 50,000 and 74,999 75,000 and 99,999 150,000 and over

3.985 5.073 3.479 -2.037 -1.653 0.039

0.164 0.242 0.176 0.188 0.193 0.104

0.000 0.000 0.000 0.000 0.000 0.707

4.020 5.080 3.537 -1.954 -1.827 0.021

0.164 0.242 0.176 0.189 0.194 0.105

0.000 0.000 0.000 0.000 0.000 0.842

42

Table 6 Temporary Removal of Paywall Incidents (a) Did the newspaper remove its paywall? Event The New York Times The Wall Street Journal Hurricane Irene (8/21 – 8/28/2011) Yes Yes Hurricane Sandy (10/29 – 11/2/2012) Yes Yes 2012 Presidential Election Yes Yes Paris Terror Attack (11/14/2015) Yes No 2016 Presidential Election (11/9/2016) Yes No Hurricane Harvey (8/26/2017) Yes Yes Hurricane Irma (9/8/2017) Yes Yes (b) Evidence of Abnormal Pageviews

Intercept Paywall Long-run trend Irene Sandy 2012 Election Paris terror 2016 Election Harvey Irma Sunday Monday Tuesday Wednesday Thursday Friday 𝜌 (serial correlation of error term)

Dependent variable*: 𝑃𝑉𝑁𝑌𝑇,𝑡 Coef. Std. Err. p-value

Dependent variable*: 𝑃𝑉𝑊𝑆𝐽,𝑡 Coef. Std. Err. p-value

10.312 -0.250 0.467 0.243 0.142 0.367 0.592 0.788 -0.004 -0.044 0.119 0.178 0.110 0.096 0.094 0.078 0.791

8.800

0.007

0.000

1.003 0.118 0.184 0.265 0.333 0.434 -0.114 0.104 -0.095 0.389 0.414 0.409 0.408 0.366 0.463

0.010 0.105 0.141 0.126 0.069 0.054 0.351 0.382 0.006 0.007 0.009 0.009 0.009 0.007 0.014

0.000 0.260 0.190 0.036 0.000 0.000 0.746 0.786 0.000 0.000 0.000 0.000 0.000 0.000 0.000

0.024 0.026 0.023 0.042 0.046 0.121 0.063 0.025 0.115 0.196 0.005 0.007 0.007 0.008 0.007 0.006 0.010

0.000 0.000 0.000 0.000 0.002 0.003 0.000 0.000 0.974 0.824 0.000 0.000 0.000 0.000 0.000 0.000 0.000

0.887 0.946 R-squared 0.886 0.946 Adj. R-squared 2.297 2.094 Durbin-Watson stat. * 𝑃𝑉 𝑁𝑌𝑇,𝑡 is the New York Times’ pageviews on day 𝑡; 𝑃𝑉𝑊𝑆𝐽,𝑡 is the Wall Street Journal’s pageviews on day 𝑡;

43

Table 7. Effects of Temporary Paywall Removal on Daily Pageviews Dependent variable: 𝜀̂𝑁𝑌𝑇,𝑡 − 𝜀̂𝑊𝑆𝐽,𝑡 (Incremental pageviews of NYT compared to WSJ) Coefficient Standard Error p-value Intercept -0.001 0.003 0.741 Irene 0.032 0.048 0.508 Sandy 0.004 0.061 0.949 2012 Election 0.131 0.096 0.173 Paris terror 0.263 0.136 0.052 2016 Election 0.504 0.096 0.000 Harvey 0.149 0.136 0.272 Irma -0.071 0.192 0.712 R-squared Adj. R-squared Durbin-Watson stat.

0.012 0.010 2.173

44

Figure 1 Conceptual Framework •

Moderating variables (on long-term effect of paywall)

• •

Content policy: Composition of news sections, content uniqueness, political slant Price policy: Price level, free articles Other variables: Print circulation volume, timing of paywall Introduction.

Paywall decisions •

Erection of paywall (Long-term effect) Temporary removal (Temporary effect)



Daily pageviews

Non-marketing (confounding) factors • • •

Daily news importance Seasonality Trend in web traffic observed commonly across multiple newspapers

45

Figure 2 Daily Pageviews of Selected Newspapers (with Paywall Introduction Dates) The New York Times (3/28/2011) 120,000 100,000 80,000 60,000 40,000 20,000 0

2010

2011

2012

2013

2014

2015

2016

2017

2016

2017

2016

2017

Los Angeles Times (3/5/2012) 20,000 16,000 12,000 8,000 4,000 0

2010

2011

2012

2013

2014

2015

Chicago Tribune (11/1/2012) 7,000 6,000 5,000 4,000 3,000 2,000 1,000 0

2010

2011

2012

2013

2014

2015

Note. The vertical lines represent the time of paywall introduction of each newspaper. The Y-axis represents the daily pageviews per million internet users.

46

Figure 3 Long-Run Trend and News Index (a) Long-Run Trend 0.8 0.4 0.0 -0.4 -0.8 -1.2

2010

2011

2012

2013

2014

2015

2016

2017

2.5 percentile of calibrated long-run trend 97.5 percentile of calibrated long-run trend

(b) News Index 20 (j) 16 (c)

12 8

(f) (e) (a)

(b)

(d)

(i)

(g) (h)

(k)

4 0 -4

2010

2011

2012

2013

2014

2015

2016

2017

(a) November 3, 2010: U.S. midterm election results (b) March 11, 2011: Japan tsunami (c) May 2, 2011: Death of Osama bin Laden (d) October 6, 2011: Death of Steve Jobs (e) February 12, 2012: Death of Whitney Houston (f) November 7, 2012: Presidential election (g) December 15, 2012: Shooting at Sandy Hook Elementary School in Connecticut (h) April 19, 2013: Boston Marathon bombing (i) November 14, 2015: Paris terror attack (j) November 9, 2016: Presidential election (k) October 2, 2017: Las Vegas shooting

47

Figure 4 Actual and Counterfactual Pageviews (For Akron Beacon Journal that introduced a paywall on July 1, 2014 ) (a) Actual Pageviews 600 500 400 300 200 100 0 2010

2011

2012

2013

2014

2015

2016

2017

(b) Predicted and Counterfactual Pageviews (Precited: red solid, Counterfactual: blue dashed) 600 500 400 300 200 100 0 2010

2011

2012

2013

2014

2015

2016

2017

Predicted pageviews are generated by assuming that the newspaper introduced its paywall on July 1, 2014. Counterfactual pageviews are generated by assuming that the newspaper did not introduce a paywall. Both are generated by M4 (Equation (1-1)).

48

Appendix

Appendix A: Modeling and Calibrating Common Factors Long-Run Exogenous Trends We use a local-level model (Durbin and Koopman 2012) to estimate the exogenous trend in a newspaper’s daily pageviews. A local-level model is a parsimonious but flexible parametric approach to extracting a nonlinear long-run trend from time-series data.1 Marketing literature has used extended versions of a local-level model (e.g., a local linear trend model) to identify flexible trends (e.g., Du and Kamakura 2012). Equations (A.1) and (A.2) represent a long-run exogenous trend (𝑇𝑟𝑒𝑛𝑑𝑡 ) in a local-level model: (A.1) (A.2)

𝑇𝑟𝑒𝑛𝑑𝑡 = 𝑇𝑟𝑒𝑛𝑑𝑡−1 + Υ𝑡−1 + 𝜁𝑇,𝑡 Υ𝑡 = Υ𝑡−1 + 𝜁Υ,𝑡 ,

where 𝜁𝑇,𝑡 and 𝜁Υ,𝑡 are error terms that follow a normal distribution with mean zero. Note that Equations (A.1) and (A.2) reduce to a deterministic linear trend model if Υ𝑡 = 1 and 𝜁𝑇,𝑡 = 𝜁Υ,𝑡 = 0 for all 𝑡. As such, we let the data determine the long-run exogenous trend rather than a priori imposing a specific trend shape. To identify long-run trends created by external factors, we search for newspapers whose pageviews would not be influenced by the focal newspapers’ introductions of paywalls. Specifically, we identify newspapers that introduced paywalls before our analysis period and have maintained them for a long time. Our identification assumption is that when a newspaper

1

A metered paywall system allows nonsubscribers to access a set number of free articles per month. Newspapers with a soft paywall usually adopt a “leaky” system, which allows search engines to index all articles behind the paywall and users to share articles on social media. In contrast, a hard paywall requires a paid subscription to access any content. This type of paywall is rare among newspapers, but more common among B2B publications. In the U.S., only The Wall Street Journal uses a hard paywall.

49

introduces a paywall, its readers switch to a free news source rather than another paid news source. For example, when the Los Angeles Times introduces a paywall, its readers are likely to switch to USA Today (a non-paywalled newspaper) in search of free online articles, but not to the Wall Street Journal (a paywalled newspaper). Four newspapers adopted paywalls before our analysis period: The Wall Street Journal, the Albuquerque Journal, the Arkansas DemocratGazette, and Newsday. Of these, we use the daily reach of the Wall Street Journal, which introduced its paywall in 1997. As a national newspaper, the Wall Street Journal’s daily pageviews are likely to represent common trends found across multiple U.S. newspapers. Figure A.1 shows the daily pageviews of the Wall Street Journal. Indeed, its daily pageviews appear to have a similar gradual long-run trend to that found in Figure 2. Also, the Wall Street Journal’s unique hard paywall system makes it difficult for a random internet user to access its online articles for free; the introductions of paywalls by our focal newspapers are not likely to alter the Wall Street Journal’s pageviews substantially enough to affect the long-run trend identified from the paper. Web Appendix W3 shows the detailed calibration process. Figure 3(a) in the main text shows the calibrated long-run trend from the Wall Street Journal’s log-transformed pageviews. The long-run trend is more complex than linear, supporting the use of the complex trend model in Equations (A.1) and (A.2). ==Figure A.1 about here== To assess the legitimacy of using Wall Street Journal to identify a long-run exogenous trend, we test whether the introduction of paywalls by other newspapers in our sample affected Wall Street Journal’s daily pageviews in Table A.1. We show that no paywalls affected the daily pageviews of Wall Street Journal, supporting our identification assumption. Because Wall Street Journal’s daily pageviews are not affected by other newspapers’ paywalls, they can be used to extract the “exogenous” long-run trend.

50

==Table A.1 about here==

Daily News Index (𝑵𝒆𝒘𝒔𝒕) We use the number of daily “unique visitors” of various news media sites (e.g., CNN, FOX news) to infer the daily news significance (i.e., the collective significance of the news on a given day). The idea behind using unique visitors is that while heavy readers of a newspaper may regularly access newspaper websites, causal readers may access newspaper websites only when an important news occurs. Thus, by using the daily unique visitors of various news media websites, we can infer how many users news attracts on a certain day. The extent to which news apples to readers can be interpreted as daily news importance. Let 𝑦𝑘𝑡 be the number of daily unique visitors (or daily reach) of news media site 𝑘 on day 𝑡. We apply Equation (A.3) to extract a daily news significance score:

(A.3)

𝑦1𝑡 𝑦2𝑡 [ ⋮ ]= 𝑦𝐾𝑡

𝑣1𝑡 𝜆1 𝑣 𝜆 2𝑡 [ 2 ] 𝑁𝑒𝑤𝑠𝑡 + [𝑇𝑟𝑒𝑛𝑑𝑡 , 𝑆𝑒𝑎𝑠𝑜𝑛𝑎𝑙𝑖𝑡𝑦𝑡 ] + [ ⋮ ] ⋮ 𝑣𝐾𝑡 𝜆𝐾

𝑁𝑒𝑤𝑠𝑡 = 𝑤𝑡 , where 𝐾 is the number of news media websites used to quantify the News Index, and [𝑇𝑟𝑒𝑛𝑑𝑡 , 𝑆𝑒𝑎𝑠𝑜𝑛𝑎𝑙𝑖𝑡𝑦𝑡 ] is a linear function of the long-run trend and day-of-week seasonality, to control for their effects on news media web traffic. We assume that the error terms 𝑣𝑘𝑡 and 𝑤𝑡 follow a normal distribution with mean zero: 𝑣𝑘𝑡 ~𝑁(0, 𝑉𝑘 ) and 𝑤𝑡 ~𝑁(0, 𝑊). We apply Equation (A.3) to daily reach of news media websites. Note that one could use the simple average (or the sum) of the daily reach of the 𝐾 news 𝐾 websites to measure daily news importance (i.e., 𝑁𝑒𝑤𝑠𝑡 = ∑𝐾 𝑘=1 𝑦𝑘𝑡 ⁄𝐾 or 𝑁𝑒𝑤𝑠𝑡 = ∑𝑘=1 𝑦𝑘𝑡 ).

Note that this simple average is a special case of Equation (A.3), with 𝜆𝑘 = 1 for all 𝑘 and 𝑣𝑘𝑡 = 0 for all 𝑘 and 𝑡. Thus, by allowing 𝜆𝑘 to vary across news websites and having a nonzero 𝑣𝑘𝑡 ,

51

Equation (A.3) incorporates the notions that different news websites may have different levels of success in attracting online readers during important news events and that news importance may not be the only factor that determines the web traffic of a news site. We calibrate the daily News Index using the daily reach of two international news agencies and five large television news media websites. The two news agencies are Reuters and the Associated Press. The five television news websites are CNN, ABC News, CBS News, Fox News, and NBC News. We cast Equation (A.3) in a Bayesian Dynamic Linear Model (DLM) framework (West and Harrison 1997) and apply the Kalman filtering/smoothing algorithm (Carter and Kohn 1994, Fruhwirth-Schnatter 1994) to calibrate the model. Web Appendix W4 explains the estimation procedure in detail. Figure 3(b) in the main text shows that the proposed model not only picks up important news events but also measures news importance on a daily basis. Appendix B: Online vs. Offline: Substitutes or Complements? It is often argued that online paywalls are designed to induce consumers to subscribe to print newspapers and discourage existing subscribers from canceling print subscriptions (Grueskin et al. 2011). Thus, it is important to understand how paywalls influence both print and digital subscriptions. Specifically, how does a newspaper’s adoption of a paid digital subscription model affect its print circulation? Because we observe the paywall introduction of online newspapers, we rely on the standard definition of substitutability (goods A and B are substitutes if an increase in the price of good B increases the sales of good A) to determine whether online and offline newspapers are substitutes. That is, if the paywall introduction of a newspaper increases or decelerates the declining trend of its print circulation, then online and offline news outlets are substitutes for the newspaper.

52

Figure B.1 shows the average semi-annual print circulation volume of the weekday editions (Monday–Friday) of the three national newspapers (The Wall Street Journal, USA Today, and The New York Times)2 from March 2008 to September 2014. One might simply compare the average print circulation volume of The New York Times before and after its paywall adoption, since only The New York Times changed its paywall policy during our sample period. However, the analysis may be confounded by the downward trend that was prevalent before the paywall rollout. We should control for the trend to properly measure the paywall impact on print circulation, and thus the relationship between online and offline news outlets. To this end, we exploit the fact that the other two national newspapers did not change their paywall decisions during the analysis period. That is, The Wall Street Journal and USA Today constitute the control group and the print circulation volume of The New York Times is evaluated against that of the control group newspapers. Table B.1 shows the average print circulation volumes of the three newspapers before and after The New York Times’ paywall introduction. As a twogroup before–after design, the treatment effect (i.e., the effect of The New York Times’ paywall on its print circulation volume) is computed by subtracting the difference in the circulation of the control newspapers from that of the focal newspaper. The treatment effect is −288,295 − (−522,832) = 234,537 for The Wall Street Journal and −288,295 − (−669,562) = 381,267 for USA Today. The positive treatment effects suggest that the print and online newspapers are substitutes for The New York Times. ==Figure B.1 about here== ==Table B.1 about here== More formally, we can use the difference-in-difference estimation to examine the effect of The New York Times’ paywall introduction on its print subscription. Let 𝐶𝑖𝑟𝑐𝑢𝑙𝑎𝑡𝑖𝑜𝑛𝑁𝑌𝑇,𝑠 be

53

the print circulation volume of The New York Times in the semi-annual period 𝑠, 𝑃𝑎𝑦𝑤𝑎𝑙𝑙𝑁𝑌𝑇,𝑠 be a step-dummy variable that is 0 before the newspaper’s paywall introduction and 1 thereafter, and let 𝑇𝑖𝑚𝑒𝐼𝑛𝑑𝑒𝑥𝑠 be a linear time index (𝑇𝑖𝑚𝑒𝐼𝑛𝑑𝑒𝑥𝑀𝑎𝑟−08 = 1, 𝑇𝑖𝑚𝑒𝐼𝑛𝑑𝑒𝑥𝑆𝑒𝑝−08 = 2, … , 𝑇𝑖𝑚𝑒𝐼𝑛𝑑𝑒𝑥𝑆𝑒𝑝−14 = 14). Then, the effect of The New York Time’s paywall introduction on its print circulation volume can be modeled as in Equation (B.1). (B.1)

𝐶𝑖𝑟𝑐𝑢𝑙𝑎𝑡𝑖𝑜𝑛𝑁𝑌𝑇,𝑠 = 𝛾𝑁𝑌𝑇,0 + 𝛾𝑁𝑌𝑇,1 𝑃𝑎𝑦𝑤𝑎𝑙𝑙𝑁𝑌𝑇,𝑠 + 𝛾𝑁𝑌𝑇,2 𝑇𝑖𝑚𝑒𝐼𝑛𝑑𝑒𝑥𝑠 + 𝜀𝑁𝑌𝑇,𝑠 ,

where 𝑇𝑖𝑚𝑒𝐼𝑛𝑑𝑒𝑥𝑠 is added to control for the downward trend in the print circulation. Positive 𝛾𝑁𝑌𝑇,1 suggests that The New York Times’s online and print newspapers are substitutes. Further, let 𝐶𝑖𝑟𝑐𝑢𝑙𝑎𝑡𝑖𝑜𝑛𝑐𝑜𝑛𝑡𝑟𝑜𝑙,𝑠 be the sum of the print circulation volume of the control newspapers, i.e., The Wall Street Journal and USA Today, in period 𝑠. Because the two newspapers did not change their paywall policies during the analysis period, 𝐶𝑖𝑟𝑐𝑢𝑙𝑎𝑡𝑖𝑜𝑛𝑐𝑜𝑛𝑡𝑟𝑜𝑙,𝑠 is modeled as in Equation (B.2). 𝐶𝑖𝑟𝑐𝑢𝑙𝑎𝑡𝑖𝑜𝑛𝑐𝑜𝑛𝑡𝑟𝑜𝑙,𝑠 = 𝛾𝑐𝑜𝑛𝑡𝑟𝑜𝑙,0 + 𝛾𝑐𝑜𝑛𝑡𝑟𝑜𝑙,2 𝑇𝑖𝑚𝑒𝐼𝑛𝑑𝑒𝑥𝑠 + 𝜀𝑐𝑜𝑛𝑡𝑟𝑜𝑙,𝑠 ,

(B.2)

Applying the difference-in-difference approach to Equations (B.1) and (B.2) leads to Equation (B.3). (B.3)

∆[𝐶𝑖𝑟𝑐𝑁𝑌𝑇,𝑠 − 𝐶𝑖𝑟𝑐𝑐𝑜𝑛𝑡𝑟𝑜𝑙,𝑠 ] = (𝛾𝑁𝑌𝑇,2 − 𝛾𝑐𝑜𝑛𝑡𝑟𝑜𝑙,2 ) + 𝛾𝑁𝑌𝑇,1 ∆[𝑃𝑎𝑦𝑤𝑎𝑙𝑙𝑁𝑌𝑇,𝑠 ] +∆[𝜀𝑁𝑌𝑇,𝑠 − 𝜀𝑐𝑜𝑛𝑡𝑟𝑜𝑙,𝑠 ].

Equation (B.3) is estimated using the ordinary least squares method. Because the error term in Equation (B.3) is subject to autocorrelation, we estimate the Newey–West heteroskedasticityand autocorrelation-consistent standard errors (Wooldridge 2010). Table B.2 shows the estimation results, confirming that The New York Times’ online and print newspapers are substitutes. ==Table B.2 about here==

54

Appendix References

Carter, CK, Kohn R (1994) On Gibbs sampling for state space models. Biometrika 81(3): 541553. Du, RY, Kamakura WA (2012) Quantitative trendspotting. Journal of Marketing Research 49 (4):514-536. Durbin, J, Koopman SJ (2012) Time Series Analysis by State Space Methods, 2nd Edition (Oxford University Press, New York). Fruhwirth-Schnatter, S (1994) Data augmentation and dynamic linear models. Journal of Time Series Analysis 15(2):183-202. Grueskin, B, Seave A, Grves L (2011) The Story So Far: What We Know about the Business of Digital Journalism, Columbia Journalism School. West, M, Harrison, J (1997) Bayesian Forecasting and Dynamic Models (Springer, New York). Wooldridge JM (2010) Econometric Analysis of Cross Section and Panel Data. The MIT Press.

55

Appendix Footnotes

1. One might want to use nonparametric approaches such as a Hodrick–Prescott filter and a frequency filter to extract the long-run trend. We confirmed in a separate analysis that the longrun trends extracted by nonparametric approaches were not substantially different from the one extracted by the local-level model, a parametric approach. 2. The U.S. newspaper industry classifies three newspapers as national newspapers: The Wall Street Journal, USA Today, and The New York Times.

56

Table A.1 Effects of Paywall of Individual Newspapers on Daily Pageviews of The Wall Street Journal DV: log of daily pageviews of The Wall Street Journal Newspaper

Effect of Paywall

Akron Beacon Journal Asbury Park Press Chicago Sun-Times Chicago Tribune Daily Herald Democrat and Chronicle Hartford Courant Indianapolis Star Knoxville News Sentinel Lexington Herald-Leader Los Angeles Times Miami Herald Milwaukee Journal Sentinel Orlando Sentinel Pioneer Press San Jose Mercury News Star Tribune Sun Sentinel The Arizona Republic The Atlanta Journal Constitution The Baltimore Sun The Blade The Buffalo News

0.003 0.003 -0.002 -0.002 0.001 0.004 0.001 0.003 0.002 -0.003 0.001 -0.002 0.000 0.004 -0.002 -0.002 -0.002 0.005 0.003 0.001 -0.002 -0.003 -0.002

S.E. 0.006 0.005 0.005 0.005 0.005 0.005 0.006 0.005 0.005 0.005 0.005 0.005 0.005 0.005 0.005 0.005 0.005 0.005 0.005 0.005 0.005 0.005 0.005

p-Value 0.564 0.578 0.686 0.673 0.897 0.431 0.866 0.578 0.740 0.517 0.861 0.725 0.952 0.398 0.674 0.757 0.711 0.327 0.558 0.813 0.728 0.569 0.642

Newspaper

Effect of Paywall

The Charlotte Observer The Cincinnati Enquirer The Columbus Dispatch The Courier-Journal The Dallas Morning News The Denver Post The Des Moines Register The Fresno Bee The Kansas City Star The Morning Call The New York Times The News & Observer The News Journal The News Tribune The Orange County Register The Palm Beach Post The Philadelphia Inquirer The Post and Courier The Sacramento Bee The State Tulsa World Wisconsin State Journal

-0.002 0.002 0.002 0.003 0.000 -0.002 0.003 -0.003 -0.003 -0.002 0.002 -0.002 0.001 0.002 0.001 0.001 0.002 0.004 0.003 -0.003 0.000 0.001

S.E. 0.005 0.005 0.005 0.005 0.005 0.005 0.005 0.005 0.005 0.005 0.006 0.005 0.005 0.005 0.006 0.005 0.005 0.005 0.005 0.005 0.005 0.005

Note. N = 2,922 for each newspaper

Table B.1 Average Print Circulation Before and After The New York Times’ Paywall Before paywall (𝑎) 1,912,775 2,011,098 945,107

The Wall Street Journal USA Today The New York Times

After paywall (𝑏) 1,389,943 1,341,535 656,811

Difference before and after (𝑏 − 𝑎) -522,832 -669,562 -288,295

Table B.2 Difference-in-Difference Estimation Results Intercept ∆[𝑃𝑎𝑦𝑤𝑎𝑙𝑙𝑁𝑌𝑇,𝑡 ] R2

Note. N = 14 semiannual periods.

Estimate 113,244.5 98,057.5 0.057

Std. Err 24,104.7 24,104.7

t-stat 4.6980 4.0680

p-Value 0.724 0.716 0.717 0.564 0.938 0.757 0.537 0.531 0.526 0.728 0.779 0.724 0.909 0.619 0.913 0.813 0.656 0.431 0.520 0.517 0.940 0.913

57

Figure A.1 The Wall Street Journal Daily Pageviews 30,000 25,000 20,000 15,000 10,000 5,000 0

2010

2011

2012

2013

2014

2015

2016

2017

Figure B.1 Average Monday – Friday Print Circulation of National Newspapers 2500000 2000000 1500000 1000000

The Wall Street Journal

USA Today

The New York Times

Sep-14

Mar-14

Sep-13

Mar-13

Sep-12

Mar-12

Sep-11

Mar-11

Sep-10

Mar-10

Sep-09

Mar-09

Sep-08

0

Mar-08

500000

58

Web Appendix

To infer the degree of content uniqueness and the composition of news content of each newspaper, we collect online news articles published in 96 major U.S. newspapers in 2015. The total number of online news articles in our sample is 4,332,315 and the daily average number of articles is 125. Because of the large number of news articles collected, we randomly select 30% of the articles for empirical analysis1 and apply LDA topic modeling to the selected articles. The LDA analysis estimates the probability distribution of 300 topics in each newspaper 2 and identifies 20 keywords most commonly observed in each topic. Appendix W1 explains the procedure to calibrate content uniqueness index of a newspaper. Appendix W2 explains the procedure to calibrate content composition (the proportion of articles that belong to specific news sections) of a newspaper. Appendix W1: Content Uniqueness Index W1.1. Overview of the Procedure A typical newspaper contains multiple topics. For example, in more than 10 sections, articles of The New York Times cover a wide range of topics including U.S. politics, world economy, local weather, and the New York Yankees baseball team. The underlying assumption of LDA topic modeling is that a newspaper article (or a document) covers different topics in a probabilistic fashion. For example, a news article may deal with three topics with varying weights: Barack Obama (20%), Obamacare (60%), and the U.S. economy (20%), while another article may consist of only two topics: the 2017 Super Bowl (50%) and the New England Patriots football team (50%). Applying LDA topic modeling to sample news articles estimates the probability distributions (i.e., the relative weights) of topics in individual articles. We aggregate the article-

59

level topic weights to the newspaper level. This aggregation procedure computes the weight with which a topic is covered in individual newspapers. If a topic is treated by multiple newspapers (i.e., if the probability distribution of a topic is relatively evenly distributed across many newspapers), the topic is more likely to be a common topic. For example, news on U.S. politics will be covered by multiple newspapers, regardless of whether the newspapers are small-town newspapers or national newspapers. On the other hand, if a topic is treated predominantly by one newspaper (i.e., if the probability distribution of a topic is concentrated in a newspaper), then the topic is more likely to be a unique topic. Thus, by applying the entropy concept (Godes and Mayzlin 2004) to the dispersion of topics across newspapers, we can measure the degree to which a topic is a common (vs. unique) topic. Then, we combine the entropy measure of the topics with the weight that they are covered by each newspaper to compute the content uniqueness index of a newspaper. W1.2. Modeling W1.2.1. Topic Distribution Let 𝑖 represent the newspaper index (𝑖 = 1, 2, … , 𝑁, where 𝑁 is the number of newspapers), 𝑗 represent the article index (𝑗 = 1, 2, … , 𝐽𝑖 , where 𝐽𝑖 is the number of articles collected from newspaper 𝑖), 𝐴𝑗𝑖 represent the 𝑗’s article of newspaper 𝑖, and 𝑘 represent the topic index (𝑘 = 1, 2, … , 𝐾, where 𝐾 is the total number of topics to be identified). An LDA topic modeling analysis estimates the probability distribution of a topic in an article of a newspaper, or the probability that topic 𝑘 is included in article 𝑗 of newspaper 𝑖, Pr(𝑘|𝐴𝑗𝑖 ). It also lists the keywords of each topic so that a researcher can interpret the topic based on the keywords. The researcher should specify the number of topics a priori. Figure W1.1 illustrates schematically the probability distribution of topics, the main output of an LDA topic modeling analysis. Note that

60 𝐴𝑗

,𝑖

𝑖 𝐾 (1) ∑𝐾 𝑘=1 Pr(𝑘|𝐴𝑗𝑖 ) = 1 because an article contains 𝐾 topics at most; (2) ∑𝑗=1 ∑𝑘=1 Pr(𝑘|𝐴𝑗𝑖 ) =

𝐴𝑗

,𝑖

𝑖 𝐾 𝐽𝑖 , number of articles of newspaper 𝑖; (3) ∑𝑁 𝑖=1 ∑𝑗=1 ∑𝑘=1 Pr(𝑘|𝐴𝑗𝑖 ) = 𝐽, total number of articles

of all newspapers in the sample. ==Figure W1.1 about here== W1.2.2. Calibration of Content Uniqueness Index Pr(𝑘|𝐴𝑗𝑖 ) can be thought of as the extent to which topic 𝑘 is covered in article 𝑗 of newspaper 𝑖. We use these probabilities to compute the extent to which a topic is covered in a newspaper (rather than in an article of a newspaper). We do this by aggregating Pr(𝑘| 𝐴𝑗𝑖 ) at the newspaper ̅̅̅̅̅̅̅̅̅̅̅̅ (𝑘| 𝐴⋅𝑖 ) be this aggregate measure. Then, ̅̅̅̅̅̅̅̅̅̅̅̅ level. Let Pr Pr(𝑘| 𝐴⋅𝑖) is defined as in Equation (W1.1). (W1.1)

̅̅̅̅̅̅̅̅̅̅̅̅ Pr(𝑘| 𝐴⋅𝑖) = 𝑎𝑣𝑒𝑟𝑎𝑔𝑒 𝑜𝑓 (Pr(𝑘| 𝐴1𝑖 ), Pr(𝑘| 𝐴2𝑖 ), … , (Pr(𝑘| 𝐴𝑗𝑖,𝑖 )) =

1 𝑗𝑖

𝑖 ∑𝑗𝑗=1 Pr(𝑘| 𝐴𝑗𝑖 ).

We use the average, instead of the sum, because the number of articles varies across newspapers. ̅̅̅̅̅̅̅̅̅̅̅̅ ̅̅̅̅̅̅̅̅̅̅̅̅ (𝑘| 𝐴⋅𝑖 ) by dividing it by ∑𝑁 We normalize Pr 𝑖=1 Pr(𝑘 | 𝐴⋅𝑖) . We denote the normalized quantity by 𝑃(𝑘|𝑖): (W1.2)

̅̅̅̅̅̅̅̅̅̅̅̅̅) Pr(𝑘|𝐴 ⋅𝑖 . ̅̅̅̅̅̅̅̅̅̅̅̅̅ Pr(𝑘|𝐴 ⋅𝑖 )

𝑃(𝑘|𝑖) = ∑𝑁

𝑖=1

Note that ∑𝑁 𝑖=1 𝑃(𝑘|𝑖) = 1 for each 𝑘. That is, 𝑃(𝑘|𝑖) can be interpreted as the relative weight that topic 𝑘 is covered by newspaper 𝑖. If 𝑃(𝑘|𝑖 ) is spread over multiple newspapers, topic 𝑘 is likely to be a common topic (e.g., a national topic that is treated by many newspapers), whereas if 𝑃(𝑘|𝑖 ) concentrates in newspaper 𝑖, topic 𝑘 is likely to be a unique topic covered mainly by newspaper 𝑖.

61

To compute the dispersion of a topic across newspapers, we use the concept of entropy. A large entropy value of a topic means that the topic is dispersed across many newspapers, and thus more likely to be a common topic. Let 𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝑘) be the entropy level of topic 𝑘. For newspaper 𝑖, we calculate the weighted average of 𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝑘)’s, with the weights given by 𝑃(𝑘|𝑖)’s. Let us denote it by 𝐶𝑜𝑚𝑚𝑜𝑛(𝑖) as in Equation (W1.3). 𝐶𝑜𝑚𝑚𝑜𝑛(𝑖) represents the extent to which newspaper 𝑖 covers common topics. (W1.3)

𝐶𝑜𝑚𝑚𝑜𝑛(𝑖) = ∑𝐾 𝑘=1 𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝑘)𝑃(𝑘|𝑖),

where 𝐶𝑜𝑚𝑚𝑜𝑛(𝑖) is positive because 𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝑘) is positive. Then, our measure of uniqueness of newspaper 𝑖 is computed as in Equation (W1.4) (W1.4)

𝑈𝑛𝑖𝑞𝑢𝑒𝑛𝑒𝑠𝑠(𝑖) = −𝐶𝑜𝑚𝑚𝑜𝑛(𝑖) + 𝐹,

where 𝐹 is an arbitrary constant added to ensure that 𝑈𝑛𝑖𝑞𝑢𝑒𝑛𝑒𝑠𝑠(𝑖) is a positive value. Figure W1.2 shows the estimated content uniqueness index with various numbers of topics. ==Figure W1.2 about here== Appendix W2: Calibration of Content Composition of a Newspaper Three human coders examined the 300 topics and discussed the news section that each topic is most likely to belong to. The identified keywords in each topic guided the coders to determine the most likely news section for the topic. For example, if a topic contains keywords such as Obama, election, and representative, then the coders classified the topic as a politics topic. The coders identified seven sections: Politics, Business/Economics/Market, Technology/Science/Health/Environment, Lifestyle/Culture/Entertainment, Sports, Social issue, and Other. Each of the 300 topics was assigned at least one of the seven categories. Then, the researchers combined the topic-section matching results (from human coders) with the topic probability of each newspaper (i.e., 𝑃(𝑘|𝑖) in Equation (W1.2)) to calibrate the content

62

composition measure. Let 𝑠 denote a news section (𝑠 = 𝑃𝑜𝑙𝑖𝑡𝑖𝑐𝑠, 𝐵𝑢𝑠𝑖𝑛𝑒𝑠𝑠/𝐸𝑐𝑜𝑛𝑜𝑚𝑖𝑐𝑠/ 𝑀𝑎𝑟𝑘𝑒𝑡, … , 𝑂𝑡ℎ𝑒𝑟), 𝐷(𝑘, 𝑠) is an indicator function that is 1 if the coders classify topic 𝑘 into section 𝑠, and let 𝑃(𝑠|𝑖) be the proportion of section 𝑠 articles in newspaper 𝑖. 𝑃(𝑠|𝑖) is our measure of content composition of newspaper 𝑖, which is computed as follows: (W2.1)

𝑃(𝑠|𝑖) = ∑300 𝑘=1 𝐷 (𝑘, 𝑠 )𝑃(𝑘|𝑖).

Note that 𝑃 (𝑠|𝑖 ) ≥ 0 and ∑𝑎𝑙𝑙 𝑠 𝑃(𝑠|𝑖) = 1.

63

Appendix W3: Content Characteristics of the Newspapers Included in the Analysis Newspaper (𝑖) Akron Beacon Journal Asbury Park Press Chicago Sun-Times Chicago Tribune Daily Herald Democrat and Chronicle The Fresno Bee Knoxville News Sentinel Lexington Herald-Leader Los Angeles Times Miami Herald Milwaukee Journal Sentinel Orlando Sentinel Pioneer Press San Jose Mercury News Star Tribune Sun Sentinel The Arizona Republic The Atlanta Journal Constitution The Baltimore Sun The Blade The Buffalo News The Charlotte Observer The Cincinnati Enquirer The Columbus Dispatch The Courier-Journal The Dallas Morning News The Denver Post The Des Moines Register Hartford Courant Indianapolis Star The Kansas City Star The Morning Call The New York Times The News & Observer The News Journal The News Tribune The Orange County Register The Palm Beach Post The Philadelphia Inquirer The Post and Courier The Sacramento Bee The State Tulsa World Wisconsin State Journal *

Content uniqueness 0.2821 3.1675 2.4343 2.0478 2.2232 1.0998 2.0112 2.2428 1.6115 1.5961 1.9445 2.1341 1.5078 2.3740 2.2156 1.9262 2.0152 1.0646 3.2636 1.3174 2.7597 3.1936 4.8119 1.5950 2.5161 1.7982 2.3967 1.0380 3.0729 1.3145 2.1453 1.6247 2.2129 0.7076 2.3537 2.5026 1.6589 2.0569 4.3096 1.6507 2.9375 1.9583 2.4641 4.4126 2.7673

𝑃(𝑃|𝑖)* 0.0765 0.0593 0.1249 0.0886 0.0948 0.0876 0.1103 0.0920 0.1144 0.0955 0.1332 0.0865 0.0727 0.0956 0.0639 0.1400 0.0863 0.0849 0.0558 0.1703 0.1068 0.0591 0.0864 0.0814 0.0918 0.0516 0.0931 0.1213 0.1025 0.1044 0.0572 0.0933 0.0753 0.1458 0.1048 0.1692 0.1281 0.0809 0.0384 0.1314 0.1022 0.1255 0.1090 0.0299 0.1350

𝑃(𝐵𝐸|𝑖)* 0.0567 0.0965 0.0928 0.1422 0.1320 0.1208 0.0917 0.1057 0.1057 0.1051 0.1222 0.1426 0.1320 0.1261 0.1152 0.1752 0.1214 0.1001 0.1123 0.0908 0.1367 0.1836 0.0889 0.1188 0.1286 0.0776 0.1407 0.1168 0.1216 0.1115 0.0758 0.1146 0.1228 0.1217 0.1159 0.1192 0.1171 0.1180 0.0765 0.1363 0.1644 0.1230 0.1050 0.2012 0.1067

𝑃(𝑇𝑆|𝑖)* 0.1229 0.0498 0.0322 0.0483 0.0508 0.0605 0.0511 0.0553 0.0621 0.0638 0.0522 0.0531 0.0442 0.0531 0.0659 0.0655 0.0459 0.0501 0.0713 0.0459 0.0556 0.0457 0.0356 0.0480 0.0632 0.0412 0.0572 0.0543 0.0549 0.0502 0.0325 0.0475 0.0413 0.0580 0.0518 0.0531 0.0602 0.0477 0.0382 0.0646 0.0518 0.0615 0.0485 0.1803 0.0407

𝑃(𝑆𝑝|𝑖)* 0.2494 0.1909 0.2489 0.1511 0.2071 0.1613 0.2719 0.1465 0.2977 0.1835 0.1456 0.2237 0.1221 0.1711 0.1788 0.1540 0.1596 0.1497 0.0850 0.2120 0.1024 0.1439 0.1754 0.1788 0.1570 0.2637 0.2326 0.2006 0.1368 0.1381 0.3554 0.3030 0.1603 0.1843 0.2765 0.1557 0.1945 0.2183 0.1646 0.1899 0.0683 0.2199 0.2834 0.1532 0.1676

𝑃(𝐿𝐸|𝑖)* 0.2154 0.1966 0.0722 0.1403 0.1107 0.1805 0.1341 0.1455 0.0952 0.2399 0.2006 0.1722 0.2333 0.1436 0.1466 0.1238 0.1912 0.2897 0.1227 0.1375 0.1075 0.1866 0.1067 0.1992 0.1075 0.2686 0.1259 0.1576 0.1326 0.1642 0.1420 0.1169 0.2539 0.1551 0.1290 0.1531 0.1474 0.1503 0.0929 0.1414 0.2117 0.1238 0.1220 0.0741 0.2012

𝑃(𝑆𝑜|𝑖) * 0.1959 0.2442 0.3403 0.3132 0.2639 0.3253 0.2549 0.3349 0.2281 0.2161 0.2710 0.2084 0.2954 0.2991 0.2974 0.2528 0.3052 0.2506 0.2201 0.2843 0.3062 0.3045 0.4568 0.2712 0.2319 0.1932 0.2404 0.2950 0.3737 0.3141 0.2552 0.2475 0.2928 0.1835 0.2368 0.2865 0.2742 0.3100 0.1247 0.2784 0.3210 0.2454 0.2374 0.1359 0.2078

𝑃(𝑂𝑡ℎ𝑒𝑟|𝑖)* 0.0832 0.1628 0.0887 0.1163 0.1407 0.0640 0.0860 0.1200 0.0968 0.0961 0.0753 0.1136 0.1003 0.1114 0.1321 0.0888 0.0904 0.0749 0.3329 0.0592 0.1848 0.0766 0.0503 0.1025 0.2201 0.1041 0.1100 0.0543 0.0780 0.1175 0.0818 0.0772 0.0536 0.1517 0.0852 0.0632 0.0786 0.0748 0.4647 0.0579 0.0806 0.1009 0.0946 0.2254 0.1410

𝑃(𝑃|𝑖): Proportion of politics related news of newspaper 𝑖; 𝑃(𝐵𝐸 |𝑖): Proportion of business/economics/market related News; 𝑃(𝑇𝑆|𝑖): Proportion of technology/science/health/environment related news; 𝑃(𝐿𝐸|𝑖): Proportion of lifestyle/entertainment/culture related news; 𝑃(𝑆𝑜|𝑖): Proportion of news related to general social issues (e.g., car accident in the local area); 𝑃(𝑂𝑡ℎ𝑒𝑟|𝑖): Proportion of all other news (e.g., weather)

64

Appendix W4: MCMC Procedure to Estimate Long-Run Exogenous Trends Let 𝑃𝑉𝑊𝑆𝐽,𝑡 be the log-transformed daily pageviews of the Wall Street Journal website. We model the long-run exogenous trend (𝑇𝑟𝑒𝑛𝑑𝑡 ) as in Equations (W4.1)–(W4.3): (W4.1)

𝑃𝑉𝑊𝑆𝐽,𝑡 = 𝛼𝑊𝑆𝐽,0 + 𝛼𝑊𝑆𝐽,2 𝑃𝑉𝑊𝑆𝐽,𝑡−1 +𝛼𝑊𝑆𝐽,3 𝑇𝑟𝑒𝑛𝑑𝑡 + 𝛼𝑊𝑆𝐽,4 𝑁𝑒𝑤𝑠𝑡 + 𝛂′𝑊𝑆𝐽,5 𝐃𝐚𝐲𝐨𝐟𝐖𝐞𝐞𝐤 𝑡 + 𝜀𝑊𝑆𝐽,𝑡

(W4.2) (W4.3)

𝑇𝑟𝑒𝑛𝑑𝑡 = 𝑇𝑟𝑒𝑛𝑑𝑡−1 + Υ𝑡−1 + 𝜁𝑇,𝑡 Υ𝑡 = Υ𝑡−1 + 𝜁Υ,𝑡 .

Equation (W4.1) is the same as Equation (1-1), except that it does not have 𝑃𝑎𝑦𝑤𝑎𝑙𝑙𝑊𝑆𝐽,𝑡 because the Wall Street Journal introduced its paywall before our analysis period. Equations (W4.2) and (W4.3), a local-level model, capture flexible nonlinear long-run trends in 𝑃𝑉𝑊𝑆𝐽,𝑡. We represent Equations (W4.1) through (W4.3) in a Bayesian DLM as in Equations (W4.4) and (W4.5), where (W4.4) is the observation equation and (W4.5) is the state equation. (W4.4)

[𝛼𝑊𝑆𝐽,3 𝑃𝑉 ⏟ 𝑊𝑆𝐽,𝑡 = ⏟ 𝐅′

𝑦𝑡

(W4.5)

𝑇𝑟𝑒𝑛𝑑𝑡 0] [ ] + 𝑑𝑊𝑆𝐽,𝑡 + 𝜀𝑊𝑆𝐽,𝑡 ⏟ Υ𝑡 𝛉𝑡

𝜁𝑇,𝑡 𝑇𝑟𝑒𝑛𝑑𝑡 1 1 𝑇𝑟𝑒𝑛𝑑𝑡−1 [ ]=[ ][ ] + [ ], ⏟ Υ Υ 0 1 ⏟ 𝑡−1 ⏟ 𝑡 ⏟𝜁Υ,𝑡 𝛉𝑡

𝐇

𝛉𝑡−1

𝛇𝑡

where 𝑑𝑊𝑆𝐽,𝑡 = 𝛼𝑊𝑆𝐽,0 + 𝛼𝑊𝑆𝐽,2 𝑃𝑉𝑊𝑆𝐽,𝑡−1 + 𝛼𝑊𝑆𝐽,4 𝑁𝑒𝑤𝑠𝑡 + 𝛂′𝑊𝑆𝐽,5 𝐃𝐚𝐲𝐨𝐟𝐖𝐞𝐞𝐤 𝑡 , 𝜀𝑊𝑆𝐽,𝑡 ~𝑁(0, 𝑉𝑊𝑆𝐽 ) and 𝛇𝑡 ~𝑁(0, 𝐖𝜁 ). Equation (W4.4) is called the observation equation with a drift term 𝑑𝑊𝑆𝐽,𝑡 and Equation (W4.5) is called the state equation. We apply a Gibbs sampler where a FFBS procedure is nested to draw the time-varying parameter 𝛉𝑡 . 1) Sampling 𝛉𝑡 Let us assume that the posterior distribution of 𝛉𝑡−1 is 𝛉𝑡−1 ~𝑁(𝐦𝑡−1 , 𝐂𝑡−1 ). The forwardfiltering procedure on day 𝑡 is as follows.

65

-

(a) Posterior on 𝑡 − 1: 𝛉𝑡−1 ~𝑁(𝐦𝑡−1 , 𝐂𝑡−1 ).

-

(b) Prior on 𝑡: 𝛉𝑡 ~𝑁(𝐚𝑡 , 𝐑𝑡 ) where 𝐚𝑡 = 𝐇𝐦𝑡−1 and 𝐑𝑡 = 𝐇𝐂𝑡−1 𝐇′ + 𝐖𝜁 .

-

(c) One-step ahead forecast of 𝑦𝑡 : 𝑦𝑡 ~𝑁(𝑓𝑡 , 𝐵𝑡 ) where 𝑓𝑡 = 𝐅 ′ 𝐚𝑡 + 𝑑𝑊𝑆𝐽,𝑡 , 𝐵𝑡 = 𝐹′ 𝐑𝑡 𝐹 + 𝑉𝑊𝑆𝐽 .

-

(d) Posterior on 𝑡: 𝛉𝑡 ~𝑁(𝐦𝑡 , 𝐂𝑡 ) where 𝐦𝑡 = 𝐚𝑡 + 𝐑𝑡 𝐅𝐵𝑡−1 (𝑦𝑡 − 𝑓𝑡 ), 𝐂𝑡 = 𝐑𝑡 − 𝐑𝑡 𝐅𝐵𝑡−1 𝐅 ′ 𝐑𝑡 .

The above four steps are run forward from 𝑡 = 1 to 𝑇. Then, the backward-sampling procedure is run from 𝑡 = 𝑇 to 1 as follows. -

On 𝑡 = 𝑇: 𝛉 𝑇 ~𝑁(𝐦 𝑇 , 𝐂𝑇 )

-

On 𝑡 = 𝑇 − 1, … , 0: 𝛉𝑡 |𝛉𝑡+1 ~𝑁(𝐠 𝑡 , 𝐊 𝑡 ) where 𝐠 𝑡 = 𝐦𝑡 + 𝐂𝑡 𝐇′ 𝐑−1 𝑡+1 (𝛉𝑡+1 − 𝐚𝑡+1 ), 𝐊 𝑡 = 𝐂𝑡 − 𝐂𝑡 𝐇′ 𝐑−1 𝑡+1 𝐇𝐂𝑡 .

2) Sampling [𝛼𝑊𝑆𝐽,0

𝛼𝑊𝑆𝐽,2

𝛼𝑊𝑆𝐽,3

𝛼𝑊𝑆𝐽,4

𝛂𝑊𝑆𝐽,5 ]

The relevant regression equation is 𝑃𝑉𝑊𝑆𝐽,𝑡 = 𝛼𝑊𝑆𝐽,0 + 𝛼𝑊𝑆𝐽,2 𝑃𝑉𝑊𝑆𝐽,𝑡−1 + 𝛼𝑊𝑆𝐽,3 𝑇𝑟𝑒𝑛𝑑𝑡 + 𝛼𝑊𝑆𝐽,4 𝑁𝑒𝑤𝑠𝑡 + 𝛂′𝑊𝑆𝐽,5 𝐃𝐚𝐲𝐨𝐟𝐖𝐞𝐞𝐤 𝑡 + 𝜀𝑊𝑆𝐽,𝑡 , where 𝜀𝑊𝑆𝐽,𝑡 ~𝑁(0, 𝑉𝑊𝑆𝐽 ). We apply the usual Bayesian estimation method with a diffuse normal prior (e.g., Congdon 2007). 3) Sampling 𝑉𝑊𝑆𝐽 The relevant regression equation is 𝑃𝑉𝑊𝑆𝐽,𝑡 = 𝛼𝑊𝑆𝐽,0 + 𝛼𝑊𝑆𝐽,2 𝑃𝑉𝑊𝑆𝐽,𝑡−1 + 𝛼𝑊𝑆𝐽,3 𝑇𝑟𝑒𝑛𝑑𝑡 + 𝛼𝑊𝑆𝐽,4 𝑁𝑒𝑤𝑠𝑡 + 𝛂′𝑊𝑆𝐽,5 𝐃𝐚𝐲𝐨𝐟𝐖𝐞𝐞𝐤 𝑡 + 𝜀𝑊𝑆𝐽,𝑡 . We apply the usual Bayesian estimation method with a diffuse Inverse-Gamma prior. 4) Sampling 𝐖𝜁 The relevant regression equation is 𝛉𝑡 = 𝐇𝛉𝑡−1 + 𝛇𝑡 . We apply the usual Bayesian estimation method with a diffuse Inverse-Wishart prior.

66

Appendix W5: MCMC Procedure to Estimate Daily News Index Let us define 𝐲𝑡 ≡ [𝑦̃1𝑡 [𝑣1𝑡

𝑣2



𝑦̃2𝑡



𝑦̃𝐾𝑡 ]′ , 𝐅 ≡ [𝜆1

𝜆2



𝜆𝐾 ], 𝜃𝑡 ≡ 𝑁𝑒𝑤𝑠𝑡 , 𝐯𝑡 ≡

𝑣𝐾𝑡 ]′ , 𝐕 ≡ 𝑑𝑖𝑎𝑔 (𝑉1 , 𝑉2 , … , 𝑉𝐾 ), and 𝐻 ≡ 0. For identification purpose, we set

𝜆1 = 1 (Bruce et al. 2012). Then Equation (A.3) in Appendix A can be cast to a Bayesian dynamic linear model (DLM) as follows: (W5.1)

𝐲𝑡 = 𝐅 ′ 𝜃𝑡 + 𝐯𝑡

(W5.2)

𝜃𝑡 = 𝐻𝜃𝑡−1 + 𝑤𝑡 ,

where 𝑣𝑡 ~𝑁(0, 𝐕) and 𝑤𝑡 ~𝑁(0, 𝑊). Equation (W5.1) is called the observation equation and Equation (W5.2) is called the state equation. We use a Gibbs sampler where a ForwardFiltering/Backward-Sampling (FFBS) procedure (West and Harrison 1997) is nested to draw the time-varying parameter 𝜃𝑡 . 1) Sampling 𝜃𝑡 Let us assume that the posterior distribution of 𝜃𝑡−1 is 𝜃𝑡−1 ~𝑁(𝑚𝑡−1 , 𝐶𝑡−1 ). The forwardfiltering procedure on day 𝑡 is as follows. -

(a) Posterior on 𝑡 − 1: 𝜃𝑡−1 ~𝑁(𝑚𝑡−1 , 𝐶𝑡−1 ).

-

(b) Prior on 𝑡: 𝜃𝑡 ~𝑁(𝑎𝑡 , 𝑅𝑡 ) where 𝑎𝑡 = 𝐻𝑚𝑡−1 = 0 and 𝑅𝑡 = 𝐻𝐶𝑡−1 𝐻′ + 𝑊 = 𝑊.

-

(c) One-step ahead forecast of 𝐲𝑡 : 𝐲𝑡 ~𝑁(𝐟𝑡 , 𝐁𝑡 ) where 𝐟𝑡 = 𝐅 ′ 𝑎𝑡 = 0, 𝐁𝑡 = 𝐅 ′ 𝑅𝑡 𝐅 + 𝐕 = 𝐅 ′ 𝑊𝐅 + 𝐕.

-

(d) Posterior on 𝑡: 𝜃𝑡 ~𝑁(𝑚𝑡 , 𝐶𝑡 ) where 𝑚𝑡 = 𝑎𝑡 + 𝑅𝑡 𝐅𝐁𝑡−1 (𝐲𝑡 − 𝐟𝑡 ) = 𝑊𝐅𝐁𝑡−1 𝐲𝑡 , 𝐶𝑡 = 𝑅𝑡 − 𝑅𝑡 𝐅𝐁𝑡−1 𝐅 ′ 𝑅𝑡 = 𝑊 − 𝑊𝐅𝐁𝑡−1 𝐅 ′ 𝑊.

The above four steps are run forward from 𝑡 = 1 to 𝑇. Then, the backward-sampling procedure is run from 𝑡 = 𝑇 to 1 as follows. -

On 𝑡 = 𝑇: 𝜃𝑇 ~𝑁(𝑚 𝑇 , 𝐶𝑇 )

67

-

−1 ( On 𝑡 = 𝑇 − 1, … , 0: 𝜃𝑡 |𝜃𝑡+1 ~𝑁(𝑔𝑡 , 𝐾𝑡 ) where 𝑔𝑡 = 𝑚𝑡 + 𝐶𝑡 𝐻 ′ 𝑅𝑡+1 𝜃𝑡+1 − 𝑎𝑡+1 ) = 0, 𝐾𝑡 = −1 𝐶𝑡 − 𝐶𝑡 𝐻′ 𝑅𝑡+1 𝐻𝐶𝑡 = 𝐶𝑡 = 𝑊 − 𝑊𝐅𝐁𝑡−1 𝐅 ′ 𝑊.

2) Sampling 𝑉𝑘 (𝑘 = 1, 2, … , 𝐾) The relevant regression equation is 𝑦̃𝑘𝑡 = 𝜆𝑘 𝜃𝑡 + 𝑣𝑘𝑡 . We apply the usual Bayesian estimation method with a diffuse Inverse-Gamma prior (e.g., Congdon 2007). 3) Sampling 𝑊 The relevant regression equation is 𝜃𝑡 = 𝐻𝜃𝑡−1 + 𝑤𝑡 . We apply the usual Bayesian estimation method with a diffuse Inverse-Gamma prior (e.g., Congdon 2007).

68

Web Appendix References

Congdon, P (2007) Bayesian Statistical Modeling (Wiley, Hoboken, NJ). Bruce, NI, Peters, K, Naik PA (2012) Discovering how advertising grows sales and building brands. Journal of Marketing Research 49(December):793-806. Godes, D, Mayzlin D (2004) Using online conversations to study word-of-mouth communication. Marketing Science 23(4):545-560. West, M, Harrison, J (1997) Bayesian Forecasting and Dynamic Models (Springer, New York).

69

Web Appendix Footnotes

1.

We oversample from newspapers that published a small number of articles. The overall

sampling rate is 34%. 2. In LDA topic modeling, the number of topics must be pre-specified by the researcher. Assuming that each of the 96 newspapers delivers at least one unique local item, we set the minimum number of topics at 200 (minimum 96 local news items plus national and international news items). Because local news items presumably number more than one per newspaper, we increased the number of topics up to 400.

70

Figure W1.1 Illustration of Topic Distribution Article 𝐴11 𝐴21

Topic 1 Pr(𝑘 = 1|𝐴11 ) Pr(𝑘 = 1|𝐴21 )

Topic 2 Pr(𝑘 = 2|𝐴11 ) Pr(𝑘 = 2|𝐴21 )

⋯ ⋯ ⋯

Topic 𝐾 Pr(𝑘 = 𝐾|𝐴11 ) Pr(𝑘 = 𝐾|𝐴21 )











𝐴𝐽1 ,1

Pr(𝑘 = 1|𝐴𝐽1 ,1 )

Pr(𝑘 = 2|𝐴𝐽1 ,1 )



Pr(𝑘 = 𝐾|𝐴𝐽1 ,1 )













Newspaper 𝑁

𝐴1𝑁 𝐴2𝑁

Pr(𝑘 = 1|𝐴1𝑁 ) Pr(𝑘 = 1|𝐴2𝑁 )

Pr(𝑘 = 2|𝐴1𝑁 ) Pr(𝑘 = 2|𝐴2𝑁 )

⋯ ⋯

Pr(𝑘 = 𝐾|𝐴1𝑁 ) Pr(𝑘 = 𝐾|𝐴2𝑁 )











𝐴𝐽𝑁 ,𝑁

Pr(𝑘 = 1|𝐴𝐽𝑁,𝑁 )

Pr(𝑘 = 2|𝐴𝐽𝑁,𝑁 )



Pr(𝑘 = 𝐾|𝐴𝐽𝑁,𝑁 )

Newspaper Newspaper 1

Figure W1.2 Content Uniqueness Index with Various Numbers of Topics 2

4

6

8 5

0

2

8

0

1

0.93 0.92

3

4

Localizaion index: 200 topics

6

Localizaion index: 300 topics

0

2

4

6

8

Localizaion index: 400 topics

10

0

2

4

0.96

0

1

2

3

4

5

0

2

4

6

8

10

Note. The numbers in the upper diagonal cells are the correlation coefficients of the corresponding uniqueness index.