Opportunities for and Pitfalls of Using Big Data in ...

32 downloads 205540 Views 355KB Size Report
Apr 24, 2017 - Using Big Data in Advertising Research, Journal of Advertising, 46:2, 227-235 ... This editorial introduces the special section on big data. We.
Journal of Advertising

ISSN: 0091-3367 (Print) 1557-7805 (Online) Journal homepage: http://www.tandfonline.com/loi/ujoa20

Opportunities for and Pitfalls of Using Big Data in Advertising Research Edward C. Malthouse & Hairong Li To cite this article: Edward C. Malthouse & Hairong Li (2017) Opportunities for and Pitfalls of Using Big Data in Advertising Research, Journal of Advertising, 46:2, 227-235 To link to this article: http://dx.doi.org/10.1080/00913367.2017.1299653

Published online: 24 Apr 2017.

Submit your article to this journal

View related articles

View Crossmark data

Full Terms & Conditions of access and use can be found at http://www.tandfonline.com/action/journalInformation?journalCode=ujoa20 Download by: [Northwestern University]

Date: 24 April 2017, At: 08:39

Journal of Advertising, 46(2), 227–235 Copyright Ó 2017, American Academy of Advertising ISSN: 0091-3367 print / 1557-7805 online DOI: 10.1080/00913367.2017.1299653

Special Section: Big Data in Advertising — Editorial

Opportunities for and Pitfalls of Using Big Data in Advertising Research Edward C. Malthouse Northwestern University, Evanston, Illinois, USA

Hairong Li Michigan State University, East Lansing, Michigan, USA

This editorial introduces the special section on big data. We define big data by examining how it is, or will be, created in advertising environments. We propose a conceptual framework for understanding the different types of digital advertising touch points that create big data, and use the framework for identifying research opportunities. We discuss the types of research questions that big data can inform, including developing and testing theories, identifying insights, and optimizing the delivery of messages. New methods that advertisers will need to use big data are identified. Recommendations are provided for how to think about and approach big data. Using the framework, we identify specific opportunities for advertising researchers to use big data. We also discuss pitfalls in using big data.

We are pleased to unveil this special section on big data in advertising, which consists of three high-quality manuscripts showcasing opportunities in this new research area. The response to our call for papers generated 28 submissions. Liu, Burns, and Hou (2017) illustrate how insights about brand associations can be made from text mining Twitter data. Becker, Linzmajer, and Wangenheim (2017) analyze clickstream data to study multichannel shopping behaviors. Huh and colleagues (2017) use social network data to estimate trust scores for members of a social network. The manuscripts followed a rigorous peer-review process. Address correspondence to Edward C. Malthouse, Spiegel Center on Digital and Database Marketing, Northwestern University, 1870 Campus Drive, Evanston, IL 60208. E-mail: [email protected] Edward C. Malthouse (PhD, Northwestern University) is the Theodore R. and Annie Laurie Sills Professor of Integrated Marketing Communications, professor of industrial engineering and management sciences, and research director, Spiegel Center on Digital and Database Marketing, Northwestern University. Hairong Li (PhD, Michigan State University) is a professor of advertising and affiliated faculty of the Center for Business and Social Analytics, Michigan State University.

While these three articles are exemplars in applying big data to advertising problems, this editorial attempts to give a more global overview of the role of big data in advertising research. In our discussion we adopt a broad definition of advertising that includes all types of brand communication, paid and nonpaid, as well as brand- and consumer-initiated. We discuss how big data are created in advertising environments and, in doing so, develop a more specific meaning for the term big data. We propose a conceptual framework for understanding big data and use the framework to identify research opportunities involving big data. We argue that big data are part of a fundamental change to advertising. We close with discussion of pitfalls. While many other articles and special issues have been recently published on big data in the business and marketing literature (e.g., Hofacker, Malthouse, and Sultan 2016; Sivarajah et al. 2017; Chintagunta, Hanssens and Hauser 2016; Skiera 2016; Mulhern 2016), this article focuses specifically on advertising. WHAT ARE BIG DATA IN ADVERTISING? To understand what big data are and their role in advertising, one must first understand why we have big data and how they are created. As others have pointed out (e.g., Sivarajah et al. 2017, p. 265), the concept of “big” is difficult to pin down, in part because what may seem big today will likely be routine in the near future, as computing power advances. Big data are often defined by the “3 Vs” (e.g., Laney 2001) of large volumes of data generated at a high velocity from a variety of sources. Sivarajah et al. (2017) summarize and discuss an expanded list of Vs, adding veracity, variability, visualization, and value. Hofacker, Malthouse, and Sultan (2016) discuss similar issues and add volatile to the list of Vs. Sometimes they are also described as data so large and complex that traditional computing environments and data-processing methods are inadequate for dealing with them. While

227

228

E.C. MALTHOUSE AND H. LI

these descriptions of big data characterize key features and distinguish them from “small” data sets, we believe that a focus on the root cause—digitalization—is more informative for understanding its scope within, and predicting its implications for, advertising, rather than defining it in terms of physical size or complexity. Big data’s primary raison d’^etre in advertising is digital brand touch points. By a touch point we mean any contact between a (potential) customer and a brand, before, during, or after purchasing it. Traditional offline touch points, such as viewing a print ad in a physical magazine, were (and are) difficult to monitor and record. In contrast, touch points that occur in digital environments, such as via the Internet, social media, and mobile devices, can be recorded over time for millions of consumers, thereby producing big data sets. Moreover, when brand touch points occur in a digital environment, advertising decisions can be made entirely, or at least informed by, big data. For example, advertisers in many digital environments may have to make decisions such as whether to buy an exposure, how much to bid for the space, and which message to display. All of these decisions can, in theory, be informed by big data. These digital environments are fundamentally transforming advertising because they enable consumers to create and distribute brand messages to large audiences (e.g., see Kumar and Gupta 2016). Big data are the consequence of the digitalization of some phenomenon. While to many the term big data is a buzzword, the digital environments that create big data are omnipresent in advertising and will likely expand in the future. Digital brand touch points have a symbiotic relationship with big data: Digital interactions create big data, which can then be used to inform subsequent digital touch point decisions. Therefore, the scope of big data can be understood by surveying the digital environments in which brand touch points occur, or will occur in the near future. Figure 1 shows the four broad categories of touch points:

1. Brands actions create and distribute advertising messages. 2. There can be dialogue with and between consumers. 3. Consumers experience touch points in shopping environments. 4. Consumers can also experience touch points while using or consuming the product or service. Figure 1 is inspired by the pinball framework for marketing communication (Hennig-Thurau et al. 2010) and the customer engagement ecosystem (Maslowska, Malthouse, and Collinger 2016). The big data generated in each category are discussed in the text that follows.

Brand Actions Brand actions include all brand-initiated contacts with customers, including advertising (e.g., broadcast, print, outdoor, email, banner ads), press releases to news media, sales promotions, and owned media (content), such as newsletters and magazines. Brand actions also include brand responses to customer messages (e.g., Webcare; Van Noort and Willemsen 2012). Many brand actions are digital and are recorded by the firm. All outbound messages targeted at individual consumers, such as e-mail and direct mail, are commonly recorded in a contact-history database. Exposure to banner and search ads are known at the device level, but it is currently sometimes difficult for brands to link devices to customers, for example, to match a customer’s ad exposures on a computer to exposures on the customer’s smartphone. Touch points that have traditionally been offline, such as exposure to a TV ad, are becoming digital, as more video content is streamed or viewed using cable TV set-top boxes. It is currently difficult to link such exposures to purchase outcomes because cable TV providers and streaming services own exposure data while, often, various retailers own purchase histories. Services to link such data sources (data fusion or synergistic consolidation; Li and Tan

FIG. 1. The role of big data in advertising.

OPPORTUNITIES FOR AND PITFALLS OF USING BIG DATA

2014) will surely become more widely available in the future. It is also currently difficult to know which household member was watching the device when the ad appeared. It is often possible to know consumer responses to brand actions. Brands can know whether the consumer received an e-mail, opened it, and clicked on a hyperlink in the e-mail. Clicks on display ads can be known. It is possible to know whether a consumer skips over an ad on YouTube or watches it to completion. Other forms of ad skipping can be known from digital video recorders (DVRs) and digital-streaming services. Brands can track the consumption of content marketing, for example, they can know if a customer visited the company Web site, downloaded a white paper, or watched to a webinar.

Dialogue Behaviors Dialogue behaviors include all nonpurchase actions by the focal customer concerning a brand, such as liking, sharing, reviewing, blogging, Tweeting, and posting user-generated content (UGC) about the brand to social media (e.g., Van Doorn et al. 2010). Of course, there have long been dialogues between consumers about brands; the difference is that now many dialogues are recorded in big data sets as text, images, and even videos. There are currently difficulties in linking, for example, a Twitter account to a known customer name. Many social media environments will not share data. It can also be useful to know a customer’s dialogue behaviors about nonbrand topics. For example, liking a certain celebrity may be associated with liking the focal brand, and the universe of people liking the celebrity could be used to identify potential customers or inform ad exposure decisions.

Shopping Behaviors Shopping behaviors include all customer actions that lead up to a purchase. Example data sources from Internet shopping environments include search history, Web logs of browsing, wish lists, and shopping carts. Internet retailers know every product that a prospective customer viewed over time during shopping sessions, which items were placed in shopping carts or on wish lists, which were shared on social media, and which were purchased or abandoned. They can know what search terms consumers are using both on a retail Web site and in search engines, and which are more likely to convert. They can also know whether the prospective customer read consumer and/or expert reviews displayed on the page, watched a video demonstration of the product, used a virtual model, shared the item with friends, and so on. New technologies are being introduced to monitor physical shopping behaviors, such as mobile and wearable devices that track a customer’s movements. Mobile location data reveal travel patterns and can signal when a customer is within close proximity of a store. Another example is embedding radio

229

frequency identification (RFID) chips in trade-show conference badges to track a customer’s movements. RFID chips are also being implanted in many products sold by retailers. Video monitoring of retail locations also produces massive data sets. Thus the details of many aspects of the shopping journey to purchase can now be chronicled in big data sets of behaviors. Advertising is an antecedent to many of these behaviors, and big data enable better measurement of the steps leading to purchase.

Brand Use Behaviors Brand use behaviors include any use of a product or service. The use of many digital products and services is recorded, for example, viewing or listening logs of media products (e.g., Amazon, Netflix, Spotify). Internet-connected devices (Internet of Things or IoT), such as cars and washing machines, record every use of the product. Insurance companies offer mobile apps to track driving behaviors. Wearable wristbands record the “use” of, for example, an amusement park. The monitoring of product or service use will surely increase in the future, as more devices are connected to the Internet.

Brand Outcomes Brand outcomes, such as lead generation and purchases, are tracked in databases. Such outcomes are often key performance indicators (KPIs) to the financial officers of companies, and it is therefore important to show the effects of advertising actions on them.

Other Data Other data can be added. Exogenous factors such as weather can be overlaid. For example, mobile promotions sent to a tourist should depend on weather, for example, promote outdoor activities if it is sunny and indoor activities if it is raining. Customer characteristics such as demographics and interest indicators can be purchased from third-party data providers. Competitive activities such as advertising activities and pricing can also be harvested from the Web or purchased from data suppliers such as Kantar or Nielsen.

WHAT ROLE CAN BIG DATA PLAY IN ADVERTISING RESEARCH? Big data sets record a wide variety of digital behaviors by the brand and by consumers discussing, shopping for, buying, and using the brand. We now discuss how it can be used in advertising research. There are at least three general ways. One is to reexamine existing theories and frameworks of advertising with the new data sources. Digital environments will often provide more accurate measures at a finer level of resolution that were not available in the past. For example, in

230

E.C. MALTHOUSE AND H. LI

the past one may have asked a survey question about the frequency of past brand use in a certain period of time or intention to use a product in the future, but if the usage occurred in a digital environment the researcher can know exactly when the consumer used the product, for how long, in what context, and specifically how the product was used. With better measurements, researchers should be able to corroborate extant theories, refine them further (e.g., by identifying moderators), and gain insights into new hypotheses. For example, the effects of advertising actions on various customer outcomes have been studied for years but can be reexamined and refined with big data. This opportunity is focused on developing, testing, and refining causal advertising theories. A second role for big data is to optimize the delivery of ad messages. Big data can be used to improve advertising decisions, such as whether to show some customers an ad at all, which message to display (personalization), and how much to pay for an exposure. The third role is concerned with uncovering exploratory insights to create better messages and motivate new theories. Social media is the world’s largest focus group and can provide insights on what consumers think and feel about a brand. Monitoring the use of a brand may also provide new insights on the relationship between a brand and its customers. Liu, Burns, and Hou (2017) in this special section give an example of this type of research.

WHAT METHODS DO ADVERTISING RESEARCHERS NEED TO USE BIG DATA? Big data are usually stored in some sort of database. As a starting point, knowledge of relational databases and structured query language (SQL) is important for being able to prepare and use big data sets. Many other types of databases are used to store specific types of big data (e.g., Hadoop), and knowing about them is especially important for the execution of ad messages. For example, is it computationally feasible to query a database to make personalization decisions in the fraction of second a company has between a customer visiting a Web site and the company displaying the content of the site? We now discuss how the nature of big data is changing the methods used by researchers. Table 1 summarizes opportunities with big data.

Unstructured Data For decades, quantitative advertising scholars have relied heavily on structured data from survey and experiments, which lend themselves to analyses with classical multivariate methods such as regression, analyses of variance (ANOVAs), factor analysis, and structural equations modeling (SEM). Many big data sources are unstructured in that they are not a matrix of numbers. Instead, many big data consist of networks, text, images, audio, and video. Methods for analyzing such data are

advancing quickly. Tools for analyzing text and network data are now included in many commercial software packages, and there are opportunities for researchers to explore and develop advertising applications with them. In this issue, Huh et al. (2017) develop network applications and Liu, Burns, and Hou (2017) develop applications of text mining. The field of image processing has recently made advances (e.g., see Google Cloud Vision) using deep learning neural networks, and image data will likely be a new frontier in ad research.

Panel Data Digital devices create behavioral logs that record interactions over time. For example, every time a customer uses a mobile app, a record is created in a database giving the device identification, date and time of the action, and information about the action itself. Visits to Web sites and purchases create similar logs. A panel is a group of customers that is measured over time. Marketing scientists and econometricians have developed many models for analyzing panel data. Hsiao (2003) provides a good survey. Leeflang et al. (2000) discuss other relevant approaches.

Statistical Learning Approaches Much traditional advertising research has focused on attribution, where the goal is to show how some advertising actions cause some outcome. Experimental designs, SEMs, and econometric models are appropriate for such situations. Big data are creating situations that are very different, where atheoretical prediction is required. For example, suppose an advertiser must decide whether to purchase a banner ad impression (e.g., see Perlich et al. 2012; Wang, Zhang, and Yuan 2016). This requires estimating whether the exposed customer will purchase and, if so, how much they will spend in order for the advertiser to do the financial calculations. The advertiser may have thousands of predictor variables available without any strong theory to guide the selection of a model. Moreover, the advertiser may have to build hundreds of such models each day. Statistical learning models are ideal for such situations where the goal is atheoretical prediction. James et al. (2013) is a good introductory textbook from a statistical perspective, and Leskovec, Rajaraman, and Ullman (2014) gives a survey from a computer science perspective. Ekstrand, Riedl, and Konstan (2010) provide an excellent survey for recommendation systems.

Quality of Big Data Research Evaluated Differently Advertising research is commonly evaluated on its theoretical foundations and the quality of internal and external validity. While these criteria are equally important for theorytesting research with big data, other criteria can be relevant,

OPPORTUNITIES FOR AND PITFALLS OF USING BIG DATA

TABLE 1 Summary of Big Data Opportunities Opportunity Unstructured data

Panel data

Statistical learning methods

Assemble jigsaw puzzle Partner with advertisers

Interdisciplinary research

Mobile devices

Brand listening

Ad avoidance Contextual data

Advertising as an iterative process Addressable TV and streaming

Financial outcomes

Description Develop applications for network, text, image, audio, or video data to identify insight and measure ad environments Use panel data methods to infer causality, measure ad effects, test theories Use predictive analytics and recommendation systems for personalization, programmatic ad decisions, etc. Acquire and join data sets to get more complete picture of customers Gain access to novel and unique data sets, field test theories and optimization approaches, link to financial outcomes Partner with data/computer scientists, marketing scientists, econometricians to develop algorithms to optimize ad execution and test theories Understand consumer experience with mobile technology, optimize customer interactions through mobile channels Monitor what customers are saying in social media about brand, how they use product with the Internet of Things, respond as appropriate Study ad avoidance behaviors, quantify time-shifted ad value Use data about the current context that the customer is in to personalize messages and improve effectiveness Understand how relationships develop over time through twoway brand interactions Deliver the right personalized TV message to the right consumers at the right time on the right device and pay the right amount Demonstrate effects of advertising on financial outcomes

especially for optimizing the execution of advertising. Important criteria for new methods in this area include computational complexity, how it “scales” as the size of the data

231

(e.g., number of customers, number of products) grows, and whether the method can be implemented in “real time” (e.g., requiring a fraction of a second to inform an advertising decision). For example, if n is the number of customers, and the number of computations performed by an algorithm is proportional to n3 (e.g., hierarchical clustering), the method will work fine with hundreds of customers but not millions. Cross-validated predictive accuracy is another important quality measure. HOW SHOULD ADVERTISING RESEARCHERS THINK ABOUT AND APPROACH BIG DATA? Assembling the Jigsaw Puzzle Survey and experimental research is often self-contained, in that a well-designed survey instrument should measure all relevant causal factors. Research with big data is often different, where each data set recorded in some digital environment contains some important information, but critical variables are usually missing and must be imported from other big data sets. Research with big data is like assembling a jigsaw puzzle, where each data set is one piece and the combination of pieces gives a complete picture of the customer relationship. Unfortunately, there will usually be missing pieces. There will be opportunities for scholars and companies that can bring pieces together. As a simple example, social media data may record a customer’s thoughts about a brand but do not have information on the customer’s purchases. Shopping conditions, such as prices offered to a customer, promotional offers, and marketing mix information about competition, will be in other data sets. Contextual information, such as who was with the person at the time of purchase and the weather conditions, come from other data sets. Some of these issues are illustrated in Liu, Burns, and Hou (2017). Stitching together data sets to fill in the puzzle is a different way to do research than the more traditional approach of designing self-contained surveys and/or experimental designs. Gaps in big data sets may have to be filled with data from traditional research designs, such as surveys and qualitative approaches. Partnerships With Advertisers While some big data sets, e.g., Twitter data, can be obtained directly through application program interfaces (APIs), and there are organizations such as the Wharton Customer Analytics Initiative and Kaggle that release advertising-related big data sets (see Liu-Thompkins and Malthouse 2017 for a more complete list), perhaps the best source for big data sets is from the brands themselves. Brands also maintain and control the digital environments, where randomized, controlled experiments can

232

E.C. MALTHOUSE AND H. LI

be run, and are likely to have records of financial outcomes. Advertising scholars should seek research partnerships with brands. Interdisciplinary Research Different methods are required to analyze panel or unstructured data, and there are academic fields devoted to their study, such as image processing, text mining, natural language processing, marketing science, and econometrics. The sheer volume and complexity of new data sources will also require greater computational and programming skills. While the software tools will likely improve, we anticipate that advertising researchers will have to work in interdisciplinary teams in the future, with data scientists and other specialists, as we already see with the articles in this special section. Big Data Chronicles the Past Advertising big data are fundamentally a detailed record of what customers did in the past; such a record, by itself, has limited value. In particular, it usually does not explain why customers behaved as they did, nor does it prescribe future brand actions. To be actionable, it must be combined with theories, models, and understanding of the advertising industry and business situation. Advertising scholars can bring these important ingredients to a big data research project. WHAT SPECIFIC OPPORTUNITIES EXIST BECAUSE OF BIG DATA? Our approach to identifying opportunities is to look for where big data and digital devices create new situations in which advertising must be understood. We identify what we think are the most important changes, but there are others. As a rule of thumb, whenever there is something new because of data or technology, there will be research opportunities. Mobile Devices Smartphones create an advertising environment that is new in many ways. There has never been an advertising channel that is physically with the consumer all the time and in every location, is as personal, gives off data about the current state of the customer, and is addressable. Kumar and Gupta (2016, pp. 307–10) discuss the disruption of mobile devices in detail; also see Okazaki, Katsukura, and Nishiyama (2007). Brand Listening Advertising has historically been focused on creating and delivering brand messages. Big data create opportunities to monitor what individual consumers are saying and doing,

how they are using products or services with the IoT, and respond as appropriate. The topics of search advertising, Webcare, and responses to trigger events were mentioned previously.

Ad Avoidance The digital technology that enables the delivery of ad messages also creates the possibility of blocking them, which leaves a digital trail. For example, when a consumer skips an ad using a DVR, the action is recorded. When a consumer clicks to skip or truncate an ad on YouTube, the click is recorded. When a consumer clicks to stop or close a video box while reading a news story, the actions are recorded. These detailed behavioral logs create opportunities for scholars to study ad avoidance. For example, are ad exposures during live TV more effective than timeshifted exposures?

Contextual Data The effectiveness of advertising depends on the context. Big data provide advertisers with important clues about needs and, like a good detective, the advertiser must piece together the clues to understand motivations. An offer for an umbrella on a rainy data will elicit a different response than on a sunny day. Responses to mobile offers and promotions sent to a man will vary depending on whom the man is with. Different restaurant and entertainment offers will be relevant depending on whether the man is with his six-year-old daughter for the afternoon, out with his football buddies, or away with his wife for a romantic weekend. An offer for a washing machine will be more effective immediately after a person’s current machine breaks than three weeks after it broke and has since been replaced or repaired. Big data can provide time- and location-specific information about the context a prospective customer is in and give advertisers information about customer’s needs to a greater extent than they have ever had before. Thus, there are opportunities to harness such contextual data, often combined with mobile devices, to improve the efficiency of advertising by delivering more relevant messages at the right time and place. Another way that context is relevant to advertising is in exploring the effect of the media vehicle on which an ad is displayed and the effects of the ad. For example, will an ad shown on program A evoke a different reaction than when it is shown on program B? Current programmatic advertising models tend to focus on estimating the value of exposing a user to an ad without considering interactions with the media vehicle. Big data enable advertisers to know more about different media vehicles and their audiences, and an important research question is whether programmatic models can be improved with such information.

OPPORTUNITIES FOR AND PITFALLS OF USING BIG DATA

Advertising as a Dynamic, Iterative Process Rather Than a Campaign Figure 1 shows a new way to think about interactions between touch points. The arrows between brand actions and consumer behaviors go both ways. Brand messages affect customers, but dialogue behaviors may affect subsequent brand actions, where the brand responds privately or publicly to a post from a customer. The digital environments create the possibility of amplification effects where user- or brand-generated messages spread virally, which develops in a dynamic way. Hennig-Thurau et al. (2010) develop their pinball metaphor to describe this new environment. Scholars will likely need dynamic models. Brand communication evolves from ad campaigns broadcasting messages to interactive, multichannel dialogues with and between customers. Some of the most influential brand messages may come from other consumers rather than the brand itself. Addressable Television and Streaming Video as a New Frontier For decades, television, which often comprises a dominant share of advertising budgets, has been a mass media, where the same message is sent to all members of some viewing audience. This is starting to change as digital streaming becomes more common and more content producers (e.g., Disney Movies Anywhere) go “over the top” (OTT), offering their content directly to consumers. Streaming services will rely on a combination of subscription fees and advertising revenue. They own device-level viewing logs of all their customers, which is a big data asset that has potential value to advertisers. Companies like Amazon distribute and create TV content and also have records of purchases across many categories, which creates a unique data asset and new opportunities to link media consumption directly to purchases. Google has similar unique data assets, because it knows searches and also distributes media. Such data can inform media planning decisions on which programs to buy. In addition, the way in which TV advertising is sold will likely evolve. Rather than buying programs, advertisers will increasingly be able to buy households or devices, and there will be opportunities to use viewing logs to target and personalize advertising more precisely. Programmatic approaches, which are currently used for display ads, will have to be adapted for video environments. Focus on Financial Outcomes Recent commentary by Stewart (2016) notes, “The accountability issue is especially vexing, and failure to resolve this issue will further diminish the role of advertising in the future” (p. 350). He reminds readers that in his discussion of advertising 25 years ago the need for financial accountability was the first of seven issues. In the same issue, articles by Schultz

233

(2016, pp. 279–80), Kumar and Gupta (2016, p. 316), and Rust (2016, p. 347) make similar points. Big data often enable the linking of brand touch points to financial outcomes such as purchase, often at the individual level. This means that advertisers will increasingly have the ability to link their actions to financial outcomes rather than intermediate outcomes such as attitudes and intentions. Those who can prove the financial return from advertising will have the trust of the financial officers within an organization.

ARE THERE PITFALLS IN USING BIG DATA? Validity One pitfall of big data is to be impressed by the size of a data set and ignore the traditional topics of data quality discussed in research methods courses, such as reliability, internal and external validity, and sample design. Big data sets are often very large convenience samples. A big data set may have comprehensive coverage of a population, but it may not be the population of interest. For example, all Twitter users may not represent all customers in a target segment. Big data are usually observational (where subjects self-select into treatment conditions), which increases the risk of there being threats to internal validity. See Li and Tan (2014) and Hofacker, Malthouse, and Sultan (2016) for further discussion of validity issues.

Omitted Variable Biases; Correlation Not Causation As mentioned in the jigsaw puzzle discussion earlier, big data sets may not measure all of the relevant causal factors and therefore model estimates can be biased. Researchers cannot assume that a big data set has measures of all relevant causal factors. It is more important than ever before to begin with a conceptual framework that identifies the relevant constructs and specifies how they interrelate. For example, a retailer may have data on exposure to customer reviews and purchases, but probably does not have measures of spending on brand advertising. A study relating the number of reviews to purchase would be biased if products that spend on brand advertising also have more reviews. Ad spending and review volume are confounded, and a model that does not account for both will overstate the effect of review volume on purchase. See LiuThompkins and Malthouse (2017) for further discussion of the omitted variable bias in advertising and how to avoid it.

Fraud The digital environments in which big data are gathered also create opportunities for fraud through “nonhuman traffic.” It is easy to program “bots” (e.g., “clickbot.A”) that visit Web sites and act like customers clicking on ads. There are many other variations of click fraud, such as “click

234

E.C. MALTHOUSE AND H. LI

farms,” were low-paid workers are hired to click on paid advertising links or “like” brands. One motivation for creating bots is pay-per-click pricing systems, which create an incentive to inflate clicks. A pitfall is analyzing big data sets without removing nonhuman and other fraudulent traffic. At the same time, this pitfall is also a research opportunities to devise, for example, better ways to detect which clicks are fraudulent, and compensation systems that do not create incentives for fraud. Data Quality While it is often easy to record interactions and create big data sets, a pitfall is to assume that the data are free of errors. There are many ways that data quality can be compromised, and substantial effort is usually necessary to obtain data that can be used in analyses. See Liu-Thompkins and Malthouse (2017) for a discussion of some common data-quality problems and references. Privacy, Security, and Trust Big data can give advertisers sensitive, personal information, and there is a fine line between impressing a customer with a highly targeted, relevant offer and giving the creepy sense that the brand has violated the customer’s privacy. More research is needed on avoiding this pitfall. Trust between the customer and brand is a critical factor, where a personalized message from a trusted brand is appreciated by the customer, but a similar personalized message from one that is not trusted causes complaints to government regulators and/or the media. CONCLUSION The world of advertising will become more digital, with more brand messages delivered in environments where exposure and consumer responses can easily be recorded. Advertising budgets allocated to digital activities have increased and will continue to increase. Customer actions that reveal brand opportunities—such as an expression of dissatisfaction or a signal that the consumer is in the market for some product or service—will increasingly be knowable from digital sources, such as social media, search history, or consumption logs. Consequently, advertising activities will be performed by those with access to the data and the ability to optimize advertising decisions. Fields such as computer science and information technology are increasingly studying advertising problems. Data-intensive fields such as recommender systems, search advertising, programmatic buying and real-time bidding, and optimization algorithms for matching advertisers to advertising inventory (matching problem) (Leskovec, Rajaraman, and Ullman 2014, section 8.3), have largely developed outside of the academic advertising community.

Advertising scholars should embrace big data opportunities either on their own or in partnership with scholars from other fields. We also hope the emergence of big data will prompt the building of bridges between advertising scholars and practitioners. Advertising scholars bring decades of research and theories that can be used to make sense of big data. By embracing big data they will have the ability to measure customer behaviors at a finer resolution than in the past, execute experiments in digital environments with high internal and external validity, and demonstrate effects on financial outcomes of interest to advertisers. Getting access to and using big data will likely require close collaborations with industry partners as well as scholars from related fields. In conclusion, we thank Shintaro Okazaki for his leadership in organizing this special section and all of the anonymous reviewers whose efforts improved the quality of the manuscripts. We hope this special section will inspire advertising researchers, especially young scholars, in carrying out more studies of advertising issues using big data and associated approaches and methods.

ACKNOWLEDGMENTS The authors thank Professors Yuping Liu-Thompkins, Tom Collinger and Shintaro Okazaki for helpful discussions on this editorial.

REFERENCES Becker, Ingo, Marc Linzmajer, and Florian von Wangenheim (2017), “Channels and Categories: User Browsing Preferences on the Path to Purchase,” Journal of Advertising, 46, XXX–XXX. Chintagunta, Pradeep, Dominique M. Hanssens, and John R. Hauser (2016), “Marketing Science and Big Data,” Marketing Science, 32 (1), 4–7. Ekstrand, Michael D., John T. Riedl, and Joseph A. Konstan (2010), “Collaborative Filtering Recommender Systems,” Foundations and Trends in Human-Computer Interaction, 4 (2), 81–173. Hennig-Thurau, Thorsten, Edward C. Malthouse, Christian Friege, Sonja Gensler, Lara Lobschat, Arvind Rangaswamy, and Bernd Skiera (2010), “The Impact of New Media on Customer Relationships,” Journal of Service Research, 13 (3), 311–30. Hofacker, Charles F., Edward C. Malthouse, and Fareena Sultan (2016), “Big Data and Consumer Behavior: Imminent Opportunities,” Journal of Consumer Marketing, 33 (2), 89–97. Hsiao, Cheng (2003), Analysis of Panel Data, 2nd ed., Cambridge, UK: Cambridge University Press. Huh, Jisu, Atanu Roy, Alexander Pfeuffer, and Jaideep Srivastava (2017), “Development of Trust Scores in Social Media (TSM) Algorithm and Application to Advertising Practice and Research,” Journal of Advertising, 46, XXX–XXX. James, Gareth, Daniela Witten, Trevor Hastie, and Robert Tibshirani (2013), An Introduction to Statistical Learning, New York: Springer. Kumar, V., and Shaphali Gupta (2016), “Conceptualizing the Evolution and Future of Advertising,” Journal of Advertising, 45 (3), 302–17. Laney, Doug (2001), “3D Data Management: Controlling Data Volume, Velocity, and Variety,” Application Delivery Strategies from META

OPPORTUNITIES FOR AND PITFALLS OF USING BIG DATA Group, February 6, https://blogs.gartner.com/doug-laney/files/2012/01/ ad949-3D-Data-Management-Controlling-Data-Volume-Velocity-andVariety.pdf. Leeflang, Peter, Dick Wittink, Michel Wedel, and Philippe Naert (2000), Building Models for Marketing Decisions, Dordrecht, the Netherlands: Kluwer Academic. Leskovec, Jure, Anand Rajaraman, and Jeffrey D. Ullman (2014), Mining of Massive Datasets, 2nd ed., Cambridge, UK: Cambridge University Press. Li, Hairong, and Peking Tan (2014), “Big Data and Small Data: Innovative Paths to Integration,” in Proceedings of the European Advertising Academy Conference, P. Verlegh and C. Segijn, eds., Amsterdam: University of Amstedam, pp. 1–7. Liu, Xia, Alvin Burns, and Yingjian Hou (2017), “An Investigation of BrandRelated User Generated Content on Twitter,” Journal of Advertising, 46, XXX–XXX. Liu-Thompkins, Yuping, and Edward C. Malthouse (2017), “A Primer on Using Behavioral Data for Testing Theories in Advertising Research,” Journal of Advertising, 46 (1), 1–13. Maslowska, Ewa, Edward C. Malthouse, and Tom Collinger (2016), “The Customer Engagement Ecosystem,” Journal of Marketing Management, 32 (5–6), 469–501. Mulhern, Frank J. (2016), “Big Data and the Digital Transformation in Advertising,” in The New Advertising: Branding, Content, and Consumer Relationships in the Data-Driven Social Media Era, Ruth E. Brown, Valerie K. Jones, and Ming Wang, eds., Santa Barbara, CA: Praeger, 95–108. Okazaki, Shintaro, Akihiro Katsukura, and Mamoru Nishiyama (2007), “How Mobile Advertising Works: The Role of Trust in Improving Attitudes and Recall,” Journal of Advertising Research, 47 (2), 165–78.

235

Perlich, Claudia, Brian Dalessandro, Rod Hook, Ori Stitelman, Troy Raeder, and Foster Provost (2012), “Bid Optimizing and Inventory Scoring in Targeted Online Advertising,” in Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York: ACM, 804–12. Rust, Roland (2016), “Comment: Is Advertising a Zombie?,” Journal of Advertising, 45 (3), 346–47. Schultz, Don (2016), “The Future of Advertising or Whatever We’re Going to Call It,” Journal of Advertising, 45 (3), 276–85. Sivarajah, Uthayasankar, Muhammad Mustafa Kamal, Zahir Irani, and Vishanth Weerakkody (2017), “Critical Analysis of Big Data Challenges and Analytical Methods,” Journal of Business Research, 70, 263–86. Skiera, Bernd (2016), “Data, Data, and Even More Data: Harvesting Insights from the Data Jungle,” GfK Marketing Intelligence Review, 8 (2), 10–17. Stewart, David W. (2016), “Comment: Speculations of the Future of Advertising Redux,” Journal of Advertising, 45 (3), 348–50. Van Doorn, Jenny, Katherine N. Lemon, Vikas Mittal, Stephan Nass, Doreen Pick, Peter Pirner, and Peter C. Verhoef (2010), “Customer Engagement Behavior: Theoretical Foundations and Research Directions,” Journal of Service Research, 13 (3), 253–66. Van Noort, Guda, and Lotte Willemsen (2012), “Online Damage Control: The Effects of Proactive versus Reactive Webcare Interventions in ConsumerGenerated and Brand-Generated Platforms,” Journal of Interactive Marketing, 26 (3), 131–40. Wang, Jun, Weinan Zhang, and Shuai Yuan (2016), “Display Advertising with Real-Time Bidding (RTB) and Behavioural Targeting,” preprint arXiv:1610.03013, available https://arxiv.org/abs/1610.03013.

Suggest Documents