Evaluating and Optimizing Online Advertising: Forget the Click, but ...

4 downloads 11923 Views 621KB Size Report
2New York University Stern School of Business, New York, New York. This manuscript ...... Ghose A, Yang S. An empirical analysis of search engine advertising:.
Big Data Volume 3 Number 2, 2015 ª Mary Ann Liebert, Inc. DOI: 10.1089/big.2015.0006

ORIGINAL ARTICLE

Evaluating and Optimizing Online Advertising: Forget the Click, but There Are Good Proxies Brian Dalessandro,1,* Rod Hook,1 Claudia Perlich,1 and Foster Provost1,2 Abstract Online systems promise to improve advertisement targeting via the massive and detailed data available. However, there often is too few data on exactly the outcome of interest, such as purchases, for accurate campaign evaluation and optimization (due to low conversion rates, cold start periods, lack of instrumentation of offline purchases, and long purchase cycles). This paper presents a detailed treatment of proxy modeling, which is based on the identification of a suitable alternative (proxy) target variable when data on the true objective is in short supply (or even completely nonexistent). The paper has a two-fold contribution. First, the potential of proxy modeling is demonstrated clearly, based on a massive-scale experiment across 58 real online advertising campaigns. Second, we assess the value of different specific proxies for evaluating and optimizing online display advertising, showing striking results. The results include bad news and good news. The most commonly cited and used proxy is a click on an ad. The bad news is that across a large number of campaigns, clicks are not good proxies for evaluation or for optimization: clickers do not resemble buyers. The good news is that an alternative sort of proxy performs remarkably well: observed visits to the brand’s website. Specifically, predictive models built based on brand site visits—which are much more common than purchases—do a remarkably good job of predicting which browsers will make a purchase. The practical bottom line: evaluating and optimizing campaigns using clicks seems wrongheaded; however, there is an easy and attractive alternative—use a wellchosen site-visit proxy instead. Key words: data science; online advertising; proxy modeling; CTR; evaluation

dictive marketing looks to utilize individual-level data to predict the degree of interest of a consumer for a given product and use this information to target in a very granular fashion only the consumers who are most likely to buy the specific product after having seen the ad. Despite these advances, optimization and measurement remain challenging primarily because the actions worth measuring (e.g., product purchase) often are exceedingly rare, not measureable at all, or only partially traceable to the digital identity of the consumer who was exposed to the ad. There also is an intimate relationship between measurement and the optimization of campaign targeting. In principle, optimizing toward the metric that is used for performance measurement should provide

Introduction One of the grand promises of online advertising is that measurement and optimization can be conducted much more easily than through many traditional advertising channels, because the targeting is embedded in largescale, real-time computer systems. The promise is that such systems enable the collection of every interaction with every individual user. This generally includes all ad impressions served, associated digital conversion events, and information from various digital touch points not directly associated with the delivery of the advertisements. The scale and breadth of this data promises that targeting approaches can be increasingly sophisticated, and that the efficacy of the system can be measured and iteratively improved. Specifically, pre1

Dstillery, New York, New York. New York University Stern School of Business, New York, New York. This manuscript was adapted from Working Paper CBA-12-02, NYU/Stern School of Business, Center for Business Analytics.

2

*Address correspondence to: Brian Dalessandro, Dstillery, 470 Park Avenue South, New York, NY, 10016, E-mail: [email protected]

90

PROXIES FOR OPTIMIZING ONLINE ADVERTISING

the best possible performance. This seemingly obvious assertion, however, may no longer hold when the optimization is based on massive amounts of data and rare outcomes. For example, one of the most effective optimization methods is predictive modeling, where the probability of a desirable outcome is predicted and only consumers with the highest probability are targeted. For predictive modeling, collecting all of the available data may hold disappointingly little value when positive instances of the event to be predicted are in very short supply. Powerful modeling from small amounts of data increases the risk of overfitting, wherein models fit well to the data used to build them but actually do not generalize to future cases.1 This paper presents a detailed treatment of proxy modeling, which is based on the identification of a suitable alternative (proxy) target variable when data on the true objective is in short supply (or even completely nonexistent).1,2 The paper has a two-fold goal. First, the potential of proxy modeling is demonstrated clearly, based on a massive-scale experiment across 58 real online advertising campaigns. Second, we assess the value of different specific proxies for evaluating and optimizing online display advertising, showing striking results. The standard formulation for a supervised, predictive modeling application comprises a target variable Y—the unknown quantity to be predicted—and a set of features X based on which we would like to learn a function to estimate E[YjX = x], where x represents the feature values for a particular prediction case. For targeted online display advertising there are many choices for X and Y. These choices usually are determined by the goals of the campaign and by the data assets of the company building the predictive model, which vary widely from company to company. It is not uncommon for X to include some combination of demographic, psychographic, and behavioral information about an individual. The choice of Y depends on the goals of the advertising campaign; currently most marketers prefer that Y be some type of conversion event related to the revenue generating mechanism of the firm (i.e., a product purchase, subscription, account registration, etc.). Campaign performance usually is measured by how well ad impression correlates with the target conversion event. In addition, and quite logically, targeters would like their predictive models to be optimized toward this target as well. Unfortunately, standard measurement and related optimization strategies often are handicapped by the fact that the ultimate conversion event of interest is extremely rare (buy a car) or simply unobservable. In such situa-

91

tions, the targeters and the marketers (as well as analytics providers) would benefit from a high-quality, observable, and common proxy for measuring campaign effectiveness. A high-quality proxy would correlate well enough with the outcome of interest that (i) comparative evaluations based on the proxy would give a good approximation of what comparative evaluations based on the true target would have given, and (ii) optimizing based on the proxy would give a good approximation of what optimizing based on the true target would have given. For online display advertising, the most convenient proxies are based on interactions with the ad creative. The most commonly used proxy has been the click on the advertisement. Campaigns are often evaluated on their ‘‘click-through rates’’ (CTRs). In some cases the marketer pays only for clicks. Importantly, as a result it is in the targeter’s (myopic) ‘‘best interest’’ then to optimize to increase CTR. Accordingly, the vast majority of academic research papers on improving the effectiveness of online advertising focus on CTR as the measure of effectiveness. This paper’s results show that for display ad campaigns like the 58 in the experiment, evaluation and optimization based on clicks is wrongheaded: Clicks are a poor proxy for revenue-generating actions. In fact, targeting based on click-based optimization is no better than randomly guessing across a surprisingly large number of campaigns. However, clicks are not the only possible proxy— they are just the proverbial streetlight under which the drunk searches for his keys. We next show that some proxies are measurably better than others. In particular, well-chosen brand related actions—actions that indicate brand affinity or product interest but are more common than purchases—make good proxies for the true event of interest. In what follows we define more precisely what it means to be a ‘‘good’’ or ‘‘better’’ proxy, and we present a study of the comparative effectiveness of two different proxies for optimizing a campaign toward more purchases. Although our examples focus on the optimization of display ad campaigns, the proxy modeling approach should be considered for any application of predictive modeling that suffers the same lack of availability of data on the ideal target variable. The approach hinges of course on the identification of a suitable proxy. In line with the discussion above, a good proxy should be reasonably well correlated with the true target variable. Also, a good proxy should be readily observable and available (at low cost) in relatively large quantities.

92

These conditions serve a statistical purpose: ultimately the fact that the proxy is only well correlated with the true target, rather than the true target itself, will introduce bias into the modeling, and bias increases the error in a predictive model. However, the introduction of bias will come with a reduction in model estimation variance, because of the larger number of positive instances. The condition of being readily observable/available at low cost is pragmatic. If significant additional investment is needed to collect a good proxy in sufficient quantity, one might instead consider collecting more of the ideal target variable (if possible). These conditions will be illustrated in further detail in our advertising case study. More specifically, the experiment’s 58 campaigns span major brands across all industries that use display advertising. As will be described in detail in the ‘‘Experiments and Results’’ section, for each campaign we randomly expose large numbers of users to display ads and collect the data on who subsequently makes a purchase.* In addition, we also observe two potential proxies for purchase: (1) clicking on the ad and (2) visiting a designated website associated with the brand or product following exposure to the ad. In each experiment we train a separate model on each of the two proxy target variables (clicks and website visits) and also the primary target variable of purchase. We then evaluate each of these three models on holdout data and test each model’s ability to predict purchases. If an alternative variable is a good proxy, models trained using it as a target variable should perform well at predicting the ideal target. Our approach allows us to ask two related questions for each campaign and to draw generalizations across the 58 campaigns. Question 1: Are clicks a good proxy for evaluating and optimizing online display advertising campaigns where the ultimate goal is purchase? This question is relevant first for challenging standard practices and assumptions within the advertising industry, and our results provide a clear answer of no. Generally, clicking does not correlate well with purchasing. This complements prior studies showing evidence that few clicks lead to actual product purchases3,4,5 and that most clicks are generated by a very small subset of browsers.6 Second, and more importantly, clicks are unsuitable as a criterion for design*To perform the comparison we used campaigns for which conversions are available. These results may generalize to campaigns for which conversions are not available for one or more of the reasons discussed below in the conclusions.

DALESSANDRO ET AL.

ing and optimizing targeting strategies (i.e., finding the best browsers to show an ad to). Targeting models built on clicks do a poor job of identifying browsers who later will purchase. This is consistent with the results of the study by Pandy et al.7 Strikingly, in our study, generalizing across all the campaigns, the targeting performance of click-trained models is statistically indistinguishable from random guessing! Question 2: Are branded website visits a good proxy for evaluating and optimizing online display advertising campaigns? In contrast to clicks, our results show that certain website visits turn out generally to be good proxies for purchases. Specifically, site visits do remarkably well as the basis for building models to target browsers who will purchase subsequent to being shown the ad. A very interesting result in our analysis is that in many cases, site visits produce better models than when actually training on purchases. Looking deeper, we show this to be largely driven by cases where few purchases are available. The fact that the impact of using a good proxy like site visits is more dramatic in cases where purchases are few and far between has important strategic implications for campaigns advertising big ticket items (like vacations), high consideration purchases (like bank accounts or 401ks) or products where purchases are mostly offline. One limitation to the generalizability of our results is that our study does not include the purchase of products with only offline purchases (such as automobiles, typical consumer packaged goods), and thus the claims we make vis-a-vis different proxy modeling strategies may not hold for such products. In the remainder of the paper we discuss the display advertising setting in more detail and then discuss why empirical assessment is necessary to judge how good they are as proxies. After that we describe in detail the data, the empirical setting, the design of the experiments, and the empirical results. Finally, the paper will discuss the generalizability and implications of these results more broadly. Background: Display Advertising, Evaluation, and Targeting In recent years, online advertising has seen major growth in the use of display ads (e.g., banner ads) for both direct response and branding styles of marketing. This growth has been facilitated by the automation of the ad-buying process, where entities called ad exchanges connect ad sellers and ad buyers

PROXIES FOR OPTIMIZING ONLINE ADVERTISING

via real-time bidding systems. This machine-automated ad environment enables sophisticated data exchange, audience targeting, and ad optimization in a way that has little precedent in offline and more rudimentary online advertising systems. Display advertising is usually contracted via two types of business arrangements: (1) direct buys between an advertising brand and a publisher, and (2) exchange-facilitated buys, where advertisers, or the agents thereof, purchase advertisements via ad exchanges who represent publishers. For this study, all experiments were set up via the latter arrangement.8 In online display campaigns, it is often possible to observe whether or not a user makes a purchase after an ad exposure, even when the path to purchase does not include a click. These observed, post-impression, clickless purchases are often called ‘‘post-view’’ in the industry and can be of any time horizon (typically 7 days, but reaching from 1 hour to 90 days). Our analysis assumes that a purchase, whether following a click or not, and whether made online or not, is the major goal of the advertising campaign. An analysis of post-view purchases usually warrants a discussion of attribution and ad effectiveness. These are extensive and important topics but largely out of the scope of this paper. We refer the interested reader to Dalessandro et al., 2012,9 Stitelman et al., 2011,10 and Papadimitriou et al., 2011,11 for a thorough treatment. When evaluating and optimizing online display advertising campaigns, different marketers have different goals. Often the explicit goal of the campaign, as described to the campaign targeters and traffickers, is some variant of maximizing the cost-adjusted purchase rate. For example, some campaigns explicitly evaluate and pay advertising companies using a so-called costper-acquisition (CPA) model, where the targeting/ trafficking firm gets paid for each purchase following an ad impression. Since the targeting firm pays for the actual ad placements (‘‘buying inventory’’) and often gets to choose more or less expensive placements, the firm would like to maximize the purchases per dollar spent on buying inventory. In other campaigns, the payment model is cost per impression or cost per one thousand impressions (called CPM, for cost per mille). However, many savvy brands nevertheless evaluate CPM campaigns’ purchase rates or the ‘‘effective CPA’’ or other similar measures. They will use these evaluations to decide which online advertising contracts to renew or to expand. Therefore, accurate comparative evaluation (between similar campaigns) is vital.

93

All campaigns we examine were conducted via standard industry practices for large campaigns. For all campaigns, the campaign parameters (e.g., the time frame within which to determine purchase, how much to bid, etc.) were determined in advance between the advertising company and the client advertiser, completely separate from any knowledge of this study. The one exception to standard industry practices is that for this paper’s experiments, browsers were targeted randomly rather than selectively, as described in the experiments and results section. Purchases, Proxies, and Proxy Modeling Before presenting an empirical analysis of proxy modeling and optimization, we will first discuss in more detail the motivation as to why to use proxies in the first place. The key problem for evaluating and optimizing online display advertising is that ultimate conversions are scarce. The term ‘‘ultimate conversion’’ covers sets of alternative, tangible (online) brand- or product-related actions that the advertiser desires; we are primarily interested in purchasing of a product online, but other ultimate conversions include registering an offline purchase, filing for a rebate on an offline purchase, registering oneself on a site, joining a loyalty club, downloading a free version of a product, etc. For this paper, we will call these all ‘‘purchases,’’ since ‘‘ultimate conversions’’ is awkward and ‘‘conversions’’ has various meanings in the industry (e.g., in some contexts a click on an ad is regarded as a conversion). Figure 1 shows the purchase rate frequencies across the campaigns in the experiments discussed below. In about half of the campaigns, purchase rates are less than one in a million, and none are more than one in a thousand. This distribution of conversion rates represents a wide variety of product categories across many industries. It is typical to serve at least several million impressions in a given campaign, but despite this high number, the low rate of purchase conversion results in very few purchases in absolute terms. Such low event occurrence presents many challenges in model building and evaluation, and is often a primary driver motivating the development of conversion proxies. The sparsity (and in many times complete absence) of purchase data has two implications: (i) it is unclear how to evaluate campaign effectiveness, and (ii) it is difficult to add data-driven modeling to campaign targeting. For evaluation, in many campaigns one would have to deliver millions of ads just to begin to get

DALESSANDRO ET AL.

30% 20% 0

10%

Percent of Campaigns

40%

50%

94

Suggest Documents