This is a working paper. Please do not distribute

This is a working paper. Please do not distribute without permission from the author(s).

Submitted to Operations Research manuscript OPRE-2008-07-354-R3

Scheduling of Dynamic In-Game Advertising John Turner, Alan Scheller-Wolf, Sridhar Tayur Tepper School of Business, Carnegie Mellon University, Pittsburgh, PA 15213 [email protected] • [email protected] • [email protected]

Dynamic in-game advertising is a new form of advertising in which ads are served to video game consoles in real-time over the Internet. We present a model for the in-game ad scheduling problem faced by Massive Inc., a wholly-owned subsidiary of Microsoft, and a leading global network provider of in-game ad space. Our model has two components: 1) a linear program (solved periodically) establishes target service rates, and 2) a real-time packing heuristic (run whenever a player enters a new level) tracks these service rates. We benchmark our model against Massive’s legacy algorithm: When tested on historical data, we observe 1) an 80-87% reduction in make-good costs (depending on forecast accuracy), and 2) a shift in the age distribution of served ad space, leaving more premium inventory open for future sales. As a result of our work, Massive has increased the number of unique individuals that see each campaign by on average 26% per week, and achieved 33% smoother campaign delivery, as measured by standard deviation of hourly impressions served. Key words : dynamic in-game advertising, video game advertising, display advertising, revenue management, linear programming, goal programming

1. Introduction Video games have incorporated static ads for decades: for example, an old racing game may have a billboard that always displays the same Sunoco ad. But recently, technology and business relationships have matured to allow Internet-enabled consoles (e.g. Xbox, PC’s) to dynamically serve ads over time, creating an entirely new ad market: Dynamic in-game ad technology allows in-game billboards to display different ads to different players based on their demographic, the time-ofday, the day-of-week, and possibly other parameters. The in-game advertising industry is growing quickly; at present, revenues are projected to reach $800 million by 2012 (Cai 2007). At the heart of an in-game ad system is the ad server: When a player enters a new level of a game, his console connects to the ad server via the Internet and requests new ad graphics for billboards, stadium walls, and other locations where ads are shown in the level. The ad server 1

Turner, Scheller-Wolf, and Tayur: Scheduling of Dynamic In-Game Advertising Article submitted to Operations Research; manuscript no. OPRE-2008-07-354-R3

2

decides which ads to serve to this player, functioning like a web server delivering banner ads. But, unlike on the web, where a selected ad is almost certainly seen, it is common for only a fraction of selected in-game ads to be seen. Billable ad time is thus recorded only when, as part of game play, the player navigates through the level and passes locations where ads are displayed. For this reason, and also because of additional constraints (such as saturation, competition, and context, discussed below), scheduling in-game ads is significantly more complicated than scheduling banner ads. The ad server is operated by a network provider, an intermediary between game publishers and advertising agencies. We focus on the scheduling problem faced by Massive Inc., a wholly-owned subsidiary of Microsoft and a leading global network provider. Game publishers allow Massive to sell and serve ads in their games, and consequently receive a portion of the generated revenues; ad agencies buy ad campaigns directly from Massive. The operational problem network providers like Massive face is how to schedule and serve ads to players over time so as to make the best use of their inventory of ad space. Campaigns purchased by an ad agency specify a target number of impressions (ads seen by gamers), a rough schedule for serving these impressions over time, and also a desired mix (e.g. 60% in sports games, 40% in the rest of the games). A campaign’s delivery may also be restricted to certain geographic areas and/or times of the day. In addition, the network provider must also manage 1) saturation: it is undesirable for a single player to simultaneously see many copies of the same campaign, 2) competition: campaigns of two competing brands – e.g. Coke and Pepsi – should not be served to the same player, and 3) context: ads should not seem out of place within the game – e.g. Coke ads belong on virtual soda machines, Tide ads do not. The size, scope, and complexity of Massive’s problem are such that even if there were no system uncertainty, optimization of their ad server would be difficult. But of course, uncertainty is present – there are three primary sources: 1) the acquisition of new games, 2) the sale of new campaigns, and 3) error in inventory forecasts of ad space. This last factor, uncertainty in the amount of ad space, arises because the number of players, the types (demographics) of players, and the ad space


3

that the players actually see during game play are not known when the scheduling problem needs to be solved. Thus, campaigns sometimes fall short of their impression goals or deviate from the desired pattern of delivery; in that case, the network provider offers the advertiser a make-good: the campaign is extended or the advertiser is offered a refund or credit for future use. We present the first planning/scheduling model and algorithm for dynamic in-game advertising. Our model has two components: 1) a linear program called the Weekly Plan is solved periodically to establish target service rates, and 2) a packing heuristic called the Real-time Algorithm is run whenever a player enters a new level, to serve impressions in accordance with these service rates. Benchmarking our model against Massive’s legacy algorithm using historical data, we observe 1) an 80-87% reduction in make-good costs (depending on forecast accuracy), and 2) a shift in the age distribution of served ad space, leaving more premium inventory open for future sales. Massive has begun a staged implementation of our model, and to date has benefitted from a 26% average increase in the number of unique individuals that see each campaign each week, and 33% smoother campaign delivery, as measured by standard deviation of hourly impressions served. We proceed as follows: In §2 we review the literature on media planning and scheduling, and describe existing models for broadcast television and web page banner ads. We introduce the problem in §3, the Weekly Plan LP in §4, and the Real-time Algorithm in §5. We benchmark our algorithm against Massive’s legacy algorithm in §6, and comment on our implementation learnings in §7. We conclude with comments and list future work in §8.

2. Literature Review To the best of our knowledge, ours is the first academic treatment of the scheduling of in-game advertisements. The process of dynamic in-game advertising is well documented (Chambers 2005, Svahn 2005), however operational problems, such as the scheduling problem we examine, have not been studied. In the traditional media planning literature, the optimization problem is usually that of a single advertiser and not of a network provider serving many advertisers. In these models, the advertiser chooses among advertising vehicles (e.g. newsprint, television, radio) to maximize


4

some combination of reach (audience size), frequency, and campaign duration subject to a budget constraint (see textbooks Rossiter and Danaher 1998, Gensch 1973). One typical assumption is wearout (the effectiveness of an ad decreases as the number of exposures to the same person increases); papers in this line of research include Thompson (1981) and Simon (1982), which arrive at an optimal pulsing strategy of advertising expenditure over time. Although some of these concepts are relevant, it is most instructive to compare our model with those that take the perspective of a third party scheduling many advertisers, such as models that plan TV commercials and web page banner ads. Structurally, in-game advertising sits between TV and web advertising: It has well-defined contracts like TV, yet decisions are made at a very fine granularity, as on the web. TV Commercials: Advertisers purchase 60-80% of the year’s ad space during a 2-3 week period in May called the upfront market; the remaining ad space is sold first-come-first-serve in the scatter market. In contrast, in-game ads are sold throughout the year, so the division between upfront and scatter markets is not profound. Therefore, papers (e.g. Araman and Popescu 2009) that determine the optimal upfront vs. scatter tradeoff are not directly applicable. Bollapragada et al. (2002) use goal programming to produce a sales plan for a single campaign that allocates commercial slots from TV shows such that the advertiser’s preferences are honored as closely as possible. Inventory is booked for each advertiser in sequence, allowing each advertiser to request changes to their plan before it is finalized. Audience size is assumed deterministic and there is no mention of make-goods allocation when schedules are not executed as-planned. This is a static upfront-market problem; in contrast, we consider the dynamic problem in which new campaigns and games materialize over a rolling horizon. Furthermore, our Real-time Algorithm is significantly different than the low-level ad slotting “ISCI Rotator” algorithms used for TV, since in our case each viewer can be shown different ads. Finally, Zhang (2006) also solves the (static, deterministic, upfront market) TV commercial scheduling problem, but in their case they assume that client negotiations are minimal, allowing them to simultaneously schedule all campaigns via a mixed-integer program.


5

Web Advertising: The most common objective in web advertising is maximizing click-through (e.g. Chickering and Heckerman 2003, Nakamura and Abe 2005), a concept that does not apply to in-game ads. However, some papers explore the essential dynamic tradeoff: choosing between serving the campaign with the highest bid versus the one farthest from achieving its contracted impression goal. Approaches include the online algorithm of Mehta et al. (2007) and the large LP solved by column generation by Abrams et al. (2007). Of these, Abrams et al. (2007) includes promising computational results on historical data, however, the authors note some challenges to implementing the full-scale algorithm in practice. In addition to tailoring our focus to in-game advertising, our problem definition is more comprehensive than either of these papers: we solve a multiobjective problem with many practical constraints not present in Abrams et al. (2007) or Mehta et al. (2007) (e.g. to spread impressions over time and across various ad spaces).

3. Definitions and Problem Statement Games, Zones, Inventory Elements, and Regions: A video game is subdivided into many zones, each having one or more inventory elements (the generic name given to in-game billboards and other locations where ads are displayed). Typically, each level of a game is a separate zone. Within each zone, inventory elements are grouped into regions based on their spacial proximity; all inventory elements visible from the same vantage point are typically in the same region. Requests, Arrivals, Game Instances, and Spots: When a gamer enters a new zone, his console sends a request to the ad server to select ad campaigns for all inventory elements in the zone. From the ad server’s perspective, an arrival of a gamer has just occurred. This arrival spawns a game instance; the gamer will continue to see the ads that were selected by the ad server for this game instance until the gamer departs from this zone. Although some inventory elements can only show one ad per game instance, others (e.g. an in-game Jumbotron) cycle through a sequence of ads; we call each element of the sequence an ad spot. Adtime: An inventory element in one game instance provides adtime equal to the total amount of time – not necessarily contiguous – the inventory element appears on-screen. Adtime can be

6


aggregated over multiple inventory elements and over multiple game instances, thus we can consider quantities such as the expected adtime of Zone 3 of Game A on Monday. Impressions: We measure billable adtime in impressions. Massive defines an impression as 10 seconds of time – not necessarily contiguous – in which a gamer sees the same ad campaign onscreen, possibly across multiple inventory elements in the same game instance. For example, given a zone with two single-spot inventory elements, if these elements log 7 and 8 seconds of adtime respectively in one game instance, then either: 1) the same campaign was served in both inventory elements, so b(7 + 8)/10c = 1 impression is counted toward that campaign’s impression goal, or 2) different campaigns were served in these inventory elements, and no impressions are counted for either campaign (b7/10c = b8/10c = 0). Paying and Nonpaying Campaigns: Paying campaigns specify an impression goal of qk and a unit price of pk , providing revenue of pk × qk . We can assume that Massive receives this revenue up-front when the campaign is negotiated. Subsequent scheduling decisions affect Massive’s ability to deliver the campaign; thus penalty costs (make-goods) may be incurred should campaign k not be delivered as promised. Nonpaying campaigns include house ads, Public Service Announcements (PSA’s), and default graphics; these do not generate any revenue, do not have minimum impression goals, and are served as “filler” when no paying campaigns can be served. Campaign Targeting – Audience Demographics and Context: Paying campaigns currently specify targeting constraints along three dimensions: geography, time, and context; i.e. impressions must come from specific geographic regions, time slots, and inventory elements, respectively. The geographic dimension is indexed by 300+ Designated Market Area (DMA) codes which allow Massive to constrain service to metropolitan areas across the United States and/or key areas in other parts of the world. There are 42 combinations of {Weekday, DayPart} tuples for the time dimension, one for each of 6 DayParts (contiguous groups of hours, e.g. Prime Time = 7PM-12AM) in each of the 7 weekdays. Finally, there are 1 to 50 Inventory Element Classes (IEC’s) per game, which group inventory elements by context (e.g. ‘all virtual soda machines’). By targeting specific games, geographic regions, and times, a campaign can be tailored for specific


7

audience demographics. In contrast, IEC targeting ensures proper context by integrating ads in realistic places: Coke ads, not Tide ads, belong on virtual soda machines. Competition and Saturation: Within each game instance, 1) campaigns of competing brands (e.g. Coke and Pepsi) cannot be shown, 2) each campaign can be served in at most ωz spots, and 3) when a campaign is served in multiple spots, the spots should preferably be in different regions. The quantity ωz is called the saturation cap, and is typically between a quarter to half the number of inventory elements in zone z. Campaign Delivery Quality: In addition to a campaign’s aggregate impression goal qk , the advertiser usually specifies other impression goals either explicitly or implicitly. These include mix requirements (e.g. at least 40% of impressions should come from sports games) and delivery schedules (e.g. uniformly spread delivery over time, or deliver more impressions the week before a new product launch). Mix requirements are modeled with an impression goal q(I,k) for certain groups of games I that campaign k desires; we call I a MixComponent. Delivery schedules can include weekly impression goals qkt and, if needed, weekly impression goals for each MixComponent t q(I,k) . It is usually also important to serve each campaign impressions from many different games,

since part of the value of contracting with a network provider such as Massive is derived from obtaining access to advertising space in a breadth of games from different publishers. Publisher Concerns: Since game publishers receive a portion of the revenues generated by the impressions served in their games, the network provider must attempt to schedule ads across games to avoid negatively affecting the revenue stream of a particular publisher. Supply: Inventory is not games, or zones, or inventory elements, but rather eyeballs: the total inventory available to fulfill a campaign is the number of impressions generated by gamers that match that campaign’s targeting. We use point forecasts of supply in our Weekly Plan LP, which are computed from the following point estimates of adtime: • sti = the expected adtime of game i, week t; • stid = the expected adtime of game i, week t, DMA d; • stie = the expected adtime of game i, week t, inventory element e; and


8

• stiw = the expected adtime of game i, week t, {Weekday, DayPart} w.

Defining the set of zones in game i as Zi , the set of inventory elements in zone z as Ez , and the sets of all DMA’s and all {Weekday, DayPart} tuples as D and W respectively, we compute “breakouts” – proportions of adtime that come from a single inventory element, DMA, or {Weekday, DayPart}: P t • btid = stid d0 ∈D sid0 = the proportion of game i, week t’s adtime that comes from DMA d; P t • btie = stie z∈Zi ,e0 ∈Ez sie0 = the proportion of game i, week t’s adtime that comes from inventory element e; and P t • btiw = stiw w0 ∈W siw0 = the proportion of game i, week t’s adtime that comes from {Weekday, DayPart} w. We use these breakouts to generate estimates of supply that are used by the Weekly Plan LP. This final step requires the additional notation: • Ezk = the set of inventory elements in zone z that matches the targeting of campaign k; • Dk , Wk = the sets of DMA’s and {Weekday, DayPart} tuples that match the targeting of

campaign k, respectively; • Pe = the set of spots in inventory element e; • se = the expected adtime of inventory element e in one game instance; • sp (e) = se /|Pe | = the expected adtime of any spot p ∈ Pe , assuming each spot of Pe is equally

likely to be on-screen; and • $ ∈ [0, 1] = a factor that approximates the conversion ratio between adtime and impressions.

We assume that when measuring aggregate supply over many inventory elements and/or large periods of time, adtime and impressions are approximately equal modulo this scaling constant. Note that if adtime was not rounded down to compute billable impressions, then $ = 1. We used ω = 0.9, which was consistent with the aggregate amount of rounding down in our dataset. bzk = Labeling the inventory elements of Ezk such that sp (1) ≥ sp (2) ≥ ... ≥ sp (|Ezk |), we define E {1, ..., ωz } as the set of the “largest” ωz inventory elements in zone z that match the targeting of

campaign k. Finally, we compute the supply estimates for the Weekly Plan LP: • sti = $sti = the estimated number of impressions provided by game i in week t;


! • stik = $sti

P

!

btid

btiw

P w∈Wk

d∈Dk

9

!

btie

P P

= the estimated number of impressions pro-

z∈Zi e∈Ezk

vided by game i in week t that match the targeting requirements of campaign k; and ! ! ! P t P t P P t t t bid biw bie = the estimated number of nonsaturated • sbik = $si w∈Wk

d∈Dk

z∈Zi e∈E b

zk

impressions provided by game i in week t that match the targeting requirements of campaign k. Demand: The demand for ad space arises from ad agencies purchasing campaigns. For each week t ∈ T of the planning horizon, Massive uses a proprietary method to estimate the distributions of aggregate demand-to-come for three groups of games: premium, middle, and discount tiers. Tightness of Capacity – Sell-through: We first define: • ιae = the adtime of inventory element e in the game instance started by arrival a (measured

ex-post); • Atz , Atzk = the sets of arrivals in week t, and in week t that match the targeting of campaign

k, respectively; • Kae = the set of campaigns with targeting that matches both arrival a and inventory element

e; and • Ik = the set of games that match the targeting of campaign k.

Then ιtk = $

P P

P

P

ιae is the number of impressions counted ex-post in week t that could

i∈Ik z∈Zi a∈At e∈Ezk zk

have been used by campaign k (assuming we disregard competition and saturation constraints). The sell-through of inventory element e in the game instance started by arrival a in week t is γae =

P k∈Kae

t qk , ιtk

the proportion of inventory element e in arrival a that would have been allocated to

paying campaigns if each campaign were served at the same rate across the entire inventory space it targeted. The population mean µγ and population standard deviation σγ describe sell-through over aggregated blocks of inventory; for example, the network sell-through over the planning horizon (i.e. computed over all games Υ and weeks T ) has mean, second moment, and standard deviation: P P

µγ =

P

P

P

P P

ιae γae

t∈T i∈Υ z∈Zi a∈At e∈Ez z P P P P P t∈T i∈Υ z∈Zi a∈At e∈Ez z

ιae

, µγ 2 =

P

P

P

2 ιae γae

t∈T i∈Υ z∈Zi a∈At e∈Ez z P P P P P

ιae

, σγ =

p µγ 2 − µ2γ .

t∈T i∈Υ z∈Zi a∈At e∈Ez z

Legacy Algorithm: Massive’s existing scheduling algorithm – the legacy algorithm – uses a


10

single impression goal qk for each campaign, and assumes that impressions should be spread uniformly across all weeks (i.e. it does not support delivery schedules that ramp up impressions the week before a product launch). When a new arrival spawns a game instance, the legacy algorithm checks whether each campaign k is ahead or behind schedule, and adjusts the service rates accordingly. Because the legacy algorithm serves campaign k at the same rate in each game i that its targeting matches, it is usually good at spreading impressions across MixComponents, providing impressions in a broad set of games, and satisfying publisher concerns. The most systematic shortcoming of the legacy algorithm is that it is myopic; since it does not use supply estimates, it is unable to proactively increase the service rate of a campaign if shortages are anticipated later.

4. Weekly Plan LP We periodically solve a linear program called the Weekly Plan to get xtik , the number of impressions allocated to campaign k from game i in week t. We update the target service rates λtik = xtik /stik whenever the Weekly Plan is re-solved; the Real-time Algorithm (see §5) uses these rates to select specific campaigns for the inventory elements of a given game instance. The Weekly Plan is similar to the goal program of Bollapragada et al. (2002), which is used to generate TV ad sales plans at NBC. Both formulations have an objective that reserves premium inventory for future sales while minimizing penalty costs for relaxing the many goal constraints. The NBC sales plan has mix constraints and weekly weighting constraints, which take a client’s goal x = q (# of commercial slots assigned from this subset of inventory = target) and relaxes it to l − y ≤ x ≤ u + y, where y ≥ 0 is a slack variable penalized in the objective and (l, u) are lower and upper bounds. We follow the same construction; however, our mix constraints specify an impression goal q(I,k) for MixComponent I (which is a group of games), whereas at NBC the mix t targets are for a single TV show. We also add a weekly mix constraint based on the target q(I,k) ,a

constraint that spreads impressions across all games within a MixComponent, and a constraint that ensures that each game is assigned a minimum dollar value in revenue. The NBC sales plan bounds the number of slots allocated to a given {Show, Week}, which partially enforces saturation and


11

competition constraints; saturation is then handled by a heuristic they call the ISCI Rotator, which spreads commercials of the same campaign apart chronologically. We follow a similar approach by including bounds on xtik that partially enforce the saturation cap, and defer the consideration of competition and saturation to the Real-time Algorithm. Finally, we note that NBC’s goal program is an integer program (as discrete commercial slots are allocated) that generates a sales plan for a single campaign, whereas we allocate impressions for many campaigns simultaneously using a linear program. The Weekly Plan LP appears in Figure 1. For compactness, some quantities are shown with lower and upper bounds within the same constraint; hence, slack variables exist on both sides of some constraints. The variables are λ, v, w, x, y, z; all other quantities are constants; see Table 1 for the full list of notation. Since we re-solve the Weekly Plan LP over time, we divide time into two parts: The constant a denotes all impressions of x achieved up to the present, and x˙ denotes all impressions of x planned into the future. Hence, x = a + x. ˙ Similarly, dots on other constants and variables indicate quantities for the remainder of the relevant time interval. Constraints 1 through 5 are goal-type constraints that model campaign contract requirements which may be relaxed and penalized in the objective: Constraint 1 (end-of-horizon impression goal) makes sure that for each campaign k, the number of impressions planned in the remainder of the horizon x˙ k , plus the planned shortfall yk , must equal the remaining end-of-horizon impression goal (qk − ak )+ ; constraint 2 (weekly impression goals) ensures the number of planned impressions xtk is between the bounds lkt and utk for each campaign k and week t; constraint 3 (end-of-horizon mix target) ensures at least l(I,k) impressions from MixComponent I are allocated to campaign t k; and constraint 4 (weekly mix targets) ensures at least l(I,k) impressions from MixComponent

I are allocated to campaign k in week t. Constraint 5 (spread impressions to all games within a MixComponent) models the following idea: “For each {MixComponent, Campaign, Week}, try to set the same service rate for all games.” Defining the service rate for a {MixComponent, Campaign, Week} as the variable λt(I,k) , serving all games in a MixComponent at the same rate requires


12 Figure 1

The Weekly Plan LP

max −

P k∈K

βk ykt −

P

πk yk −

k∈K,t∈Tk

ζi zit +

t∈T,i∈Υ

k∈K,I∈Φk ,t∈Tk

s.t.

P

= (qk − ak )

x˙ k + yk

t ηk y(I,k)


P

t t θ(I,m) w˙ (I,m)

I∈Ψ,t∈T,m=1..n +

∀k ∈ K

(1)

∀t ∈ Tk , ∀k ∈ K

(2)

lkt − ykt

≤ xtk

l(I,k) − y(I,k)

≤ x(I,k)

∀I ∈ Φk , ∀k ∈ K

(3)

t t l(I,k) − y(I,k)

≤ xt(I,k)

∀I ∈ Φk , t ∈ Tk , ∀k ∈ K

(4)

t stik λt(I,k) − v(I,k) ≤ xtik P

≤ utk + ykt

P

τk y(I,k) −

k∈K,I∈Φk

t ξk v(I,k) −

P

−

P

pk xtik + zit

t ≤ stik λt(I,k) + v(I,k) ∀i ∈ I, ∀I ∈ Φk , t ∈ Tk , ∀k ∈ K

(5)

≥ rit

∀t ∈ T, i ∈ Υ

(6)

= s˙ ti

∀t ∈ T, i ∈ Υ

(7)

k∈K

x˙ ti + w˙ it

t

x˙ tik

≤ sb˙ ik

∀i ∈ Ik , t ∈ Tk , ∀k ∈ K

(8)

xtik − x˙ tik

= atik

∀i ∈ Ik , t ∈ Tk , ∀k ∈ K

(9)

t w˙ (I,m) P

≤ ϑ˙ t(I,m)

∀I ∈ Ψ, m = 1..(n − 1), t ∈ T

(10)

∀I ∈ Ψ, t ∈ T

(11)

=0

∀k ∈ K

(12)

= ak

∀k ∈ K

(13)

=0

∀t ∈ Tk , ∀k ∈ K

(14)

= a(I,k)

∀I ∈ Φk , ∀k ∈ K

(15)

=0

∀I ∈ Φk , t ∈ Tk , ∀k ∈ K

(16)

=0

∀t ∈ T, i ∈ Υ

(17)

t w˙ (I,m) −

m=1..n

w˙ it = 0

i∈I

x˙ tik

P

x˙ k −

P

i∈Ik ,t∈Tk

xk − x˙ k P t xtk − xik i∈Ik

P

x(I,k) −

x˙ tik

i∈I,t∈Tk

xt(I,k) −

P

xtik

i∈I

x˙ ti −

P

x˙ tik

k∈K

λ, v, w, x, y, z

≥0

for all forms of these variables (18)

satisfying λtik = λt(I,k) ∀i ∈ I. By multiplying xtik /stik = λt(I,k) through by stik and introducing the t slack v(I,k) , we obtain constraint 5. Note that stik λt(I,k) can be interpreted as the nominal number t of impressions to plan in game i, and so v(I,k) penalizes the maximum absolute deviation from the

nominal number of impressions across all games in MixComponent I. Constraint 6 (minimum revenue target for each game) is the only goal-type constraint that models publisher contract requirements. This constraint ensures that enough impressions are allocated to


13

Indices i, k, t = a game, a campaign, a week m = a segment of the piecewise-linear expected revenue function for unused inventory I = a set of games (can be either a MixComponent or a tier)

Slack Variables and Penalty Costs v(tI,k) /ξk = maximum absolute deviation from the nominal number of impressions to serve campaign k in game i at week t, taken over all games i in MixComponent I / penalty cost for not spreading impressions across all games in a MixComponent Sets yk /πk = shortfall amount / penalty cost for the endΥ, K, T = all games, all paying campaigns, all of-horizon impression goal for campaign k weeks ykt /βk = amount bounds violated / penalty cost for Ik = games that campaign k can be served in the impression goal for campaign k, week t Tk = weeks that campaign k can be served in y(I,k) /τk = amount bounds violated / penalty cost for the end-of-horizon impression goal for campaign Collections (sets of sets) k, MixComponent I Φ = all MixComponents y(tI,k) /ηk = amount bounds violated / penalty cost Φk = MixComponents for campaign k for the impression goal for campaign k, MixCompoΨ = all tiers of games nent I, week t zit /ζi = shortfall amount / penalty cost for the minParameters ak , a(I,k) , atik = # of impressions served to date for imum revenue target of game i, week t campaign k, for campaign k in MixComponent I, Decision Variables and for campaign k in game i during week t wit = impressions of game i available to satisfy future lkt , utk = bounds on the # of impressions to show for demand in week t campaign k in week t t = impressions of tier I available to satisfy wI,m l(I,k) , l(tI,k) = lower bounds for the # of impressions future demand in week t, placed in segment m of the to serve to campaign k of MixComponent I over all piecewise-linear expected revenue function weeks, to campaign k of MixComponent I in week t x = impressions allocated to campaign k over the k n = # of piecewise-linear segments in expected planning horizon revenue function xtk = impressions alloc’d to campaign k from week t pk = price per impression paid by campaign k xti = impressions allocated from game i, week t qk = end-of-horizon impression goal for campaign k xt = impressions allocated to campaign k from game ik rit = minimum revenue amount for game i, week t i, week t sti = expected supply of game i in week t x ( I,k ) = impressions allocated to campaign k from stik = expected supply of game i in week t that MixComponent I over the planning horizon matches campaign k xt(I,k) = impressions allocated to campaign k from sbtik = expected nonsaturated supply of game i in MixComponent I, week t week t that matches campaign k λt(I,k) = nominal rate to serve impressions of Mixθ(tI,m) = marginal revenue in the mth segment of the Component I to campaign k in week t piecewise-linear expected revenue function for tier I, week t Note: Variables superscripted with a dot indicate ϑt(I,m) = width of the mth segment of the piecewise- “the component of this quantity from now until the linear expected revenue function for tier I, week t relevant time period.” Hence, xtik = atik + x˙ tik Table 1

Notation for Weekly Plan LP

game i in week t to guarantee at least rit dollars in revenue. Constraint 7 (supply) states that for the remaining portion of week t in game i, the number of impressions planned, plus the number of impressions left unplanned, must equal the forecast supply. Constraint 8 (saturation cap bound) partially enforces the saturation cap by bounding the num-


14

ber of impressions assigned to campaign k in game i, week t by the estimated nonsaturated supply sbtik . Constraint 9 (link total planned impressions with remaining planned impressions) ensures that for each {Game, Campaign, Week}, the total number of impressions planned, xtik , must equal the actual number of impressions achieved up to the present, atik , plus the number of impressions planned from the present until the end of the week, x˙ tik . For weeks t >= 2, we always have atik = 0, and so this constraint reduces to xtik = x˙ tik . Constraints 10-11 model a piecewise-linear component of the objective function, and are described with the objective in §4.1. Constraints 12-17 link the various forms of x. In an actual implementation, only the variables xtik and x˙ tik are required; the other forms of x can be written as linear combinations of xtik and x˙ tik . We use the other forms of x to keep constraints 1-11 readable. Constraints 3 and 4 do not have upper bounds because MixComponents are not mutually exclusive: game i may be both a Sports Game and a Racing Game; in that case, impressions allocated to game i count toward two mix targets. Had upper bounds existed, they may need to be violated in order to satisfy the end-of-horizon goal (constraint 1).

4.1. The Objective Function The objective is to maximize revenues that can be affected by scheduling; it contains two terms. Objective Term 1 (penalty costs for breaking contracts) is: P P P − πk yk − βk ykt − τk y(I,k) k∈K k∈K,t∈Tk k∈K,I∈Φk P P t t − ηk y(I,k) − ξk v(I,k) − k∈K,I∈Φk ,t∈Tk


P

ζi zit .

t∈T,i∈Υ

We model the costs of falling short of the goals modeled by constraints 1-6 as negative revenues from the delivery of make-goods for broken contracts. Make-good costs include direct compensation for underdelivery, transaction costs, and loss of goodwill. Direct compensation is either the dollar amount refunded to the advertiser, or the (shadow) cost of assigning additional inventory to extend the campaign past its scheduled end date. Transaction costs include all manual processing to issue the make-good, including getting approval to extend the campaign. Loss of goodwill models the


15

advertiser’s displeasure with the quality of service they received, and reflects the estimated loss of repeat business. Transaction costs and goodwill costs are likely significant, yet are extremely hard to estimate; fortunately, with regard to linear programming, exact estimation of the penalty costs is not as important as their relative ranking. Massive suggested the following penalty costs, which are proportional to campaign prices pk : πk = pk per impression short of the end-of-horizon goal for campaign k; βk = 0.1pk per impression over/under the allowed deviation from the weekly goal; τk = 0.1pk over/under the allowed deviation from the end-of-horizon goal for each MixComponent; ηk = 0.05pk over/under the allowed deviation from the weekly goal for each MixComponent; and ξk = 0.05pk for not spreading impressions evenly across all games in a MixComponent. We also used ζi ∈ {0.5, 0.25, 0.1} as the per-dollar penalties of falling short of the minimum revenue target of game i (penalties are differentiated by tier). Note that in practice, make-good costs should also be increased for preferred customers (which are often given quantity discounts) to compensate for the fact that the above formulas assign low penalty costs to campaigns with low prices. Objective Term 2 - general form (revenue potential of unscheduled inventory) is: X

fIt (wIt ).

I∈Ψ,t∈T

Since supply and demand change over time, the scarcity of inventory, and therefore the value of unscheduled inventory, will change over time. By scheduling appropriately, we maximize the total dollar value of unscheduled inventory, i.e. the expected future sales revenue. The general form of objective term 2 is as listed above, where Ψ is the set of all tiers of games (premium, middle, discount), T is the set of all weeks, and fIt is a function that values the quantity of unscheduled inventory wIt for tier I in week t. Suppressing subscripts and superscripts, the value function for unscheduled inventory f (w) of a particular {Tier, Week} is: +

Zw

f (w) := pE[min(X, w)] = pE[X − (X − w) ] = p

G(x)dx, 0


16

where p is the fixed price for the type of inventory under consideration, X is a random variable that models market demand, G(x) is the cumulative distribution function of demand, and G(x) = 1 − G(x). Thus if we leave w units of inventory available, expected sales is E[min(X, w)], yielding f (w) dollars in expected revenue. Rw d G(x)dx = pG(w) and G is a nonincreasing function, f (w) is concave increasSince f 0 (w) = p dw 0

ing for all demand distributions of X. Therefore, we can approximate f (w) by a piecewise-linear function with n segments of successively smaller positive slope. Denoting the marginal revenues (slopes) as θm and segment widths as ϑm for the m = 1..n segments, the expected revenue from future sales is modeled by the linear form: Objective Term 2 - linear form: X

t t θ(I,m) w˙ (I,m)

I∈Ψ,t∈T,m=1..n

and is accompanied by constraint 10 (bound the unplanned impressions in each segment) and constraint 11 (link unplanned impressions by {Tier, Week, Segment} to {Game, Week}). t t Since the slope parameters θI,m are decreasing in m and we are maximizing, w˙ I,m = 0 =⇒ t t w˙ I,m+1 = 0. The bounds on w˙ (I,m) , namely ϑ˙ t(I,m) , ensure that impressions are accounted for in the

correct segment, where ϑ˙ t(I,m) is defined as follows (ρ is the percentage of week 1 remaining): ( ρϑt(I,m) for t = 1 t ϑ˙ (I,m) := . ϑt(I,m) for t ≥ 2 We assumed stationarity in the tests we performed, and computed the required cdf’s from historical data. In practice, we expect the historical cdf to be used as a baseline that Massive can manually align with their sales projections (keeping the general shape of the cdf the same, but shifting its mean). For an example that uses a discrete demand distribution to compute θ and ϑ, see §EC.1. Alternately, when only E[X] is specified, we can use G(x) = {0 if x < E[X], 1 if x ≥ E[X]} to compute f (w|E[X]) = {w if x < E[X], E[X] if x ≥ E[X]}. By Jensen’s Inequality, we know f (w|E[X]) ≡ E[min(E[X], w)] >= E[min(X, w)] ≡ f (w); thus, f (w) is a function that is bounded above by


17

f (w|E[X]) yet has similar asymptotic behavior; i.e. f (w) → w as w → 0, and f (w) → E[X] as w → ∞. An appropriately-chosen spline can be approximated with a few piecewise-linear segments to get the piecewise-linear form of f (w) used by our model. Finally, we point out that in theory, censored demand can be handled by the Kaplan-Meier estimator (Talluri and Van Ryzin 2005); however, this is not necessary when working with tiers: individual games may get sold out, but since tiers are large aggregations of games, excess inventory usually exists.

5. Real-time Algorithm Each arrival to the ad server invokes the Real-time Algorithm, which assigns campaigns to inventory elements in the player’s game instance. The Real-time Algorithm makes this allocation in accordance with the service rates λtik computed by the Weekly Plan LP, while also obeying campaign targeting, saturation, and competition constraints. The Real-time Algorithm serves the same purpose as the ISCI Rotator algorithm for scheduling commercials in broadcast television (Bollapragada et al. 2004, Bollapragada and Garbiras 2004). A broadcast TV sales plan (analogous to our Weekly Plan) specifies the set of commercials to display during each airing of each show, leaving the ISCI Rotator to choose, for each ad, the commercial breaks and positions within those breaks where each ad should be aired. The ISCI Rotator allocates ads to slots such that saturation is managed (two airings of the same commercial are as evenly spaced as possible) and competition constraints are obeyed (competing brands are not displayed in the same commercial break). Although there are some high-level similarities, the Real-time Algorithm operates differently than the ISCI Rotator. This is because targeting constraints add complexity to the Real-time Algorithm, and because the Real-time Algorithm sacrifices optimality for speed: it is run millions of times per day compared to the ISCI Rotator’s nightly execution. The Real-time Algorithm performs two operations: 1) it adjusts the service rates for the current game instance, and 2) assigns campaigns to inventory elements in accordance with the adjusted service rates. Notation for the Real-time Algorithm is summarized in Table 2.


18 Indices a = an arrival b, b0 , b∗ = a bucket bu = the “unpaid” bucket c = an IEC k, k 0 , k 00 , k ∗ = a campaign p, p∗ = a spot r = a region u = all nonpaying campaigns

Input: Parameters se = expected adtime of inventory element e sp = sP e(p) /|Pe(p) | = expected adtime of spot p scz = e∈Ecz se = expected adtime of IEC c in one game instance of zone z χk = contracted end date of campaign k ωz = saturation cap of zone z

Functions of Indices b(c, k) = bucket for IEC c, paying campaign k c(p) = IEC corresponding to spot p k(b) = campaign corresponding to bucket b e(p) = inventory element corresponding to spot p r(p) = region corresponding to spot p Input: Sets Cz = IEC’s in zone z Ez = inventory elements in zone z Ecz = inventory elements in zone z and IEC c Ka = campaigns with targeting matching arrival a Kac = campaigns with targeting matching both arrival a and IEC c Pb ≡ Pc(b),z = spots that can be placed in bucket b Pe = spots of inventory element e Pz = ∪e∈Ez Pe = spots in zone z Pcz = ∪e∈Ecz Pe = spots in zone z and IEC c Rz = regions in zone z

Inputs/Variables (inputs that get adjusted) λck = service rate for campaign k in IEC c Λa = service rate matrix for arrival a Variables: Scalars nr,k = number of times a spot from region r was allocated to campaign k vb = unallocated expected adtime of bucket b xp = unallocated expected adtime of spot p yb,p = expected adtime of spot p alloc’d to bucket b EIGHT ED = weighted average service rate for λW k campaign k, computed over all IEC’s φ, ψ = used in local computations

Variables: Sets K ∗ = campaigns of competing brands K U = campaigns strictly under the saturation cap B = buckets for paying campaigns P ∗ = spots p∗ that can be placed in the chosen bucket b0 , and are in a region r(p∗ ) where we have allocated campaign k(b0 ) the fewest number of times P ∗∗ = spots not completely allocated Input: Collections (sets of sets) Ka = groups of campaigns that are competing P˜k = spots in which campaign k should be served brands with targeting that matches arrival a Table 2

Notation for Real-time Algorithm

5.1. Stage 1: Adjust Service Rates for the Current Game Instance An arrival of type a={Zone, DMA, Weekday, DayPart} is initially assigned the service rate matrix Λa , the elements of which are:   λ1 if campaign k matches the targeting of both IEC c and   i(a),k the {Zone, DMA, Weekday, DayPart} specified by a λck :=   0 otherwise where i(a) is the game that includes the zone specified in a, and λ1i(a),k is the service rate for campaign k, game i(a), week 1 from the Weekly Plan. Algorithm 1 modifies the rate matrix Λa in order to satisfy 1) competition constraints, and 2) the constraint

P k∈Kac

λck ≤ 1 ∀ c ∈ Cz , which states that for each IEC c in the set of IEC’s


19

Cz that intersect the current zone z, the sum of the service rates of campaigns Kac eligible for service to this {Arrival, IEC} must not exceed 1. In step 1, we compute a weighted average rate EIGHT ED λW for each campaign over all IEC’s in the zone. Next, we use the weighted average rates k

to assign one random campaign from each set of competing brands a nonzero service rate (steps 2-5). We have assumed that the groups of competing brands are mutually exclusive; however, a straightforward extension of this algorithm applies to overlapping groups of competing brands. Finally, in steps 6-12, the algorithm decreases the service rates of some campaigns (if necessary) in order for

P

λck ≤ 1 ∀ c ∈ Cz to hold; note the campaigns with the latest end dates (which have

k∈Kac

the most time to catch up later) are chosen. See §EC.2.1 for a detailed example. Algorithm 1 AdjustServiceRates P P EIGHT ED 1: λW ← c∈Cz scz λck k c∈Cz scz ∀ k ∈ Ka ∗ 2: for all K ∈ Ka do P W EIGHT ED EIGHT ED 3: k 0 ← kP ∈ K ∗ with probability λW k k∗ ∈K ∗ λk∗ 4: λck0 ← k∈K ∗ λck ∀ c ∈ Cz , and λck ← 0 ∀ k ∈ K ∗ \{k 0 }, c ∈ Cz 5: end for 6: for all c ∈ Cz do 7: Label the campaigns k ∈ Kac such that χ1 ≥ χ2 ≥ ... ≥ χ|Kac | + P ; k←1 8: φ← k∈Kac λck − 1 9: while φ > 0 do 10: ψ ← min{λck , φ}; λck ← λck − ψ; φ ← φ − ψ; k ← k + 1 11: end while 12: end for 13: return Λa

5.2. Stage 2: Assign Campaigns to Inventory Elements Let Ka be the set of paying campaigns with targeting that matches arrival a, and let u represent all unpaid campaigns. For each campaign k ∈ Ka ∪ {u}, Algorithm 2 outputs a set of spots P˜k in which campaign k should be served. The algorithm begins by creating a set of buckets B; the size of bucket b(c, k) ∈ B is the expected adtime for paying campaign k in IEC c, assuming campaign k is served at rate λck . An infinitely-large bucket bu ∈ / B called the “unpaid bucket” is also created; it is used when service rates are too low to assign all spots to paying campaigns. The main variables

20


are: vb = the unallocated expected adtime of bucket b; xp = the unallocated expected adtime of spot p; yb,p = the expected adtime of spot p allocated to bucket b; and nr,k = the number of times a spot from region r has been allocated to campaign k. We will sometimes use index functions such as r(p) = the region corresponding to spot p; and k(b) = the campaign corresponding to bucket b. Recall from §3 that the expected adtime of any spot p of inventory element e is sp ≡ sp (e) = se /|Pe |. Steps 1-5 comprise the initialization phase. Algorithm 2 AssignCampaigns 1: vbu ← ∞; ybu ,p ← 0 ∀ p ∈ Pz ; B ← ∅ 2: for all c ∈ Cz , k ∈ Kac do 3: Create bucket b ≡ b(c, k); vb ← λck scz ; yb,p ← 0 ∀ p ∈ Pz ; B ← B ∪ {b} 4: end for 5: xp ← sp ∀ p ∈ Pz ; nr,k ← 0 ∀ r ∈ Rz , k ∈ Ka P 6: while b∈B vb > 0 do P ∗ v 7: b0 ← b ∈ B with probability v ∗ b b b ∈B 8: P ∗ ← p∗ ∈ Pb0 nr(p∗ ),k(b0 ) = minp∈Pb0 nr(p),k(b0 ) ; p0 ← arg maxp∈P ∗ xp 9: ψ ← min{vb0 , xp0 }; yb0 ,p0 ← yb0 ,p0 + ψ; vb0 ← vb0 − ψ; xp0 ← xp0 − ψ; nr(p0 ),k(b0 ) ← nr(p0 ),k(b0 ) + 1 10: end while 11: P ∗∗ ← {p ∈ Pz |xp > 0} 12: if P ∗∗ 6= ∅ then 13: ybu ,p ← xp ∀ p ∈ P ∗∗ , xp ← 0 ∀ p ∈ P ∗∗ 14: end if ˜k ← ∅ ∀ k ∈ Ka ∪ {u} 15: P 16: for all p ∈ Pz do 17: b0 ← {b ∈ B ∪ {bu } with probability yb,p /sp } 18: yb0 ,p ← sp ; yb,p ← 0 ∀ b ∈ B ∪ {bu }\{b0 }; P˜k(b0 ) ← P˜k(b0 ) ∪ {p} 19: end for ˜k | > ωz , k 6= u} do 20: for all k 0 ∈ {k : |P ˜ 21: while |Pk0 | > ωz do p U ˜ 22: p0 ← Parg minp∈P˜k0 s ; K ← {k : |Pk | < ωz , k 6= u} 23: if k∈KU λc(p0 ),k > 0 then P 0 ∗ 24: k 00 ← k ∈ K U with probability λc(p0 ),k k∗ ∈K U λc(p ),k 25: else 26: k 00 ← u 27: end if 28: P˜k0 ← P˜k0 \{p0 }; P˜k00 ← P˜k00 ∪ {p0 } 29: end while 30: end for ˜ 31: return P

In steps 6-10, spots are fractionally assigned to buckets b ∈ B; i.e. a spot may be assigned to more than one bucket at this stage. At each iteration of the loop, we randomly draw a non-full bucket b0


21

and a non-completely-assigned spot p0 , and assign a portion (or all, if possible) of spot p0 to bucket b0 . We bias toward assigning large spots to large buckets, which achieves a tight packing; yet we do not always pick the largest bucket and spot: buckets are chosen randomly so the visual placement of ads within the game is randomized, and spots are chosen to spread multiple copies of the same campaign to spots in different regions. The chosen spot p0 is the spot with the largest amount of unallocated expected adtime among those in the set P ∗ , where P ∗ is the set of all spots p∗ that can be placed in chosen bucket b0 and are in a region r(p∗ ) where we have allocated campaign k(b0 ) the fewest number of times. In steps 11-14, any spots that did not fit into the buckets for paying campaigns B are assigned (in whole or in part) to the unpaid bucket bu . It can be shown that when the algorithm is initialized, P b∈B

vb ≤

P

xp , so all buckets b ∈ B are always filled at this point.

p∈Pz

In steps 15-19, the algorithm executes randomized rounding (see Raghavan and Tompson 1987): each spot p gets assigned in whole to bucket b (and removed from all other buckets) with probability yb,p /sp (the proportion of spot p currently assigned to bucket b). At this point, buckets may be perfectly filled, over-filled or under-filled – all cases are feasible. A perfectly filled (resp. over-filled / under-filled) bucket b(c, k) corresponds to serving campaign k in IEC c exactly at (resp. above / below) the service rate λck in expectation. In steps 20-30, the algorithm adjusts the final solution P˜k (the set of spots in which to serve campaign k) if the saturation cap is exceeded (the same campaign appears in more than ωz spots). We reassign the smallest spots p0 to minimize the resulting change to the service rates; we first reassign spots in whole to paying campaigns that are under the saturation cap, drawn proportionally to λc(p0 ),k when there are multiple such campaigns, and then to the unpaid campaign bucket. See §EC.2.2 for an example of Algorithm 2.

5.3. Properties of the Solution At its conclusion, the Real-time Algorithm has assigned a campaign to each spot such that:


22

1. DMA, Weekday, DayPart, and IEC targeting are satisfied, since only campaigns that target the current arrival a receive nonzero service rates; 2. Competition constraints are satisfied, due to Algorithm 1; 3. The saturation cap is satisfied and local oversaturation is controlled, due to Algorithm 2; 4. Each campaign is served in expectation at the rate specified by the Weekly Plan LP (except perhaps for a few campaigns with later due dates that had their service rates reduced by steps 6-12 of Algorithm 1). Thus, in expectation, our algorithm meets the impression goals, mix requirements, spread constraints, and revenue targets specified in the Weekly Plan, while maximizing revenues affected by scheduling decisions. This follows because, excluding adjustments made to satisfy the saturation cap, Algorithm 2 fills buckets exactly to their fill lines in expectation; and 5. Variance in the service rates is low, since buckets are packed close to their fill lines by using small spots to finish the packing. (These are also the spots that get reassigned by randomized rounding and to enforce the saturation cap.)

6. Experimental Findings 6.1. Primary Benefit Using 26 weeks of historical data, we experimentally evaluate (backtest) the performance of our algorithm using a version of Massive’s legacy algorithm that was provided as the benchmark. We set the horizon of the Weekly Plan LP at 13 weeks; therefore, at each point during the backtest we need 13 weeks of inventory supply forecasts (for games, as well as inventory elements, DayPart/Weekdays, and DMA’s within games) as input. We generate forecasts from actual data by applying a random multiplicative noise term, the magnitude of which is parameterized by α. We backtest through weeks t = 1..14, using generated forecasts from weeks t through t + 12; the Weekly Plan LP is re-solved hourly. We use five forecast instances, which range from a perfect forecast (α = 0) to a highly-variable forecast (α = 0.2). The full description of our forecast generation scheme, which produces adtime estimates sti , stid , stie , and stiw , defined in §3, is given in §EC.3. To summarize, we generate multiplicative forecast errors for a forecast h weeks from today by taking the product of h lognormal


23

random variables; the lognormal factors have median eµ , where µ is a random variable dependent on α with E[µ] = 0. When µ = 0, forecasts for one week into the future have a 95% confidence interval of [1/ς, ς], where ς = e1.96σ and σ = 2α (this confidence interval is symmetric on the logarithmic scale). For α = 0.04 and α = 0.2, this confidence interval is [0.85, 1.17] and [0.46, 2.19] respectively; i.e. when α = 0.2 and µ = 0, a one-week advance forecast for the actual value of 1000 is between 460 and 2190 with probability 0.95. And for the same α, confidence intervals grow wider as the forecasted period moves farther into the future. Our tests on historical data show that we significantly reduce make-good costs and increase sales revenue potential relative to the legacy algorithm. Make-good costs (objective term 1 in §4.1) decrease by 80-87%, depending on the magnitude of forecast error (see Figure 2). In the most conservative case of α = 0.2 (it is expected that forecasts are at least this accurate), make-good costs decrease by 80%. Ex-Post Objective Comparison - Penalty Costs 250

Penalty Cost

End of Campaign Impression Goal 200

End of Campaign Mix Goal

150

Weekly Campaign Impression Goal Weekly Campaign Mix Goal

100 Balance Impressions Across Games In Same MixComponent

50

Minimum Weekly Revenue Target by Game 0 0

0.0016

0.008

0.04

0.2

Legacy

Model

Figure 2

Performance of our algorithm compared to the legacy algorithm for different levels of forecast error α. The legacy algorithm does not use forecast data.

Revenues from the future sales of ad space (objective term 2 in §4.1) cannot be evaluated directly: Since we are backtesting, the sales process in our actual data cannot be affected by our scheduling


24

algorithm. Therefore, we use the distribution of the age of served impressions, the time between the sale of the campaign and when the impression was served, as a proxy. A good scheduling algorithm will try to use the least costly inventory to satisfy existing campaigns, leaving premium inventory reserved for future sales. Therefore, a good scheduling algorithm should shift impressions from tier 1 (premium) games into tier 2 (middle) or tier 3 (discount) games when mix constraints allow. When impressions cannot be appreciably shifted between tiers, a good scheduling algorithm should try to serve tier 1 inventory as early as possible, and serve tier 2 and 3 inventory later on, again, saving the most valuable inventory for future sales. In Figure 3, we compare the age distribution from our algorithm with that of the legacy algorithm; in all cases, the forecast error parameter is set to α = 0.2 (the most conservative scenario). As expected, our algorithm shifts a substantial portion of impressions from tier 1 into tiers 2 and 3. The legacy algorithm serves 92% of its impressions from tier 1 games whereas our algorithm serves just under 80% from tier 1. Furthermore, our algorithm serves tier 1 and tier 2 impressions earlier in a campaign’s life, while tier 3 impressions are served later: 63% of tier 1 impressions are 6 weeks old or younger using our algorithm, compared to 56% under the legacy algorithm.

6.2. Stability of our Algorithm Figure 4(a) shows our algorithm makes steady, continuous progress toward meeting the end-ofcampaign impression goals; the plot shows the cumulative number of impressions logged by a single campaign from its start date to its end date. Figure 4(b) plots the cumulative number of impressions logged by this same campaign in its first week. Notice that the LP’s plan is revised midweek, at which point the LP decides it is less costly to overrun the weekly goal a bit than to use inventory from future weeks to serve this campaign. Our algorithm successfully meets the end-of-week target, despite the fact that this campaign starts mid-week.


% of Impressions by Tier

100%

25 Tier 1 Age CDF

1 0.8

80% 60%

Ti 3 Tier 3 Tier 2

40%

Tier 1

0.6

Proposed

0.4

Legacy

0.2

20%

0 0%

1 Proposed

Tier 2 Age CDF

1

5

7

9 11 13 15 17 19 21 23 25 Weeks

Tier 3 Age CDF

1

0.8

0.8

0.6

Proposed

0.4

Legacy

0.2

0.6

Proposed

0.4

Legacy

0.2

0

0 1

Figure 3

3

Legacy

3

5

7

9 11 13 15 17 19 21 23 25 Weeks

1

3

5

7

9 11 13 15 17 19 21 23 25 Weeks

The tier distribution of impressions served (top left) and age distributions of impressions served for each tier, for the legacy algorithm and our proposed algorithm (using α = 0.2).

2,500,000

LP Solution Performance: Tracking Weekly Campaign Goal

LP Solution Performance: Tracking Campaign Goal

180,000 150,000

Goal

1,500,000

Actual 1,000,000 500,000 0 01/01/07

Figure 4

Impresssions

Impressions

2,000,000

120,000 90,000 60,000

Goal Planned Actual

30,000

01/22/07

02/12/07

03/05/07

03/26/07

0 01/01/07

01/03/07

01/05/07

01/07/07

The progress of the Real-time Algorithm (labelled “actual”) plotted over time for the largest campaign in the system. The forecast error parameter is the most conservative (α = 0.2). In (a) the goal is the end-of-campaign impression goal; in (b) we show the end-of-week impression goal and the number of impressions planned this week by the Weekly Plan LP.

6.3. Performance under Higher Sell-through and Tighter Targeting We now evaluate our algorithm as the number of campaigns in the system increases and the supply constraints become tighter (or infeasible in some cases), thereby validating our approach as sales volume increases. We use sell-through and targeting percentage to measure the tightness of

26


the supply constraints. As defined in §3, the mean and standard deviation of sell-through at the network level describe the tightness of supply: a high standard deviation of sell-through indicates that in the space of all impressions in the network, there are pockets of inventory with high sellthrough, corresponding to intersections of games, DMA’s, IEC’s, weekdays, and dayparts that are P

in high demand. The targeting percentage of campaign k, computed as

P

t∈Tk i∈Ik

P

P

t∈Tk i∈Υ

stik sti

, represents

the percentage of impressions in the network that campaign k targets. A low targeting percentage (i.e. tight targeting) alone doesn’t make a campaign difficult to serve; but when a tightly targeted campaign targets inventory that also has high sell-through, serving this campaign is difficult. We generate six additional test cases with different levels of sell-through and targeting and benchmark the performance of our algorithm relative to the legacy algorithm. Each test case is built by augmenting the original problem instance from §6.1 (hereby called ‘status quo’) with a set of 200 randomly generated campaigns. The test cases differ in two properties of the added campaigns: 1) the mean daily impression goal relative to the empirical distribution (which affects sell-through), and 2) the number of DMA’s that match (which affects targeting). The other campaign parameters are consistent with the empirical distributions. The six test cases, which are listed in Table 3, have targeting percentages and coefficients of variation of sell-through in the same general range as the ‘status quo’ test case, while some aspect of the instance (most often mean sell-through) is more highly constrained. (The precise values of these parameters for the status quo case are proprietary.) The test cases were constructed with mean sell-through at either (approximately) 27% or 52%, and targeting percentages from 1% to 10%; as a result, there is a large range for the standard deviation of sell-through. We use α = 0.2 in all test cases. Impression Number of Standard Goal Matching Mean Deviation of Targeting Instance Multiplier DMA’s Sell-Through Sell-Through Percentage 1 1.2 1 0.27 6.75 0.018 2 2.4 1 0.50 11.88 0.011 3 1.2 10 0.28 0.47 0.037 4 2.4 10 0.52 0.88 0.034 5 1.2 40 0.28 0.28 0.097 6 2.4 40 0.52 0.60 0.097 Table 3 Test Cases for Tight Supply Constraint Tests


27

As can be seen by Figure 5(a), our algorithm outperforms the legacy algorithm in all of the six high-load test cases, confirming that our approach is robust. Relative outperformance, however, begins to decrease when the impression goal multiplier is high (i.e. mean sell-through is high) and the number of matching DMA’s is low (i.e. targeting percentage is low). There is a nearly 60% make-good cost reduction for test case 5 (impression goal multiplier=1.2, number of matching DMA’s=40), yet for test case 2 (impression goal multiplier=2.4, number of matching DMA’s=1) the make-good cost reduction is only 5%. The relative performance of our algorithm diminishes because there are regions of the network in which the sell-through is above 100% – that is, demand exceeds supply, and therefore make-good costs are inevitable. We therefore compute an approximate lower bound on the make-good costs by solving the Weekly Plan with perfect forecasts (α = 0) and only the make-good cost term in the objective. This is not a true lower bound because, as described in §3, we compute stik and sbtik using assumed independent adtime breakouts btid , btie , and btiw .

Using this bound (LB), we can measure the percentage of the gap that we close relative to the legacy algorithm, defined as 1 −

Our Algorithm Cost - LB . Legacy Algorithm Cost - LB

As seen in Figure 5(b), the percentage

gap closed is 30% or better for most cases, dropping only when targeting gets very tight. This can be interpreted as a 30%+ cost savings on the part of the network that can be affected by good scheduling under reasonably high sell-through and tight targeting. In particular, if we order these test cases by the standard deviation of the sell-through as in Figure 5(c), we can see a clear relationship between relative outperformance and increasing variability of sell-through. Even though the relative improvement is smaller, because the absolute magnitudes of the makegood costs in the highly constrained test cases are higher, a 5% cost savings for test case 2 translates into a greater absolute dollar amount saved when using our algorithm than the 60% cost savings for test case 5 (see Figure 5(d)). Thus, our algorithm significantly reduces make-good costs in all of the scenarios with high sell-through and tight targeting that we consider.


28

Percentage of Gap Closed Relative to Legacy Model

Percentage Cost Savings Relative to Legacy Model

100%

100%

Status Quo

80% 60%

5 = {1.2, 40}

40%

3 = {1.2, 10}

0.5

40%

6 = {2.4, 40} 4 = {2.4, 10} 2 = {2.4, 1}

1 = {1.2, 1} 0

5 = {1.2, 40} 3 = {1.2, 10} 1 = {1.2, 1}

60%

20% 0%

Status Quo

80%

1

1.5

2

20%

6 = {2.4, 40} 4 = {2.4, 10} 2 = {2.4, 1}

0% 2.5

Imp Goal Multiplier for Added Campaigns

0

0.5

1

1.5

2

2.5


Absolute Cost Savings Relative to Legacy Model

Percentage of Gap Closed Relative to Legacy Model 100%

Status Quo

80%

4 = {2.4, 10}

5 = {1.2, 40} 5 {1 2 40} 3 = {1.2, 10} 6 = {2.4, 40} 4 = {2.4, 10}

60% 40% 20%

1 = {1.2, 1}

3

6

9

12

Standard Deviation of Sell-Through

Figure 5

2 = {2.4, 1}

1 = {1.2, 1}

2 = {2.4, 1}

5 = {1.2, 40}

Status Quo

0% 0

6 = {2.4, 40}

3 = {1.2, 10}

0

0.5

1

1.5

2

2.5


Performance of all 6 high-load test cases + status quo. Subfigures are (a) top left, (b) top right, (c) bottom left, and (d) bottom right (vertical scale omitted for confidentiality).

7. Implementation and Learnings 7.1. Implementation Our collaboration has led to a paradigm shift at Massive: Previously, their ad-serving architecture was entirely execution-based. After our project, Massive realized the benefits of hierarchical planning where you first produce a plan, and then you execute that plan. As of December 2009, a staged implementation of our model is underway. When complete, an LP-based planning module will be run nightly to produce daily targets for each campaign in each game, and an improved legacy algorithm that exploits structural ideas from our Real-time Algorithm will track these daily targets. From a technology standpoint, Massive chose to use C# and the Microsoft Foundation Solver API. Massive decided to switch from weekly to daily granularity because they discovered advertisers


29

place a higher value on smooth delivery than previously thought, and Massive expects the benefits of explicit daily goals to outweigh the increase in forecast error caused by planning at a finer granularity. Moreover, due to our work, Massive knows that a granularity finer than {Campaign, Game, Day} is unnecessary. Recall from §6.3 that we tested how our model performs when many campaigns request tightly targeted inventory (i.e. Pittsburgh on Tuesday afternoon between 47PM), and found that even though our Weekly Plan LP used the coarse granularity of {Campaign, Game, Week}, and did not explicitly resolve inventory conflicts for overcapacitated geographic regions and times-of-day, performance was quite good. We ascribe this to the fact that the schedule was being updated periodically, so shortfalls in a specific geographic region could be compensated for by increasing the overall service rate in that game. In terms of the implementation’s rollout, Massive decided the legacy algorithm should be adapted first, and the LP implemented afterward. Specifically, as a crucial intermediate step toward LPbased planning, Massive created a database to store impression goals for each {Campaign, Game, Day}, and modified the legacy ad-serving logic to track these daily goals. Presently, a greedy heuristic computes the daily goals and populates this database; when the LP is complete, this database will instead be populated by the LP, and the real-time ad serving logic will benefit from improved guidance without additional modification. This change to the legacy algorithm has been substantial: the explicit goals for each game have increased the reach of Massive’s network; that is, by more effectively spreading impressions across many games, Massive has increased the number of unique individuals that see each campaign, on average by 26% per week. Another substantial improvement made to Massive’s legacy algorithm involves the use of adtime estimates for each inventory element. Originally, Massive’s legacy algorithm did not use forecasts of any kind. As a result, the system tended to serve impressions erratically: the system would over-serve, learn that over-serving had occurred (since there is a significant delay in counting impressions), compensate by stopping the service of the campaign for awhile, and then repeat the process later as the campaign began to starve. Our Real-time Algorithm does not suffer from this effect, since it uses adtime estimates to track service rates which stay fixed between re-solves of the


30

LP. To achieve smoother service rates in their existing framework, Massive adapted their legacy algorithm to use adtime estimates for each inventory element. In particular, Massive now assumes that when a campaign is placed into an inventory element, a number of impressions will be served in expectation. These “expected impressions” get accumulated, and are part of the service rate calculation that Massive uses to throttle the service rate of a campaign up or down. As a result, service rates are now much more stable: the standard deviation of hourly impressions served is 33% lower than before the use of adtime estimates. In addition to the measurable benefits of increased reach and smoother campaign delivery, Massive has benefited from our collaboration in several intangible ways. First and foremost, the discovery process that we undertook with Massive to formulate the constraints of the linear program has been very valuable; in some cases, unwritten business rules were uncovered and communicated to business analysts and developers. The language of objectives, variables, and constraints has facilitated many engineering discussions, and has been instrumental in building confidence in the new, more transparent, system. Furthermore, the visibility of sold inventory has been improved: sales associates can view the daily plan, allowing them to price games based on capacity and to encourage advertisers to purchase impressions from undersold games.

7.2. Learnings Our learnings throughout the project led to several refinements of our model. For example, our Weekly Plan LP did not initially include constraints to guarantee a minimum amount of revenue per game. This posed a problem when a very popular game was added to Massive’s network: the new game was allocated a sizable number of impressions that previously would have gone to incumbent games, thereby capturing their revenue. To be fair to all game publishers, we introduced minimum revenue targets for each game. We also learned that Massive sells some campaigns called “share of voice” that contractually require a percentage of the total impressions generated by a game over a time period without having


31

an absolute impression goal, e.g. an advertiser may request 25% of all impressions from Game A. This type of campaign is easy to serve in our framework; we just set the service rate to 25%. Finally, we learned that the yield factor $ (see §3) used to convert between adtime and impressions varies from one game to the next, and also changes when the scheduling algorithm changes (i.e. from legacy to ours). We have recommended that yield factors be empirically measured over time in order to have accurate values of $ for each game. With regard to hierarchical planning systems in general, we learned that it is usually not possible to have a complete separation of tasks from one stage to the next. For example, both the Weekly Plan LP and the Real-time Algorithm deal with saturation: the Weekly Plan LP bounds service rates (constraint 8 in §4) so that with high likelihood the rates remain feasible when the Real-time Algorithm enforces the saturation cap. This type of dependency between stages of a hierarchical planning system is common in other areas of practice as well; for example, in broadcast TV, commercials are spread over time (thus dealing with saturation) in both the high-level sales plan and by the lower-level ISCI Rotator algorithm.

8. Conclusions We studied the scheduling problem of Massive Inc., a network provider of dynamic in-game advertising, and developed 1) an LP to compute service rates for each {Campaign, Game} pair, as well as 2) a Real-time Algorithm which uses these service rates to assign campaigns to inventory elements whenever a gamer begins playing a new level. We benchmarked our algorithm against Massive’s legacy algorithm, and found that when tested on historical data 1) our algorithm reduces make-good costs by 80-87%; 2) our algorithm reserves more impressions from premium games for future sales; 3) performance is good for various levels of forecast accuracy; and 4) results hold as sell-through increases and targeting becomes tighter, as is anticipated in the future. Future work may include: 1) explicitly modeling supply with random variables; 2) extending the model to plan at a finer granularity than {Campaign, Game, Week} using Bender’s decomposition and/or column generation; 3) extending the model to incorporate reach goals by requiring a minimum number of unique game players to see a campaign; and 4) generalizing the model for other


32

ad networks with dynamic display content, such as digital TV and the web. As a result of our work, Massive has increased the number of unique individuals that see each campaign by on average 26% per week, and achieved 33% smoother campaign delivery, as measured by standard deviation of hourly impressions served. Massive continues to improve their ad serving technology based on the models and algorithms we developed, and the insights we generated.

Acknowledgments The authors are grateful to Frank O’Donnell, David Sturman, Joe Ciaramitaro, and the rest of the team at Massive for numerous comments and support. This research was supported by Massive Incorporated and a William Larimer Mellon Fellowship.

References Abrams, Z., O. Mendelevitch, J. Tomlin. 2007. Optimal delivery of sponsored search advertisements subject to budget constraints. Proceedings of the 8th ACM conference on Electronic commerce 272–278. Araman, V. F., I. Popescu. 2009. Media Revenue Management with Audience Uncertainty: Balancing Upfront and Spot Market Sales. Manufacturing & Service Operations Management EPub ahead of print June 12, http://msom.journal.informs.org/cgi/content/abstract/msom.1090.0262v1. Bollapragada, S., M. R. Bussieck, S. Mallik. 2004. Scheduling Commercial Videotapes in Broadcast Television. Operations Research 52(5) 679–689. Bollapragada, S., H. Cheng, M. Phillips, M. Garbiras, M. Scholes, T. Gibbs, M. Humphreville. 2002. NBCs optimization systems increase its revenues and productivity. Interfaces 32(1) 47–60. Bollapragada, S., M. Garbiras. 2004. Scheduling Commercials on Broadcast Television. Operations Research 52(3) 337–345. Cai, Y. 2007. Electronic gaming in the digital home: Game advertising. Parks Associates Industry Report Q2. Chambers, J. 2005. The sponsored avatar: Examining the present reality and future possibilites of advertising in digital games. Available at http://www.gamesconference.org/digra2005/viewabstract.php?id=270. Chickering, D. M., D. Heckerman. 2003. Targeted Advertising on the Web with Inventory Management. Interfaces 33(5) 71–77. Dixit, A. K., R. S. Pindyck. 1994. Investment under uncertainty. Princeton University Press, Princeton, NJ. Gensch, D. H. 1973. Advertising Planning: Mathematical Models in Advertising Media Planning. Elsevier Scientific. L¨ utkepohl, H. 1993. Introduction to Multiple Time Series Analysis. Springer. Mehta, A., A. Saberi, U. Vazirani, V. Vazirani. 2007. Adwords and generalized online matching. Journal of the ACM 54(5). Nakamura, A., N. Abe. 2005. Improvements to the Linear Programming Based Scheduling of Web Advertisements. Electronic Commerce Research 5(1) 75–98. Raghavan, P., C. D. Tompson. 1987. Randomized rounding: A technique for provably good algorithms and algorithmic proofs. Combinatorica 7(4) 365–374. Rossiter, J. R., P. J. Danaher. 1998. Advanced Media Planning. Kluwer Academic Publishers. Simon, H. 1982. ADPULS: An Advertising Model with Wearout and Pulsation. Journal of Marketing Research 19(3) 352–363.


33

Svahn, M. 2005. Future-proofing advergaming: a systematisation for the media buyer. Proceedings of the second Australasian conference on Interactive entertainment 187–191. Talluri, K. T., G. Van Ryzin. 2005. The theory and practice of revenue management. Springer Verlag. Thompson, G. 1981. An optimal control model of advertising pulsations and wearout. J. W. Keon, ed., Marketing Measurement and Analysis. The Institute of Management Sciences. Zhang, X. 2006. Mathematical models for the television advertising allocation problem. International Journal of Operational Research 1(3) 302–322.

e-companion to Turner, Scheller-Wolf, and Tayur: Scheduling of Dynamic In-Game Advertising

This page is intentionally blank. Proper e-companion title page, with INFORMS branding and exact metadata of the main paper, will be produced by the INFORMS office when the issue is being assembled.

ec1

ec2


Supplementary Material This e-companion contains an example of the value function for unscheduled inventory for a discrete demand distribution (§EC.1), an example that illustrates the Real-time Algorithm’s logic (§EC.2), and details on how the forecast errors were generated (§EC.3).

EC.1. Example: Valuing Unplanned Inventory Over a Discrete Demand Distribution As described in §4.1, there is a term in the objective that values unplanned inventory. For each {Tier, Week}, the value function for unplanned inventory f (w) := pE[min(X, w)] was shown to be

concave increasing. In this example we illustrate the shape of f (w) when X is a discrete distribution with limited support. Although the price of a campaign pk is determined through negotiation, we assume that Massive has a target price for each tier of games that serves as a baseline for the negotiation process. This price, which we denote p∗ , is considered fixed since it is updated less frequently than the horizon of the optimization model. Given the fixed price p∗ , Massive expects to be able to sell w∗ impressions and earn expected revenues of R∗ = p∗ w∗ . However, the optimization model considers many tradeoffs, and may restrict the available inventory to an amount w < w∗ . In this model, the inventory set aside to satisfy future demand is the decision variable w, and the expected revenue that comes from this inventory is described by the piecewise-linear function f (w) seen in Figure EC.1.

$ f(w)

R* = p*w* p*

w*

Figure EC.1

Revenue Function

w


ec3

Suppose that at the fixed price p∗ , the demand distribution X is described by these three scenarios: 1. with probability 0.2 we will sell 10K impressions 2. with probability 0.6 we will sell 20K impressions 3. with probability 0.2 we will sell 30K impressions Graphically, these three scenarios produce three different expected revenue functions, since the point at which we expect to stop earning revenue depends on how much we think we can sell (see Figure EC.2). $

Prob = 0.2

f1(w)

R* = 10p* p* 10K

$

w

Prob = 0.6 f2(w)

R* = 20p* p*

20K

w

$ R* = 30p*

f3(w)

Prob = 0.2 p*

30K

Figure EC.2

w

Three Revenue Functions

Thus our expected revenue function is f (w) = 0.2f1 (w) + 0.6f2 (w) + 0.2f3 (w), shown in Figure EC.3.

ec4


In this example, θ1 = p∗ , θ2 = 0.8p∗ , θ3 = 0.2p∗ , θ4 = 0, ϑ1 = ϑ2 = ϑ3 = 10K, ϑ4 = +∞.

$

f(w)

20p* 18p* 10p*

p* 10K

Figure EC.3

0.2p*

0.8p*

20K

30K

w

Expected Revenue Function

EC.2. Real-time Algorithm Example EC.2.1. Computing the Service Rates We provide an example illustrating the algorithm for adjusting the service rates, as described in §5.1.

Suppose a player enters a new level of a game, invoking the Real-time Algorithm with the arrival type a = {Game A Level 3, Pittsburgh, Tuesday, Evening}. Four campaigns in the system have targeting that allows them to be shown in Pittsburgh on Tuesday evening; we will index them as k = 1..4. Campaigns k = 1..4 have end dates of Jan 5, Jan 12, Mar 12, and Feb 28 respectively. Campaign 1 is for Coke and campaign 2 is for Pepsi; these campaigns are competing brands and so the competition constraint will be imposed on them. Game A has two IEC’s: the first matches all four campaigns, while the second matches only campaigns 1, 2, and 3. IEC 1 is expected to supply 30% of the adtime in Level 3, while IEC 2 is expected to yield the other 70%. Assume the service rate matrix Λa is: k 1 2 3 4 1 0.11 0.45 0.60 0.80 c 2 0.11 0.10 0.60 0


ec5

Note that in this example, campaign 2 is served at rate 0.45 in IEC 1 and at rate 0.10 in IEC 2. In §5.1, we claimed that each campaign k is served at the same rate λ1i(a),k into all the IEC’s c that it matches (and at rate 0 if it does not match). However, in general the service rates λck can take any values without hindering the Real-time Algorithm; thus we present a more general example. Using the given expected adtime breakouts for IEC’s 1 and 2 of 0.3 and 0.7 respectively, we compute: EIGHT ED λW := 0.3λ(1,k) + 0.7λ(2,k) = [0.11, 0.205, 0.60, 0.24] k

We now modify the rates λck so that only one campaign in each set of competing brands has a nonzero service rate. In this example, the only set of campaigns that requires a competition constraint is {1,2}. With probability

0.11 0.11+0.205

we serve campaign 1 at rates 0.11 + 0.45 = 0.56 and

0.11 + 0.10 = 0.21 in IEC’s 1 and 2 respectively, and campaign 2 at rate 0 in both IEC’s 1 and 2. And with probability

0.205 0.11+0.205

we serve campaign 2 at rates 0.56 and 0.21 in IEC’s 1 and 2

respectively, and campaign 1 at rate 0 in both IEC’s 1 and 2. Let’s say that in this instance of the Real-time Algorithm, campaign 1 (Coke) is selected by this random scheme and so the service rates λck are now: k 1 2 3 4 1 0.56 0 0.6 0.8 c 2 0.21 0 0.6 0 We observe that

P

λ(1,k) > 1, and so we need to lower some of the service rates for IEC 1. Since

k

campaign 3’s end date is the latest (Mar 12), we first decrease λ(1,3) . After decreasing λ(1,3) all the P way to 0, we still have λ(1,k) > 1, and so we find the campaign with the next latest end date k P (campaign 4, which ends Feb 28). Decreasing λ(1,4) to 0.44 suffices to satisfy λ(1,k) ≤ 1, and so k P we stop. The constraint λ(2,k) ≤ 1 already holds, and so no rate adjustments are required for k

IEC 2. The final service rates λck are: k 1 2 3 4 1 0.56 0 0 0.44 c 2 0.21 0 0.6 0

ec6


Note that since

P

λ(2,k) = 0.81, 19% of IEC 2 is planned to be filled with nonpaying campaigns.

k

EC.2.2. Assigning Campaigns to Inventory Elements We now illustrate how campaigns are assigned to inventory elements (IE’s), as described in §5.2. For this, we need some additional input data. Recall that IEC 1 is expected to supply 30% of the adtime, while IEC 2 is expected to yield the other 70%. The detailed breakdown is given as follows: IEC IEC Size IE IE Size Spot Spot Size Region 1 0.07 1 0.07 1 2 0.06 2 0.06 1 1 0.3 3 0.08 3 0.08 2 4 0.04 4 0.04 2 5 0.05 5 0.05 1 6 0.17 3 6 0.51 7 0.17 3 8 0.17 3 2 0.7 9 0.03 4 7 0.09 10 0.03 4 11 0.03 4 8 0.1 12 0.1 1 In this table are the list of spots and their respective sizes. As well, we include the region that each IE is in (regions are defined according to spatial proximity within a level). Let the saturation cap be 3 (so at most 3 spots can be assigned to the same campaign).

The buckets for this problem instance are: Bucket (IEC,Campaign) = (c, k) Size 1 (1, 1) 0.3 × 0.56 = 0.168 2 (1, 4) 0.3 × 0.44 = 0.132 3 (2, 1) 0.7 × 0.21 = 0.147 4 (2, 3) 0.7 × 0.6 = 0.42 UNPAID ∞ Note that we add an ‘unpaid’ bucket of infinite size. We now run the algorithm to fractionally assign the spots to the buckets. The first few iterations could be as follows:


ec7

Iteration 1: A bucket is chosen at random from the remaining size distribution (0.168, 0.132, 0.147, 0.42). Let’s say bucket 4 is chosen (this occurs with probability

0.42 0.867

= 0.484). Bucket 4 represents {IEC

2, Campaign 3}. Since no spots have yet been assigned to campaign 3, we can pick a spot from any region, and so we pick the largest overall spot. The remaining sizes of the spots in IEC 2 are: Spot 6 7 8 9 10 11 12 Remaining Size 0.17 0.17 0.17 0.03 0.03 0.03 0.1 Breaking ties arbitrarily, we select spot 6 since it has the largest remaining size. We place spot 6 into bucket 4. Since it fits completely, we set the remaining size of spot 6 to zero and the remaining size of bucket 4 to 0.42 − 0.17 = 0.25.

Iteration 2: A bucket is chosen at random from the remaining size distribution (0.168, 0.132, 0.147, 0.25). Let’s say bucket 3 is chosen (this occurs with probability

0.147 0.697

= 0.211). Bucket 3 represents {IEC

2, Campaign 1}. Since no spots have yet been assigned to campaign 1, we can pick a spot from any region, and so we pick the largest overall spot. The remaining sizes of the spots in IEC 2 are: Spot 6 7 8 9 10 11 12 Remaining Size 0 0.17 0.17 0.03 0.03 0.03 0.1 Breaking ties arbitrarily, we select spot 7 since it has the largest remaining size. Since spot 7 cannot fit completely into bucket 3, we fill bucket 3, setting the remaining size of bucket 3 to 0 and the remaining size of spot 7 to 0.17 − 0.147 = 0.023.

Iteration 3: A bucket is chosen at random from the remaining size distribution (0.168, 0.132, 0, 0.25). Let’s say bucket 4 is chosen (this occurs with probability 0.4545). Bucket 4 represents {IEC 2, Campaign 3}. At this point in the algorithm, campaign 3 has only been served once, and that was in region 3. We consider spots from regions that have never been assigned campaign 3. There are several

ec8


such spots, which are compared on the basis of their remaining size. The remaining sizes of these spots in IEC 2 are: Spot 9 10 11 12 Remaining Size 0.03 0.03 0.03 0.1 We select spot 12, since it has the largest remaining size. Since spot 12 fits completely into bucket 4, we set the remaining size of spot 12 to 0 and the remaining size of bucket 4 to 0.25 − 0.1 = 0.15.

Iteration 4: A bucket is chosen at random from the remaining size distribution (0.168, 0.132, 0, 0.15). Let’s say bucket 4 is chosen (this occurs with probability 0.33). Bucket 4 represents {IEC 2, Campaign 3}. At this point in the algorithm, the count of how many times campaign 3 has been served in regions 1-4 is (1, 0, 1, 0). We give priority to all spots that are in regions that have never been assigned campaign 3. There are several such spots, which are compared on the basis of their remaining size. The remaining sizes of these spots in IEC 2 are: Spot 9 10 11 Remaining Size 0.03 0.03 0.03 Breaking ties arbitrarily, we select spot 9. Since it fits completely into bucket 4, we set the remaining size of spot 9 to 0 and the remaining size of bucket 4 to 0.15 − 0.03 = 0.12.

Iteration 5: A bucket is chosen at random from the remaining size distribution (0.168, 0.132, 0, 0.12). Let’s say bucket 4 is chosen (this occurs with probability 0.286). Bucket 4 represents {IEC 2, Campaign 3}. At this point in the algorithm, the count of how many times campaign 3 has been served in regions 1-4 is (1, 0, 1, 1). Since all spots in IEC 2 are in regions that campaign 3 has been served exactly once, we can pick a spot from any region, and so we pick the largest overall spot. The remaining sizes of spots in IEC 2 are: Spot 6 7 8 9 10 11 12 Remaining Size 0 0.023 0.17 0 0.03 0.03 0


ec9

We select spot 8, since it has the largest remaining size. Since it cannot fit completely into bucket 4, we fill bucket 4, setting the remaining size of bucket 4 to 0 and the remaining size of spot 8 to 0.17 − 0.12 = 0.05.

Iterations

6-11: At this point, the remaining size distribution of the buckets is

(0.168, 0.132, 0, 0). Buckets corresponding to IEC 2 are completely filled, while IEC 1’s buckets still need to be filled. Since buckets are chosen at random, it is not usually the case that IEC’s are assigned in sequence; however, to keep this example brief we have assigned IEC 2 completely before IEC 1 so that we can skip the details of assigning IEC 1. At the end of Iteration 5, the count of the number of times each region has been allocated is (0, 0, 1, 0), (1, 0, 2, 1), and (0, 0, 0, 0) for campaigns 1, 3, and 4 respectively. And the remaining sizes of all spots is: IEC 1 IEC 2 Spot 1 2 3 4 5 6 7 8 9 10 11 12 Remaining Size 0.07 0.06 0.08 0.04 0.05 0 0.023 0.05 0 0.03 0.03 0 Without getting into the details, assume iterations 6-11 pick (bucket, spot) pairs {(1,3), (2,1), (1,2), (2,4), (1,5), (2,5)} in sequence. At the end of iteration 11, the region usage count is (2, 1, 1, 0), (1, 0, 2, 1), and (2, 1, 0, 0) for campaigns 1, 3, and 4 respectively. And the remaining size of all spots in IEC 1 is exactly 0 (this happens since

P

λ(1,k) = 1).

k

Iteration 12: At this point, the remaining size distribution of the buckets is (0, 0, 0, 0). Yet spots 7, 8, 10, and 11 (all in IEC 2) still have remaining portions to be allocated (this is due to the fact that P

λ(2,k) < 1). We assign the remaining portions of these spots to the unpaid bucket.

k

Randomized Rounding: At this point, the buckets appear as in Figure EC.4. Spot 5 gets assigned to bucket 1 with probability

0.028 0.028+0.022

= 0.56 and to bucket 2 with probability 0.44. Let’s assign spot 5 to bucket

2. And let’s assume that spots 7 and 8 get assigned to buckets 3 and 4 respectively (these are

ec10


the most likely positions for spots 7 and 8 to end up; the relevant probabilities are 0.865 and 0.706 respectively). After randomized rounding, the assignment of spots to buckets is as depicted in Figure EC.5.

Bucket 1 Bucket 2

3 (0.08) 1 (0.07)

5

5

(0.028) 0.168 (0.022)

0.132 0.147

6 (0.17)

Bucket 4

Figure EC.4

4 (0.04) 7 (0.147)

Bucket 3

Unpaid Bucket

2 (0.06)

12 (0.1)

8 10 11 7 (0.05) (0.03) (0.03)

9 (0.03)

8 (0.12)

0.42

(0.023)

Before Randomized Rounding: The gray spots have been assigned fractionally and need to be placed in only one of the buckets to which they are currently assigned.

Bucket 1 Bucket 2

3 (0.08) 1 (0.07)

2 (0.06) 4 (0.04)

Bucket 3

7 (0.17)

Bucket 4

6 (0.17)

0.168 5

(0.05) 0.132 0.147 12 (0.1)

9 (0.03)

8 (0.17)

0.42

Unpaid 10 11 Bucket (0.03) (0.03)

Figure EC.5

After Randomized Rounding: Buckets 2, 3, and 4 are overfilled (the dotted lines are the bucket fill lines) while bucket 1 is underfilled (the dotted area is the remaining space in bucket 1).

Enforce Saturation Cap: We now create one large bucket per campaign. Since buckets 1 and 3 were both associated with campaign 1, these get aggregated together. Figure EC.6 shows the aggregated buckets. Since the saturation cap for this level is 3, no single campaign can be served in more than 3 spots. Yet campaign 3 is being served in 4 spots: 6, 8, 9, 12. Since spot 9 is the smallest, it is reassigned. And since no paying campaigns are under the saturation cap, spot 9 is reassigned to the unpaid campaign bucket. If campaigns 1 and 4 had only 2 spots assigned each,


ec11

they both would have been contenders to receive spot 9. In that case, since spot 9 is in IEC 2, we would draw from the distribution [0.21, 0, 0, 0] and note that campaign 1 would get spot 9. Note that this distribution comes from taking λck and setting all entries that correspond to campaigns that are at or over their saturation cap to zero.

Campaign 1 Campaign 4

3 (0.08) 1 (0.07)

Campaign 3

2 (0.06) 4 (0.04)

7 (0.17)

5 (0.05)

6 (0.17)

12 (0.1)

9 (0.03)

8 (0.17)

Unpaid 10 11 Campaigns (0.03) (0.03)

Figure EC.6

Campaign Buckets: One bucket is created for each campaign.

At the conclusion of this example, the following campaigns have been assigned to the spots in this zone (U denotes an unpaid campaign): Spot 1 2 3 4 5 6 7 8 9 10 11 12 Campaign 4 1 1 4 4 3 1 3 U U U 3

EC.3. Generation of Forecasts The Weekly Plan LP uses point estimates of supply (sti , stik , and sbtik ), which are based on adtime estimates (sti , stid , stie , and stiw ), as defined in §3. In this section, we describe how we generate noisy adtime estimates for sti by combining the actual value of adtime (known, since we are backtesting) with a random error term. The other adtime estimates (stid , stie , and stiw ) are generated in exactly the same way. We generate adtime estimates that have several properties (e.g. correlations) that we believe actual forecasts are likely to exhibit. We assume that forecasts are revised on a weekly basis; at any time the most current forecast that our model can use is one that was generated at the beginning of the week. At the beginning of week t (the week that starts at time t), we generate supply forecasts for game i for each of the upcoming 13 weeks.

ec12


Notice that the actual adtime in week t is not known until time t + 1 (i.e. the beginning of week t + 1). Accordingly, let Ai,t+1 be the actual adtime for game i in week t. Following the notation for forecasting using h-step predictors given in L¨ utkepohl (1993), we denote the h-step predictor for the adtime of game i as of time t as Ait (h). This means that at the beginning of week t, our forecast for the quantity Ai,t+h is Ait (h). At the beginning of each week t, we generate forecasts Ait (h) for h = 1..13. Fixing j to be some time in the future (i.e. strictly greater than t), we can observe how the forecasts for the quantity Aij evolve as t increases and h = j − t decreases. At time t, the forecast for Aij is Ait (j − t). The ratio of the forecast to the actual value gives us a multiplicative forecast error for the quantity Aij computed at time t: t Eij :=

Ait (j − t) . Aij

Thus, the adtime estimate for game i, week j, computed at the beginning of week t, is the product of the actual adtime and a noisy error term: t sji = Ait (j − t) = Aij Eij .

Defining the forecast adjustment for quantity Aij made at time t as Pijt :=

Ait (j − t) , Ai,t−1 (j − t + 1)

we have that: j−1 t Eij := Eij · Pijt+1 · Pijt+2 · · · Pijj−2 · Pijj−1

Rearranging, we find that: t := Eij

We make several assumptions: 1. Pijt ∼ Log-N(−µij , σ 2 ) j−1 2. Eij ∼ Log-N(µij , σ 2 )

1 Pijt+1

·

1 Pijt+2

···

1 Pijj−2

·

1 Pijj−1

j−1 · Eij

(†)


ec13

j−1 3. Eij , Pijt are all independent t These assumptions allow us to consider the h-step error Eij to be the product of h = j − t

independent Log-N(µij , σ 2 ) random variables. The intuition behind this assumption is the following: j−1 • If the 1-step error Eij = 1, then the forecast made 1 week in advance of time j would be

perfect. j−1 • If instead, we take Eij to be a random variable with median 1, then the 1-step forecast will

be too high half of the time, and too low the other half of the time. • Given that we would like a forecast to be double the actual amount as often as it is half the j−1 actual amount, the lognormal distribution works nicely: taking Eij ∼ Log-N(0, σ 2 ) yields 1-step j−1 errors Eij with median 1, variance parameterized by σ 2 , and a 95% confidence interval of [1/ς, ς]

for ς = e1.96σ . j−1 • Finally, we generalize this model by taking Eij ∼ Log-N(µij , σ 2 ); in this case, the median j−1 of Eij is eµij , and so errors are median-biased whenever µij 6= 0; the amount of bias depends on j−1 game i and week j. The 95% confidence interval for Eij is eµij [1/ς, ς], where again ς = e1.96σ . We

generate µij so that E[µij ] = 0. Now that we have described how the 1-step errors are computed, we proceed to describe how the h-step errors are generated; in general, h-step predictors should be stochastically less accurate t than (h − 1)-step predictors. Since the h-step error Eij can be defined as the product of h = j − t t terms (see Equation †), we generate Eij by assuming

1 t Pij

∼ Log-N(µij , σ 2 ) as well, generate h

independent such lognormals, and multiply them together. Note that

1 t Pij

∼ Log-N(µij , σ 2 ) ⇔ Pijt ∼

Log-N(−µij , σ 2 ). When µij is nonzero, the median-bias causes the h-step error to drift away from 1 as h increases. We now elaborate on how the µij values are generated. In addition to requiring E[µij ] = 0, we wish to generate the µij values so that they are positively correlated in the j-dimension. This allows us to model the practical assumption that when forecasting h weeks into the future, our estimates for week j and j + 1 should look similar (assuming we are at week t and j = t + h), and so will likely both be above or below the actuals for those weeks due to the same underlying cause. We impose

ec14


the appropriate correlations by defining µij as the following AR(1) process in the j-dimension (the reader may wish to consult Dixit and Pindyck 1994 for a primer on autoregressive processes): µij := ϕµ(i,j−1) + ij where ij ∼ N (0, ∆2 ) and 0 < ϕ < 1. The process µij thus defined is mean-reverting (with mean 0), satisfying E[µij ] = 0. We generate ∆2 µ(i,0) ∼ N 0, 1−ϕ from the stationary distribution of the process µij . 2 Figure EC.7 illustrates the components of our error-generating scheme. 0.06 0.02

Eijt

μij

0.04 0 ‐0.02 0 2 4 6 8 10 12 14 ‐0.04

1

‐0.06

Figure EC.7

1.2 1 0.8 0.6 04 0.4 0.2 0 3

j

5

7

9

11 13

h

Generating Random Errors: In (a) we plot the mean-reverting process µij for α = 0.04 (∆ = t 0.0133, ϕ = 0.9428). In (b) we plot the process Eij for j = 14, µij = −0.05, α = 0.04 (σ = 0.08). t Notice Eij stochastically approaches 1 as time passes and h = j − t decreases.

Notice that the parameters (σ, ∆, ϕ) fully characterize how we generate the random errors. There are some insightful relationships between these parameters that will help us choose good parameter values for testing our model. Denoting the variance of the stationary distribution of µij as α2 :=

∆2 , 1−ϕ2

the ratio

∆ α

is a measure of how large the per-period movements of µij are relative

to the long-term magnitude of the process. Given that we are working with a 13-week horizon in our testing, we hold this ratio fixed at 13 . We can now solve for ϕ: ∆2 1 = 2 = 1 − ϕ2 . ∆ 9 1−ϕ2

ec15


After simplifying, this fixes ϕ at 0.9428. And ∆ = α3 . Consider also the ratio

σ , α

which describes the relative magnitude of the random per-period

t forecast adjustments of Eij versus a measure of the drift of this process. Fixing

σ α

= 2 gives us

σ = 2α. We vary only α and compute the dependent σ, ∆, and ϕ values. Our test cases for varying degrees of forecasting error are: Instance α σ ∆ ϕ 1 0 0 0 0.9428 2 0.0016 0.0032 0.000533 0.9428 3 0.008 0.016 0.002666 0.9428 4 0.04 0.08 0.013333 0.9428 5 0.2 0.4 0.066666 0.9428 Table EC.1

Test Cases for Forecast Accuracy Tests

This is a working paper. Please do not distribute without permission from the author(s).

The Planning of Guaranteed Targeted Display Advertising John Turner Tepper School of Business, Carnegie Mellon University, Pittsburgh, PA 15213 [email protected]

As targeted advertising becomes predominant across a wide variety of media vehicles, planning models become increasingly important to ad networks that need to match ads to appropriate audience segments, provide a high quality of service (meet advertisers’ goals), and ensure opportunities to serve advertising are not wasted. We define Guaranteed Targeted Display Advertising (GTDA) as a class of media vehicles that include webpage banner ads, video games, electronic outdoor billboards, and the next generation of digital television, and formulate the GTDA planning problem as a transportation problem with a quadratic objective. By modeling audience uncertainty, forecast errors, and the ad server’s execution of the plan, we derive sufficient conditions for optimality of the quadratic objective with respect to common advertising metrics. In addition to providing insights that help managers of ad networks understand if and why their ad server is operating optimally, we study the aggregation of viewer types as a means to solve the large planning problems that result from richly-defined targeting. We find that in many cases, a little bit of disaggregation goes a long way: near-optimal schedules can often be produced despite significant aggregation of audience segments. We use duality to bound the optimality gap of feasible solutions constructed via our aggregation/disaggregation algorithm, allowing any given aggregation of audience segments to be tested for quality. In addition, we give an algorithm that starts with an aggregate problem and successively disaggregates at each iteration, terminating with an optimal solution without ever solving the full disaggregate problem. Key words : guaranteed targeted display advertising, advertisement scheduling, aggregation, math programming

1. Introduction Modern targeted advertising is becoming increasingly important to advertisers, as new technologies provide higher degrees of ad campaign customization and more accurate performance tracking. 1

2

Turner: The Planning of Guaranteed Targeted Display Advertising

Increasingly, targeted ads – ads that are shown only to audience segments requested by advertisers – are being embedded in a wide spectrum of media: webpages display banner ads and video ads; video games that run on PC’s and on consoles like the XBox seamlessly integrate ads (Turner et al. 2008); electronic billboards in public places cycle through ads, some even tracking the number of people that look at the screen (Mandel 2007); and the six largest cable companies in the United States are cooperating on bringing targeted advertising to digital television (Arango 2008). According to the 2008 revenue survey compiled by the Interactive Advertising Bureau (2009), U.S. Internet ad sales totaled $23.4B, or 8.7% of the total $268B U.S. annual advertising market. Moreover, online forms of advertising are expected to nearly double their market share in five years, reaching a projected $37.2B – i.e. 15.2% market share – by 2013 (Hallerman 2009). Coupled with technology improvements that enable targeted ads in previously non-targetable media, there is tremendous growth potential for targeted advertising. In this environment, ad networks – firms which aggregate ad space across multiple websites, video games, or other media vehicles – increasingly make use of planning and scheduling algorithms to match ads to appropriate audience segments, provide a high quality of service (meet advertisers’ goals), ensure opportunities to serve advertising are not wasted, and lower the costs of serving advertising. Many forms of targeted advertising possess similar properties, and can therefore be planned and scheduled in a similar manner. In this paper, we focus on what we call Guaranteed Targeted Display Advertising, or GTDA: targeted advertising that has the following properties: • CPM Sales Model: Advertisers pay for a number of “eyeballs,” called impressions. Each impres-

sion corresponds to an individual that sees an ad at a particular point in time, for example, if they load a webpage and see a banner ad. Prices are quoted in cost-per-thousand (CPM); e.g., $30 CPM means $30 buys 1000 impressions. This is a widely used sales model, often used to price webpage banner ads and dynamic ads in video games (Surmanek 1995). • Measurable Progress: The exact number of impressions served to date is known. • Targeting Control: Ads shown to a specific individual can be chosen based on that individual’s

characteristics (demographic, geographic, and/or behavioral).


3

• Guaranteed Delivery: The network provider promises to serve each advertiser an agreed-upon

number of impressions over a fixed time period. In this sense, delivery is “guaranteed,” since ad networks do whatever they can to avoid under-delivery. Ad networks still have considerable degrees of freedom in deciding exactly which individuals get served which ads. Although we assume all of the preceding properties hold, extensions to our model are possible to handle other classes of targeted advertising, such as ads that are sold by the number of clicks instead of the number of impressions. In this paper, we solve the ad network’s single-period planning problem of allocating impressions from audience segments to ad campaigns, formulating the planning problem as a transportation problem. We introduce a quadratic objective which spreads impressions proportionally across audience segments; such spreading, although common in practice, has yet to be modeled in the ad planning literature. By modeling audience uncertainty, forecast errors, and the ad server’s execution of the plan, we derive sufficient conditions for when spreading impressions proportionally across audience segments minimizes the variance of the number of impressions served and maximizes expected reach – the number of unique individuals that see each ad. These conditions provide insights to ad managers who may already be proportionally spreading impressions in practice: the optimality of existing systems can be verified, or appropriate changes can be made to restore optimality if some conditions are not met. The first half of the paper focuses on defining the model and providing managerial insights: in particular, §2 provides a literature review, §3 defines the model, §4 defines variance and reach, and §5 illustrates several managerial insights, including the sensitivity to the number of ads seen per viewer and the sensitivity to the choice of algorithm the ad server uses to execute the plan. The second half of the paper studies the aggregation of the planning problem introduced in the first half. As technologies continue to develop that can execute ever-more precise targeting, the number of unique audience segments that need to be considered in the planning problem grows combinatorially. We suggest intelligent methods for aggregating audience segments, and find that in many cases a little bit of disaggregation goes a long way: near-optimal schedules can often be

4


produced despite significant aggregation of audience segments. In §6.2.1 we assume management specifies a fixed clustering of audience segments, and use the resulting aggregate problem to find a feasible solution to the disaggregate problem by expressing the solution of the aggregate problem in the disaggregate space. Moreover, we bound the distance from optimality of any such feasible solution. Finally, in §6.2.2, we allow the clusters of audience segments to be split during the solution process, and show that if the clusters are split in an intelligent way, we can always find an optimal solution to the disaggregate planning problem by solving an aggregate problem that has noticeably fewer audience segments.

2. Literature Review Our problem – the single-period planning problem of allocating impressions from audience segments to ad campaigns – has been studied by Langheinrich et al. (1999), Tomlin (2000), and Nakamura and Abe (2005). Just as we do, these authors formulate the ad planning problem as a transportation problem; however, they use other objectives rather than the quadratic objective which we adopt. Since it is the quadratic objective that leads to our managerial insights and aggregation results, our paper represents a significant contribution. We elaborate on these papers in detail in §3.1.1, after formally defining our model. The second half of this paper uses the theory of aggregation of large math programs, and in particular transportation problems. In the context of transportation planning, Zipkin (1980a) defines the basic framework, which is refined by Zipkin and Raimer (1983). In our case, a crucial assumption of Zipkin (1980a), namely that the adjacency structure of all aggregated nodes should be the same, does not hold. Without this assumption, solutions of the aggregate problem may be infeasible after disaggregation. Much of §6 focuses on restoring feasibility; thus, our work contributes to the aggregation literature for the special case of quadratic transportation problems that arise in GTDA planning. We should note that although the paper by Zipkin (1980a) seems to be the most relevant for GTDA, aggregation has been studied in the context of generalized transportation networks


5

(Litvinchev and Rangel 2006), linear programs (Zipkin 1980b, Zipkin 1980c, and Leisten 1997), stochastic programs (Birge 1985, Wright 1994), and convex network programs (Zipkin 1982). The surveys by Rogers et al. (1991) and Vakhutinsky et al. (1979) list the bounds known at that time, and the book by Litvinchev and Tsurkov (2003) is also a good reference. Finally, there is a significant body of online advertising literature which is tangential to this paper. Problems studied include the design of incentive-compatible auctions for allocating advertising (Edelman et al. 2007), allocating advertising to maximize revenue subject to budget constraints using online algorithms (Mehta et al. 2007), packing 2D ads of different shapes and sizes into 2D areas (Adler et al. 2002, Dawande et al. 2003, Menon and Amiri 2004, Kumar et al. 2006), revenue management (Roels and Fridgeirsdottir 2008), and pricing (Araman and Fridgeirsdottir 2008, Fridgeirsdottir and Najafi-Asadolahi 2008). Applications of targeted advertising include scheduling of ads in video games (Turner et al. 2008) and SMS phone messages (De Reyck and Degraeve 2003).

3. The Model 3.1. Definitions and Notation We study the planning problem of a firm responsible for serving ads to multiple advertisers; we call this firm the network provider. In some cases, the publisher or content provider is also the network provider; however, in many cases, advertisers purchase ad slots from ad networks that aggregate ad space from multiple publishers; in that case, the ad network is the network provider. For example, DoubleClick (www.doubleclick.com) is an ad network, and websites that lease ad space to DoubleClick are content providers. The network provider is responsible for inventory management, where in this setting inventory refers to the impressions, i.e. eyeballs or page views, generated by the content’s audience. Advertisers buy impressions from the network provider by purchasing a campaign: a contract that specifies the number of impressions to be served over a fixed time period. A campaign’s targeting constrains the inventory it can be allocated to specific audience segments, which we call viewer types. For example, a campaign may require only viewers from specific media assets (e.g., www.espn.com/golf/),

6


that are viewing in specific time periods (e.g., Saturday evening), or require all viewers to be from certain geographic regions (e.g., France), demographic profiles (e.g., male, age 18-25), or behavioral categories (e.g., golfers). The number of impressions viewer type v generates is called its supply. We use the following notation: • V = the set of all viewer types • K = the set of all campaigns • Vk = the set of viewer types that campaign k targets • Kv = the set of campaigns that target viewer type v • gk = the impression goal of campaign k • Sv = the supply of viewer type v (a random variable) • sv := E[Sv ] = the expected supply of viewer type v • cv := 1/sv = the reciprocal of the expected supply of viewer type v

The viewer type partition is the partition of the audience space induced by the targeting constraints of all campaigns managed by the network provider. Example 1. Figure 4(a) displays a viewer type partition induced by three campaigns: campaign A targets Pittsburgh (viewer types a, d, e, and g), campaign B targets males (viewer types b, d, f , and g), and campaign C targets the 18-25 year old age group (viewer types c, e, f , and g). The viewer types in this example are: a={from Pittsburgh, female, not aged 18-25}, b={not from Pittsburgh, male, not aged 18-25}, c={not from Pittsburgh, female, aged 18-25}, d={from Pittsburgh, male, not aged 18-25}, e={from Pittsburgh, female, aged 18-25}, f ={not from Pittsburgh, male, aged 18-25}, and g={from Pittsburgh, male, aged 18-25}. The planning problem we study can be cast as a quadratic transportation problem: we represent each viewer type as a source node, each campaign as a sink node, and for each viewer type that a campaign targets, we connect the corresponding source to sink with an uncapacitated arc. There are two equivalent formulations of this planning problem (P P ): an impression formulation (P P IM P ) in which the amount of flow on arc (v, k) represents the absolute number of impressions of viewer

7


type v allocated to campaign k, and the proportion formulation (P P P ROP ) in which the amount of flow on arc (v, k) represents the proportion of viewer type v allocated to campaign k: (P P IM P ) min s.t.

P k∈K,v∈Vk

P v∈Vk

P k∈Kv

(P P P ROP ) min s.t.

xvk

= gk ∀k ∈ K

(impression goals)

xvk

≤ sv ∀v ∈ V

(supply constraints)

≥ 0 ∀k ∈ K, v ∈ Vk (nonnegativity)

xvk P k∈K,v∈Vk

P v∈Vk

P k∈Kv

pvk

cv x2vk

sv p2vk

sv pvk

= gk ∀k ∈ K

(impression goals)

pvk

≤ 1 ∀v ∈ V


≥ 0 ∀ k ∈ K, v ∈ Vk (nonnegativity)

The impression formulation (P P IM P ) has a deterministic interpretation: If we assume supply is known (i.e. Sv = sv ), then the decision variable xvk is interpreted as the number of impressions of viewer type v allocated to campaign k. In this case, the impression goal constraint makes sure each campaign is allocated exactly the number of impressions that it requires, and the supply constraint ensures that none of the viewer types are overallocated. The quadratic objective tends to spread impressions proportionally across all viewer types that a campaign targets; we will see in §5 why this is important. However, since audience sizes are not known in advance, in general Sv 6= sv . By defining the decision variable pvk = xvk /sv , we can substitute pvk for xvk to get the proportion formulation (P P P ROP ). The advantage of the proportion formulation is that for every realization of Sv , we can consider Sv pvk to be the number of impressions generated by viewer type v that get served to campaign k. Under this interpretation, E[Sv pvk ] = sv pvk = xvk ; i.e. xvk is the number of impressions of viewer type v that will be served to campaign k in expectation. Therefore, the impression goal constraint states that in expectation, each campaign must be served exactly the number of impressions that it requires. And the supply constraint ensures that no more than 100% of each viewer type is allocated. 3.1.1. Relating our Model to the Literature. Ad planning problems with this constraint structure have been studied by Langheinrich et al. (1999), Tomlin (2000), and Nakamura and Abe

8


(2005); however, there is disagreement in the research community about what constitutes a natural objective function, especially when the advertising product is impressions, not clicks. The paper by Langheinrich et al. (1999), as well as later work by a subset of the original authors (see Nakamura and Abe 2005), employs the linear objective

P k∈K,v∈Vk

rvk xvk to maximize the total number of

clicks their plan is expected to generate, where rvk is the click-through rate of campaign k for viewer type v. Note that their objective cannot be used for GTDA problems, since in our case a viewer only needs to see an ad – not click on it – for an impression to be counted. It is tempting to consider modifying their objective by taking rvk = 1 ∀k ∈ K, v ∈ Vk ; or more generally, assuming that rvk is the marginal revenue collected by serving one impression of viewer type v to campaign k. However, neither modification works: Taking rvk = 1 causes the objective to simplify to a constant, since P k∈K,v∈Vk

xvk =

P k∈K

gk due to the impression goal constraints. And because GTDA campaign

contracts are settled in advance for a lump sum, the marginal revenues {rvk ∀k ∈ K, v ∈ Vk } are not explicitly available to use in the objective. Tomlin (2000) notes that the linear objective of Langheinrich et al. (1999) isn’t robust against unexpected changes in supply, and therefore advocates maximizing entropy using the objective P k∈K,v∈Vk

−xvk ln(xvk ). As is known in the traffic modeling literature, the entropy function is a

robust objective which leads to allocations that are well spread. Tomlin (2000) follows Langheinrich et al. (1999) in assuming that the network provider is interested in maximizing total clicks, and so suggests an objective that has two terms: the first measures total entropy, and the second is the linear term of Nakamura and Abe (2005). As already discussed, the objective of Nakamura and Abe (2005) does not apply to GTDA problems, suggesting that we may want to simply maximize total entropy. The resulting objective would tend to spread impressions of each campaign across many viewer types, and is in the spirit of what we propose in this paper; however, our quadratic objective, which is not equivalent to the max entropy objective, is better motivated in the context of GTDA.

9


3.2. Motivating the Quadratic Objective To motivate the quadratic objective in the problem (P P ), we require additional notation: Let Sk :=

P v∈Vk

Sv be the total supply (in impressions) that campaign k targets, and let qk := gk /E[Sk ].

We define the equal-proportion allocation as {pvk = qk ∀k ∈ K, v ∈ Vk }; i.e. all impressions that a campaign targets are assigned an equal proportion of the campaign’s impression goal. First, we show that the quadratic objective in (P P ) is equivalent to minimizing the L2 distance from the equal-proportion allocation. Afterward, we comment on why the equal-proportion allocation is desirable. Let q be the vector with components {qvk = qk ∀k ∈ K, v ∈ Vk }, and p be the vector of deciqP 2 sion variables {pvk ∀k ∈ K, v ∈ Vk }. The distance function d(p, q) = k∈K,v∈Vk sv (pvk − qvk ) is a simple modification to the usual L2 distance that accounts for the fact that viewer types with a larger audience size should be given more weight. In this regard, when we expect sv impressions to be generated by viewer type v, it is possible to think of d(p, q) as a distance function in which the summation is over all campaigns and all impressions. Formally, the inner product on which P this distance function is based is ha, biS = k∈K,v∈Vk sv avk bvk ; thus, the corresponding norm is p || · ||S = h·, ·iS , and we have d(p, q) = ||p − q||S . Note that usually we assume sv > 0 ∀v ∈ V , but

if sv = 0 is permitted, then technically || · ||S is a seminorm. Proposition 1. For the problem (P P P ROP ), the objectives min

P k∈K,v∈Vk

sv p2vk and min d(p, q)

are equivalent. Proof.

The objectives min d(p, q) and min d(p, q)2 are equivalent over our domain, since the

distance function d is nonnegative. And we have: d(p, q)2 =

X

sv (pvk − qvk )2 =

k∈K,v∈Vk

=

X

=

k∈K,v∈Vk

sv (p2vk − 2qk pvk − qk2 )

k∈K,v∈Vk

sv p2vk

−2

k∈K,v∈Vk

X

X

X k∈K

sv p2vk

−2

X k∈K

qk

X v∈Vk

qk gk −

X

sv pvk −

k∈K,v∈Vk

X k∈K,v∈Vk

sv qk2 ,

sv qk2

10


where the last line follows from the fact that the impression goal constraint yields gk = The final expression is equivalent to minimizing constants.

P k∈K,v∈Vk

P v∈Vk

sv pvk .

sv p2vk because the last two terms are

Thus, the objective min

P k∈K,v∈Vk

sv p2vk attempts to spread impressions proportionally across

all viewer types that each campaign targets. This seems to be a reasonable choice, and indeed the practice of spreading exposures across media assets is widespread in media planning: For an example in which ads are spread across multiple TV commercials, see Bollapragada et al. (2002); for an example where ads are spread across multiple online video games, see Turner et al. (2008). Intuitively, spreading exposures across media assets – or across viewer types – makes sense: an advertiser wants a large number of people to see their ad, and not everyone in the advertiser’s target audience watches the same TV show, visits the same website, or plays the same video game. Acknowledging the importance of spreading, however, does not answer the fundamental question: What is the optimal way to spread impressions? It turns out that in many cases, the equal-proportion allocation is a good way of spreading, since it tends to minimize the variability of the number of impressions served while maximizing reach – the number of unique individuals that see an ad campaign over a fixed time period. To understand why, we need to first introduce distributional assumptions for supply, thereby modeling audience uncertainty, and to introduce a model for how the network provider executes the plan (P P ).

3.3. Model of Audience Uncertainty The network provider’s ad server is the computer system responsible for selecting which ads to serve at each point in time. We say an arrival occurs at the instant the ad server must select and push ads to a single viewer. In the context of webpage banner ads, an arrival occurs when a viewer loads a webpage in their browser; at this point, the ad server selects one ad for each of the n banner ad slots on the page. In the context of dynamic in-game advertising, an arrival occurs when a video game player loads a new game level; at this point, the ad server selects one ad for each of the n ad slots in the level.


11

We assume viewers of type v arrive into the system according to a Poisson process with rate λv . It is known that arrivals to home pages of websites can be accurately modeled with a nonhomogeneous Poisson process (e.g., Liu et al. 2001, Chlebus and Brazier 2007), as well as arrivals of people playing video games (Turner 2010). Further, we assume each arrival r of viewer type v generates an i.i.d. random number of impressions Yvr , which has mean µv and standard deviation σv . Therefore, the supply of viewer type v can be written as: Sv = Yv1 + · · · + YvMv , where Mv ∼ Poisson(λv ). On the web, it is common for exactly one impression to be counted for each banner ad served, and so Yvr = n if there are n ad slots on the page. However, more generally, it is possible to count impressions in different ways, e.g., logging an impression once an ad is on-screen for 10 seconds, as is the case in dynamic in-game advertising. Since Sv is a compound Poisson1 random variable, its expectation and variance are: E[Sv ] = µv λv , Var[Sv ] = µ2v + σv2 λv .

(1)

3.4. Model of Ad Server Execution Consider what happens when the ad server processes a single arrival of viewer type v. To exactly track the plan (P P ), the ad server would like to assign a pvk -fraction of the impressions that this arrival will generate to campaign k. However, such an exact assignment is usually not possible, because each arrival has a discrete number of slots, and, in general, each slot generates a random number of impressions. In particular, if the number of ad slots is less than the number of campaigns k with pvk > 0, the ad server must randomly pick a subset of the eligible campaigns to serve to this arrival. Modeling these details allows us to measure how well the ad server executes the plan (P P ). 1

P A random variable Y is compound Poisson if it can be written as the sum Y = n=1..N Xn , where N is a Poisson random variable and the Xn ’s are i.i.d. Since N and Xn are independent, from first principles we have E[Y ] = E[N ]E[X] and Var[Y ] = E[N ](Var[X] + E[X]2 ).

12


r Let Fvk be the random fraction of the impressions generated by the rth arrival of viewer type v

that get served to campaign k. Since the ad server is trying to execute the plan as closely as possible, r we require E[Fvk ] = pvk to hold for any campaign k that, according to the plan, is scheduled to

receive a pvk -fraction of viewer type v’s impressions. If any two campaigns, according to the plan, should be served the same p-fraction of viewer type v’s impressions, we require the ad server to treat these two campaigns in a symmetric fashion. Therefore, for all campaigns which the plan allocates a p-fraction of viewer type v, we assume r the random variables Fvk are i.i.d. over all arrivals r and campaigns k. This allows us to define r 2 αv (p) := E[(Fvk ) ] as the second moment of the random fraction of impressions awarded to a r campaign that is allotted a p-fraction of viewer type v. As well, we define βvk := Prob(Fvk > 0) as

the probability that campaign k is served in one or more ad slots of an arrival of type v. Since we require campaigns that are allocated the same p-fraction of viewer type v to be treated the same way, we often write βv (p) in place of βvk . Example 2. Banner ads are being served by a web server. Arrivals of all viewer types see the same web page, which has n ad slots to be filled, and we assume that each of the n slots will generate exactly one impression. For an arrival of type v, the ad server would like to assign campaign k to npvk ad slots; however, npvk may not be integral, and so the number of slots assigned to campaign k is rounded up or down. The random fractions are defined as: ( r Fvk

=

bnpvk c n dnpvk e n

with probability bnpvk c + 1 − npvk with probability npvk − bnpvk c

r It is straightforward to compute E[Fvk ] = pvk ; αv (p) =

2bnpc+1 (p n

−

bnpc ) n

(2) +

bnpc2 ; n2

and βv (p) =

min(np, 1) (see Appendix A). Note that αv (p) is piecewise-linear convex increasing with n segments, while βv (p) is piecewise-linear concave increasing with 2 segments. r In practice, it may be difficult to choose a suitable analytical expression for Fvk that closely

matches the complex ad slotting heuristics of a given ad server; however, historical data can be used to estimate the shape of the αv (p) and βv (p) functions, which are the crucial model components.


13

Our framework assumes the plan provides high-level guidance to the ad server, but does not fully specify how the ad server should execute the plan. The alternative is to generate plans that also provide lower-level direction to the ad server; i.e. plans that implicitly or explicitly include the exact allocation of ads to slots for each arrival. Abrams et al. (2008) call the pattern of ads an individual viewer is served a slate, and generate a plan with one or more slates for each viewer type. When their plan is executed, slate i is shown with probability fi , exposing the viewer to the set r of campaigns in the slate. In our framework, Fvk can be defined to model the random selection of

slates; however, since we require all campaigns assigned the same p-fraction of a viewer type to be treated in a symmetric fashion, the slates must be generated in a way that ensures this symmetry.

4. Performance Metrics Using our models of audience uncertainty and ad server execution, we derive expressions for two important performance metrics: reach and variance. Later in §5, we will show that solutions to (P P ) have good performance with respect to these metrics; i.e. such solutions tend to have high reach and low variance.

4.1. Reach Let Uvk be the reach of campaign k in viewer type v – i.e. the number of unique individuals from viewer type v that see campaign k over a fixed time period. To derive an expression for expected reach E[Uvk ], we assume that the population of viewers (across all viewer types) is homogeneous, so that each individual viewer arrives at rate η. Let mv be the number of individuals that belong to viewer type v, and let Aiv ∼ Poisson(η) be the number of arrivals of individual i in viewer type v. i v Then by definition, λv = E[A1v + · · · + Am v ] = mv E[Av ] = mv η =⇒ mv = λv /η. Furthermore, for each

arrival of individual i, campaign k is served with probability βvk ; thus, individual i sees campaign k i i i in Wvk ∼ Poisson(ηβvk ) arrivals. Using the indicator variable Zvk := {1 if Wvk ≥ 1, 0 otherwise}, we

have Uvk :=

P i=1..mv

i i Zvk ∼ Binomial(mv , Prob(Zvk = 1)) ≡ Binomial(λv /η, 1 − e−ηβvk ). Therefore,

E[Uvk ] =

λv 1 − e−ηβvk , η

(3)

14


and the expected reach of a campaign allocated a p-fraction of viewer type v according to the plan is

E[Uv (p)] =

λv 1 − e−ηβv (p) . η

(4)

Although the above expected reach function assumes individuals are homogeneous in their arrival rates, we could allow for some heterogeneity without much complication. For example, assume the population can be segmented into two classes: “high use” individuals that arrive at rate η h , and “low use” individuals that arrive at the lower rate η l . Defining mhv and mlv as the number of individuals of each class in viewer type v, and πvh :=

mh v mlv +mh v

and πvl :=

mlv mlv +mh v

as the proportions of individuals

considered “high use” and “low use” respectively, we can easily compute mhv = mlv =

πvh h h πv η +πvl η l

λv and

πvl λ πvh η h +πvl η l v

using the identities λv = mhv η h + mlv η l and πvh + πvl = 1. Therefore, expected reach h l is E[Uv (p)] = mhv 1 − e−η βv (p) + mlv 1 − e−η βv (p) . Segmenting the population into any number of classes in the preceding manner is theoretically possible, however in practice it may not be necessary: Even though the 2-segment expected reach function does not model the possibility of an individual switching from being “high use” to “low use” (as most certainly occurs in practice), a reasonably good fit for the empirical aggregate reach curve of dynamic in-game advertising can be obtained using the 2-segment reach function (Turner 2010). Throughout this paper, we use the single-segment expected reach function because it is simple, yet it maintains the essential structure of an expected reach function: it is concave increasing in βv (p) and is directly proportional to λv . Note that Jensen’s Inequality implies that successive segmentation yields successively lower estimates for expected reach; thus, the single-segment expected reach function overestimates the true expected reach. def

Finally, we assume that the reach of campaign k in plan (P P ), denoted Uk , is simply Uk = P v∈Vk

Uvk ; i.e. reach is additive across viewer types. Although this assumption is common in the

media industry (see, for example, Surmanek 1995), in general Uk ≤

P v∈Vk

Uvk due to audience

15


duplication: individual viewers may belong to more than one viewer type, and thus end up doublecounted; for example, if viewer type 1 is {Males that visit www.espn.com/golf/} and viewer type 2 is {Males that visit http://finance.yahoo.com/}, some viewers will fall into both categories, yet def

should only be counted once. For this reason, E[Uk ] =

P v∈Vk

E[Uvk ] is an overestimate of the true

expected reach that campaign k gets from plan (P P ). Of course, when viewer types are defined using only properties of the audience members, and not properties of the media vehicles, audience duplication does not occur and our definition of Uk is exact. The total expected reach of plan (P P ) is the expected reach summed up across all campaigns:

E[U (p)] =

X

X

E[Uk ] =

k∈K

E[Uv (pvk )] =

k∈K,v∈Vk

X k∈K,v∈Vk

λv 1 − e−ηβv (pvk ) . η

(5)

4.2. Variance Let Xvk be the actual number of impressions served to campaign k from viewer type v. If λv is known, then Xvk is a random variable of the form: Xvk =

X

r Fvk Yvr , where Mv ∼ Poisson(λv ).

(6)

r=1..Mv r To compute E[Xvk ] and Var[Xvk ], we need estimates for the first and second moments of Fvk and

Yvr , as well as an estimate for λv . We assume the only parameter subject to significant estimation error is λv ; in other words, we have a reasonable understanding of the dynamics of the ad server on r which the first and second moments of Fvk and Yvr depend, but audience size is subject to forecast

error. To model the estimation error of λv , we assume that a forecasting system uses historical data and forward-looking statements from management to compute the estimator Λv . We insist that Λv is unbiased (E[Λv ] = λv ), and that Var[Λv ] – the variance of the forecasting system’s point estimate for λv – can be computed. With forecast error of audience size taken into account, Equation (6) generalizes to:

Xvk =

X r=1..Mv

r Fvk Yvr , where Mv ∼ Poisson(Λv ).

(7)

16


Using this definition of Xvk , the mean and variance of the number of impressions served to campaign k from viewer type v are (see Appendix B for the derivations): E[Xvk ] = µv λv pvk = sv pvk

(8)

Var[Xvk ] = (σv2 + µ2v )λv αv (pvk ) + µ2v p2vk Var[Λv ] . | {z } | {z } Stochastic Variance

(9)

Forecast Variance

In general, Var[Xvk ] has two components: forecast variance caused by uncertainty in the forecasted arrival rate; and stochastic variance which, assuming a known arrival rate, is caused by uncertainty in the number of arrivals, the number of impressions per arrival, and the number of impressions assigned to each campaign from each arrival. Example 3. Consider the case where a forecasting system uses τ periods of historical data from a server log to compute Λv ; in other words, audience size distributions are assumed to be stationary. Furthermore, since server logs can be very large, we assume the forecasting system uses a sampled server log, where each arrival in the full log is sampled independently with probability γ. Let Zvt be the number of arrivals of viewer type v in the sampled log from period t; we treat Zvt as an estimator – a random variable that encapsulates the variation in the number of arrivals that could have occurred. Therefore, Λv – the maximum likelihood estimator for the arrival rate parameter λv under the assumption that arrivals are i.i.d. Poisson(λv ) in all periods – is computed by averaging the number of sampled arrivals Zvt over time periods t = 1..τ and scaling by γ: 1 X t Z , where Zvt ∼ Poisson(γλv ), γτ t=1..τ v 1 X 1 E[Zvt ] = × τ (γλv ) = λv , and E[Λv ] = γτ t=1..τ γτ 1 X 1 λv Var[Λv ] = 2 2 ; Var[Zvt ] = 2 2 × τ (γλv ) = γ τ t=1..τ γ τ γτ Λv =

(10) (11) (12)

and the variance of the number of impressions served to campaign k from viewer type v is computed by substituting (12) into (9): Var[Xvk ] = (σv2 + µ2v )λv αv (pvk ) + | {z } Stochastic Variance

µv sv p2vk γτ | {z }

Forecast Variance

.

(13)

17


Finally, we define Xk = X(p) =

P k∈K

P v∈Vk

Xvk as the total number of impressions served to campaign k, and

Xk as the total number of impressions served to all campaigns under the plan (P P ).

Since variance is additive, the total stochastic variance of plan (P P ) is:

StochVar[X(p)] =

X

StochVar[Xk ] =

k∈K

X

StochVar[Xvk ] =

k∈K,v∈Vk

X

(σv2 + µ2v )λv αv (pvk ), (14)

k∈K,v∈Vk

and the total forecast variance of plan (P P ) is:

ForecastVar[X(p)] =

X

ForecastVar[Xk ] =

k∈K

X

ForecastVar[Xvk ] =

k∈K,v∈Vk

X

µ2v p2vk Var[Λv ].

k∈K,v∈Vk

(15)

5. Managerial Insights 5.1. Visualizing The Important Functions Figure 1 displays plots of α(p) and β(p), as well as the expected reach, stochastic variance, and forecast variance of a single campaign k served to a single viewer type v as a function of p ≡ pvk . For these plots, we assume the ad server execution model of Equation (2), the definition of Λv from Equation (10), and fix the parameters of this instance at γ = 0.1, τ = 3, µ = 5, σ = 5, η = 2.2, λv = 10, and n = 3. Notice that since there are n = 3 ad slots, α(p) and StochV ar(p) have n = 3 piecewise-linear segments. As well, β(p) and Reach(p) increase linearly until p = 1/n = 1/3, beyond which expected reach cannot increase because β(p) = 1 implies all individuals of this viewer type see this campaign. Finally, F orecastV ar(p) is quadratic in p, and is independent of α(p) and β(p).

ΑHpL

ΒHpL

ReachHpL

StochVarHpL

ForecastVarHpL

1.

1.

4

500

800

0.5

0.5

2

250

400

0.

Figure 1

0

0.5

1

0.

0

0.5

1

0

0

0.5

1

0

0

0.5

1

0

0

0.5

1

Important functions of the proportion p ≡ pvk of viewer type v allocated to campaign k. Parameters are fixed at γ = 0.1, τ = 3, µ = 5, σ = 5, η = 2.2, λv = 10, and n = 3.

18


5.2. When is the Equal-Proportion Allocation Optimal? Because spreading impressions proportionally across viewer types is a natural way to spread impressions, managers of ad networks are interested to know what conditions are required to hold for the equal-proportion allocation to maximize reach, minimize stochastic variance, and minimize forecast variance. To answer this question, we first define a planning problem related to (P P ) and show that the equal-proportion allocation is optimal in this problem for all objective functions of a certain form: Theorem 1. Consider the objective function f (p) =

P k∈K,v∈Vk

sv h(pvk ), where h : R → R is convex

(but possibly nondifferentiable). The equal-proportion allocation p = q is optimal for the optimization problem: (P 1) min f (p) P s.t. v∈Vk sv pvk = gk ∀k ∈ K pvk ≥ 0 Proof.

See Theorem 4 in the Appendix.

(impression goals)

∀k ∈ K, ∀v ∈ Vk (non-negativity)

Next, we show that under certain conditions, the expressions for expected reach, stochastic variance, and forecast variance have the functional form required by Theorem 1; therefore, in this case, the equal-proportional allocation is optimal in (P 1) with respect to expected reach, stochastic variance, and forecast variance: Corollary 1. Assume that all viewer types share the same mean number of impressions per arrival (µv = µ ∀v ∈ V ), and that as a function of the planned proportion p, the probability that a given campaign is served to a given arrival is the same across all viewer types (βv (p) = β(p) ∀v ∈ V ). Then if β(p) is concave in p, the equal-proportion allocation p = q is optimal for (P 1) under the objective of maximizing expected reach. P λv −ηβ(pvk ) . From Equation (5), total expected reach is f0 (p) = k∈K,v∈Vk η 1 − e P Because max f0 (p) ≡ min(−µη)f0 (p) ≡ min k∈K,v∈Vk µλv e−ηβ(pvk ) − 1 ≡ P def min k∈K,v∈Vk sv e−ηβ(pvk ) − 1 , the result follows from Theorem 1 using h(p) = e−ηβ(p) − 1. Note

Proof.

that β(p) concave =⇒ −ηβ(p) convex =⇒ e−ηβ(p) convex =⇒ h(p) convex.

19


Corollary 2. Assume that all viewer types share the same mean and standard deviation for the number of impressions per arrival (µv = µ, σv = σ ∀v ∈ V ), and that as a function of the planned proportion p, the second moment of the random fraction of impressions awarded to a given campaign is the same across all viewer types (αv (p) = α(p) ∀v ∈ V ). Then if α(p) is convex in p, the equal-proportion allocation p = q is optimal for (P 1) under the objective of minimizing stochastic variance. Proof.

From Equation (14), total stochastic variance is f0 (p) =

µ Since min f0 (p) ≡ min σ2 +µ 2 f0 (p) ≡ min

P k∈K,v∈Vk

µλv α(pvk ) ≡ min

P

k∈K,v∈Vk (σ

P k∈K,v∈Vk

2

+ µ2 )λv α(pvk ).

sv α(pvk ), the result

def

follows from Theorem 1 using h(p) = α(p). Corollary 3. Assume that all viewer types share the same mean number of impressions per arrival (µv = µ ∀v ∈ V ). Then if Var[Λv ] = θλv ∀v ∈ V , where θ ≥ 0 is a constant, the equalproportion allocation p = q is optimal for (P 1) under the objective of minimizing forecast variance. Proof.

From Equation (15), total forecast variance is f0 (p) =

min f0 (p) ≡ min µ1 f0 (p) ≡ min

P k∈K,v∈Vk

µθλv p2vk ≡ min

P k∈K,v∈Vk

P k∈K,v∈Vk

µ2 p2vk Var[Λv ]. Since

sv p2vk , the result follows from

def

Theorem 1 using h(p) = p2 . Although the real planning problem we wish to solve is (P P ) and not (P 1), Corollaries 1, 2, and 3 are nevertheless important. Indeed, whenever the equal-proportion allocation is feasible in (P P ), and the conditions of Corollaries 1, 2, and 3 hold, the equal-proportion allocation is optimal with respect to all three performance metrics in (P P ) as well; this follows immediately from the fact that the constraint set of (P 1) is the relaxation of (P P ) with supply constraints dropped. Moreover, when the equal-proportion allocation is not feasible in (P P ) and the conditions of Corollary 3 hold, forecast variance is still minimized by solving the planning problem (P P ), since in this case minimizing the quadratic objective of (P P ) is equivalent to minimizing forecast variance (this is clear in the proof of Corollary 3). With respect to maximizing expected reach and minimizing stochastic variance when there are supply constraints, the results are not as clean. However, when the equal-proportion allocation is

20


not feasible in (P P ) and the conditions of Corollaries 1 and 2 hold, the quadratic objective of (P P ) is a good surrogate objective for expected reach and stochastic variance. This is because the quadratic objective of (P P ) produces feasible plans as close to the equal-proportion allocation as possible (cf. Proposition §1), which we know would be optimal if the supply constraints were dropped. In this case, managers can be confident that plans generated by (P P ) will be executed in a manner that achieves near-optimal expected reach and stochastic variance. 5.2.1. Interpreting the Conditions that Imply Optimality. We now interpret the assumptions of Corollaries 1, 2, and 3, and show that they can often be satisfied in practice. Let us call a cluster of viewer types homogeneous if all viewer types in the cluster share the same values for µv , σv , αv (p), and βv (p). The three sets of assumptions are: • Conditions that require viewer types to be (partially) homogeneous. Since Corollaries 1, 2,

and 3 assume some degree of homogeneity of the viewer types, instead of solving one large planning problem (P P ), it makes sense to decompose the full planning problem into clusters of homogeneous viewer types, and solve a planning problem of the form (P P ) for each. In practice, a cluster of homogeneous viewer types represents all audience segments in a group of similarly-structured media vehicles: e.g., all audience segments in all websites that have the same number of ad slots, or all audience segments in all video games that induce similar patterns of game play. Note that viewer types from media vehicles with different content classifications can be clustered together; i.e. viewer types from a finance website can get clustered with viewer types from a golf website if the number of ad slots on both webpages are the same. As well, the arrival rates λv are allowed to be heterogeneous across all viewer types (all audience segments of all media vehicles) within each cluster. • Convexity of α(p) and concavity of β(p). If the ad server’s execution of the plan is modeled

by Equation (2), both of these assumptions hold, as can be seen in panels 1 and 2 of Figure 1. • Var[Λv ] = θλv ∀v ∈ V for some constant θ ≥ 0. When arrival rates are assumed to be station-

ary, i.e. if Λv is computed from τ periods of historical data sampled with frequency γ as in Example 3, this assumption holds with θ = 1/γτ .

21


5.3. Sensitivity to the Number of Ad Slots A manager of an ad network may want to know how the number of ad slots on a webpage or in a game level affects reach and variance. Typically, α(p) and β(p) depend on the number of ad slots n; thus, expected reach and stochastic variance are affected by the number of ad slots, but forecast variance is not. Figures 2 and 3 show the sensitivity of expected reach and stochastic variance to the number of ad slots. For these plots, we make the following assumptions: ad server execution is defined by Equation (2); Λv is defined by Equation (10); µ = σ = 5; there are two viewer types with arrival rates λ1 = λ2 = 10; and a single campaign k with impression goal gk = 50 targets both viewer types. Therefore, we have s1 = s2 = µλ1 = µλ2 = 50 and the equal-proportion allocation is qk = gk /(s1 + s2 ) = 0.5. The impression goal constraint requires s1 p1k + s2 p2k = gk ; thus, for any p1k ∈ [0, 1], we take p2k = 1 − p1k . The total expected reach and total stochastic variance from both viewer types is plotted as a function of p1k .

n1

8 6 4

n2

8 6

0

Figure 2

0.5

4

1

n4

8 6

0

0.5

4

1

n8

8 6

0

0.5

4

1

n¥

8 6

0

0.5

4

1

0

0.5

1

Total expected reach as a function of p1k for varying numbers of ad slots n. Parameters are fixed at γ = 0.1, τ = 3, µ = 5, η = 2.2, and λ1 = λ2 = 10.

n1

n2

400 200

Figure 3

n4

400 0

0.5

1

200

n8

400 0

0.5

1

200

n 100

400 0

0.5

1

200

400 0

0.5

1

200

0

0.5

1

Total stochastic variance as a function of p1k for varying numbers of ad slots n. Parameters are fixed at µ = 5, σ = 5, and λ1 = λ2 = 10.

22


There are several structural properties to notice. First, we see that the equal-proportion allocation p1k = p2k = qk = 0.5 maximizes expected reach and minimizes stochastic variance in all cases. However, it is also interesting to note that at one extreme (n → ∞), expected reach is not affected by the choice of (p1k , p2k ), while at the other extreme (n = 1), it is stochastic variance that is unaffected by the solution (p1k , p2k ). Thus, the quadratic objective of (P P ) not only has a simpler form than the expected reach and stochastic variance functions, but it is more robust: it delivers the optimal expected reach and stochastic variance for all values of n. It is also important to note that when n is low, simply increasing the number of ad slots may increase expected reach: As shown in Figure 2, when n = 1, total expected reach is maximized at a value of 6, while for n ≥ 2, total expected reach can be as high as 8. This is because β(p) = min(np, 1) = 1 for all p ≥ 1/n. When β(p∗ ) < 1 at the reach-optimal proportion p∗ (in this case the equal-proportion allocation p∗ = 0.5), increasing n increases β(p∗ ). In this case, we can see that when n = 1, β(0.5) = 0.5, while when n = 2, β(0.5) = 1.

5.4. Sensitivity to the Ad Server’s Execution Algorithm Although the ad server may execute the plan (P P ) using any number of many heuristics, there are two opposing execution strategies that deserve mention. The first is a reach strategy that serves as few impressions to each arrival as possible, thereby spreading impressions over the largest number of individuals; this is the strategy we have assumed thus far, and is well-modeled by Equation (2). The second is a saturation strategy that serves each campaign to fewer individuals, but each individual served sees as many impressions as permitted. The saturation strategy, which tends to increase individuals’ recall of the ads they see, can be modeled with the random fractions: ( ω with probability pvk /ω r Fvk = 0 with probability 1 − pvk /ω

,

(16)

where the constant ω is the proportion of an arrival’s impressions to allocate to campaign k, conditional on campaign k being selected for service to this arrival. For example, if it is visually unappealing for more than 5 of 20 ad slots in a video game level to show the same campaign,


23

ω = 0.25 would be appropriate. Finally, we note that pvk ≤ ω ∀k ∈ K, v ∈ Vk should be added to the constraint set of (P P ) to ensure the probabilities in (16) are at most 1. Structurally, the saturation strategy is very similar to the reach strategy with one ad slot per arrival. Under the saturation strategy, it is easy to derive αv (p) = p and βv (p) = p/ω from Equation (16). And under the reach strategy with n = 1, we know that αv (p) = p and βv (p) = p from Example 2. Since αv (p) = p in both cases, we know that under the saturation strategy, stochastic variance takes the same form as under the reach strategy with one ad slot: it is a constant for all feasible solutions to (P P ), as depicted in panel 1 of Figure 3. Thus, if the saturation strategy is employed by the ad server, it is not possible to affect stochastic variance. Furthermore, since βv (p) is linear in p in both cases, we know that under the saturation strategy, expected reach takes the same form as under the reach strategy with one ad slot: it is smooth and concave as in panel 1 of Figure 2; i.e. it does not have a kink or a flat plateau as in panels 2 through 5.

6. Aggregation In this section, we study aggregation of the viewer type space as a way to manage the potentially large number of viewer types, many of which correspond to very small populations of viewers. This is especially important when targeting varies widely across campaigns, since in that case the number of viewer types grows exponentially with the number of campaigns (up to a point – the world’s population, and thus the number of viewer types, is indeed bounded!). In the context of transportation planning, Zipkin (1980a) introduces the aggregation framework, which is refined by Zipkin and Raimer (1983): An Aggregated Transportation Problem (ATP) is produced from an original Transportation Problem (TP) by grouping source and destination nodes into aggregated mega-source and mega-destination nodes, and adjusting the arc costs and capacities accordingly. An optimal solution for (ATP) is disaggregated into a “good” feasible solution for (TP), and the quality of this solution is assessed using a duality-based bound. The advantage of using an aggregation algorithm is that near-optimal solutions to (TP) can be found by solving (ATP), which is a much smaller problem.

24


In our case, a crucial assumption of Zipkin (1980a) does not hold: namely, the adjacency structure of all aggregated nodes should be the same. Without this assumption, solutions of (ATP) may be infeasible for (TP) after disaggregation. Much of this section focuses on restoring feasibility. In terms of exposition, we deviate from Zipkin (1980a) by including the disaggregation formula which transforms an aggregate solution into a disaggregate solution explicitly in the aggregate problem. Therefore, the aggregate problem of Zipkin (1980a) is analogous to what we define as an auxiliary transportation problem, and the extended formulation that we call the aggregate problem is not explicitly defined by Zipkin (1980a). Furthermore, we only consider aggregation of viewer types, and not campaigns; this is because the viewer type space grows exponentially in the number of campaigns, and is therefore the most important dimension for us to aggregate. Future work may consider aggregation of campaigns as well.

6.1. Notation and Definitions Aggregation is accomplished by clustering viewer types into groups, which we call inventory blocks. An inventory block partition is a clustering in which each viewer type is assigned to exactly one inventory block. We extend the notation of §3.1 as follows: • I = the set of all inventory blocks • i(v) = the inventory block to which viewer type v belongs • Vi = the set of viewer types in inventory block i • Vik = the set of viewer types that campaign k targets in inventory block i • Ik = the set of inventory blocks that campaign k targets • Ki = the set of campaigns that target inventory block i

For example, Figure 4(b) shows one possible inventory block partition of the viewer type space introduced in Example 1; in this partition, all inventory from viewers in Pittsburgh is considered inventory block 1, while all remaining inventory is considered inventory block 2. Thus, i(a) = i(d) = i(e) = i(g) = 1, i(b) = i(c) = i(f ) = 2, V2 = {b, c, f }, V2B = {b, f }, K2 = {B, C }, and IB = I = {1, 2}.

25


d B

Figure 4

b

A

A

a

1

g f

1

e c

C

B

2

1

1

2

2

C

(a) An example of a viewer

(b) An example of an inven-

type partition.

tory block partition.

Viewer Type and Inventory Block Partitions

P P Naturally, the supply of inventory block i is Si = v∈Vi Sv , and so si := E[Si ] = v∈Vi E[Sv ] = P v∈Vi sv . Analogously, the supply of the subset of inventory block i that campaign k targets is P Sik = v∈Vik Sv ; thus, the expected supply of inventory block i available to campaign k is sik := P E[Sik ] = v∈Vik sv .

Recall from §3.1 that the planning problem (P P ) has two equivalent formulations: the impression formulation, which uses decision variables xvk , and the proportion formulation, which uses decision def

variables pvk = xvk /sv . In this section, we refer to (P P ) as the Original Planning Problem, denoted (OP P ), and write (OP P IM P ) and (OP P P ROP ) for the impression and proportion formulations respectively. One way of aggregating (OP P ) is by adding constraints pvk = pv(i),k ∀k ∈ K, ∀v ∈ Vk ; i.e. substituting the variable pik ≡ pi(v),k for pvk for all v ∈ Vi , thereby forcing the proportional allocations for each viewer type within an inventory block to be the same. We take a slightly more general approach. Instead, we consider the Aggregate Planning Problem (AP P ) produced from (OP P ) by P adding the constraints yv = min 1, 1 k∈Kv pi(v),k ∀v ∈ V , and pvk = yv pi(v),k ∀k ∈ K, v ∈ Vk . The quantity yv is called the yield of viewer type v, and the disaggregation formula pvk = yv pi(v),k is used to convert from inventory block weights pik to viewer type proportions pvk (we prefer to call pik a weight rather than a proportion, since pik > 1 is allowed). Note that we do not intend to solve (AP P ) directly, but will use it to structurally compare different solution approaches. Example 4. Consider an instance with 2 viewer types and 2 campaigns. Viewer types v and w

26


each supply 1 impression in expectation (sv = sw = 1). Campaign A targets viewer type v only, while campaign B targets both viewer types v and w; thus, VA = {v }, VB = {v, w}, Kv = {A, B }, and Kw = {B }. Impression goals are gA = 3/4 and gB = 1. The solution {pvA = 3/4, pvB = 1/4, pwB = 3/4} is

optimal for (OP P P ROP ). This follows because, first, all feasible solutions have pvA = 3/4, reducing the objective to min p2vB + p2wB . Second, since pvA + pvB ≤ 1 is a constraint, maximum spreading of campaign B across pvB and pwB is achieved with pvB = 1/4 and pwB = 3/4. Now consider aggregating both viewer types into inventory block 1. The aggregate solution {p1A = 9/4, p1B = 3/4} is equivalent to the disaggregate solution {pvA = 3/4, pvB = 1/4, pwB = 3/4}, as can be seen from the following substitution. Yields are yv = min(1, 1/(p1A + p1B )) = 1/3 and yw = min(1, 1/p1B ) = 1; hence pvA = yv p1A = (1/3)(9/4) = 3/4, pvB = yv p1B = (1/3)(3/4) = 1/4, and pwB = yw p1B = (1)(3/4) = 3/4. Denote the optimal values of (OP P ) and (AP P ) as z OP P and z AP P respectively. Proposition 2. (AP P ) is a restriction of (OP P ). Hence, any point that is feasible in (AP P ) is feasible in (OP P ). Thus, z OP P ≤ z AP P . Proposition 3. The supply constraints Proof.

P k∈Kv

pvk ≤ 1 ∀v ∈ V are redundant in (AP P ).

Notice that ! X k∈Kv

pvk =

X

X

yv pi(v),k =

k∈Kv

X min 1, 1 pi(v),k0

pi(v),k

k0 ∈Kv

k∈Kv

! X = min 1, 1 pi(v),k k∈Kv

! X

pi(v),k = min

k∈Kv

X

pi(v),k , 1 ≤ 1.

k∈Kv

We introduce the quantities aik :=

P v∈Vik

sv yv2 , bik :=

P v∈Vik

sv yv , and cik := aik /b2ik so that we

may represent (AP P ) succinctly. The quantity cik is a measure of yield variability in the viewer types of inventory block i that campaign k targets. To see this, let Yik be a random variable that takes value yv with probability sv /sik for all v ∈ Vik . Then bik /sik and aik /sik are the first and second moments of Yik , respectively. A useful result, which follows directly from the fact that

27


Var[Yik ] = E[Yik2 ] − E[Yik ]2 ≥ 0, is that cik ≥ 1/sik . Note that if all yields are 100% (i.e. yv = 1 ∀v ∈ Vik ), then aik = bik = sik , and cik = 1/sik . Using these definitions, we can write the number of impressions planned for campaign k in inventory block i as X

xik =

xvk =

v∈Vik

X

sv pvk =

v∈Vik

X

sv yv pi(v),k = pik

v∈Vik

X

sv yv = bik pik ,

v∈Vik

the objective function as X

cv x2vk =

k∈K, v∈Vk

X

sv p2vk =

X X

sv (yv pi(v),k )2 =

k∈K, v∈Vik i∈Ik

k∈K, v∈Vk

X

p2ik

k∈K, i∈Ik

X

sv yv2 =

v∈Vik

X

aik p2ik =

k∈K, i∈Ik

X

cik x2ik ,

k∈K, i∈Ik

and the impression goal constraint as X

xvk =

v∈Vk

X X

xvk =

i∈Ik v∈Vik

X

xik = gk .

i∈Ik

Therefore, as with (OP P ), there are two equivalent formulations for (AP P ): The proportion formulation (AP P P ROP ), which has decision variables pik , aik , and bik , and the impression formulation (AP P IM P ), which has decision variables xik and cik : (AP P P ROP ) min s.t.

P k∈K,i∈Ik

P i∈Ik

aik p2ik = gk ∀ k ∈ K

bik pik

(impression goals)

≥ 0 ∀ k ∈ K, i ∈ Ik (nonnegativity)

pik (p, a, b) ∈ Y

P ROP

(nonlinear yield constraints)

where Y P ROP is the set of points {pik , aik , bik ∀k ∈ K, i ∈ Ik } that can be extended to a solution of the system P yv − min 1, 1 k∈Kv pi(v),k = 0 ∀v ∈ V P aik − v∈Vik sv yv2 = 0 ∀k ∈ K, i ∈ Ik P bik − v∈Vik sv yv = 0 ∀k ∈ K, i ∈ Ik

by picking appropriate values for the variables yv ∀v ∈ V . Similarly, the impression formulation is (AP P IM P ) min s.t.

P k∈K,i∈Ik

P i∈Ik

cik x2ik = gk ∀k ∈ K

xik

≥ 0 ∀k ∈ K, i ∈ Ik (nonnegativity)

xik (x, c) ∈ Y

(impression goals)

IM P

(nonlinear yield constraints)

28


where Y IM P is the set of points {xik , cik ∀k ∈ K, i ∈ Ik } that can be extended to a solution of the system aik −

P

bik −

P

v∈Vik

v∈Vik cik − aik /b2ik

sv yv2

= 0 ∀k ∈ K, i ∈ Ik

sv yv

= 0 ∀k ∈ K, i ∈ Ik

yv − min 1, 1 xik − bik pik

= 0 ∀k ∈ K, i ∈ Ik P k∈Kv

pi(v),k = 0 ∀v ∈ V = 0 ∀k ∈ K, i ∈ Ik

by picking appropriate values for the variables yv ∀v ∈ V and (pik , aik , bik ) ∀k ∈ K, i ∈ Ik . P Although the supply constraints k∈Kv pi(v),k ≤ 1 ∀v ∈ V are not explicitly written, they are in fact satisfied, due to Proposition 3. As well, we can think of (AP P ) as having solutions in both the inventory block space and the viewer type space, since the disaggregation formula pvk = yv pi(v),k can be used to express the solution in the viewer type space. 6.2. Solution Approaches There are two main problems that we consider: solving (AP P ) when the inventory block partition is fixed, and solving (AP P ) while simultaneously refining the inventory block partition. For the case in which the partition is fixed, we give a heuristic that either finds a feasible solution with bounded distance from the (OP P ) optimum or an infeasible solution with measurable shortfall for each impression goal. For the case where we are allowed to refine the inventory block partition, we give an algorithm that always terminates with an optimal solution to (OP P ). 6.2.1. Fixed Inventory Block Partition. Management may insist on using a specific inventory block partition if, for example, the partition for planning ad server execution must coincide with the partition used by the sales team to price and bundle ad inventory. Using a fixed partition may also be desirable if the most accurate method available for estimating viewers’ arrival rates depends on a specific partition; see for example Agarwal et al. (2007), which describes a method to estimate impression counts using a prior generated from existing data, which essentially includes a given partition. We make use of the following family of linear programs, parameterized by the objective coefficients c = {cik : k ∈ K, i ∈ Ik }. We call this the Auxiliary Transportation Problem:

29


P

(AU X(c)) min

k∈K,i∈Ik

P

s.t.

i∈Ik

cik x2ik

xik

P k∈Ki

xik

0 ≤ xik

= gk ∀k ∈ K

(impression goals)

≤ si ∀i ∈ I


≤ sik ∀k ∈ K, i ∈ Ik (arc flow bounds)

Note that (AU X(c)) is equivalent to (AP P IM P ) with objective coefficients fixed, nonlinear yield constraints dropped, supply constraints added, and upper bounds on the variables introduced. Although the supply constraints and upper bounds are redundant in (AP P IM P ), they are not in (AU X(c)). Proposition 4. The supply constraints and upper bounds present in (AU X(c)) can be derived from (OP P ), thereby proving their validity. Proof.

The supply constraints are aggregated from (OP P IM P ) as follows: X

xvk ≤ sv =⇒

X X

xvk ≤

v∈Vi k∈Kv

k∈Kv

X

sv =⇒

v∈Vi

X X

xvk ≤ si =⇒

k∈Ki v∈Vik

X

xik ≤ si .

k∈Ki

The upper bound on xik is derived from (OP P IM P ) as follows: xvk ≤ sv =⇒

X v∈Vik

xvk ≤

X

sv =⇒ xik ≤ sik .

v∈Vik

Our heuristic for finding a feasible solution to (AP P ), called GetCloseAndScaleUp, begins by assuming all yields are 100%; i.e. yv = 1 ∀v ∈ V . Therefore, aik = bik = sik ∀k ∈ K, i ∈ Ik and cik = 1/sik ∀k ∈ K, i ∈ Ik . With a slight abuse of notation, we will refer to (AU X(1/s)) as the problem (AU X(c)) with cik = 1/sik ∀k ∈ K, i ∈ Ik . The linear program (AU X(1/s)) is solved to get an impression allocation xAU X(1/s) that is “close” to optimal for (OP P ), and is converted to inventory AU X(1/s)

block weights p0ik = xik

/bik . A scaling algorithm (Algorithm 1: ScaleInvBlockWeights)

is then used to successively increase the inventory block weights pjik at each iteration j until they hopefully converge to a feasible solution for (AP P ). The inputs for ScaleInvBlockWeights are the inventory block weights p0 from (AU X(1/s)), as well as a threshold value ψ that limits the magnitude any element of pj can take before the algorithm concludes that pj is not converging to a feasible solution.

30


Algorithm 1 ScaleInvBlockWeights Input(s): p0 , ψ Output: pj 1: Initialize the iteration counter atPj = 0 j j 2: Compute yields y v := min 1, 1 ∀v ∈ V k∈Kv pi(v),k P j j 3: Compute bik := v∈Vik sv y v ∀k ∈ K, i ∈ Ik P j j j 4: Compute scaling factors fk := gk / i∈I bik pik ∀k ∈ K k j 5: if fk = 1 ∀k ∈ K then 6: pj is feasible in (AP P P ROP ) 7: return pj 8: else if ||pj ||∞ > ψ (i.e. pj is not converging to a feasible solution) then 9: pj is infeasible in (AP P P ROP ) 10: return pj 11: else j j 12: Set pj+1 ik := fk pik ∀k ∈ K, i ∈ Ik ; set j := j + 1; and go to step 2 13: end if ScaleInvBlockWeights terminates with pjik = fkCU M p0ik , where fkCU M = fkj−1 fkj−2 · · · fk1 fk0 ≥ 1. When pj is feasible, the interpretation of fkCU M is the following: p0ik is the correct inventory block weight to use if all yields end up being 100%; but since yields are often lower, inventory block weights must be increased by a factor of fkCU M to compensate. P P P P P j Note that i∈Ik bik pjik = i∈Ik v∈Vik sv y jv pjik = v∈Vk sv pjvk = v∈Vk xjvk is the total number of impressions allocated to campaign k at iteration j. Therefore, when ScaleInvBlockWeights P terminates, we have ∆k = gk − i∈Ik bik pjik ≥ 0 as the number of impressions that campaign k is underallocated. When pj is infeasible, ∆k > 0 for at least one campaign k. When this happens, management can either choose to execute this plan as-is (i.e. accept some reduction in impression goals) or refine the partition to recover feasibility, as we do in §6.2.2. Other variants of GetCloseAndScaleUp are also reasonable to consider. In general, this class of iterative algorithm guesses yields y, computes yield-dependent parameters (a, b, c), solves (AU X(c)), evaluates the actual yields y and actual yield-dependent parameters (a, b, c), adjusts (a, b, c), and iterates, re-solving (AU X(c)) until an acceptable solution is found. Note that j j j+1 j j 2 j+1 j GetCloseAndScaleUp is the special case with bj+1 ik = bik /fk , aik = aik /(fk ) , and cik = cik =

1/sik . Regardless of how a feasible solution to (AP P ) is found, we can always bound its distance from

31


optimality using the optimal value from (AU X(1/s)), as we will now show. Let z AU X(1/s) denote the optimal value of (AU X(1/s)) and z OP P denote the optimal value of (OP P ). Theorem 2. z AU X(1/s) ≤ z OP P . We show that any optimal dual solution for (AU X(1/s)) is dual feasible in (OP P ) with dual objective value z AU X(1/s) . Since the dual problem of (OP P ) is a maximization problem, z OP P ≥ z AU X(1/s) . Proof.

Using η, γ, θ, and φ as the Lagrange multipliers for the impression goal constraints,

supply constraints, variable lower bounds (nonnegativity), and variable upper bounds, respectively, the Lagrangian dual for (AU X(1/s)) is (see Appendix D for the full derivation): (DAU X(1/s)) max −

X X X 1X 2 sik (−γi + ηk + θik − φik ) − si γi + gk ηk − sik φik 2 k∈K, i∈I k∈K k∈K, i∈Ik

i∈Ik

s.t. γi ≥ 0 ∀i ∈ I, and θik ≥ 0, φik ≥ 0 ∀k ∈ K, i ∈ Ik Similarly, the dual of (OP P IM P ) with redundant variable upper bounds xvk ≤ sv ∀k ∈ K, v ∈ Vk is: (DOP P IM P ) max −

X X X 1X 2 sv (−γv + ηk + θvk − φvk ) − sv γv + gk ηk − sv φvk 2 k∈K, v∈V k∈K k∈K, v∈Vk

v∈Vk

s.t. γv ≥ 0 ∀v ∈ V, and θvk ≥ 0, φvk ≥ 0 ∀k ∈ K, v ∈ Vk Consider an optimal solution (η ∗ , γ ∗ , θ∗ , φ∗ ) of (DAU X(1/s)) which has value z AU X(1/s) . Let ∗ ∗ bv = γi(v) ηbk = ηk∗ ∀k ∈ K, γ ∀v ∈ V , θbvk = θi(v),k ∀k ∈ K, v ∈ Vk , and φbvk = φ∗i(v),k ∀k ∈ K, v ∈ Vk .

b φ) b is feasible for (DOP P IM P ). Evaluating (ηb, γ b φ) b in the objective function of b, θ, b, θ, Clearly (ηb, γ

(DOP P IM P ), we have: −

2 X X X 1X bv + sv −b γv + ηbk + θbvk − φbvk − sv γ gk ηbk − sv φbvk 2 k∈K, v∈V k∈K k∈K, v∈Vk

v∈Vk

X X X 2 X X 1X X ∗ ∗ ∗ =− + ηk∗ + θi(v),k − φ∗i(v),k − sv −γi(v) sv γi(v) + gk ηk∗ − sv φ∗i(v),k 2 k∈K, v∈V i∈I v∈V k∈K k∈K, v∈V i∈Ik

=−

i

ik

i∈Ik

X X X X X X 1X 2 ∗ sv − γi∗ sv + gk ηk∗ − φ∗ik sv − φ∗ik ) (−γi∗ + ηk∗ + θik 2 k∈K, v∈V i∈I v∈V v∈V k∈K k∈K, i∈Ik

ik

i

i∈Ik

ik

ik

32


=−

X X X 1X 2 ∗ sik (−γi∗ + ηk∗ + θik − φ∗ik ) − si γi∗ + gk ηk∗ − sik φ∗ik 2 k∈K, i∈I k∈K k∈K, i∈Ik

=z

i∈Ik

AU X(1/s)

Hence, there exists a dual feasible point with objective value z AU X(1/s) in (DOP P IM P ). Since (DOP P IM P ) is a maximization problem, z OP P ≥ z AU X(1/s) . Corollary 4. Let z F EAS be the value of a feasible solution to (OP P ). Then the optimality gap is bounded: z F EAS − z OP P ≤ z F EAS − z AU X(1/s) . Corollary 4 is important, because any feasible solution to (AP P ) is feasible for (OP P ), and hence given any feasible aggregate solution, we have a bound on suboptimality that can be computed without solving the disaggregate (original) problem. 6.2.2. Partition Refinement. If we are allowed to refine the inventory block partition as part of the solution process, we can always find an optimal solution to (OP P ) by using Algorithm 2: RefinePartitionAndSolve, which successively creates new inventory blocks for groups of viewer types that are overallocated (i.e. have

P k∈Kv

pi(v),k > 1). We assume no partition is given, and start

with a partition that has a single inventory block which contains all viewer types. Alternatively, we could, of course, begin with any partition and run the algorithm from that point forward. We have assumed that (OP P ) is feasible; if it is not, RefinePartitionAndSolve will detect infeasibility of (OP P ) at Step 4, since eventually (AU X(1/s)) will become infeasible. As well, note that we don’t need to run RefinePartitionAndSolve to completion; we can always stop early, fix the inventory block partition, and run GetCloseAndScaleUp to get a near-optimal or near-feasible solution. Theorem 3. The solution pj returned by RefinePartitionAndSolve is optimal in (OP P ). Proof.

Say RefinePartitionAndSolve terminates at iteration j with partition I j . Further-

more, let (xj , z AU X(1/s) ) be an optimal solution and corresponding optimal value for (AU X(1/s))


33

Algorithm 2 RefinePartitionAndSolve 1: Initialize the iteration counter at j = 0 2: Initialize the partition to one big inventory block: I 0 := {1}, V10 := V ; i.e. i(v) = 1 ∀v ∈ V 3: loop 4: Solve (AU X(1/s)) with partition I j to get xj 5: Compute inventory block weights pjik := xjik /sik ∀k ∈ K, i ∈ Ikj P j 6: Compute yields y jv := min 1, 1 ∀v ∈ V k∈Kv pi(v),k 7: For each i ∈ I j , find the set of overallocated viewer types: Vbij := {v ∈ Vij | y jv < 1} 8: Let nj := |i ∈ I j s.t. Vbij 6= ∅| be the number of inventory blocks with overallocated viewer types 9: if nj = 0 then 10: pj is optimal in (OP P ) 11: return pj 12: else 13: Set I j+1 := I j ∪ {|I j | + 1, |I j | + 2, ..., nj } and m := |I j | + 1 14: for all i ∈ I j do 15: if Vbij = ∅ then 16: Keep inventory block i unchanged: Vij+1 := Vij 17: else 18: Split inventory block i in two: Vij+1 := Vij \Vbij and Vmj+1 := Vbij 19: m := m + 1 20: end if 21: end for 22: end if 23: j := j + 1 24: end loop under partition I j , as computed in iteration j. We now show that (xj , s) is feasible for problem (AP P IM P ) with partition I j . First, it is clear that since (AU X(1/s)) has impression goal and nonnegativity constraints, those constraints in (AP P IM P ) are satisfied by xj . We just need to verify that (xj , s) satisfies the nonlinear yield constraints Y IM P of (AP P IM P ). But since RefinePartitionAndSolve always terminates with no viewer types overallocated (i.e. y jv = 1 ∀v ∈ V ), we know that (xj , s) ∈ Y IM P : to verify, substitute yv = 1 ∀v ∈ V , aik = bik = sik ∀k ∈ K, i ∈ Ik , and pik = pjik = xjik /sik ∀k ∈ K, i ∈ Ik into Y IM P . Thus, (xj , s) is feasible for problem (AP P IM P ) with partition I j . Now we evaluate (xj , s) in the objective of (AP P IM P ). Since cik = 1/sik ∀k ∈ K, i ∈ Ik , the objectives of (AU X(1/s)) and (AP P IM P ) are identical. Hence, not only is (xj , s) feasible in (AP P IM P ), but this solution has value z AU X(1/s) . Therefore, z AP P ≤ z AU X(1/s) . But from Proposition 2 and Theorem 2 we know that for the fixed inventory block partition I j , z AU X(1/s) ≤ z OP P ≤ z AP P .

34


Hence, z AU X(1/s) = z OP P = z AP P . Therefore, we have shown that the disaggregation of xj is optimal in (OP P IM P ); correspondingly, the disaggregation of pj is optimal in (OP P P ROP ).

7. Conclusions Among the media vehicles that can be considered Guaranteed Targeted Display Advertising, allocating impressions from campaigns to audience segments can be done with a transportation problem with quadratic objective. Models of audience uncertainty, forecast error, and the random slotting of the ad server were used to derive sufficient conditions for when the quadratic objective minimizes the variance of the number of impressions served and maximizes expected reach, so that ad managers can understand if and why their ad server is optimal with respect to variance and reach. In addition, we studied the aggregation of the viewer type space and gave two algorithms to solve the original large planning problem: GetCloseAndScaleUp, which assumes a fixed partition of the viewer type space, and attempts to find a feasible solution with bounded optimality gap; and RefinePartitionAndSolve, which successively refines the partition at each iteration, terminating with an optimal solution if one exists.

References Abrams, Z., S. S. Keerthi, O. Mendelevitch, J. A. Tomlin. 2008. Ad Delivery with Budgeted Advertisers: A Comprehensive LP Approach. Journal of Electronic Commerce Research 9(1). Adler, M., P. B. Gibbons, Y. Matias. 2002. Scheduling space-sharing for internet advertising. Journal of Scheduling 5(2) 103–119. Agarwal, D., A. Z. Broder, D. Chakrabarti, D. Diklic, V. Josifovski, M. Sayyadian. 2007. Estimating rates of rare events at multiple resolutions. Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM New York, NY, USA, 16–25. Araman, V. F., K. Fridgeirsdottir. 2008. Online Advertising: Revenue Management Approach. Working paper. London Business School. Arango, T. 2008. Cable firms join forces to attract focused ads. The New York Times March 10.


35

Birge, J. R. 1985. Aggregation bounds in stochastic linear programming. Mathematical Programming 31(1) 25–41. Bollapragada, S., H. Cheng, M. Phillips, M. Garbiras, M. Scholes, T. Gibbs, M. Humphreville. 2002. NBC’s optimization systems increase its revenues and productivity. Interfaces 32(1) 47–60. Chlebus, E., J. Brazier. 2007. Nonstationary Poisson modeling of web browsing session arrivals. Information Processing Letters 102(5) 187–190. Dawande, M., S. Kumar, C. Sriskandarajah. 2003. Performance bounds of algorithms for scheduling advertisements on a web page. Journal of Scheduling 6(4) 373–394. De Reyck, B., Z. Degraeve. 2003. Broadcast scheduling for mobile advertising. Operations Research 51(4) 509–517. Edelman, B., M. Ostrovsky, M. Schwarz. 2007. Internet advertising and the generalized second-price auction: selling billions of dollars worth of keywords. American Economic Review 97(1) 242–259. Florenzano, M., C. Le Van, P. Gourdel. 2001. Finite dimensional convexity and optimization. Springer Verlag. Fridgeirsdottir, K., S. Najafi-Asadolahi. 2008. Revenue management for online advertising: impatient advertisers. Working paper. London Business School. Hallerman, D. 2009. U. S. advertising spending: the new reality. eMarketer May 2009. Hiriart-Urruty, J. B., C. Lemar´echal. 2001. Fundamentals of convex analysis. Springer. Interactive Advertising Bureau. 2009. 2008 IAB Internet Advertising Revenue Report. http://www.iab. net/media/file/IAB_PwC_2008_full_year.pdf. Kumar, S., V. S. Jacob, C. Sriskandarajah. 2006. Scheduling advertisements on a web page to maximize revenue. European Journal of Operational Research 173(3) 1067–1089. Langheinrich, M., A. Nakamura, N. Abe, T. Kamba, Y. Koseki. 1999. Unintrusive customization techniques for Web advertising. Computer Networks: The International Journal of Computer and Telecommunications Networking 31(11) 1259–1272. Leisten, R. 1997. A posteriori error bounds in linear programming aggregation. Computers and Operations Research 24(1) 1–16.

36


Litvinchev, I., V. Tsurkov. 2003. Aggregation in large-scale optimization. Kluwer Academic Publishers. Litvinchev, I. S., S. Rangel. 2006. Using error bounds to compare aggregated generalized transportation models. Annals of Operations Research 146(1) 119–134. Liu, Z., N. Niclausse, C. Jalpa-Villanueva. 2001. Traffic model and performance evaluation of Web servers. Performance Evaluation 46(2-3) 77–100. Mandel, C. 2007. Eye-catching glance at the future. Globe and Mail June 28. Mehta, A., A. Saberi, U. Vazirani, V. Vazirani. 2007. Adwords and generalized online matching. Journal of the ACM 54(5). Menon, S., A. Amiri. 2004. Scheduling banner advertisements on the web. INFORMS Journal on Computing 16(1). Nakamura, A., N. Abe. 2005. Improvements to the linear programming based scheduling of web advertisements: world wide web electronic commerce, security and privacy. Electronic Commerce Research 5(1) 75–98. Roels, G., K. Fridgeirsdottir. 2008. Dynamic revenue management for online display advertising. UCLA Working Paper GR10. Available at http://repositories.cdlib.org/anderson/dotm/GR10. Rogers, D. F., R. D. Plante, R. T. Wong, J. R. Evans. 1991. Aggregation and disaggregation techniques and methodology in optimization. Operations Research 553–582. Surmanek, J. 1995. Media planning: a practical guide. McGraw-Hill. Tomlin, J. A. 2000. An entropy approach to unintrusive targeted advertising on the Web. Computer Networks 33(1-6) 767–774. Turner, J. 2010. Ad slotting & pricing: new media planning models for new media. Ph.D. thesis, Tepper School of Business, Carnegie Mellon University. Unpublished - in progress. Turner, J., A. Scheller-Wolf, S. Tayur. 2008. Scheduling of dynamic in-game advertising. Submitted to Operations Research. Vakhutinsky, I. Y., L. M. Dudkin, A. A. Ryvkin. 1979. Iterative aggregation: a new approach to the Solution of large-scale problems. Econometrica 821–841. Wright, S. E. 1994. Primal-dual aggregation and disaggregation for stochastic linear programs. Mathematics of Operations Research 893–908.

37


Zipkin, P. H. 1980a. Bounds for aggregating nodes in network problems. Mathematical Programming 19(1) 155–177. Zipkin, P. H. 1980b. Bounds for row-aggregation in linear programming. Operations Research 903–916. Zipkin, P. H. 1980c. Bounds on the effect of aggregating variables in linear programs. Operations Research 403–418. Zipkin, P. H. 1982. Aggregation and disaggregation in convex network problems. Networks 12(2). Zipkin, P. H., K. Raimer. 1983. An improved disaggregation method for transportation problems. Mathematical Programming 26(2) 238–242.

Appendix A: Derivations for Example 2 r ] = pvk ; In §3.4 we presented a specific example of ad server execution, and claimed that E[Fvk

αv (p) =

2 2bnpc+1 (p − bnpc ) + bnpc . n n n2

r E[Fvk ]=

= = = =

These derivations are included here. For clarity, we take p ≡ pvk .

bnpc dnpe (bnpc + 1 − np) + (np − bnpc) n n bnpc bnpc + 1 (bnpc + 1 − np) + (np − bnpc) n n bnpc 1 (bnpc + 1 − np + np − bnpc) + (np − bnpc) n n 1 bnpc (1) + (np − bnpc) n n bnpc bnpc +p− n n

=p

α(p) :=

r 2 E[(Fvk ) ]

2 bnpc + 1 = (bnpc + 1 − np) + (np − bnpc) n ! 2 2 bnpc bnpc bnpc 1 = (bnpc + 1 − np) + + 2 2 + 2 (np − bnpc) n n n n 2 bnpc bnpc 1 (bnpc + 1 − np + np − bnpc) + 2 2 + 2 (np − bnpc) = n n n 2 bnpc bnpc 1 = (1) + 2 2 + 2 (np − bnpc) n n n 2 2 bnpc bnpcp bnpc p bnpc = +2 −2 2 + − 2 n2 n n n n 2 bnpcp bnpc p bnpc =2 − + − 2 n n2 n n

bnpc n

2

38


= = = = = =

Furthermore, α(p) = Proof.

2bnpc + 1 bnpc2 + bnpc p− n n2 2bnpc + 1 bnpc(bnpc + 1) p− n n2 2bnpc + 1 bnpc bnpc(bnpc + 1) 2bnpc + 1 bnpc + (p − )− ( ) n n n2 n n bnpc bnpc(bnpc + 1) 2bnpc2 bnpc 2bnpc + 1 (p − )− + + 2 n n n2 n2 n 2 2 2bnpc + 1 bnpc bnpc + bnpc − 2bnpc − bnpc (p − )− n n n2 2 2bnpc + 1 bnpc bnpc (p − )+ n n n2

2 2bnpc+1 (p − bnpc ) + bnpc n n n2

is convex in p.

We show that α(p) is piecewise-linear increasing (and hence convex). Let αj (p) =

2 2j+1 (p − nj ) + nj 2 , n

j ∈ {0..n − 1}. Then α(p) = αj (p) when p ∈

in p. As well, α(p) is continuous: at the point p = ) = limp↑ j+1 αj (p) = limp↓ j+1 αj+1 (p) = α( j+1 n n

n

(j+1)2 , n2

j+1 n

j n

, j+1 . Each segment αj (p) is linear n

between segments j and j + 1, we have

since

j j2 2j + 1 p− + 2 lim αj (p) = lim n n n p↑ j+1 p↑ j+1 n n 2j + 1 j +1 j j2 = − + 2 n n n n 2 2j + 1 1 j = + 2 n n n 2j + 1 + j 2 = n2 (j + 1)2 = n2 2(j + 1) + 1 j +1 j +1 (j + 1)2 = − + n n n n2 2(j + 1) + 1 j +1 (j + 1)2 = lim p− + n n n2 p↓ j+1 n

= lim αj+1 (p). p↓ j+1 n

Finally, the slope of segment j is

2j+1 ; n

thus α(p) is increasing in p.

Appendix B: Derivations for the Mean and Variance of Xvk We require the following technical lemmas:


Lemma 1. If Y =

P n=1..N

39

Xn is the sum of a random number of i.i.d. random variables, where

the Xn ’s and N are mutually independent, then from first principles we have E[Y ] = E[N ]E[X] and Var[Y ] = E[N ]E[X 2 ] + E[X]2 (Var[N ] − E[N ]). Lemma 2. If M ∼ Poisson(L), where L is a random variable, then Var[M ] = Var[L] + E[L]. Proof. Var[M ] = Var[E[M |L]] + E[Var[M |L]]; by the law of total variance = Var[L] + E[L]; since E[M |L] = Var[M |L] = L for M ∼ Poisson(L). These quantities are used in the main derivations: r Lemma 3. E[Fvk Yvr ] = µv pvk .

Proof.

r r r r E[Fvk ] = pvk , E[Yvr ] = µv , and E[Fvk Yvr ] = E[Fvk ]E[Yvr ] since Fvk and Yvr are independent.

r Lemma 4. E[(Fvk Yvr )2 ] = (σv2 + µ2v )αv (pvk ).

Proof.

r 2 r r 2 r E[(Fvk ) ] = αv (pvk ), E[(Yvr )2 ] = σv2 + µ2v , and E[(Fvk Yvr )2 ] = E[(Fvk ) ]E[(Yvr )2 ] since Fvk

and Yvr are independent.

We now derive the mean and variance of the number of impressions served to campaign k from viewer type v. Lemma 5. The expected number of impressions served to campaign k from viewer type v is E[Xvk ] = λv µv pvk = sv pvk . Proof.

Taking the expectation of Equation (7), we get: " # X r r E[Xvk ] = E Fvk Yv r=1..Mv

= E[Mv ]E[Fvk Yvr ] by Lemma 1 = E[Mv ]µv pvk by Lemma 3 = λv µv pvk = sv pvk by iterating the expectation E[Mv ] = E[Λv ] = λv .

40


Lemma 6. The variance of the number of impressions served to campaign k from viewer type v is Var[Xvk ] = (σv2 + µ2v )λv αv (pvk ) + µ2v p2vk Var[Λv ]. Proof.

Taking the variance of Equation (7), we get: "

Var[Xvk ] = Var

# X

r Fvk Yvr

i=1..Mv r r = E[Mv ]E[(Fvk Yvr )2 ] + E[Fvk Yvr ]2 (Var[Mv ] − E[Mv ]) by Lemma 1 r r = E[Mv ]E[(Fvk Yvr )2 ] + E[Fvk Yvr ]2 (Var[Λv ] + E[Λv ] − E[Mv ]) by Lemma 2 r r = E[Λv ]E[(Fvk Yvr )2 ] + E[Fvk Yvr ]2 (Var[Λv ] + E[Λv ] − E[Λv ]) r r = λv E[(Fvk Yvr )2 ] + E[Fvk Yvr ]2 Var[Λv ]

= λv (σv2 + µ2v )αv (pvk ) + µ2v p2vk Var[Λv ] by Lemmas 3 and 4. Appendix C: Optimality Results for the Equal-Proportion Allocation Lemma 7. Let h : R → R be a (possibly nondifferentiable) function, x ∈ Rn , and s ∈ Rn with si ≥ 0 ∀i = 1..n. Consider the function f (x) =

P

i=1..n si h(xi ).

The set D(x) = {d ∈ Rn : di = si ti , ti ∈

∂h(xi ), i = 1..n} is contained in the subdifferential of f at x; i.e. D(x) ⊆ ∂f (x). Proof.

Given a point v ∈ R, by the definition of subdifferential (Hiriart-Urruty and Lemar´echal

2001) we have: h(w) − h(v) ≥ t(w − v) ∀w ∈ R, ∀t ∈ ∂h(v). Hence, by substitution and simplification:

si [h(yi ) − h(xi )] ≥ si ti (yi − xi ) ∀yi ∈ R, ∀ti ∈ ∂h(xi ), ∀i = 1..n =⇒

X i=1..n

si h(yi ) −

X i=1..n

si h(xi ) ≥

X

si ti (yi − xi )

i=1..n

=⇒ f (y) − f (x) ≥ dT (y − x), where the last inequality holds for all y ∈ Rn , and all d ∈ D(x), where D(x) = {d ∈ Rn : di = si ti , ti ∈ ∂h(xi ), i = 1..n}. Thus, all vectors d ∈ D(x) are subgradients of f at x. So D(x) ⊆ ∂f (x).

41


Lemma 8. Consider a convex optimization problem of the form: min f0 (x) s.t. fk (x) = 0 ∀k = 1..m, where x ∈ Rn , f0 : Rn → R is a convex (possibly nondifferentiable) function, and fk : Rn → R are linear functions for all k = 1..m. Then (x∗ , η ∗ ) is a primal-dual optimal solution iff the following KKT conditions hold (Florenzano et al. 2001): 0 ∈ ∂f0 (x∗ ) +

(Stationarity)

P

∗ ∗ k=1..m ηk ∇fk (x )

(Primal feasibility) fk (x∗ ) = 0

∀k = 1..m,

where ∂f0 (x) is the subdifferential of f at x. Note that we do not need to explicitly assume the Slater condition, since it is equivalent to primal feasibility for this problem class. This follows from the fact that the domain of f0 is open and fk is linear for all k = 1..m. Theorem 4. Consider the objective function f (p) =

P k∈K,v∈Vk

sv h(pvk ), where h : R → R is convex

(but possibly nondifferentiable). The equal-proportion allocation p = q is optimal for the optimization problem: (P 1) min f (p) P s.t. v∈Vk sv pvk = gk ∀k ∈ K pvk ≥ 0 Proof.

(impression goals)

∀k ∈ K, ∀v ∈ Vk (non-negativity)

We show that p = q is optimal for the following optimization problem: (P 2) min f0 (p) s.t. fk (p) = 0 ∀k ∈ K,

where f0 ≡ f , and fk (p) =

P v∈Vk

sv pvk − gk . This is exactly (P 1) with the non-negativity constraints

dropped. Since q ≥ 0, if p = q is optimal for (P 2), it is also optimal for (P 1). Throughout this proof we use the convention that p is the vector of planned proportions {pvk : k ∈ K, v ∈ Vk } with entries ordered according to the following convention: Let pk =

[pv1 ,k , pv2 ,k , . . . , pvm ,k ]T where vj ∈ Vk , j = 1..m indexes the set Vk . Then p = [p1 , p2 , . . . , p|K| ]T is a P column vector with n = k∈K |Vk | entries. We also write η = {ηk : k ∈ K }. By Lemma 8, (p∗ , η ∗ ) is primal-dual optimal for (P 2) iff the following KKT conditions hold: P (Stationarity) 0 ∈ ∂f0 (p∗ ) + k∈K ηk∗ ∇fk (p∗ ) (Primal feasibility) fk (p∗ ) = 0

∀k ∈ K

42


Define the vector sk = [sv1 , sv2 , . . . , svm ]T where vj ∈ Vk , j = 1..m indexes the set Vk . Given that the partial derivative is ( sv if v ∈ Vk ∂fk = ∂pvk 0 otherwise

,

we have ∇fk (p) = [0, . . . , 0, (sk )T , 0, . . . , 0]T . The stationarity condition is therefore: ∗ 0 ∈ ∂f0 (p∗ ) + [η1∗ (s1 )T , η2∗ (s2 )T , . . . , η|K| (s|K| )T ]T .

By Lemma 7, the following condition implies the stationarity condition: ∗ 0 ∈ D(p∗ ) + [η1∗ (s1 )T , η2∗ (s2 )T , . . . , η|K| (s|K| )T ]T ,

(17)

where D(p) = {d ∈ Rn : dvk = sv tvk , tvk ∈ ∂h(pvk ), k ∈ K, v ∈ Vk }. Since a function has at least one subgradient at each point in its domain, we can arbitrarily pick a point t = {t1 , t2 , . . . , t|K| } where tk ∈ ∂h(qk ) ∀k ∈ K. Since d = [t1 (s1 )T , t2 (s2 )T , . . . , t|K| (s|K| )T ]T ∈ D(q), condition (17) is satisfied by the primal-dual solution (p∗ , η ∗ ) = (q, −t). Finally, we verify that fk (q) = 0 ∀k ∈ K; i.e. that q is primal feasible: ! ! ! .X X X X fk (q) = sv qk − gk = qk sv − gk = gk sv sv − gk = gk − gk = 0. v∈Vk

v∈Vk

v∈Vk

v∈Vk

Thus, the KKT conditions are satisfied for p∗ = q; hence, the equal-proportion allocation is optimal. Appendix D: Quadratic Transportation Problem Duality Consider the following transportation problem with quadratic objective. Each source s ∈ S is connected to the set of sinks t ∈ Ts ; likewise, each sink t ∈ T is connected to the set of sources s ∈ St . The cost of transporting xst units from source s to sink t is cst x2st , where we assume cst > 0. Source s can supply up to as units, and sink t demands exactly bt units. The amount of flow on arc xst is limited to the upper bound of dst . This transportation problem is written: P min 21 s∈S,t∈Ts cst x2st P s.t. t∈T xst ≤ as ∀s ∈ S P s s∈St xst = bt ∀t ∈ T 0 ≤ xst ≤ dst ∀s ∈ S, t ∈ Ts

43


In standard form, the problem is: min s.t.

1 2

P s∈S,t∈Ts

cst x2st

Dual Vars

P

xst − as ≤ 0 ∀s ∈ S P bt − s∈St xst = 0 ∀t ∈ T

...us

−xst ≤ 0 ∀s ∈ S, t ∈ Ts

...wst

xst − dst ≤ 0 ∀s ∈ S, t ∈ Ts

...zst

t∈Ts

...vt

The Lagrangian is therefore: ! ! X X X X 1 X 2 bt − xst vt xst − as us + cst xst + L(x, u, v, w, z) = 2 s∈S,t∈T s∈S t∈T s∈S t∈T s s t X X + (−xst ) wst + (xst − dst ) zst s∈S,t∈Ts

X

= L0 +

s∈S,t∈Ts

Lst (xst ), where

s∈S,t∈Ts

L0 = −

X

as us +

s∈S

X

b t vt −

t∈T

X

dst zst ;

s∈S,t∈Ts

1 Lst (xst ) = cst x2st + (us − vt − wst + zst )xst . 2

Since ∂L/∂xst = ∂Lst (xst )/∂xst = cst xst + us − vt − wst + zst , the Karush-Kuhn-Tucker conditions are: (Stationarity) cst xst = −us + vt + wst − zst ∀s ∈ S, t ∈ Ts P (Primal feasibility) t∈T xst ≤ as ∀s ∈ S P s s∈St xst = bt ∀t ∈ T 0 ≤ xst ≤ dst ∀s ∈ S, t ∈ Ts (Dual feasibility) us ≥ 0 ∀s ∈ S wst ≥ 0 ∀s ∈ S, t ∈ Ts zst ≥ 0 ∀s ∈ S, t ∈ Ts P (Complimentary slackness) t∈Ts xst − as us = 0 ∀s ∈ S xst wst = 0 ∀s ∈ S, t ∈ Ts (xst − dst )zst = 0 ∀s ∈ S, t ∈ Ts We now solve for the dual objective. First, we note the Lagrangian dual function is g(u, v, w, z) := P inf x L(x, u, v, w, z) = L0 + s∈S,t∈Ts inf xst Lst (xst ). Since Lst is convex in xst , a minimum is obtained by solving the first order condition ∂Lst /∂xst = 0. Since cst > 0, the first order condition yields x∗st =

1 (−us cst

+ vt + wst − zst ). Therefore, at optimality, Lst (x∗st ) = − 2c1st (−us + vt + wst − zst )2 .

Hence, the Lagrangian dual function is: g(u, v, w, z) = −

X X X 1 X 1 (−us + vt + wst − zst )2 − as us + b t vt − dst zst . 2 s∈S,t∈T cst s∈S t∈T s∈S,t∈T s

s

44


The dual problem is therefore: max − 12

1 s∈S,t∈Ts cst

P

2

(−us + vt + wst − zst ) −

s.t. us ≥ 0 ∀s ∈ S wst ≥ 0 ∀s ∈ S, t ∈ Ts zst ≥ 0 ∀s ∈ S, t ∈ Ts

P s∈S

a s us +

P t∈T

bt vt −

P s∈S,t∈Ts

dst zst