Measuring advertising incrementality using Ghost Ads

Ghost ads measurement allows advertisers to understand whether their advertising spend is producing incremental revenue.

Incrementality is the measurement of revenue that can be reasonably attributed to advertising spend; it has existed as a critical appraisal of advertising effectiveness for decades but has developed added consequence in the age of digital advertising. The concept of incrementality is fairly simple: contrasted against a baseline of activity that would be expected anyway, how much additional benefit does advertising yield?

To illustrate the concept, imagine some scenario where a company reliably generates a measurably consistent amount of revenue every period. After two periods of observing this stable stream of revenues, the company commences an advertising campaign and experiences a substantial increase in their revenue. The measured revenue stream that existed before the campaign would be considered the company’s revenue baseline, and the increase above that revenue following the start of the advertising campaign could be considered incremental.

Of course, such a clean and obvious example of direct revenue attribution to advertising is hardly commonplace, although many CMOs wish this type of campaign performance was prevalent. In reality, incrementality of advertising spend can be difficult to achieve. In their paper on the subject of digital advertising incrementality, Blake et al describe an experiment that eBay ran in 2012 related to their paid search advertising activities. The company suspected that it was buying click traffic on search engines that would have ended up on its site anyway; to test that hypothesis, they paused those search ad campaigns and found that their organic inbound traffic increased by almost the exact same amount that their click traffic decreased. In other words, the natural search results for the terms they were bidding on were an almost perfect substitute for the sponsored results they were paying for, and all of the money they had been spending on those campaigns had essentially been wasted.

Certainly a situation like eBay’s is unfortunate, but achieving true incrementality in advertising is challenging. While most people understand and appreciate the notion of incrementality, for both political and logistical reasons, it’s not always easy or straightforward to accomplish perfectly additive revenue generation from advertising spend. As with most tasks in advertising, producing incrementality is a process of continuous, iterative experimentation and measurement in order to ensure that campaigns produce new revenue. But what tools are available to test for that?

Measuring Incrementality

The two most common experiment designs used in measuring incrementality are the use of Public Service Announcements (PSAs) as control group ads and the “intent-to-treat” assignment approach; both of these experiment designs are described in a paper by Johnson, Lewis, and Nubbermeyer called Ghost Ads: Improving the Economics of Measuring Ad Effectiveness. These experiments are essentially A/B tests meant to observe a difference in behavior between groups, but the distinction between the two are related to control group and test group construction.

With the PSA design, an advertiser will split their target audience into two groups of equal size — the test and control groups — and serve normal ads to the test group and PSA ads (eg. for the SPCA or Red Cross) to the control. After some time, the advertiser will measure the difference in activity between users in the test and control groups and make a determination about the effectiveness of their advertising (eg. if the test group spent 50% more money than the control group in the timeline of the experiment, advertising “lift” could be described as 50%).

One problem with the PSA approach is that most advertising platforms algorithmically optimize campaign targeting based on user feedback (eg. clicks), which distorts the group definitions and makes it impossible to compare behaviors across groups. Another problem with the PSA approach is that, while it is statistically valid and intuitive, it can be expensive: it involves serving ads with zero benefit to half of an experiment group.

To avoid this expenditure, the “intent-to-treat” (ITT) design involves simply dividing an advertising target population into two groups and only serving ads to one. If it is possible to know, at the user level, the construction of both groups (eg. advertising IDs), then the behaviors across both groups can be compared without needing to serve irrelevant ads to the control group to establish a counterfactual. The ITT paradigm rose to prominence in medical experimentation; since measuring the differences between the groups who completed a study introduced biases into the experiment, the groups were defined from the start and measured over the course of the study whether they completed it or not (“once randomized, always analyzed”).

The problem with this approach is that some users in the test group will not actually see ads; in establishing the test groups before starting the experiment, the possibility arises that some of the people in the group meant to see ads will not come online in the experiment timeline and thus will not actually see ads (but might still perform whatever activity, such as making a purchase, that is being measured). The extra variance created by the subset of the test group that doesn’t see ads means the population being considered needs to be very large in order to produce enough statistical power to make the experiment results reliable.

Enter Ghost Ads

In the aforementioned paper, the authors propose a measurement methodology that they have named “Ghost Ads” as an improvement over the PSA approach. With ghost ads, a control group is served ads normally, but the ad platform keeps track of instances when an experimental ad (like a PSA ad) would have won the auction for an impression and stores that information in a log file. These log entries — for impressions when these ghost ads would have been served — function as the counterfactual instances for the control group but also don’t incur any costs (since an impression wasn’t actually filled) and are only measured at the exposure level (that is, group sizes can be kept consistent between the treatment and control).

In considering the current state of the advertising landscape with respect to event- and value-based bidding strategies, the authors compare the efficacy (ignoring cost) of PSAs and ghost ads under three different optimization schemes:

  1. Delivery optimization. The advertiser only cares about delivery and number of impressions served; any down-funnel information (such as clicks, conversions, etc.) is irrelevant. In this scheme, the performance of the PSA approach and ghost ads is equivalent;
  2. Event optimization. The advertiser wants impressions filled in accordance with some sort of event optimization (eg. optimize impression serving on the basis of clicks). In this scheme, ghost ads outperform the PSA approach because the difference in ad content between PSAs and commercial ads causes a divergence in targeting;
  3. Value optimization. The advertiser wants impressions filled on the basis of measured downstream value (eg. optimize impression serving based on purchases). In this scheme, ghost ads outperform the PSA approach because the PSAs by definition can’t produce measurable conversion information.

While the authors note that an advertiser’s ability to serve ghost ads is hindered on many platforms since none were specifically designed around that functionality, Ghost Ads measurement is possible in some RTB environments. Regardless, this seems to be the general direction that most platforms (especially the duopoly) are moving in: Facebook appears very interested in providing for advertising lift measurement (after all, the better the measured lift for an advertiser, the more likely they are to spend more money on Facebook’s platform), and Google specifically has invested resources into developing the Ghost Ads concept. As more advertisers become attuned to the important of incrementality measurement in the broader scheme of budget optimization, it seems likely that the platforms will start to offer measurement suites that provide for things like Ghost Ads testing natively.

Photo by Andrea Boldizsar on Unsplash