The automated creatives testing framework

August 17, 2015

A popularly accepted notion within mobile advertising is that creatives testing – that is, evaluating the performance of ad creatives against each other for the purposes of click and click-to-install optimization – helps advertisers avoid wasting ad impressions. This optimization is usually achieved through a continuous process of A/B testing: new creatives are generated, the variants are A/B tested, and the best performing creative is deployed universally.

While the problems inherent in A/B testing are fairly well known (selection bias, lack of sufficient sample size, premature declaration of the winning variant, etc.), a few more fundamental problems with this kind of process are rarely discussed:

– A/B testing is very time intensive. Setting up and evaluating the results of A/B tests can easily take a person’s dedicated focus for up to one work day’s worth of time;

– The opportunity cost of showing the “wrong” ad for the duration of a test can be deceptively high. This cost is usually justified because the results of an A/B test are considered to be valid in perpetuity, meaning the relative cost versus the benefits of ultimately having chosen the best performing creative are diminished over time;

– But creative saturation can change the performance of one variant with respect to another over time. So either new creatives are constantly generated to ameliorate this decline (invalidating the perceived benefits from the aforementioned point; new creatives need to be generated, anyway, so the opportunity cost of showing the “wrong” creative is actually a constant), or the “right” creative is displayed for too long and ultimately becomes the “wrong” creative (because the people that find the creative the most appealing have already seen and possibly responded to it, and because the mobile market is constantly changing and stagnant creatives can’t possibly adapt to that).

In other words, while most marketing teams recognize that the winning variant of an A/B test won’t necessarily maintain an acceptable level of performance in perpetuity, the structure of this process (continuous creative generation and A/B test validation) generally forces marketers to ignore the fact that past variants might perform better than the current winning variant at some point in the future, and very well may perform better than future winners of A/B tests because of the dynamism of the mobile ecosystem and because it’s easy to fall victim to testing myopia (that is, considering the results of a test independently of past tests).

An automated creative testing framework sidesteps many of these pitfalls. Testing automation is commonly implemented with a Bayesian mechanism, such as a Bayesian Bandits algorithm: variants are pooled together into a bank and selected for display based on past performance, with prior assumptions about performance being updated with each test (display) and a certain percentage of impressions being reserved for selecting variants at random (so that underperforming creatives can be re-evaluated over time).

An automated framework has three principal advantages over the A/B testing process. The first is that it’s far less manually intensive (once the infrastructure has been developed) to run tests: each ad display is itself a test, and test results are updated in real time by the algorithm, so there is no need to deliver an “analysis” of results (such as with some sort of A/B test result spreadsheet) after an arbitrary amount of time has passed.

The second is that past variants remain in the creative bank and can potentially resurface as creative fatigue causes the performance of erstwhile “winners” to decline. In practice, this means that not only are no creatives in this automated process “wasted” (they all go into and remain in the creative bank), but that new creatives can be added to the bank as time allows and not on some subjective schedule (in the short term, the algorithm should ensure that performance doesn’t radically decline, especially given a large enough bank).

The third benefit is that the algorithm can generally be tuned more granularly than an A/B testing process allows because it’s easier to evaluate performance across dimensions with an automated framework than it is with a manual process. A/B testing may optimize performance at the portfolio level – eg. Variant A works better than Variant B at the level of the entire portfolio – but an automated process can more easily allow creatives to be chosen based on, for instance, the source app (does Variant A perform better than Variant B when shown in App 1, as opposed to App 2?) or the country of the user viewing the impression (does Variant A perform better than Variant B when shown to a user from Country 1, and opposed to Country 2?). Of course, as more dimensions intersect, the more likely it is for the framework to succumb to the curse of dimensionality (ie. low sample sizes in highly specific dimension sets produce results that are not robustly dependable).

Most mobile ad networks already employ some version of automated testing and optimization, but as the mobile ecosystem continues to trend toward clusters of portfolios, automated creative testing will become increasingly important to developers for the purposes of cross promotion (with many large developers operating what are effectively internal ad networks). Short of employing an army of marketing analysts that do nothing but aggressively run A/B tests on ad creatives (and maintain oversight over past tests), there is no means of achieving a consistently optimized pipeline creatives other than investing in marketing automation.

Comments: