Integrating a Media Mix Model into a digital marketing workflow

Interest in Media Mix Modeling (MMM) has been invigorated by new and more restrictive privacy norms that limit the availability of data for use in the measurement of advertising campaigns. On mobile, the need to adapt measurement to these new privacy restrictions is particularly acute given that Apple’s App Tracking Transparency (ATT) privacy policy has already been rolled out; for web advertisers, the deprecation of third-party cookies in the Chrome browser, while inevitable, will take place on a much less certain timeline.

Marketing measurement techniques, especially for mobile advertising campaigns, are often classified as either deterministic or probabilistic. I dislike this false dichotomy, for a number of reasons. The first is that, as I discuss in this podcast episode, deterministic attribution is almost certainly an overstatement given competing levels of claim to attribution across ad networks, such as with the attribution priority of so-called Self-Attributed Networks (SANs) for mobile app advertising campaigns. A second reason I dislike this strict binary classification is that the term probabilistic has been mis-used and co-opted to refer to methods, like device fingerprinting, that don’t actually utilize probabilistic methods and instead attempt to reconstitute identity through proxy.

My preferred representation for the different models of advertising measurement is a spectrum spanning from “bottoms-up” to “tops-down,” with the distinction between the two extremes being represented by the direction of their estimates:

  1. Tops-down advertising measurement attempts to assess the profitability of media spend via broad, coarse aggregated values related to advertising performance that can be decomposed into more granular recommendations for action;
  2. Bottoms-up advertising measurement attempts to assess the profitability of media spend via granular, specific measures related to advertising performance that can be aggregated into higher-level business outcomes.

Note that these inputs are not mutually exclusive concerning use in different types of measurement models: a bottoms-up model might use campaign-level event counts, like SKAdNetwork postbacks, in combination with user-level revenue attributed via available IDFAs; similarly, a tops-down model might use geo-level revenue in combination with SKAdNetwork postbacks. Viewing the inputs to different forms of marketing measurement models across a spectrum of granularity helps to prevent them from being seen as competing.

In this article, I’ll walk through some considerations for integrating one type of model from the tops-down class — a Media Mix Model (MMM) — into a digital marketing team’s operational workflow. Note that this article won’t focus on building an actual MMM; for more information on developing a MMM, I’d recommend this very helpful article and its follow-up, as well as this YouTube video from marketing data scientists at HelloFresh. Various open-source MMM tools also exist, such as the Lightweight (Bayesian) Media Mix Model built by Google data scientists (but not formally supported as a Google project) and Robyn, developed by Meta.

What is a Media Mix Model?

A Media Mix Model is an econometric framework for determining the optimal allocation of media spend across an assortment of channels (and, in some cases, geographies). MMMs use coarse inputs such as geography- and channel-specific media spend to estimate the efficiency of that spend with respect to revenue. Because MMMs use broad, aggregated data for estimation, they can incorporate varied and functionally diverse formats of advertising spend, they can accommodate organic contributions to revenue, and they are fully compliant with current and proposed privacy policy changes and privacy-related legislation. It is the last point that has activated recent interest in MMMs for digital marketing teams.

This very informative article provides an overview of how MMMs function. A thorough overview of the mechanics of an MMM is beyond the scope of this article, but one way to consider this concept is through a hypothetical. Imagine an advertiser that operates ad spend across two channels, with efficiency (in this case, defined as Return on Ad Spend, or ROAS) for each following a known, static pattern that is captured by a decreasing linear relationship between ad spend and ROAS. That is, each marginal dollar of ad spend results in linearly-decreasing ROAS. A diagram of the ROAS curves for these two channels is presented below.

If the ROAS curves for these two advertising channels are known with certainty, then optimizing the allocation of spend across them can be done through a straightforward mathematical process. In the diagram above, the two channel-level ROAS curves are defined by:

  • Channel 1: y = -2x + 300
  • Channel 2: y = -3x + 400

Optimizing this system — that is, finding the maximum profit possible through ad spend allocations across Channel 1 and Channel 2 — can be done by assigning the independent variable x1 to Channel 1 and x2 to Channel 2 and resolving the maximization function represented below:

MAX( x1 ⋅ (300 − 2x1) + x2 ⋅ (400 − 3x2) − x1 − x2 )

This can be approximated with brute force, iterative guess-and-check that simply tests different combinations of ad spend across channels for profit, where profit is defined as ( ad spend * ROAS ) - ad spend per media source. An implementation of that brute force (read: not efficient) code can be found in this GitHub repository; when run, it finds a solution at the following points:

The way to read this is: maximizing profit is achieved through $75 in spend on Channel 1 and $66 in spend on Channel 2. Note that this assumes no budget constraint for ad spend (for instance, a different allocation would be found if the maximum total budget available to be allocated was less than $75+66=$141).

Another way to resolve this maximization problem is to solve it mathematically, which can be done using the approach taught in this video. The equation resolves to a value of 299/4, or roughly $75 for Channel 1 and $133/2, or roughly $67 for Channel 2. These results can be validated on WolframAlpha.

There are a few realities of ad spend and ROAS that make this approach impractical or impossible:

  1. ROAS curves for different media sources are generally not known a priori, and they can change;
  2. ROAS curves are generally not linear and are impacted by channel saturation and lagged effects;
  3. ROAS curves are not static and they change based on market dynamics, the age of the product being advertised, consumer sentiment, etc.

Modern MMM tools seek to accommodate these factors through the use of adstock (lagged effect) and saturation adjustments (see this paper for more detail). Additionally, the most sophisticated MMM tools utilize a Bayesian process for updating model priors related to ad efficiency / impact, as well as adstock and saturation. This Bayesian approach can help to moderate wild swings in estimated impact.

Media Mix Models tend to require a long history of timeseries data related to ad spend and revenue in order to provide reliable guidance — it’s common to see two years’ worth of historical data recommended for use in running a MMM.

Implementing a Media Mix Model

Fundamentally, a media mix model is not a media buying tool — it is an advertising measurement tool that is designed to provide budget guidance over some portfolio of channels, formats, geographies, or a combination thereof. As such, a secondary set of tools and data reporting infrastructure is required in order for a Media Mix Model to be utilized by the media buying team. My belief is that a media mix model should be managed and maintained by a marketing data science team, with its output being mostly abstracted and transformed before the media buying team interfaces with it.

One of the primary challenges in implementing a media mix model is creating the data pipeline that will feed it. The data used to power a MMM must be aggregated into a timeseries and dimensionalized in ways that many digital marketing organizations might not naturally use to filter it: for instance, with broad geographic regions, or by format (eg. in-app rewarded inventory vs. social media inventory, and not at the level of the individual channel), and calculated on a weekly instead of daily basis. Below is an abstracted description of the underlying model in a Media Mix Model from the Lightweight MMM documentation.

This data infrastructure work is actually a process: the team implementing the MMM must determine what the right level of granularity for these features is to achieve output accuracy while avoiding overfitting. This is an iterative testing process that could very well occupy a data science team for weeks. Critically, part of this process is determining which independent and dependent variables should be used in the model in the first place. Media Mix Models commonly use revenue or purchases as a dependent variable, but the independent variables used can include media spend or purchases impressions, as well as dummy variables like seasonality and promotions or special offers. And again, these must be tested for predictive power in the model.

Integrating a Media Mix Model into an Operational Workflow

Circling back to the initial point about established ROAS curves: if a media buying team could understand, with credibility, how additional ad spend on any given channel would impact marginal ROAS, then optimizing across channels would be an uncomplicated process. A Media Mix Model attempts to fill that knowledge gap by looking at variations in inputs (ad spend, impressions purchased, etc.) and outputs (revenue) to determine the efficiency of ad spend at various levels of budget, across media sources and potentially geographies. Modern Media Mix Model tools also endeavor to control for other factors like saturation, the lagged effects of ad spend on revenue, seasonality, promotions, etc.

All of this is an exercise in data science that is far removed from the purview of a media buying team. Media buying teams are tasked with tactical responsibilities: managing channel-level optimizations on a real-time basis, providing performance feedback and inputs to the creative production process, and general performance reporting, among others. Media buying teams are generally most productive when given clear and specific prescriptions related to ad spend, like explicit bid and budget guidance at the campaign level. A Media Mix Model is not designed to provide this, and so incorporating a Media Mix Model into the operational workflow requires the development of a reconciliation process to the media buying team’s work.

In designing this process, it’s instructive to define the problem that Media Mix Models are now being excitedly adopted to solve in the new privacy environment. With the events steam of on-site data broken between advertisers and ad platforms by privacy policies like ATT, rendering user-level attribution impossible in some cases, advertisers lack a deterministic (“bottoms-up”) means of attributing and accounting for efficiency of advertising spend. A Media Mix Model is a probabilistic (“tops-down”) tool that strives to fill that gap using statistical methods that estimate revenue impact at the level of the media source.

Filling that gap to deliver efficiency estimates for each media source is helpful from a budgeting perspective but not an optimization perspective. This is because, for optimization, a media buying team needs guidance that is:

  • Granular to the degree supported by the media source (eg. creative-level);
  • Timely, or even real-time;
  • Specific for the optimization options available to the media buying team (eg. cost-per-install bid pricing).

None of these characteristics apply to Media Mix Model output, but none of these characteristics is really feasible for marketing measurement in the new privacy landscape, anyway. And so the marketing process must adapt to accommodate a longer feedback cycle between campaign optimizations being made (eg. changing a bid, or adding a creative to a campaign) and the impact of those changes being known. Realistically, this means splitting the optimization process into three distinct stages:

  1. Establishing a budget allocation across media channels from model calculations;
  2. Optimizing media spend against all available performance data, such as on-site metrics like cost-per-click / click-through-rate, against the target budget allocation;
  3. Updating the model using new performance data, by which channel-level efficiency estimates can be influenced over time by the optimization changes made in step two.

This three-part process surfaces an important point about the integration of Media Mix Modeling into a marketing workflow: the marketing organization must include a data science function to own the model, own its update cadence, and own its production of budget guidance (as well as the addition of new media sources over time). Again, the Media Mix Model is an input to the larger process that must be adapted for the new privacy environment. That process requires data science capabilities to manage not just the implementaiton of a Media Mix Model but its ongoing role in guiding performance.

Photo by Antoine Dautry on Unsplash