Back-testing LTV models


A common approach to measuring the accuracy of a LTV model is to back-test the model against historical data: behavioral user data is fed into the model for a cohort that has reached sufficient age (eg. 180 days, for a 180-day LTV model) and the actual LTV of the cohort is compared to the model output. This approach is conceptually consistent with a component-centric perspective on performance user acquisition; that any cohort’s LTV is a function of various characteristics of that cohort, and that an LTV model can take those characteristics as inputs and predict lifetime revenues from them.

What this perspective misses is the effect of the environment in which the product exists on any cohort’s LTV; that is to say, that environmental (eg. market) forces, existing independently of the marketing campaign, can have an effect on a cohort’s LTV.

One undeniable example of this risk in the “component” methodology (versus the “environment”methodology) is the obvious increase in competitiveness for mobile marketing inventory over the past 12-24 months. Any model that was trained on data so old would miss the changes that have taken place in the market during that time. Since LTV projections are made over such long time horizons — eg. 12, 18, 24 months — it’s difficult to capably audit a model on historical data. The mobile landscape simply changes too quickly to use years old data to inform user acquisition.

In this presentation, I outline two methods for modeling LTV in a spreadsheet. In both methods, the prediction is updated as more data becomes available (ie. the cohort grows older); in this sense, the model is “back-tested” in real time. From a practical perspective, updating UA campaign spending (and the LTV model that was used to run those campaigns) via a continuous feedback loop from recently acquired cohorts is probably a better approach (in terms of accuracy and money at risk) than setting bids via a back-tested model, for two reasons.

The first is that making incremental, step-wise changes based on real-time feedback establishes a flexible workflow that benefits other parts of the marketing organization and is more amenable to exploration and experimentation than a more rigid, deterministic approach. It also allows for decisions around LTV to be made more quickly, earlier in the product’s lifecycle (when little or no historical data exists).

And the second is that back-testing fails to take into account the effect of the cohort itself on its own LTV. Ideally, especially early on in a product’s life, user acquisition budgets should increase consistently on an absolute basis as the product reaches more users and makes more money. Larger and larger acquired cohorts could actually increase the product’s unit economics as the product reaches critical mass (eg. a multiplayer game or chat app). On the other hand, a product’s unit economics could decrease as it reaches saturation (eg. the most relevant users have already been acquired).

It’s hard to use the state of the product relative to the market as an input in a model; this needs to be discovered as it happens. Generally speaking, flexibility and the capacity to adapt to changes quickly are important in rapidly changing platforms (mobile), and an LTV model (and the campaign management workflow that it supports) should reflect that.