Calculating Lifetime Customer Value (LCV), Part 2 of 2: The data

Part One

The first post in this series discussed Lifetime Customer Value from a conceptual standpoint. This post is more practical: I’ll outline an approach for collecting and aggregating the data needed to calculate LCV.

Collecting the data you’ll need to track LCV as a dashboard-level metric is fairly simple, but some transformations and architectural considerations are required before that data will be useful in analysis. Customer spend isn’t destiny; no one downloads your app or signs up for your service with a mandate to spend money. User characteristics have an impact on LCV and must be accounted for: acquisition channel, gender, age, location, use habits etc. are all dimensions over which LCV should be calculatable. Remember: LCV drives the marketing budget, not the other way around. Your universal LCV might be 3€, but that doesn’t necessarily justify a 3€ CPA on every network.

At the very least, I would recommend architecting your user data structure in a way that allows LCV to be aggregated around the following:

  • Acquisition channel hierarchy (eg Paid Advertising -> Google Adwords -> Some Campaign Name)
  • Geography hierarchy (Continent -> Country Code -> City)
  • Personal characteristics (gender, age, and whatever else is available from the platform you’re working on)

These three dimensions will give you a lot of flexibility in segmenting LCV values, and they’re all actionable — that is, you can launch targeted marketing campaigns to groups defined by these characteristics.

The two key components of LCV are obviously lifetime and value. As was discussed in the first post, lifetime (or, as I term it, “duration”) is calculated from retention rates. I track retention by storing a record for each user on each day they engage with the app. Each record contains columns for retention benchmarks; I use D1 – D7, D14, D30, and D365 (I also keep a boolean column to designate if this is the user’s first day). Each day, a process updates all relevant records for every user and sets the retention benchmark values to true if that day is DX. For example, if a user logs in seven days after registering, I set the D7 column in his first record to 1.

This approach creates extraneous data, but I like it for a two reasons. The first is that calculating retention rates with these aggregates is incredibly simple and versatile: it merely requires a sum of the retention columns (for percentages, this sum is divided by the count of records for the day being examined). The second is that it tracks not only new user retention, but retention relative to any date: retention rates can be tracked after the launch of a new feature, a major content push, etc.

The second component of the LCV metric is value, or revenue. This is simpler: I track the revenue earned per user in their daily records. I explained in part one that I usually take a trailing daily ARPU value over some period of time. The time period you choose will invariably be influenced by externalities, and you’ll need to transform your data to account for these (and week-to-week cyclicality). This is a good primer on transforming time series data; the first answer to this question provides a good guideline for when (and how) to perform a log transformation on time series data. The bottom line: test your data for heteroscedasticity and expect to have to transform it.

Once you have duration and daily ARPU, you can calculate expected Lifetime Customer Value. And if you have oriented your data around the hierarchies mentioned above, you can calculate LCV with a granularity that should help you optimize your marketing spend.