The problem with analyzing payer data in freemium

Posted on February 15, 2016 by Eric

Given the inherently low percentage of users in a freemium product that convert into payers, as well as the skew of the distribution of lifetime values that freemium developers hope to achieve in their userbases, it might be tempting for a freemium developer to focus most of their product analysis on monetization.

But there are serious problems with putting too much emphasis on optimizing for monetization in freemium products, especially with analytical exercises that attempt to segment paying users. Beyond the practical issues with this type of analysis -- some of which are addressed below -- a development mentality that myopically attunes to monetization could potentially neglect the priority of scale in freemium and the dynamic between monetization and retention. Especially for mobile apps, because of platform charts and the nature of discoverability on mobile, optimizing for monetization may be sub-optimal if it produces bottlenecks in user activity that lead to increases in churn (and thus decreases in virality) that limit the overall size of the user base.

But more concretely, the reason attempting to draw actionable cues from the data generated by paying users is problematic is that there simply aren't a lot of them in a freemium product. Assuming a freemium product with 1MM DAUs has a conversion rate (to paying users) of 5% -- which is high -- and that the presence of users that have paid at least once are distributed somewhat evenly within the product from day-to-day, on any given day only 50,000 revenue-contributing users are present in the user base.

50,000 users may seem like a lot, but unless the product is specific to some demographic or platform (eg. it is available only on iOS in the US), the curse of dimensionality can quickly whittle such a group down into smaller sub-groups that aren't statistically viable for analysis.

And dimensionality is exactly what makes a data set around player activity valuable for the purposes of marketing: what use is, for instance, an LTV profile if it can't be used to generate more users with that profile (in other words, to set targeting parameters for marketing campaigns)? In-product monetization data is interesting, of course -- for instance, whether payers that pay in their first session ultimately end up spending more than payers that don't -- but unless it can be broken down by the dimensions that ad campaigns can be targeted with, such as device type, location, acquisition source, etc., it's not really anything more than trivia, at least for the purposes of acquiring more users of a certain profile (there's obviously a strong case for using monetization data for product optimizations, but the focus of this article is marketing).

This is what makes retention data such a powerful point of analysis: retention profiles can be built for every DAU a product has. And retention, of course, is a pre-requisite for monetization, anyway: users can't pay if they're not using the product.

Conventional wisdom generally holds that soft launches for freemium apps should be evaluated on the basis of retention (and not monetization) precisely because of this fact. In a soft launch, data is scarce and the product's economy is considered in flux. Optimizing the product around what makes users stick around (to potentially pay later) makes sense in this context because, by definition, the product is being changed frequently and evaluated. But this shouldn't really change after hard launch; a freemium product ("product-as-a-service") should constantly be evaluated and improved upon, and doing so through the lens of monetization attaches an unnecessary immediacy to a process that should be viewed, at least in the abstract, as interminable.

  • aaron

    i've personally experienced the pain in doing this type of exercise. and at the end coming to realize that it's futile to try to make any sort of sense off the tiny number of conversions. trying to convince the boss that this is not a good way to spend time is another exercise in itself.

  • ESeufert

    I think you may have missed the point of the post. 50k is "small" because it then needs to be broken down by the various dimensions that are targetable for the purposes of marketing. The point of the post isn't the relative size of the paying player bus, but rather that the only way to make payer profiles actionable is to be able to target them with marketing. But even then, yes, breaking 50k users down into targetable groups (based on combinations of dimensions) has, in my experience, produced a large degree of variance within the groups in terms of overall LTVs and general demographic trends. Also, I think you might need to revisit your definition of the term "noise" in the freemium context, given that very few (if any) variables are distributed normally.