I often hear of freemium app players being bucketed into three groups: the non-payers (95-97% of the userbase), the payers who spend a little (2-4%), and the "whales" (usually less than 1%). But these clusters are hard to actually identify in most apps; for one, they ignore the temporality of these definitions: most whales were at one point payers who spent a little. Second, I simply don't believe these clusters form this way naturally, and if they do, it's more the result of poor freemium monetization strategy than the visceral behavior of the userbase. The purpose of freemium is to bring as many users into an app as possible and leverage the scale of the userbase into a stratified monetization curve that spans — albeit with a dramatic decrease from non-spenders to spenders — over a very large set of possible lifetime values.
Consider the freemium app model manifested as a market in the center of town. When the stalls in that market only sell items that cost either 1€ and 100€, wouldn't forcing contrived "doesn't buy", "buys a little", and "buys a lot" labels onto the patrons be deterministic? Surely everyone that goes to the market won't spend €100, and the majority will spend nothing, but by not offering a varied and price-diverse catalogue of items to buy, the market's organizers are engineering these artificial patron tiers. Freemium app developers do the same thing by ignoring one of the basic tenets of the freemium model: users who enjoy buying items must be given the opportunity to do so to whatever extent they — the users, not the developers — deem appropriate.
The vast majority of freemium app users will never buy items in-game — usually, these people account for about 95-97% of the total user base. But LTVs of users who potentially will pay should fall across a broad spectrum of total values, facilitated by an exhaustive product catalogue in which each item communicates a clear value proposition. Designing a monetization loop that achieves this is a topic for another post; here, I'll discuss how the monetization curve should be measured and acted upon.
Before hard data is available, the freemium monetization curve (or curve of potential LCV values) can be approximated by the probability density function of the pareto distribution (this site provides examples of 22 distributions in Excel): the Y-intercept is 97%, meaning 97% of users have an LCV of 0. The curve progresses from there through the range of potential LCV values until it reaches nearly 0 at the highest possible LCV value (the maximum amount a user could theoretically spend in game). Conceptually, the sum of the area of the curve (given LCV values on the X-axis and "probabilities" or fractions of the user base on the Y-axis) represents the total amount of money that will be spent in-game by whatever users are being tracked.
Measuring the fraction values on the Y-axis is simple, but it must be done ex-post (after churn-out) in order to be accurate. But once benchmark values have been established, this curve can be estimated by cohort based on behavioral projections mapped to the final LCV values of previous users (i.e. users that purchase a specific item in their first session may ultimate spend similarly).
If the freemium monetization curve wasn't approximated as a continuous distribution — i.e. it assumed a discrete distribution between "users that spend a little" and "whales" -- it not only wouldn't be useful in predicting total revenue, but it also wouldn't provide any feedback about the depth of the product catalogue at the highest levels. "Renewables" and premium access purchaseables get stale; to monetize at the highest levels and increase the tail of the monetization curve, a freemium app must offer products that satisfy the needs of the power users at extreme ends of the spending spectrum.