How to model growth programmatically using product retention curves.
The Python Code used in this article can be found on GitHub
Product growth is often described as something of an inscrutable sorcery given the blend of operative efforts and domain expertise that comprise it: analytics, marketing, creative asset production, etc. Many growth teams approach product growth from a top-down, macro perspective: if certain efforts are indexed against topline growth, then initiatives like A/B tests, channel composition balancing, ad creative experiments, and others can provide the veneer of methodology required to convince the business that growth is actively being managed and is under control.
But growth really isn’t being actively, systematically managed if the team is not focused on its atomic units, which are retention and cohort compounding. Products can grow without a growth team focused on these things — and many of the most exciting consumer products at any given point in time are — but that growth is capricious, unpredictable, and haphazard. Virality, luck, and favorable market conditions aren’t business strategies: in order to grow systematically, a growth team needs to understand the fundamental structure of its product’s user base. In my own work, I often talk about “architecting” growth, because I think what’s more important than seeing a product grow is having planned and carefully managed that growth.
The building block of growth is the retention profile: the rate at which a cohort decays over time. Looking at a single retention profile in isolation, it’s easy to understand how a user base made up of many cohorts can grow over time: users that churn out of a product are replaced with an even larger number of new users. Some teams calibrate growth with the term “replacement DAU”: if the number of users that churn in any given day are replaced with the equivalent number of new users, then the product’s user base stays the same size. In visualizing an arbitrary retention profile, it’s fairly straightforward to understand what percentage of cohort users need to be replaced at certain points along the lifecycle of that single cohort to maintain DAU :
Using absolute numbers instead of percentages, imagine a cohort of 5,482 users that is onboarded to the product and propagates through it with the retention profile above. At any given point in time, the absolute number of users that would need to be onboarded to replace churned users can be calculated by using the retention profile:
If 5,482 is a meaningful DAU level for the product, then at any of those points annotated on the diagram above, some number of users can be acquired to bring the product’s DAU back to 5,482. But of course, that new cohort would also decay over time, and the next day, the product would have less than 5,482 users again.
Imagine that the 5,482 users acquired were part of a campaign that produced five days’ worth of new cohorts, all of which experienced the same retention profile referenced above:
Each of these cohorts will decay over time, and the product’s DAU on any day would be the sum of the users in those cohorts that remain according to the compressing gravity of the underlying retention profile. This notion is best visualized with a 3D bar chart that shows how each of these cohorts diminishes over time (the numbers in white are the original cohort sizes):
These cohort users compound over time, which can be visualized with a stacked bar chart: cohorts get added to the user base and decline over time at the rate dictated with the retention profile, but the user base grows because new cohorts are being added each day over the course of this five-day campaign:
Planning for growth
The stacked bar chart above could exist in a dashboard for most consumer tech companies and elicit congratulatory triumph each morning: up-and-to-the-right is a nice state to find a product in.
But this product is growing because new users are being added to it each day to replace those that have churned. What happens when that stops?
That can be calculated by using each cohort’s retention profile (which, in this case, happens to be the same) to project forward the users that will remain in each cohort at some day in the future, and then summing those remaining cohort users up into total DAU. In the Python file linked at the top of this article, this “forward DAU” calculation is done in a function called build_forward_DAU:
def build_forward_DAU( cohorts, map_length ): map_length += len( cohorts ) start_date = min( cohorts[ 'date' ] ) today = start_date + timedelta( days = ( map_length ) ) #map_length should include original cohort days, so add in the length of the cohorts dates = pd.date_range( start_date, periods = map_length ).tolist() dates = [ str( d.date() ) for d in dates ] forward_DAU = pd.DataFrame( columns = [ 'cohort_date' ] + dates ) for index, value in cohorts.iterrows(): this_date = value[ 'date' ] this_cohort = pd.DataFrame( columns = [ 'cohort_date' ] + dates ) this_cohort.loc[ 0, 'cohort_date' ] = this_date i = 0 while this_date < today: this_cohort.loc[ 0, str( this_date ) ] = int( value[ 'cohort_size' ] * np.exp( value[ 'retention_profile' ][ 1 ] * i ) ) this_date = this_date + timedelta( days = 1 ) i += 1 forward_DAU = forward_DAU.append( this_cohort ) forward_DAU = forward_DAU.fillna( 0 ) return ( forward_DAU, dates )
This function takes some map length, or amount of time to project the existing cohorts forward, and multiplies the projected retention percentage at each date by the original size of each cohort, producing the absolute Day X (DX) retention value per cohort. When these original five cohorts are projected forward for 15 days, with no new cohorts being added to the product, DAU evolves in the following way:
All of the sudden, the team isn’t celebrating anymore: the product is shrinking. Of course, that makes sense, since no new users are being added to the product and the corrosive effect of churn is inescapable.
In order for this product to grow again, new users need to be onboarded. And just with the isolated cohort visualization from above, if the product has a target DAU, then the distance between that DAU target and the actual DAU value on any given day in this forward projection is easily calculated, and a new cohort of that size can be acquired. But then that new cohort will diminish over time, and another DAU gap will have to be calculated, and so on ad infinitum.
These DAU gaps are the holes that growth fills. Products can grow without these gaps being understood or conquered, but systematic growth seeks to understand them and accommodate them in pursuit of commercial ambitions. It’s rare for a growth team to operate without fielding demands from other teams: like the finance team, which needs to understand the P&L impact of product growth (marketing expenses, revenue changes); or the product team, which wants to know how many users will be using its creation; or the executive team, which wants to understand the trajectory that the business is on.
Growing towards a goal
How does the growth team use the above framework to achieve specific business goals? It can be reactive each day and scramble to acquire the number of users necessary to reach some DAU threshold, but that’s inefficient (the shorter the acquisition timeline, the more expensive the unit cost of acquisition).
Instead, the growth team can use the forward DAU concept to project out some timeline and fill the DAU gaps as needed against some future DAU goal. Extending the example from above: at the end of the five day campaign (ie. on the fifth day of acquisition), the standing DAU of the product is 24,863. Imagine that the product must reach 50,000 DAU after a period of 15 days (that is, 15 days after the five-day campaign has ended). This gap can be visualized as such:
But this isn’t really helpful. If no new users are added after the five-day campaign ends, as in the projection visualized earlier, the standing DAU will be 22,051 after 15 days: if the growth team needs to fill that gap on Day 15 to hit 50,000, the cost will be almost certainly be greater than if they try to incrementally grow the user base over each day of the 15-day period.
The extent to which this has to be done is easy enough to quantify with a simple linear regression from the end of the five-day campaign (at 24,863 DAU) to the 50,000 DAU needed at the end of the 15-day period:
This simple linear regression model dictates the DAU level needed at each day in order to build toward the 50,000 DAU goal after 15 days. But how does the growth team calculate the size of the DAU gap on each day given the projected deterioration of existing cohorts to know exactly how many users must be acquired daily to hit the above targets?
The team can accomplish this through a simple process:
- For a given day in the above projection, using the retention profiles of existing cohorts, calculate how many users from those cohorts will be present on that day;
- Subtract the total number of present existing users from the DAU target to get the DAU gap;
- Acquire a cohort of the size of the DAU gap;
- Progress to the next day and run through the above steps again.
This process can be implemented recursively, calling itself over and over until the last day of the projection is reached. In the code, this function is called build_DAU_projection_map:
def build_DAU_projection_map( cohorts, retention_profiles, forward_DAU, DAU_values ): this_DAU_value = DAU_values[ 0 ] this_date_value = datetime.strptime( forward_DAU.columns.tolist()[ -1 ] , '%Y-%m-%d' ) + timedelta( days = 1 ) #advance the cohorts forward by one day to see what the natural DAU #from existing cohorts would be without any additions forward_DAU, forward_DAU_dates = build_forward_DAU( cohorts, 1 ) natural_DAU = forward_DAU.iloc[ :, -1 ].sum() #calculate replacement DAU needed to hit the DAU goal replacement_DAU = this_DAU_value - natural_DAU #add this new cohort on this day IF the replacement DAU is positive cohorts = add_cohort( cohorts, this_date_value, ( 0 if replacement_DAU < 0 else replacement_DAU ), retention_profiles[ 0 ] ) #advance the cohorts, including the new cohort, forward by one day forward_DAU, forward_DAU_dates = build_forward_DAU( cohorts, 0 ) #if this was the last DAU target to hit, return the values if len( DAU_values ) == 1: return cohorts #if there are more DAU targets left to hit, remove this target and run the process again recursively return build_DAU_projection_map( cohorts, retention_profiles, forward_DAU, DAU_values[ 1: ] )
This function is called and iterates through the projection period by calling itself until the end of the period is reached. When it has, the function returns a cohort list that includes the necessary number of DAU additions each day to hit the daily and final DAU targets. When this is run, it produces the following graph for the 50,000 DAU target:
The topmost layer of each daily stacked bar represents the users that are acquired that day to meet the daily target; the layers below it are previously-onboarded cohorts that have deteriorated since acquisition.
The process is the same in the case where the product doesn’t need to grow but merely maintain DAU: the projection period is iterated over and the DAU gaps are calculated and filled. If the ending DAU level of 24,863 was intended to be maintained, the DAU targets would look as follows:
And the DAU projection would like this:
The daily acquisition levels are lower since they are merely replacing churned DAU, not contributing to an increased topline DAU level.
These levels are even lower when the product is intended to shrink, for instance to a level of 22,500 DAU from the ending 24,863:
The DAU projection in this case produces a very thin layer of new acquisition on top of the existing cohorts, producing something visually reminiscent to a controlled building demolition:
Modeling a future
Building frameworks like this is important because consumer tech products are fundamentally valuable as a function of their growth, and so growth strategy is business strategy. Virality isn’t a business strategy; virality is fickle, and it ends.
Managed and controlled growth is cheaper and easier to implement than ad hoc and disorganized growth spurts. A growth team should be able to articulate what its goals are and, more importantly, how it plans to achieve them, which necessitates some model of the world that allows forward projections to be made.
Models are by definition wrong, but they’re essential for making credible predictions. Marketing teams can’t model the future but, using historical behavioral norms, they can model a future. Systematic growth isn’t deterministic — a user base can’t be willed into existence — but it’s the indispensable, humdrum work that allows businesses to plan and evolve.