A/B testing can kill product growth

A/B testing can lead to suboptimal outcomes when marketing inputs are not considered.

In hearing from most consumer product growth teams — that is, teams tasked with growing a product’s user base that don’t have a specific remit to run advertising campaigns — it’s fairly common to learn of some considerable number of A/B tests that were undertaken in service to the group’s keystone strategy of mass experimentation. In a conference presentation or post-mortem Medium post, this might take the form of an admission that the team ran 10,000 (or some other suitably large number) of A/B tests until they found the one thing that drove engagement or monetization or social sharing.

A/B testing is a powerful tool, but it’s also a blunt tool that can cause more harm than good to a product when used haphazardly or senselessly, especially on the early funnel portion of a product. Much has been written about A/B testing on this site over the years, and there’s no point in re-treading that particular ground, but it does make sense to consider A/B testing within the context of not just preserving existing users but in reinforcing marketing campaigns.

Many growth teams could just as easily be renamed retention or monetization teams: their primary goal is to retain users that are already in the product and to surface the experiences to those users that are most conducive to generating monetization. That’s a noble goal and it’s a focus that every consumer product company should take very seriously, but it actually has nothing to do with “growth”: it is the conservation of potential user attention, not the growth of it. User base growth can only be achieved by adding new users to the system — as a supporting thought experiment: imagine an app with perfect retention that is removed from the App Store. Existing users can still interact with the app, but new users can’t find it to download it. Is it growing?

This isn’t a semantic quibble — once the growth team has been tasked exclusively with retention, then their canvas becomes the early user funnel and they become inclined to not care about the provenance of users. This means that growth teams tend to functionally be most successful at companies with products that have already become viral cultural phenomena — the Facebooks and Spotifys and Slacks of the world. Huge numbers of users adopt these products every day, and the growth teams at those companies are tasked with ensuring that the maximum proportion of those users remain engaged: the “growth” dynamic leans heavily on organic adoption but also user base accumulation through increased retention.

I have no doubt that the growth teams at companies like the ones mentioned have tremendous and invaluable impact through retentive experimentation. But for products not in that rarefied sect, where actual free, abundant, organic inbound growth can’t be taken for granted, freewheeling and unfettered A/B testing at the funnel stage of the product can be disastrous.

There is a serious, fundamental problem with extensive A/B testing at the earliest stage of the funnel, especially for products where marketing is the function that drives user acquisition, which is that nothing is “known” about the user in their first session. What are their product motivations? What kind of content do they consume, and how do they consume it? For freemium apps: can they pay? What is the purpose of applying broad A/B tests to an entire cohort before that cohort can be segmented and the tests can be dimensionalized beyond just the most superficial features of each user (like geography, type of device, source acquisition channel, etc.).

This is an obvious case of pushing an entire cohort to a local maximum, and it’s the kiss of death for freemium apps, where monetization is generally driven by users at the extreme tail of the LTV distribution (that is: 95% of the revenue comes from 1% of the users). What’s more: if you can’t segment users out into behaviorally-relevant profiles and then optimize within those, which you are not doing if you are bucketing an entire Day 1 cohort into A/B tests, then you can’t send relevant signal back to acquisition channels — you don’t know if Facebook is providing users with a different distribution of engagement than Google is because all users were pushed through an A/B test in the first session that was designed to optimize engagement at the lowest common denominator. Obviously this isn’t a concern when all traffic is organic, but being able to break source channel cohorts apart by engagement and monetization distribution is tremendously important when marketing is driving user growth, and mass early-funnel A/B testing seriously impairs that.

The core concept here is that source channels differ materially in what’s often called “user quality” but what is actually captured in user intent (what was the user trying to do when they installed the app?), user understanding (how much does the user know about the app already?), and user proclivity (what types of apps does the user enjoy?). A user acquired for a mobile game from another, similar mobile game potentially has a different combination of those than a user acquired from some utility app; likewise for a user acquired into a D2C fashion app from Instagram versus a mobile game. Source channels matter, and when products are running marketing, it makes no sense to try to optimize a product through limited A/B testing for the combined, blended mass of users in a cohort that represents a multitude of different acquisition channels.

If the goal of a product is to be personalized, then that won’t be accomplished with p-value, confidence interval A/B testing but rather with an online Bandit mechanism, but even still that will work better with acquisition channel (and even better, behavioral cues) taken into account. Managing the entire user base to small, incremental boosts in early-funnel engagement can hurt product growth as the best marketing channels are diluted away with broad, un-segmented A/B testing.

Photo by Brendan Church on Unsplash