♥ 0 |
Marked as spam
|
Private answer
All of the code used in this answer can be found on GitHub Assuming there is a relationship between Paid and Organic installs, and that the relationship is linear (eg. the number of Organic installs doesn't increase exponentially as Paid installs increase, which happens sometimes), then you can measure the relationship between Paid and Organic installs with a simple linear regression fairly easily in Python. As an example, we can start by seeding some sample Organic and Paid install data and plotting it. We'll start by creating a sample Paid install data set that increases over a 100-day period. We'll then use some arbitrary multiplier to create a sample of Organic data from the Paid install data that has a relationship to it (ie. the Organic installs are not totally unrelated to Paid installs):
The plot should look something like this (we used random data to seed the samples, so your graph won't look exactly the same): The blue line here is Paid installs and the orange line is Organic installs. We can see that both graphs seem to be increasing linearly and that there is some relationship between Paid and Organic installs. This would be the starting point for undertaking this analysis if you are observing your own data: you see that there seems to be some relationship between Paid and Organic installs. In order to quantify the relationship between Paid and Organic, we'll calculate the covariance between the two variables. Covariance is a measure of how variables move together: if covariance is positive and "high", it means that when one variable increases, another variable increases with a similar magnitude (and the opposite is true if covariance is negative and "high"). This thread on StackExchange does a good job of explaining covariance. The purpose of calculating covariance is to validate our assumption that these two variables have some sort of relationship. We'll calculate it with:
The value of the covariance between Paid and Organic when I run this code is
When I run this, I get 0.918. Pearson's coefficient is a measure of the linear relationship between two variables and exists as a value from -1 (totally linearly uncorrelated) to +1 (totally linearly correlated). Since our value of .92 is very close to 1, it means our variables are highly linearly correlated. Since that's true, we should create another simple linear regression model to predict Organic installs as a function of Paid installs:
This organic_paid variable is a list of Organic install values at various levels of Paid installs, up through 100. We can plot this to get a sense of the relationship (which is represented via the slope of the line):
A rough reading of this says that about 5 Organic installs from a level of 20 Paid installs and 10 Organic installs from a level of about 40 Paid installs. We can calculate the exact values like this:
Note that this example was specifically designed with an underlying linear relationship between Paid and Organic installs -- you might find that no such relationship exists in your own data. You might also find that a relationship does exist at some level of Paid installs but doesn't exist at lower or higher levels -- you might need to reach some threshold of Paid installs before people find your product Organically. And the relationship can change at various levels, too: it's not hard to envision an app that sees Organic discovery skyrocket when Paid installs reach the level of tens of thousands per day. The exploration phase of this process is important: you should visualize your data sets and break them out into specific periods of time before trying to model a relationship. Marked as spam
|