I read an article recently in which the decision point over building an analytics platform versus using a 3rd party’s was discussed. This question comes up a lot, and it’s an important one given how expensive robust analytics platforms can be to build and implement. But I think the key to answering the question is determining if the organizing has enough domain expertise in house to utilize large amounts of data effectively.
I’m not talking about a Data Science team armed with PhDs in statistics; that comes when an organization reaches a certain size. What I’m referring to when I say utilizing data effectively are the simple things we use data for when that data isn’t perfectly curated or packaged for us. In my opinion, an organization should rely on a 3rd-party analytics stack until it has the domain expertise in-house (or can recruit someone to fill it out) to do the following:
- Construct an A/B test to avoid bias and determine when enough data has been collected to make a decision based on the test
- Write simple queries to collect data to answer weird questions (“What percentage of people in Germany play our games on Nokia handsets?”) without having to bug an engineer.
- Understand how to segment users and compare user groups (which is much more complicated than it sounds)
- Utilize time series methods for projection
- Test and iterate over marketing campaigns to optimize CTR and conversion
- Test a hypothesis using some statistical method (not just a gut feeling)
Collecting an infinite amount of data sounds great, and engineers are always eager to implement in-house systems because that’s what they do — build systems. But unless that data can be put to good use — and the tasks in the list above barely qualify as good use — then the expense of an end-to-end in-house analytics stack can’t really be justified.