Ditch Tableau and use d3.js, Part 1 of 2: MBAs shouldn't do data analysis

Part 2: Access and allies

I’ve written before of the practical advantages a d3.js-based analytics system boasts over Tableau, which I view as the best commercial solution. The most prominent advantage is cost: Tableau is exorbitantly expensive, and its learning curve is so steep (and free instructional materials are so meager) that training an analyst to use the software through extortionary workshops can easily eclipse the cost of the software itself.

So the practical advantages of using a d3.js over Tableau aren’t hard to grasp; budget constraints should resonate with every manager who deals with software procurement. But the advantages that are more abstract have to do with the organizational implementation of an analytics system and how that system should be used. I believe a hosted, open-source-based analytics system trumps a desktop-software-based system for organizational reasons (and d3.js is the best open-source tool to use in implementing a hosted analytics system). I’ll outline two reasons for this and expound upon them over a series of two posts: 1) analysis consistency and, next, 2) analytics access.

Analysis Consistency

I hate Dilbert-style managerial buzzwords (in fact, I hate most buzzwords), and the buzzword I hate most is “slice-and-dice”. Product Managers and generic management-types love using Excel to “slice-and-dice” data and can sometime eschew pre-formatted, pre-calculated report circulation in favor of receiving “raw data” that they can “slice-and-dice” themselves (usually through PivotTables). This is dangerous.

I’ve seen very intelligent, very capable Product Managers do heinously absurd things to data in the process of “slicing-and-dicing” it. I’ve seen people multiply net revenue numbers from Apple by 1.3 (ie 130%) in order to get to the gross value. I’ve seen people take averages of averages. I’ve seen people use linear regression on data that clearly didn’t follow a linear pattern. I’ve seen people mix up European and American date formats and report entire months as daily data points. I’ve seen people report averages when major outliers skewed the data; I’ve seen people over-report median without the context of the average. I’ve seen people log-normalize data without pointing that out. The mis-applications of statistical methods I’ve seen performed fill my dreams with horrific images of circular reference errors and hard-coded calculations, leaving me to carry the burden of an irrational, insurmountable distrust of Excel.

And it has also has convinced me that putting the means of “slicing-and-dicing” data into the hands of Product Managers is a mistake, for three reasons. The first is that every manager (and especially if he has an MBA) thinks he is a statistics guru, but, in my experience, few are (see above for anecdotes to support this). The second is that giving Product Managers the power to calculate their own metrics introduces risk into an organization – people do desperate things when their bonuses are at stake. The third is that all calculations and meaningful metrics should be defined uniformly at the organization level for consistency – when individual managers are calculating their own metrics (for instance, ARPU or user conversion), those metrics risk being calculated differently across different teams.  Keeping formulas consistent is essential to the integrity of organizational data, and the only way to ensure it is to define formulas throughout the organization and disseminate them from a centralized business intelligence / analytics hub within the organization.

Centralizing analytics and reporting is the most responsible, most efficient, and most effective way of using data to make decisions: an analytics group should define metrics calculation, define reporting formats, and define the source data from which metrics are derived. A d3.js-based system is the best way to do this: with a d3.js system, an analyst defines the metrics and programmatically implements them in JavaScript on the front-end; the “raw data” is not accessible except through that filter. Furthermore, the analyst defines how the data is communicated and can avoid using misleading / inscrutable implements like pie charts, stacked percentage line charts, and dual-axis charts reporting fundamentally different things. These aren’t frivolous controls; data can be misused much more easily than it can be adequately vetted and qualified.

© 2012-2013 Mobile Dev Memo