What do Data Scientists know?

What makes the problem of disinformation abuse and subversion on the largest social media platforms so intractable is the vast chasm of comprehension that separates all three relevant parties to the situation: the government, the social media networks, and the public. The crux of the predicament is that the algorithms used by social media networks to curate content are unconditionally and utterly indifferent: they possess no inherent political bias or moral center. And algorithms can be gamed; the malicious intent that motivated Russian interference in the 2016 US presidential election belonged to callous political mercenaries who had discovered how to exploit the systems of social media, not those systems themselves.

The algorithms that surface content to users’ feeds on Facebook (and other social networks) are tasked with one objective: to ruthlessly optimize for some measured mark of success, whether that be a click, a view, a like, a share, long-term engagement with the product, or a revenue event for an advertiser. Algorithms have no sense of morality, social mores, or geopolitics: advertising algorithms don’t attribute probity to clicks, whether they be for sofa advertisements or Russian propaganda posts.

Imbuing an algorithm with the sensibility to qualify content on the basis of morality, proprietary, or veracity is a noble yet formidable and towering proposition. The social media companies whose primary revenue machinery is an agglomeration of algorithms — and the number of such companies is exploding at a rapid clip — actually are incentivized quite heavily to install dimensions like “integrity” or “veracity” as secondary marks of success for the content that they surface to users, subordinate (or equivalent) to some measure of immediate revenue.

This isn’t because there’s a thread of righteousness that unites the largest platform companies, despite their altruistic mantras and mission statements. The real reason that these companies want to avoid any semblance of abuse or impropriety on their platforms is that corruption is bad for engagement, and the combination of engagement and user base growth is the raison d’être of social networks: public markets don’t take kindly to advertising companies that can’t claim both.

This begs the question: what do data scientists actually know? If it’s in the best interest of these companies to empower their algorithms with a moral core, and those algorithms are created and tuned by the brightest minds in machine learning and statistics, why hasn’t it happened? Why can’t the data scientists who birthed these algorithms simply upgrade them with a sense of right and wrong to prevent another wave of systemic abuse?

This is where the tri-party disconnect becomes apparent. What the data scientists know is how an algorithm functions and how distant actual outputs are from some predicted value. This information is more or less completely detached from the underlying content that the algorithms are functioning on, which is clicks, swipes, views, etc. And for good reason: a feed is meant to be personalized based on the content a user enjoys consuming. From the algorithm’s perspective — and also from the perspective of the algorithm’s author, the data scientists — there’s no real difference between a vicious cycle and a virtuous cycle with respect to the qualitative nature of the content being consumed by a user. If a user pursues a sinister path, the algorithm merely lit the way.

But from the government’s perspective (“government” here being an abstraction), the data scientist (and thus the algorithm) does have a qualitative grasp on the content that precipitated the click. And from the user’s perspective, the data scientist knows the intent behind the content that precipitates the click: according to some of the lay criticism of these companies that has been published in the past months, users bear no responsibility for tuning their feeds to partisan conspiracy theory posts because they were duped into doing so by parties operating with pernicious intent (that is: a conspiracy theory posted by an unaffiliated member of the public is more genuine and healthier for democracy than the same conspiracy theory posted by a state actor).

What the data scientist knows will likely become a more important question in this debate as it rages. The largest social media networks would likely benefit from making that more clear.

Photo by Roman Mager on Unsplash