Surveillance advertising is a myth

March 29, 2021

On Thursday, March 25th, the CEOs of Google, Twitter, and Facebook testified before Congress in a hearing titled “Disinformation Nation: Social Media’s Role In Promoting Extremism And Misinformation.” The substance of the hearing was focused on content moderation, misinformation, and the algorithms that social platforms tune to keep users engaged. These hearings have become regular fare and this one wasn’t particularly elucidating or interesting; the full video of the hearing, broken out into clips, can be found on C-SPAN’s website.

Much of the hearing was dedicated to assessing the roles that Twitter, Facebook, and Google play in the spread of misinformation, especially as related to the 2020 US election. Many of the questions asked of the technology executives seemed heavily influenced by the themes of engagement optimization and behavioral tracking popularized in the movie The Social Dilemma; in fact, all three executives were asked explicitly if they have seen The Social Dilemma. At one point, Congresswoman Robin Kelly from Illinois’ 2nd district pointedly addressed the role of engagement in Facebook’s business model:

The business model for your platforms is quite simple: keep users engaged. The more time people spend on social media, the more data harvested and targeted ads sold. To build that engagement, social media platforms amplify content that gets attention. That can be cat videos or vacation pictures, but too often it means content that’s incendiary, contains conspiracy theories or violence. Algorithms on the platforms can actively funnel users from the mainstream to the fringe, subjecting users to more extreme content, all to maintain user engagement.

This reductive oversimplification of the ‘social media business model’ gets one thing wrong, of course: the data that is used for targeting ads isn’t actually harvested on the social media properties themselves (eg. the Facebook app) but rather on third-party properties, where conversion events like purchases are observed. As I detail in this piece, Facebook’s first-party data — its knowledge of the ‘cat videos’ and ‘vacation pictures’ with which users engage — is actually not very helpful for use in targeting ads. Rather, what Facebook uses to great effect is its knowledge of which products users buy on advertisers’ properties: armed with that information, Facebook can create behavioral product purchase and engagement histories of its users that very effectively direct its ads targeting. What’s missing in the above assessment is that advertisers willingly and enthusiastically give that data to Facebook.

What Congresswoman Kelly (and many public commentators) get wrong about Facebook’s business model is that on-site interactions have very little value for ads targeting. And in fact, the abuse of Facebook’s platform that was perpetrated by the Russian Internet Research Agency ahead of the 2016 US election didn’t utilize advertising to any material extent but rather relied on Facebook Groups for reach: Facebook estimates that 10MM people were exposed to political ads purchased by the Russian state-sponsored group but 126MM people were exposed to organic posts that it created.

Some might argue that I’m proposing a false distinction: if engagement optimization algorithms keep people on Facebook’s properties, and if increased user time-on-site results in more impressions being served, then through some transitive property of engagement, Facebook’s algorithms empower ad serving. And that’s true: more engagement at the user level should result in more ad impressions being served at the user level, assuming a constant ad load. But the important point of differentiation here is that the data sets being used are different.

Knowledge of cat video consumption is helpful in curating a feed to include more cat videos, but it isn’t useful in targeting ads for shoes or diet supplements or mobile games. The idea that first-party data serves as a treasure trove of valuable raw materials for ad targeting was mostly debunked when Cambridge Analytica was revealed as a fatuous hoax, as I discuss in this podcast episode.

Which is all context for an interesting comment made during the hearing from Anna Eshoo, a Congresswoman representing California’s 18th district, directed at Mark Zuckerberg:

Your model has a cost to society. The most engaging posts are often those that induce fear, anxiety, anger and that includes deadly, deadly misinformation…This is dangerous and it’s why Representative [Jan] Schakowsky and I are doing a bill that is going to ban this business model of surveillance advertising.

But, again, this characterization confuses multiple ideas:

Platforms use algorithms to optimize on-site engagement. When I interact with cat videos, I see more cat videos. Because the content to which I’m exposed on a given property is tailored to my tastes, I spend more time on the property, and I’m exposed to more ads as a result — again, assuming some constant impressions-per-minute coefficient (ad load);
Advertisers pass conversion data back to ad platforms, and those ad platform partners use algorithms to optimize ad targeting. When I interact with ads and make purchases on advertisers’ sites, I see more ads like the ones I interacted with. Because the ads to which I’m exposed on a given property are tailored to my tastes, I interact with more ads.

The data sets that drive both of these independent dynamics don’t really, considerably interact: to the extent that first-party data is used to target ads, it’s mostly demographic data that is volunteered to the platform by the user (eg. age, gender, location) versus behavioral data (eg. clicked on cat videos). And since these feedback loops operate mostly autonomously, the deterioration of one won’t impact the other. That the flow of conversion data is severed between advertisers and ad platforms — as it will be on iOS when AppTrackingTransparency (ATT) is rolled out — has no impact on an ad platform’s ability to optimize engagement on its owned-and-operated properties, and vice versa.

The irony of the notion of surveillance advertising is that privacy groups and the W3C have decreed that first-party data is privacy compliant and have countenanced it for use in targeting ads. This has led to the content fortress phenomenon I have previously described: platforms and publishers alike are racing to wrap their arms around as much first-party data as they can ahead of ATT so as to facilitate conversions in a first-party environment, the data artifacts of which they can use for ad targeting. The data being used for ads targeting on platforms like Facebook is currently engendered by advertisers themselves, not the ad platforms: to the degree that the term ‘surveillance’ can be applied to the approach taken to ads targeting, it’s advertisers that are responsible for it. Of course, this changes with content fortresses: the platforms themselves service transactions, product conversions data, and target against it in a fully privacy-compliant, first-party setting.

In What do Data Scientists know?, I wrote:

This is where the tri-party disconnect becomes apparent. What the data scientists know is how an algorithm functions and how distant actual outputs are from some predicted value. This information is more or less completely detached from the underlying content that the algorithms are functioning on, which is clicks, swipes, views, etc. And for good reason: a feed is meant to be personalized based on the content a user enjoys consuming. From the algorithm’s perspective — and also from the perspective of the algorithm’s author, the data scientists — there’s no real difference between a vicious cycle and a virtuous cycle with respect to the qualitative nature of the content being consumed by a user. If a user pursues a sinister path, the algorithm merely lit the way.

While the underlying algorithms that power targeted ads and engagement optimization are similar, they’re fed by fundamentally different data sets that don’t intersect. If the spread of misinformation is somehow curtailed on Facebook, its advertising algorithms will continue humming unabated. And if the efficiency of Facebook’s advertising platform is crippled, misinformation will nonetheless circulate in Groups and through social sharing. The notion of surveillance advertising being perpetrated by ad platforms via social interactions is a myth: there is no omniscient social media entity spying on its users and hoarding their interaction data to power ads targeting. If legislators want to ban or severely restrict the capability of ads platforms to provide targeted advertising, then that’s mostly an effort that needs to be directed at advertisers, which generate the data that is used to target ads.

Photo by Niv Singer on Unsplash

Comments: