Federated learning — like differential privacy, which I discussed in a previous article — is a concept that has garnered increased attention within the digital advertising domain as new platform privacy restrictions diminish ads targeting and measurement. Federated learning essentially allows many “edge” devices (such as smartphones) to participate in the iterative revision of machine learning models by using local data to update model weights and coefficients. Federated learning provides for a machine learning model to be updated while protecting the privacy of the data used: because only updated model parameters leave edge devices, and not the underlying local training data, the risk that sensitive information is revealed is ameliorated (although not completely eliminated). The application of federated learning is not exclusive to digital advertising: use cases exist in healthcare, finance, etc.
The federated learning workflow can be done in a centralized fashion, with an aggregation server that handles model aggregation, or in a de-centralized fashion where nodes synchronize model parameters directly with each other. In either case, the process comprises four steps:
- Distribute a global model to edge nodes, either from a central aggregation server or via an exchange across edge nodes;
- Train the model on the edge nodes using local data;
- Transmit the locally-trained parameters, either back to a central server or via an exchange across edge nodes;
- Aggregate the parameters produced by the edge nodes back into a global model.
The above explanation is deliberately abstracted: the topic quickly gets very technical, and it’s not necessary to fully grasp the underlying mechanics of federated learning in order to understand how it can be applied to digital advertising in a way that preserves privacy. This video and this video also provide descriptions of federated learning that are comprehensible to non-engineers. This video delivers a very in-depth overview of how federated learning is used in a healthcare context and the value of privacy-protective mechanics in a field that historically has seen innovation restricted by privacy regulation.
It is important for advertising practitioners to understand, conceptually, how federated learning can be deployed by advertising platforms in a way that protects consumer privacy while still allowing for descriptive, commercially revelatory audience segments to be created and targeted. As I’ve written about previously, advertising “tracking,” which is the practice that is facilitated by third-party browser cookies and device identifiers and serves as the catalyst for platform-centric privacy resolutions, requires the combination of data sets from multiple parties.
In the federated learning-based advertising construct, user data from multiple user-to-product contexts is not combined: a machine learning model is deployed to devices, such as smartphones, and the model is utilized and updated with locally-stored data sets with eg. conversions, browsing behavior, app download behavior, product engagement etc. Only model parameters, and not the underlying data that was used to derive them, leave the device.
It’s worth comparing this approach to the way that Google’s Federated Learning of Cohorts (FLoC) functions. Broadly, FLoC assigns users to interest-based cohorts based on their browsing behavior and allows those cohorts to be targeted by advertisers, with data related to any individual’s browsing history not leaving their device. FLoC isn’t a genuine application of federated learning, as was pointed out to me on Twitter, since the algorithm that assigns users to cohorts (SimHash) is static and not updated in an iterative process (steps 2-4 in the model described above). The assignment of cohorts by SimHash is represented by the below equation:
This notation looks scary, but
Θ captures the angle between the vectors (user browsing histories)
p is the number of bits contained in the vector, so vectors that are similar (smaller
Θ) are exponentially more likely to be grouped together with the same cohort given some
p (Google’s FLoC whitepaper suggests that Google will use 8 bits to create cohorts, meaning 2^8 = 256 potential cohort IDs). Google asserts in the FLoC white paper that the benefit of using SimHash in this way is that no connection to a centralized server is needed: no information about other users is required in assigning any user to a cohort, since the SimHash output is only dependent on the user’s behavioral history, and users will share cohorts with the probability calculation described above. It should be noted that Google will re-calculate each user’s cohort ID on a weekly basis, using their most recent week of browsing history.
FLoC has been the recipient of vociferous criticism, particularly from the Electronic Frontier Foundation, around tracking and fingerprinting concerns and on the basis of anti-competitive positioning (because FLoC “place(s) the browser in a vital gatekeeper position for the adtech ecosystem,” per a UK regulatory body report). FLoC is not a perfect mechanism by which user privacy can be preserved in digital advertising, and it isn’t even, definitionally, a utilization of federated learning. But FLoC showcases the potential for utilizing on-device data in a way that prevents specific identifiers and behavioral histories from being shared with third parties while allowing for ads to be effectively targeted against those behaviors. This, in my opinion, is the most exciting area of digital privacy innovation.
But this area of privacy innovation also presupposes a recognition of the utility / privacy tradeoff, and it serves as an implicit acknowledgement that the tradeoff can be managed with a technological solution. Some of the invective that has surfaced in response to Google’s privacy sandbox (which includes FLoC) captures a resounding rejection of that philosophical position: that there can be no tradeoff where privacy is concerned, and that any explicit targeting used in serving digital ads is an ethical disaster. Approaches like federated learning can only improve the tradeoff calculus, but they can’t eliminate the tradeoff.