On 29 June, Google announced its first “machine unlearning” challenge, allowing anyone to submit code that can cause machine learning (ML) models to forget and retain datasets on which they were trained.
The competition will run from mid-July to mid-September.
Machine unlearning, a nascent ML subfield, aims to remove the influence of a subset of training data from a trained model.
Unlearning algorithms are useful insofar as they allow for an individual’s data to be erased from ML systems in compliance with data privacy regulation, such as the GDPR’s “right to erasure”.
However, Google says that removing the influence of certain data is “challenging” because it requires not only the erasure of the data from the database where it is stored, but also the erasure of its influence on other artefacts such as trained ML models.
Furthermore, recent research has shown that “membership inference attacks” can predict with a high degree of accuracy whether or not a particular example was contained in the ML model’s training dataset.
“[An] ideal unlearning algorithm would remove the influence of certain examples while maintaining other beneficial properties, such as the accuracy on the rest of the train set and generalisation to held-out examples,” according to Google’s research scientists, Fabian Pedregosa and Eleni Triantafillou.
“A straightforward way to produce this unlearned model is to retrain the model on an adjusted training set that excludes the samples from the forget set. However, this is not always a viable option, as retraining deep models can be computationally expensive.”
Google will evaluate submissions based on the strength of the forgetting algorithm and the model utility, and will exclude any submissions that run slower than a fraction of the time it takes to retrain the model.
Google hopes that progress in the area of machine unlearning will “open additional ways to boost fairness in models, by correcting unfair biases or disparate treatment of members belonging to different groups.”