Manipulating a machine-learning method by feeding it doctored training data

A correspondent who usually prefers to remain anonymous points us to this article by Ilia Shumailov et al. which states:

The data-driven nature of modern machine learning (ML) training routines puts pressure on data supply pipelines, which become increasingly more complex. . . .

This emergent complexity gives a perfect opportunity for an attacker to disrupt ML training, while remaining covert. In the case of stochastic gradient descent (SGD), it assumes uniform random sampling of items from the training dataset, yet in practice this randomness is rarely tested or enforced. Here, we focus on adversarial data sampling. . . . by simply changing the order in which batches or data points are supplied to a model during training, an attacker can affect model behaviour. . . .

They focus on the idea that “it is possible to perform integrity and availability attacks without adding or modifying any data points,” but to me that doesn’t seem like such a big deal. Once you can get in and modify the order of the data, you should be able to change the data too, no?

Kinda like with Brian “Pizzagate” Wansink: Once you accept that he can pick and choose which data to include in his papers, it’s not such a big step to suppose that he can just make up an entire experiment.

That issue aside, this work sounds super-interesting to me, especially in its connection to generalization and poststratification (differences between the training and test data), as discussed for example here and here.

6 thoughts on “Manipulating a machine-learning method by feeding it doctored training data

  1. This reminds me of sort algorithms. Quicksort has excellent average behavior but n^2 worst case. If you always choose the pivot with an RNG it’s not easy for an adversary to hand you data that will cause the worst case. Similar thing here. If you’re going to sample from the training data, just call a sampler to do it!

  2. Once you can get in and modify the order of the data, you should be able to change the data too, no?

    No, there are tons of attacks where I might be able to affect the ordering but not the data itself. For example, here is an attack using their approach inside a Github pull request:

    ~~~
    foo.py: make training more reproducible
    @@ l1234
    + set.seed(56789)
    ~~~

    Totally harmless-looking especially if it’s part of a big batch of patches… but whups. Now training doesn’t work, or it’s backdoored on some specific input or something. Or you might be colocated on a cloud instance and attacking the RNG. Or you might be targeting a public scraper by changing the order of links on pages or DoS attacks: https://arxiv.org/abs/2302.10149

    Plus, the extreme stealthiness is a big advantage: modified data leaves obvious traces – just compare against the original. But affecting the ordering is nigh impossible to find in retrospect even if you have extremely detailed logs (not that anyone would ever think to check this). As they point out:

    This attack is realistic and can be instantiated in several ways. The attack code can be infiltrated into: the operating
    system handing file system requests; the disk handling individual data accesses; the software that determines the way
    random data sampling is performed; the distributed storage manager; or the machine learning pipeline itself handling
    prefetch operations. That is a substantial attack surface, and for large models these components may be controlled by
    different principals. The attack is also very stealthy. The attacker does not add any noise or perturbation to the data.There are no triggers or backdoors introduced into the dataset. All of the data points are natural. In two of four variants
    the attacker uses the whole dataset and does not oversample any given point, i.e. the sampling is without replacement.
    This makes it difficult to deploy simple countermeasures.

    Personally, I find this interesting from the theoretical perspective: this is an example of how data ‘programs’ the neural net ‘weird machine’. The order of updates can program the decision boundaries of the high-dimensional polytopes, pushing them around as desired. The right datapoint is like executing a command which does multiple things, changing multiple surfaces. (This is why you can do things like train a NN to classify the 10 digits of MNIST with less than 10 distilled/synthetic images: each image works overtime to program the NN to classify more than 1 kind of digit.)

    • But affecting the ordering is nigh impossible to find in retrospect even if you have extremely detailed logs (not that anyone would ever think to check this)

      Wouldn’t you notice accuracy varied strangely from batch to batch? It is pretty standard to monitor such things.

      You could also calculate some descriptive statistics (eg, mean pixel value) within and between the batches, those should stay relatively constant.

  3. It may be true that if you can change the order that the data are presented then you can also change the data themselves, but I can imagine that that would not always be the case. For instance, maybe you’re going to train your AI by having it read all of Wikipedia and all of the associated links. Anyone can change Wikipedia but if you do that in a large way in order to mess up the AI it will be pretty easy to detect. But if you just change the order in which Wikipedia is crawled, that might be a lot harder to detect. The first-order and second-order things someone might check are: (1) did every page get included that was supposed to be included, and (2) was anything included that shouldn’t have been included?, and you’ll pass both of those tests.

    I had not previously thought of the fact — obvious once encountered — that order matters. Even AI’s are influenced by first impressions.

    • Phil:

      Another way to put it is: Yes, you could indeed build an algorithm that would be invariant to the order of the data. But it would be real a pain in the ass to construct and run such an algorithm.

      • Well, Bayesian learning is order invariant if you’re using the full system and not taking shortcuts. Also any system that samples using an RNG with an unpredictable seed (say, seeded from /dev/urandom on Linux) should be pretty hard to attack. If you actually are worried about real attacks, using a crypto-RNG as your RNG would be even more un-attackable.

Leave a Reply

Your email address will not be published. Required fields are marked *