Anonymising genetic data – some particular pitfalls

Following our last post, ‘Challenges with anonymising genetic data’, here we explore some examples of the pitfalls of processing genetic data on this basis.

Anonymisation is an ongoing process

Firstly, whether a dataset is anonymised is something that can change abruptly. As soon as one dataset is merged with another relating to the same set of data subjects, it becomes more likely that the information could be used to re-identify a data subject. For example, it was reported last year that the British National Health Service had sold medical records to pharmaceutical companies that could be used to re-identify “anonymised” genetic information collected for diagnostic purposes.

Advances in AI are also making it harder to anonymise data, because it is increasingly easy to match up various pieces of data and link them to one individual. A 2019 study published in Nature suggests that 99.98% of individuals in “anonymised” datasets could be correctly re-identified using 15 demographic attributes. The authors said that their findings seriously challenged “the technical and legal adequacy of the de-identification release-and-forget model”. Where genetic data information is concerned, it may be that even fewer attributes are required given how inherently personal it can be. Merging of datasets in this way is predicted to become increasingly prevalent – particularly in the context of personalised medicine, and machine learning will likely be adopted as a strategy to improving the quality by “tying” data from different sources.

Anonymisation can reduce the value of the dataset

Sometimes anonymisation just isn’t desirable – the more identifiable information that is collated, the more valuable the dataset for research. Marrying genetic data with information about clinical outcomes and patient history (such as exposure to previous treatments and response rates) provides invaluable information that could help generate improved methods of diagnosis, as opposed to analysing the genetic data in isolation.

Anonymisation can fall short

In light of some of these challenges, an attempt to anonymise genetic data might end up falling short, resulting in pseudonymisation. Pseudonymised data is that which can no longer be attributed to a specific data subject without the use of additional information. For example, a data subject’s name might be replaced with a reference number. Unlike anonymised data, pseudonymised data does fall within the GDPR. For this reason, it could be risky for an entity offering a diagnostic test to rely on anonymisation alone for the legitimate processing of genetic data, in case the data is in fact pseudonymised.

In our next post, we’ll explore the option of processing genetic data on the basis of consent.

Find out more about our experience in diagnostics and medical devices at www.herbertsmithfreehills.com/our-expertise/sector/diagnostics-and-medical-devices.

References

Authors

Kate Macmillan
Kate Macmillan
Consultant
+44 20 7466 3737

Katie Pryor
Katie Pryor
Senior Associate
+44 20 7466 6313

Challenges with anonymising genetic data

Under the GDPR, anonymous information is information which doesn’t relate to an identifiable natural person. Whether a person is identifiable depends on all of the means reasonably likely to be used to identify someone. This is a question of factors such as the cost and time associated with re-identifying someone, taking into account developments in technology.

A common approach to anonymisation of genetic data is to ensure that it is not held along with any information that could be used to directly or indirectly identify the individual from whom the genomic information was derived. Such information might include an individual’s name, date of birth, address, social security or other government issued identification number. Whether this approach is sufficient to render the data anonymised will depend on:

  1. the nature of the genetic data;  and
  2. the nature of current technology, including means by which the data could be cross-referred with other information.

Of course, not all genetic data is created equal. A small set of polymorphisms detected in a subject’s genome would probably present a remote risk of identifying an individual, particularly if these are common genetic alterations possessed by a high proportion of the population. On the other hand, sometimes genetic data will indicate that a patient may have been diagnosed with a very rare disease, which substantially narrows the pool of individuals that the genetic information could have come from.

And then there’s whole genome sequencing: while a person’s complete set of genes is inherently unique and therefore capable of identifying a particular individual, whether it could be reasonably used to re-identify someone will depend on the means available to someone in possession of this information. Through developments in technology, such as AI-mediated merging of datasets, and the increasing volume of information that is collected and stored about individuals, this is likely to become increasingly feasible. In fact, it was suggested in a paper by the former Article 29 Data Protection Working Party (the predecessor to the European Data Protection Board), that in the context of tissue donors, even publicly available resources such as genealogy registers combined with the metadata about DNA donors (time of donation, age, place of residence) can reveal the identity of certain individuals even if that DNA was donated “anonymously”.

Evidently, anonymisation of genetic data can present some challenges – we will explore this further in our next post.

Find out more about our experience in diagnostics and medical devices at www.herbertsmithfreehills.com/our-expertise/sector/diagnostics-and-medical-devices.

Authors

Kate Macmillan
Kate Macmillan
Consultant
+44 20 7466 3737

Katie Pryor
Katie Pryor
Senior Associate
+44 20 7466 6313