Anonymising genetic data – some particular pitfalls

Following our last post, ‘Challenges with anonymising genetic data’, here we explore some examples of the pitfalls of processing genetic data on this basis.

Anonymisation is an ongoing process

Firstly, whether a dataset is anonymised is something that can change abruptly. As soon as one dataset is merged with another relating to the same set of data subjects, it becomes more likely that the information could be used to re-identify a data subject. For example, it was reported last year that the British National Health Service had sold medical records to pharmaceutical companies that could be used to re-identify “anonymised” genetic information collected for diagnostic purposes.

Advances in AI are also making it harder to anonymise data, because it is increasingly easy to match up various pieces of data and link them to one individual. A 2019 study published in Nature suggests that 99.98% of individuals in “anonymised” datasets could be correctly re-identified using 15 demographic attributes. The authors said that their findings seriously challenged “the technical and legal adequacy of the de-identification release-and-forget model”. Where genetic data information is concerned, it may be that even fewer attributes are required given how inherently personal it can be. Merging of datasets in this way is predicted to become increasingly prevalent – particularly in the context of personalised medicine, and machine learning will likely be adopted as a strategy to improving the quality by “tying” data from different sources.

Anonymisation can reduce the value of the dataset

Sometimes anonymisation just isn’t desirable – the more identifiable information that is collated, the more valuable the dataset for research. Marrying genetic data with information about clinical outcomes and patient history (such as exposure to previous treatments and response rates) provides invaluable information that could help generate improved methods of diagnosis, as opposed to analysing the genetic data in isolation.

Anonymisation can fall short

In light of some of these challenges, an attempt to anonymise genetic data might end up falling short, resulting in pseudonymisation. Pseudonymised data is that which can no longer be attributed to a specific data subject without the use of additional information. For example, a data subject’s name might be replaced with a reference number. Unlike anonymised data, pseudonymised data does fall within the GDPR. For this reason, it could be risky for an entity offering a diagnostic test to rely on anonymisation alone for the legitimate processing of genetic data, in case the data is in fact pseudonymised.

In our next post, we’ll explore the option of processing genetic data on the basis of consent.

Find out more about our experience in diagnostics and medical devices at



Kate Macmillan
Kate Macmillan
+44 20 7466 3737

Katie Pryor
Katie Pryor
Senior Associate
+44 20 7466 6313