Anonymisation vs Pseudonymisation

The privacy enhancing techniques of ‘anonymisation’ or ‘pseudonymisation’ of data are recognised by the GDPR and candiminish some of the more onerous provisions of the regulations. Preventing or reducing the likelihood that personal data can be tracked back to the original owner can allow companies to use such information freely, or at least under different constraints compared to raw, personalised data.

But what do the two terms mean in reality? This article will attempt to cover some of the important differences and discuss some of the potential difficulties associated with compliance.

Firstly, lets look at ‘anonymisation’. The GDPR defines this as ‘data rendered anonymous in such a way that the data subject is not or no longer identifiable’. Anonymisation is thus irreversible and renders personal data outside of the scope of the GDP, more or less allowing data controllers free reign with personal information. However, the reality of truly anonymising information represents a challenge which may prove extremely difficult.

Currently in the UK, clear guidelines on what it means to anonymise data are lacking. A frequently cited study in the US showed that 87% of its citizens could be identified from just 3 data points (ZIP code, date of birth and gender) demonstrating that, whilst each of these pieces of information alone would not allow re-identification, seemingly innocuous pieces of information, when combined, can render an individual recognisable. Whilst there have been attempts to standardise what ‘anonymised’ data might look like in the US (for example the Health Insurance Portability and Accountability Act (HIPAA) treats data as anonymised is 18 specific elements are removed), in the EU the situation currently lacks any such clarity and data controllers, whilst believing that their data is anonymised, may run the risk of being found non-compliant at a future date.

‘Pseudonymisation’ of data represents a lower bar of achievability since, by definition, it is data that has been processed such that, whilst it may appear to an outsider to be anonymised, it has the potential to be identifiable through the use of a ‘key’ (ie via an encryption/decryption scheme). The GDPR defines the ‘pseudonymisation’ of data as ‘the processing of personal data is such a way that the data can no longer be attributed to a specific data subject without the use of additional information’. To pseudonymise a data set, the ‘additional information’ must be ‘kept separately and subject to technical and organisational measures to ensure non-attribution to an identified or identifiable person’. The GDPR permits data which has been treated in this way to be used more liberally without fear of infringing the rights of subjects, making it an opportunity to achieve GDPR compliance whilst still using collected data.

Critical to the effectiveness (and thus legality) of both techniques of data manipulation is an assessment of the likelihood of re-identification of data subjects. The GDPR limits the ability of a data handler to benefit from pseudonymised data if it is ‘reasonably likely’ that re-identification is achievable. Since guidance on appropriate pseudonymisation techniques to use to achieve this standard and a lack of clarity on the definition of ‘reasonably likely’ has yet to be released, data handlers who want to implement pseudonymisation as an element of their GDPR compliance are in a difficult position and face an uncertain regulatory environment.