Here, Rob Fotheringham, Director of Fotheringham Associates, explains how pseudonymisation can allow organisations to utilise invaluable data received through the emergency response to COVID-19, to benefit future research and analysis
We know that during the COVID-19 crisis everyone has pulled together including organisations that are sharing vital data in order to help the most vulnerable. Whether you are a local council, housing association, or other third sector organisation you will now possess an enhanced view of the citizens you serve which has been critical to understanding your population needs in order to manage emergency response provision. It is highly likely that you will be expected to produce an analysis of your vulnerable service users including outcomes after the pandemic and therefore, we suggest you see this data as the first stage of building an invaluable strategic data asset that will allow you to answer these questions as well as understand your population needs at an individual level over the longer term.
The individuals and their data rights need to be protected, so we would recommend that you pseudonymise the dataset if you plan to keep and maintain it going forward. During the response to the pandemic, you need to be able to easily identify people to deliver critical support, but afterwards, if you choose to use this information for research and analysis use you should protect their identity and pseudonymisation is an excellent way of achieving that. Maintaining the dataset and adding further information over time will increase the risk of something going wrong in terms of information security which could result in a breach, embarrassing notifications to clients, a report to the Information Commissioners Office and possibly a huge fine.
Do you, therefore, discard this valuable data or take the risk? You could decide to anonymise the data to protect yourself from those potential breaches. Whilst this certainly reduces your information risk, it also brings a significant constraint, as you have also rendered it beyond the ability to link it to current live data. The data you anonymise is effectively held in suspended animation on the day that you change it, with no further transactions, behaviour, or additions able to be linked to the records. Any new information added to the dataset will create new anonymous data, possibly skewing the numbers and blurring the information landscape to the point at which you cannot draw any accurate insights. This does not make best use of the valuable data foundation that you have created.
Pseudonymisation is the way forward
Pseudonymisation is a technique that uses a key from your current live dataset and transforms it to be unintelligible allowing you to work on the data for research without fear of accidentally publishing personal data or without people having access to personal information that they shouldn’t. However, unlike anonymisation, which uses a random value as the key to the records, the pseudonymised key is a link to the operational system and data can continue to be added to the dataset.
There are a number of ways of achieving pseudonymisation. At Fotheringham Associates, we have designed an approach for a large programme operated by a children’s charity that is straight forward and secure, based upon algorithms provided by the NSA. For this client, we have inserted pseudonymisation into the build of a data integration platform that enables them to gather data from 20 service providers including three NHS trusts, the local council, a number of charities and a collection of community groups.
The approach produces identical pseudo keys even when processed in different locations and by different systems which is how we have enabled the diverse environment to work in this programme. In addition, the pseudonymised information cannot be reverse engineered, increasing the protection. When combined with a matching routine, the approach is helping to deliver an environment that captures all service users across the programme and will allow research data analysts to trace improvements in outcomes across the in-scope population, seeking to understand the contributing factors. The power of your research data is not simply in its volume, real data power lies with the connections between elements.
Weigh up your options
We urge you to consider the approach to creating research data stores carefully, whether extending the use of data hastily gathered in an emergency or if you start from scratch. Never before have the trade-offs been more important with regulators beginning to flex their new muscle under GDPR.
- You can simply manage your duplicated data with care.
- Anonymisation can be the solution for records if they are beyond the point at which the data will be enriched further, or you accept the constraints.
- Or you can consider pseudonymisation as the path to deliver the best outcome with a good balance of risk and data richness.
It would be remiss not to point out that duplicate data must be included in the application of data subject requests including SARs (subject access requests), withdrawal of consent and retention rules, the latter two events will need data removed from your emergency response or analysis data store as well as the operational one. Whilst this is also true for pseudonymised data, there are a set of processes that can manage such requests.
Therefore, if you are considering whether the invaluable data that you now have at your fingers can be used to help you to help others in the longer term we urge to seriously consider using a pseudonymisation technique which will help protect your organisation, your research data and most importantly your data subjects.
Please note: This is a commercial profile