0% found this document useful (0 votes)
58 views7 pages

10 Misunderstandings Related To Anonymisation 1619560700

1) Anonymisation is the process of rendering personal data anonymous by removing direct and indirect identifiers. 2) Some key misunderstandings about anonymisation include thinking that pseudonymisation is the same as anonymisation, that encryption anonymises data, or that anonymisation of data is always possible and permanent. 3) In reality, anonymisation aims to balance reducing re-identification risk with maintaining data utility, but full anonymisation may not always be possible depending on the context and nature of the data. Re-identification risks could also increase over time with new information or technologies.

Uploaded by

Vanessa Lerner
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
58 views7 pages

10 Misunderstandings Related To Anonymisation 1619560700

1) Anonymisation is the process of rendering personal data anonymous by removing direct and indirect identifiers. 2) Some key misunderstandings about anonymisation include thinking that pseudonymisation is the same as anonymisation, that encryption anonymises data, or that anonymisation of data is always possible and permanent. 3) In reality, anonymisation aims to balance reducing re-identification risk with maintaining data utility, but full anonymisation may not always be possible depending on the context and nature of the data. Re-identification risks could also increase over time with new information or technologies.

Uploaded by

Vanessa Lerner
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

10

MISUNDERSTANDINGS RELATED TO

ANONYMISATION

10 MISUNDERSTANDINGS RELATED TO ANONYMISATION


Anonymisation is the process of rendering personal data anonymous.

According to the European Union’s data protection laws, in particular the General Data
Protection Regulation (GDPR)1, anonymous data is “information which does not relate to
an identified or identifiable natural person or to personal data rendered anonymous in such
a manner that the data subject is not or no longer identifiable”. Datasets2 which include
personal data may contain direct and indirect identifiers, which allow an individual to be
identified or become identifiable. A direct identifier is specific information that references
to an individual, such as name or an identification number. An indirect identifier (also called
quasi-identifier) is any piece of information (e.g. a geographical position in a certain moment
or an opinion about a certain topic) that could be used, either individually or in combination
with other quasi-identifiers, by someone that has knowledge about that individual with the
purpose of re-identifying an individual in the dataset3 4. The re-identification likelihood
is the probability in a given dataset of re-identifying an individual, by turning anonymised
data back into personal data through the use of data matching or similar techniques. The
utility of a dataset is a measure of how useful that information is for the intended purpose
(e.g. a research study on a specific disease).

Throughout the years, there have been several examples of incomplete or wrongfully
conducted anonymisation processes that resulted in the re-identification of individuals.
In 2006, a movie-streaming service, for instance, published a dataset containing 10 million
movie rankings made by 500,000 customers claiming that it was anonymous, but it was later
found that it would only take a little bit of knowledge about the subscriber for an adversary
to be able to identify that subscriber’s record in the dataset5. Another example of deficient
anonymisation: in 2013, the New York City Taxi and Limousine Commission published a
data sheet with more than 173 million individual taxi trips containing the pickup and drop-
off location, times and supposedly anonymised licence numbers. The dataset had not been
correctly anonymised, and it was possible to identify the original licence numbers and even
the individual drivers of those taxis6.

Anonymous data play an important role in the context of research in the fields of medicine,
demographics, marketing, economy, statistics and many others. However, this interest
coincided with the spread of related misunderstandings. The objective of this document
is to raise public awareness about some misunderstandings about anonymisation, and to
motivate its readers to check assertions about the technology, rather than accepting them
without verification.

This document lists ten of these misunderstandings, explains the facts and provides references
for further reading.

1 http://data.europa.eu/eli/reg/2016/679/2016-05-04.
2 A dataset is a structured collection of data. A table where each column represents a particular variable and each row corresponds
to a different record is an example of a dataset.
3 Barth-Jones, D. (2012). The ‘re-identification’ of Governor William Weld’s medical information: a critical re-examination of
health data identification risks and privacy protections, then and now. Then and Now (July 2012). https://papers.ssrn.com/sol3/papers.
cfm?abstract_id=2076397.
4 Khaled El Emam and Bradley Malin, “Appendix B: Concepts and Methods for De-identifying Clinical Trial Data,” Sharing
Clinical Trial Data: Maximizing Benefits, Minimizing Risk (Washington D.C.: National Academies Press, 2015), http://www.ncbi.nlm.nih.
gov/books/NBK285994.
5 Narayanan, A., & Shmatikov, V. (2006). How to break anonymity of the Netflix prize dataset. arXiv preprint cs/0610105. https://
arxiv.org/abs/cs/0610105.
6 Pandurangan, V. (2014). On taxis and rainbows: Lessons from NYC’s improperly anonymized taxi logs. Medium. Accessed
November, 30, 2015. https://tech.vijayp.ca/of-taxis-and-rainbows-f6bc289679a1.

10 MISUNDERSTANDINGS RELATED TO ANONYMISATION


MISUNDERSTANDING 1. MISUNDERSTANDING 2.
“Pseudonymisation is the “Encryption is
same as anonymisation” anonymisation”
Fact: Pseudonymisation is not the same Fact: Encryption is not an
as anonymisation. anonymisation technique, but it can be
a powerful pseudonymisation tool.
The GDPR defines ‘pseudonymisation’ as
‘the processing of personal data in such The encryption process uses secret keys to
a manner that the personal data can no transform the information in a way that
longer be attributed to a specific data reduces the risk of misuse, while keeping
subject without the use of additional confidentiality for a given period of time.
information, provided that such additional Because the original information needs to
information is kept separately and is subject be accessible, the transformations applied
to technical and organisational measures by encryption algorithms are designed to be
to ensure that the personal data are not reversible, in what is known as decryption.
attributed to an identified or identifiable The secret keys used for decryption are the
natural person’. This means that the use aforementioned ‘additional information’
of ‘additional information’ can lead to the (see Misunderstanding 1), which can
identification of the individuals, which is make the personal data readable and,
why pseudonymous personal data is still consequently, the identification possible.
personal data.
The secret keys used for decryption are the
Anonymous data, on the other hand, cannot aforementioned ‘additional information’
be associated to specific individuals. Once (see Misunderstanding 1), which can
data is truly anonymous and individuals make the personal data readable and,
are no longer identifiable, the data will not consequently, the identification possible.
fall within the scope of the GDPR.
In theory, it could be considered that
deleting the encryption key of encrypted
data would render it anonymous, but
this is not the case. One cannot assume
that encrypted data cannot be decrypted
because the decryption key is said to be
“erased” or “unknown”. There are many
factors affecting the confidentiality of
encrypted data, especially in the long term.
Among these factors are the strength of
the encryption algorithm and of the key,
information leaks, implementation issues,
amount of encrypted data, or technological
advances (e.g. quantum computing71).

7 TechDispatch #2/2020: Quantum Computing and


Cryptography, 7 August 2020, European Data Protection
Supervisor https://edps.europa.eu/data-protection/our-work/
publications/techdispatch/techdispatch-22020-quantum-
computing-and_en

10 MISUNDERSTANDINGS RELATED TO ANONYMISATION


MISUNDERSTANDING 3. MISUNDERSTANDING 4.
“Anonymisation of data “Anonymisation is
is always possible” forever”
Fact: It is not always possible to lower Fact: There is a risk that some
the re-identification risk below a anonymisation processes could be
previously defined threshold whilst reverted in the future. Circumstances
retaining a useful dataset for a specific might change over time and new
processing. technical developments and the
availability of additional information
Anonymisation is a process that tries to find might compromise previous
the right balance between reducing the re- anonymisation processes.
identification risk and keeping the utility
of a dataset for the envisaged purpose(s). The computing resources and new
However, depending on the context or the technologies (or new ways to apply existing
nature of the data, the re-identification risks technologies) available to an attacker that
cannot sufficiently mitigated. This could could try to re-identify an anonymous
be the situation when the total number of dataset change overtime. Nowadays, cloud
possible individuals (‘universe of subjects’) computing provides affordable computing
is too small (e.g. an anonymous dataset capability to levels and prices that were
containing only the 705 members of the unthinkable years ago. In the future,
European Parliament), when the categories quantum computers might also alter
of data are so different among individuals what is nowadays considered “reasonable
that it is possible to single these individuals means” 103.
out (e.g. device fingerprint of the systems
that accessed a certain website) or when Also, the disclosure of additional data over
the case of datasets include a high number the years (e.g. in a personal data breach)
of demographic attributes81 or location can make it possible to link previously
data29. anonymous data to identified individuals.
The release of many decades old records
containing highly sensitive data (e.g.
criminal records) could still have a severely
detrimental effect on an individual or
relatives411 .

8 Rocher, L., Hendrickx, J. M., & De Montjoye, Y. A.


(2019). Estimating the success of re-identifications in incomplete
datasets using generative models. Nature communications,
10(1), 1-9, https://doi.org/10.1038/s41467-019-10933-3 10 EDPS TechDispatch - Quantum computing
9 Xu, F., Tu, Z., Li, Y., Zhang, P., Fu, X., & Jin, D. and cryptography. Issue 2, 2020, https://data.europa.eu/
(2017, April). Trajectory recovery from ash: User privacy is not doi/10.2804/36404
preserved in aggregated mobility data. In Proceedings of the 11 Graham, C. (2012). Anonymisation: managing data
26th international conference on world wide web (pp. 1241-1250), protection risk code of practice. Information Commissioner’s
https://dl.acm.org/doi/abs/10.1145/3038912.3052620 Office. https://ico.org.uk/media/1061/anonymisation-code.pdf .

10 MISUNDERSTANDINGS RELATED TO ANONYMISATION


MISUNDERSTANDING 5. MISUNDERSTANDING 6.
“Anonymisation always “Anonymisation is a
reduces the probability binary concept that
of re-identification of a cannot be measured”
dataset to zero” Fact: It is possible to analyse and
measure the degree of anonymization.
Fact: The anonymisation process and
the way it is implemented will have a
The expression “anonymous data” cannot
direct influence on the likelihood of
not be perceived as if datasets could
re-identification risks.
simply be labelled as anonymous or
not. The records in any dataset have a
A robust anonymisation process aims to
probability of being re-identified based on
reduce the re-identification risk below
how possible it is to single them out. Any
a certain threshold. Such threshold will
robust anonymisation process will assess
depend on several factors such as the
the re-identification risk, which should be
existing mitigation controls (none in the
managed and controlled over the time132.
context of public disclosure), the impact
on individuals’ privacy in the event of re-
identification, the motives and the capacity Except for specific cases where data is
of an attacker to re-identify the data12 highly generalised (e.g. a dataset counting
1
.
the number of visitors of a website per
country in a year), the re-identification risk
Although a 100% anonymisation is the
most desirable goal from a personal data is never zero.
protection perspective, in some cases it
is not possible and a residual risk of re-
identification must be considered.

12 External guidance on the implementation of the


European Medicines Agency policy on the publication of clinical 13 Step 4: Measure the data risk. De-identification
data for medicinal products for human use (2016) https://www. Guidelines for Structured Data, Information and Privacy
ema.europa.eu/en/documents/regulatory-procedural-guideline/ Commissioner of Ontario June 2016. https://www.ipc.on.ca/
external-guidance-implementation-european-medicines- wp-content/uploads/2016/08/Deidentification-Guidelines-for-
agency-policy-publication-clinical-data_en-0.pdf Structured-Data.pdf

10 MISUNDERSTANDINGS RELATED TO ANONYMISATION


MISUNDERSTANDING 7. MISUNDERSTANDING 8.
“Anonymisation can be “Anonymisation makes
fully automated” the data useless”
Fact: Automated tools can be used Fact: A proper anonymisation process
during the anonymisation process, keeps the data functional for a given
however, given the importance of purpose.
the context in the overall process
assessment, human expert intervention The purpose of anonymisation is to
is needed. prevent individuals in a dataset from being
identified. The anonymisation techniques
On the contrary, it requires an analysis of will always restrict the ways in which the
the original dataset, its intended purposes, resulting dataset can be used. For example,
the techniques to apply and the re- grouping dates of birth into year intervals
identification risk of the resulting data114. will reduce the re-identification risk while
at the same time reducing the dataset
The identification and deletion of direct utility in some cases. This does not mean
identifiers (also known as ‘masking’), that anonymous data will become useless,
while being an important part of the but rather that its utility will depend
anonymisation process, must always on the purpose and the acceptable re-
be followed by a cautious analysis for identification risk.
other sources of (indirect) identification152
(generally through quasi-identifiers). While On the other hand, personal data cannot
direct identifiers are somewhat trivial to be permanently stored beyond its original
find, indirect identifiers, on the other side, purpose, waiting for a chance where it
are not always obvious, and the failure to might become useful for other purposes.
detect them can result in the reversion of The solution for some controllers might
the process (i.e. re-identification), with be anonymisation, where personal data
consequences for the privacy of individuals. can be detached and discarded from the
dataset, while the remaining dataset still
Automation could be key for some steps retains a useful meaning. An example could
of the anonymisation process, such as be the anonymisation of access logs of a
the removal of direct identifiers or the website, by keeping only the access date
consistent application of a generalisation and accessed page, but not the information
procedure over a variable.16
3
On the contrary, on who accessed it.
it seems unlikely that a fully automatised
process might identify quasi-identifiers The “data minimisation” principle requires
in different contexts or decide how to the controller to determine if it is necessary
maximise data utility by applying specific to process personal data in order to fulfil a
techniques to specific variables. particular purpose, or if that purpose can
14 Recommendation section (5.2) of Article 29
also be achieved with anonymous data.
Data Protection Working Party. (2014). Opinion 05/2014 on
Anonymisation Techniques. https://ec.europa.eu/justice/
article-29/documentation/opinion-recommendation/files/2014/
In certain cases, this might lead to the
wp216_en.pdf conclusion that rendering the data
15 Guess Who? 5 examples why removing names fails
as anonymization, https://www.syntho.ai/5-examples-why-
anonymous will not fit the intended
removing-names-fails-as-anonymization purpose. In such cases, the controller will
16 See for examples e.g. F. Diaz, N. Mamede, J. Baptista
(2016), Automated Anonymization of Text Documents, https://
have to choose between processing personal
www.hlt.inesc-id.pt/~fdias/mscthesis/automated_text_ data (and use e.g. pseudonymisation) and
anonymization.pdf
10 MISUNDERSTANDINGS RELATED TO ANONYMISATION
apply the GDPR, or not to process the data MISUNDERSTANDING 10.
at all.
“There is no risk and no
interest in finding out to
whom this data refers to“
MISUNDERSTANDING 9.
Fact: Personal data has a value in itself,
“Following an for the individuals themselves and for
anonymisation process third parties. Re-identification of an
that others used individual could have a serious impact
for his rights and freedoms.
successfully will lead
our organisation to Attacks against anonymisation can be either
deliberate attempts at re-identification,
equivalent results” unintended attempts at re-identification,
data breaches or releasing data to the
Fact: Anonymisation processes need to
public17
1
. The likelihood of someone trying to
be tailored to the nature, scope, context
re-identify an individual only touches upon
and purposes of processing as well
the first type. The possibility of someone
as the risks of varying likelihood and
re-identifying at least one person in a
severity for the rights and freedoms of
dataset, be it out of curiosity, by chance or
natural persons.
driven by an actual interest (e.g. scientific
research, journalism or criminal activity)
Anonymisation cannot be applied akin to
cannot be disregarded18 2
..
following a recipe, because the context
(nature, scope, context and purposes of the
It can be difficult to accurately assess the
processing of the data) are likely different
impact of re-identification on a person’s
from one circumstance to another, and
private life, because it will always depend
from one organisation to another. An
on the context and on the information
anonymisation process might have a re-
that is correlated. For example, the re-
identification risk below a certain threshold
identification of a data subject in the
when the data is only made available to a
context of the seemingly harmless context
limited number of recipients, whereas the
of his or her movie preferences might lead
re-identification risk will not be able to
to inferring about that person’s political
meet that threshold when the data is made
leanings or sexual orientation193.. Such
available to the general public.
particularly sensitive data are however
accorded special protection under the
Different datasets might be available in
GDPR.
different contexts. These could be cross-
referenced with the anonymous data
affecting the re-identification risk. For
example, in Sweden, details of taxpayers’
personal data are publicly available, while 17 Khaled El Emam and Luk Arbuckle, Anonymizing
in Spain they are not. Therefore, even if Health Data (p. 29-33).
18 Khaled El Emam, Elizabeth Jonker, Luk Arbuckle,
datasets including information of Spanish Bradley Malin, “A Systematic Review of Re-Identification
and Swedish citizens would be anonymised Attacks on Health Data”, 11 December 2011.
19 Narayanan, Arvind; Shmatikov, Vitaly. “Robust De-
following the same procedure, the re- anonymization of Large Sparse Datasets” (PDF). Retrieved
identification risks could be different. 2 March 2021. https://www.cs.utexas.edu/~shmat/shmat_
oak08netflix.pdf .

10 MISUNDERSTANDINGS RELATED TO ANONYMISATION

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy