10 Misunderstandings Related To Anonymisation 1619560700
10 Misunderstandings Related To Anonymisation 1619560700
MISUNDERSTANDINGS RELATED TO
ANONYMISATION
According to the European Union’s data protection laws, in particular the General Data
Protection Regulation (GDPR)1, anonymous data is “information which does not relate to
an identified or identifiable natural person or to personal data rendered anonymous in such
a manner that the data subject is not or no longer identifiable”. Datasets2 which include
personal data may contain direct and indirect identifiers, which allow an individual to be
identified or become identifiable. A direct identifier is specific information that references
to an individual, such as name or an identification number. An indirect identifier (also called
quasi-identifier) is any piece of information (e.g. a geographical position in a certain moment
or an opinion about a certain topic) that could be used, either individually or in combination
with other quasi-identifiers, by someone that has knowledge about that individual with the
purpose of re-identifying an individual in the dataset3 4. The re-identification likelihood
is the probability in a given dataset of re-identifying an individual, by turning anonymised
data back into personal data through the use of data matching or similar techniques. The
utility of a dataset is a measure of how useful that information is for the intended purpose
(e.g. a research study on a specific disease).
Throughout the years, there have been several examples of incomplete or wrongfully
conducted anonymisation processes that resulted in the re-identification of individuals.
In 2006, a movie-streaming service, for instance, published a dataset containing 10 million
movie rankings made by 500,000 customers claiming that it was anonymous, but it was later
found that it would only take a little bit of knowledge about the subscriber for an adversary
to be able to identify that subscriber’s record in the dataset5. Another example of deficient
anonymisation: in 2013, the New York City Taxi and Limousine Commission published a
data sheet with more than 173 million individual taxi trips containing the pickup and drop-
off location, times and supposedly anonymised licence numbers. The dataset had not been
correctly anonymised, and it was possible to identify the original licence numbers and even
the individual drivers of those taxis6.
Anonymous data play an important role in the context of research in the fields of medicine,
demographics, marketing, economy, statistics and many others. However, this interest
coincided with the spread of related misunderstandings. The objective of this document
is to raise public awareness about some misunderstandings about anonymisation, and to
motivate its readers to check assertions about the technology, rather than accepting them
without verification.
This document lists ten of these misunderstandings, explains the facts and provides references
for further reading.
1 http://data.europa.eu/eli/reg/2016/679/2016-05-04.
2 A dataset is a structured collection of data. A table where each column represents a particular variable and each row corresponds
to a different record is an example of a dataset.
3 Barth-Jones, D. (2012). The ‘re-identification’ of Governor William Weld’s medical information: a critical re-examination of
health data identification risks and privacy protections, then and now. Then and Now (July 2012). https://papers.ssrn.com/sol3/papers.
cfm?abstract_id=2076397.
4 Khaled El Emam and Bradley Malin, “Appendix B: Concepts and Methods for De-identifying Clinical Trial Data,” Sharing
Clinical Trial Data: Maximizing Benefits, Minimizing Risk (Washington D.C.: National Academies Press, 2015), http://www.ncbi.nlm.nih.
gov/books/NBK285994.
5 Narayanan, A., & Shmatikov, V. (2006). How to break anonymity of the Netflix prize dataset. arXiv preprint cs/0610105. https://
arxiv.org/abs/cs/0610105.
6 Pandurangan, V. (2014). On taxis and rainbows: Lessons from NYC’s improperly anonymized taxi logs. Medium. Accessed
November, 30, 2015. https://tech.vijayp.ca/of-taxis-and-rainbows-f6bc289679a1.