Identification, data combination and the risk of disclosure
Tatiana V. Komarova (),
Denis Nekipelov () and
Evgeny Yakovlev
Additional contact information
Tatiana V. Komarova: Institute for Fiscal Studies and London School of Economics and Political Science
No CWP38/11, CeMMAP working papers from Centre for Microdata Methods and Practice, Institute for Fiscal Studies
Abstract:
Businesses routinely rely on econometric models to analyze and predict consumer behavior. Estimation of such models may require combining a firm's internal data with external datasets to take into account sample selection, missing observations, omitted variables and errors in measurement within the existing data source. In this paper we point out that these data problems can be addressed when estimating econometric models from combined data using the data mining techniques under mild assumptions regarding the data distribution. However, data combination leads to serious threats to security of consumer data: we demonstrate that point identification of an econometric model from combined data is incompatible with restrictions on the risk of individual disclosure. Consequently, if a consumer model is point identified, the firm would (implicitly or explicitly) reveal the identity of at least some of consumers in its internal data. More importantly, we provide an argument that unless the firm places a restriction on the individual disclosure risk when combining data, even if the raw combined dataset is not shared with a third party, an adversary or a competitor can gather confidential information regarding some individuals from the estimated model.
Date: 2011-12-20
New Economics Papers: this item is included in nep-ecm
References: Add references at CitEc
Citations: View citations in EconPapers (1)
Downloads: (external link)
http://cemmap.ifs.org.uk/wps/cwp3811.pdf (application/pdf)
Our link check indicates that this URL is bad, the error code is: 500 Can't connect to cemmap.ifs.org.uk:80 (No such host is known. )
Related works:
Journal Article: Identification, data combination, and the risk of disclosure (2018) 
Working Paper: Identification, data combination and the risk of disclosure (2018) 
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:ifs:cemmap:38/11
Ordering information: This working paper can be ordered from
The Institute for Fiscal Studies 7 Ridgmount Street LONDON WC1E 7AE
Access Statistics for this paper
More papers in CeMMAP working papers from Centre for Microdata Methods and Practice, Institute for Fiscal Studies The Institute for Fiscal Studies 7 Ridgmount Street LONDON WC1E 7AE. Contact information at EDIRC.
Bibliographic data for series maintained by Emma Hyman ().