Anonymity, Consent, and Other Noble Lies: An Empirical Study of the Data Economy

Joel REARDON - Associate Professor
Digital Security

Date: -
Location: Eurecom

Abstract: While legal scholars have cited decades of computer science research that demonstrates why anonymity is a hard problem (and that datasets should not be labelled as "anonymous" cavalierly), industry and legal practitioners have not heeded those warnings: many organizations trafficking in consumer data continue to make assertions that, for example, hashed email addresses are anonymous and cannot reveal the original email address, and that device-based identifiers, such as advertising IDs, only identify devices and not people. We acquired datasets from multiple data brokers to empirically demonstrate why these assertions are false. Using publicly available email addresses found in data breaches posted on the Internet, we show that one can trivially reidentify 88% of the hashed email addresses that we obtained. Reidentifying hashed email addresses need not rely on illicit data: by constructing rainbow tables, we reidentified a majority of the hashed email addresses. In all cases, the hashed email addresses were linked to other device-based identifiers (e.g., mobile data advertising IDs, IPs, etc.), demonstrating why device-based identifiers have long been considered personally identifiable information. Relatedly, organizations trafficking in this data make another assertion, that this data was collected from consumers with their consent. To evaluate this claim, we performed a survey (n = 369), in which we emailed the reidentified individuals in our datasets to recruit them to participate in a survey. This survey asked participants about their recollections of having provided consent (99.1% had no recollection of providing consent) and whether they would prefer that the data brokers delete their data (94.2% said they would prefer their email address was not sold, while 76.4% said they planned to submit deletion requests). Bio: Joel Reardon is an associate professor at the University of Calgary who researches mobile security and privacy issues and data collection done through those devices. He has also co-founded the privacy analytics company AppCensus. He received his Bachelors and Master's at the University of Waterloo and his Doctor of Sciences at the ETH Zurich. His research has been covered by the CBC, the BBC, the Washington Post, and the Wall Street Journal, among other places. His research has received the Emilio Aced Research and Personal Data Protection Award, the CNIL - Inria Data Protection Award, and the Caspar Bowden Award for Outstanding Research in Privacy Enhancing Technologies. He likes bicycling and snowboarding and is currently trying to improve his French.