This is the Students and Early Career Researchers (ECRs) homepage of the Privacy and Security in Machine Learning Interest Group.
To get involved, join the #students channel in our Slack.
Past Research Talks:
- 30 March 2022, 18:00 (UK time)
Ana-Maria Cretu (Imperial College London)
Interaction data are identifiable even across long periods of time
Abstract: Fine-grained records of people’s interactions, both offline and online, are collected at large scale. These data contain sensitive information about whom we meet, talk to, and when. We demonstrate here how people’s interaction behavior is stable over long periods of time and can be used to identify individuals in anonymous datasets. Our attack learns the profile of an individual using geometric deep learning and triplet loss optimization. In a mobile phone metadata dataset of more than 40k people, it correctly identifies 52% of individuals based on their 2-hop interaction graph. We further show that the profiles learned by our method are stable over time and that 24% of people are still identifiable after 20 weeks. Our results suggest that people with well-balanced interaction graphs are more identifiable. Applying our attack to Bluetooth close-proximity networks, we show that even 1-hop interaction graphs are enough to identify people more than 26% of the time. Our results provide strong evidence that disconnected and even re-pseudonymized interaction data can be linked together making them personal data under the European Union’s General Data Protection Regulation.
- 9 March 2022
Fatemeh Mireshghallah (UC San Diego)
What Does it Mean for a Language Model to Preserve Privacy?
Abstract: Natural language reflects our private lives and identities, making its privacy concerns as broad as those of real life. Language models lack the ability to understand the context and sensitivity of text, and tend to memorize phrases present in their training sets. In this talk, we first discuss the potential risks of language models, and then focus on what is required for a language model to be considered privacy preserving, and what the challenges are in making it happen. Then we discuss the mismatch between the narrow assumptions made by popular data protection techniques (data sanitization and differential privacy), and the broadness of natural language and of privacy as a social norm. Finally, we see other alternatives and have a discussion on what the possible paths forward are.
- 2 March 2022
Alexandre Sablayrolles, Pierre Stock, and Igor Shilov
Open-Source Libraries for Privacy Attacks and Differentially Private Training: Privacy Linter and Opacus
Abstract: As the field of Privacy Preserving ML is advancing, it’s important that researchers and industry practitioners have access to state of the art tools for both research and application purposes. In this talk, we’ll discuss two open-source libraries for Privacy Attacks and Differentially Private training, developed at Meta AI: Privacy Linter and Opacus. We’ll do a deep dive into their capabilities, talk about code architecture and share some practical tips on applying them to a set of real-world problems.
Bios: Alexandre Sablayrolles is a Research Scientist at Facebook AI in Paris, working on the privacy and security of machine learning systems. He received his PhD from Université Grenoble Alpes in 2020, following a joint CIFRE program with Facebook AI. Prior to that, he completed his Master’s degree in Data Science at NYU, and received a B.S. and M.S. in Applied Mathematics and Computer Science from École Polytechnique. Alexandre’s research interests include privacy and security, computer vision, and applications of deep learning. Homepage: https://ai.facebook.com/people/alexandre-sablayrolles/
Pierre Stock joined Facebook AI as a Research Scientist in June 2021. Previously, he was a PhD Resident at Facebook AI Research and ENS de Lyon and defended his PhD around “Efficiency and Redundancy in Neural Networks” in April 2021. His interests include Neural Network Compression and Privacy-Preserving Machine Learning. Homepage: https://ai.facebook.com/people/pierre-stock/
Igor Shilov is a Research Engineer at Facebook AI, working on applied research in privacy preserving machine learning. He is the lead developer of Opacus and has industry experience in building highly scalable ML Systems, including NLP applications, Recommender Systems and Information Retrieval Engines. Homepage: https://github.com/ffuuugor
- 9 Feb 2022
Praneeth Vepakomma (MIT)
Private Measurement of Nonlinear Correlations in a Distributed Setting
Abstract: We introduce a differentially private method to measure nonlinear correlations between sensitive data hosted across two entities. We provide utility guarantees of our private estimator. Ours is the first such private estimator of nonlinear correlations, to the best of our knowledge within a multi-party setup. The important measure of nonlinear correlation we consider is distance correlation. This work has direct applications to private feature screening, private independence testing, private k-sample tests, private multi-party causal inference and private data synthesis in addition to exploratory data analysis.