no code implementations • 24 May 2024 • Igor Shilov, Matthieu Meeus, Yves-Alexandre de Montjoye
This introduces a previously unexplored confounding factor in post-hoc studies of LLM memorization, and questions the effectiveness of (exact) data deduplication as a privacy protection technique.
no code implementations • 24 May 2024 • Florent Guépin, Nataša Krčo, Matthieu Meeus, Yves-Alexandre de Montjoye
Taken together, our results show that current MIA evaluation is averaging the risk across datasets leading to inaccurate risk estimates, and the risk posed by attacks leveraging information about the target dataset to be potentially underestimated.
no code implementations • 5 Apr 2024 • Ana-Maria Cretu, Miruna Rusu, Yves-Alexandre de Montjoye
We evaluate six neural networks architectures as the embedding model.
no code implementations • 14 Feb 2024 • Matthieu Meeus, Igor Shilov, Manuel Faysse, Yves-Alexandre de Montjoye
We here propose to use copyright traps, the inclusion of fictitious entries in original content, to detect the use of copyrighted materials in LLMs with a focus on models where memorization does not naturally occur.
no code implementations • 23 Oct 2023 • Matthieu Meeus, Shubham Jain, Marek Rei, Yves-Alexandre de Montjoye
First, we propose a procedure for the development and evaluation of document-level membership inference for LLMs by leveraging commonly used data sources for training and the model release date.
no code implementations • 4 Jul 2023 • Florent Guépin, Matthieu Meeus, Ana-Maria Cretu, Yves-Alexandre de Montjoye
While membership inference attacks (MIAs), based on shadow modeling, have become the standard to evaluate the privacy of synthetic data, they currently assume the attacker to have access to an auxiliary dataset sampled from a similar distribution as the training dataset.
no code implementations • 17 Jun 2023 • Matthieu Meeus, Florent Guépin, Ana-Maria Cretu, Yves-Alexandre de Montjoye
The choice of vulnerable records is as important as more accurate MIAs when evaluating the privacy of synthetic data releases, including from a legal perspective.
1 code implementation • 8 Jun 2023 • Ana-Maria Cretu, Daniel Jones, Yves-Alexandre de Montjoye, Shruti Tople
We here present the first systematic analysis of the causes of misalignment in shadow models and show the use of a different weight initialisation to be the main cause.
no code implementations • 25 Nov 2022 • Florimond Houssiau, Vincent Schellekens, Antoine Chatalic, Shreyas Kumar Annamraju, Yves-Alexandre de Montjoye
In this paper, we introduce the generic moment-to-moment (M$^2$M) method to perform a wide range of data exploration tasks from a single private sketch.
1 code implementation • 9 Nov 2022 • Ana-Maria Cretu, Florimond Houssiau, Antoine Cully, Yves-Alexandre de Montjoye
We show the attacks found by QS to consistently equate or outperform, sometimes by a large margin, the best attacks from the literature.
no code implementations • 16 Dec 2021 • Ana-Maria Creţu, Florent Guépin, Yves-Alexandre de Montjoye
Second, we propose a model-based attack, showing how an attacker can exploit black-box access to the model to infer the correlations using shadow models trained on synthetic datasets.
1 code implementation • 20 Nov 2015 • Bjarke Felbo, Pål Sundsøy, Alex 'Sandy' Pentland, Sune Lehmann, Yves-Alexandre de Montjoye
Mobile phone metadata is increasingly used for humanitarian purposes in developing countries as traditional data is scarce.