Search Results for author: Jeffrey G. Wang

Found 1 papers, 1 papers with code

Pandora's White-Box: Precise Training Data Detection and Extraction in Large Language Models

1 code implementation26 Feb 2024 Jeffrey G. Wang, Jason Wang, Marvin Li, Seth Neel

In fine-tuning, we find that a simple attack based on the ratio of the loss between the base and fine-tuned models is able to achieve near-perfect MIA performance; we then leverage our MIA to extract a large fraction of the fine-tuning dataset from fine-tuned Pythia and Llama models.

Language Modelling

Cannot find the paper you are looking for? You can Submit a new open access paper.