Larger Model Causes Lower Classification Accuracy Under Differential Privacy: Reason and Solution

29 Sep 2021 · Yinchen Shen, Zhiguo Wang, Ruoyu Sun, Xiaojing Shen ·

Differential privacy (DP) is an essential technique for privacy-preserving, which works by adding random noise to the data. In deep learning, DP-stochastic gradient descent (SGD) is a popular technique to build privacy-preserving models. With a small noise, however, the large model (such as ResNet50) trained by DP-SGD cannot perform better than the small model (such as ResNet18). To better understand this phenomenon, we study high dimensional DP learning from the viewpoint of generalization. Theoretically, we first demonstrate that for the Gaussian mixture model with even small DP noise, if excess features are used, classification can be as bad as the random guessing since the noise accumulation for the estimation in high dimensional feature space. Then we propose a robust measure to select the important features, which trades off the model accuracy and privacy preserving. Moreover, the conditions under which important features can be selected by the proposed measure are established. Simulation on the real data (such as CIFAR-10) supports our theoretical results and reveals the advantage of the proposed classification and privacy preserving procedure.

PDF Abstract