Machine Unlearning in Gradient Boosting Decision Trees

KDD 2023 · Huawei Lin, Jun Woo Chung, Yingjie Lao, Weijie Zhao ·

Various machine learning applications take users' data to train the models. Recently enforced legislation requires companies to remove users' data upon requests, i.e.,the right to be forgotten. In the context of machine learning, the trained model potentially memorizes the training data. Machine learning algorithms have to be able to unlearn the user data that are requested to delete to meet the requirement. Gradient Boosting Decision Trees (GBDT) is a widely deployed model in many machine learning applications. However, few studies investigate the unlearning on GBDT. This paper proposes a novel unlearning framework for GBDT. To the best of our knowledge, this is the first work that considers machine unlearning on GBDT. It is not straightforward to transfer the unlearning methods of DNN to GBDT settings. We formalized the machine unlearning problem and its relaxed version. We propose an unlearning framework that efficiently and effectively unlearns a given collection of data without retraining the model from scratch. We introduce a collection of techniques, including random split point selection and random partitioning layers training, to the training process of the original tree models to ensure that the trained model requires few subtree retrainings during the unlearning. We investigate the intermediate data and statistics to store as an auxiliary data structure during the training so that we can immediately determine if a subtree is required to be retrained without touching the original training dataset. Furthermore, a lazy update technique is proposed as a trade-off between unlearning time and model functionality. We experimentally evaluate our proposed methods on public datasets. The empirical results confirm the effectiveness of our framework.

PDF Abstract