1 code implementation • 10 Apr 2024 • Mingyu Jin, Qinkai Yu, Jingyuan Huang, Qingcheng Zeng, Zhenting Wang, Wenyue Hua, Haiyan Zhao, Kai Mei, Yanda Meng, Kaize Ding, Fan Yang, Mengnan Du, Yongfeng Zhang
In this paper, we explore the hypothesis that LLMs process concepts of varying complexities in different layers, introducing the idea of "Concept Depth" to suggest that more complex concepts are typically acquired in deeper layers.
no code implementations • 23 Mar 2024 • Minzhou Pan, Zhenting Wang, Xin Dong, Vikash Sehwag, Lingjuan Lyu, Xue Lin
In this paper, we propose WaterMark Detection (WMD), the first invisible watermark detection method under a black-box and annotation-free setting.
1 code implementation • 1 Feb 2024 • Mingyu Jin, Qinkai Yu, Dong Shu, Chong Zhang, Lizhou Fan, Wenyue Hua, Suiyuan Zhu, Yanda Meng, Zhenting Wang, Mengnan Du, Yongfeng Zhang
Compared to traditional health management applications, our system has three main advantages: (1) It integrates health reports and medical knowledge into a large model to ask relevant questions to large language model for disease prediction; (2) It leverages a retrieval augmented generation (RAG) mechanism to enhance feature extraction; (3) It incorporates a semi-automated feature updating framework that can merge and delete features to improve accuracy of disease prediction.
1 code implementation • 6 Jul 2023 • Zhenting Wang, Chen Chen, Lingjuan Lyu, Dimitris N. Metaxas, Shiqing Ma
To address this issue, we propose a method for detecting such unauthorized data usage by planting the injected memorization into the text-to-image diffusion models trained on the protected dataset.
no code implementations • 29 May 2023 • Zhenting Wang, Chen Chen, Yi Zeng, Lingjuan Lyu, Shiqing Ma
To overcome this problem, we first develop an alteration-free and model-agnostic origin attribution method via input reverse-engineering on image generation models, i. e., inverting the input of a particular model for a specific image.
1 code implementation • 28 May 2023 • Kai Mei, Zheng Li, Zhenting Wang, Yang Zhang, Shiqing Ma
Such attacks can be easily affected by retraining on downstream tasks and with different prompting strategies, limiting the transferability of backdoor attacks.
1 code implementation • 5 Apr 2023 • Zhenting Wang, Kai Mei, Juan Zhai, Shiqing Ma
Then, it proposes a unified framework to invert backdoor triggers based on the formalization of triggers and the identified inner behaviors of backdoor models from our analysis.
no code implementations • 29 Nov 2022 • Guanhong Tao, Zhenting Wang, Siyuan Cheng, Shiqing Ma, Shengwei An, Yingqi Liu, Guangyu Shen, Zhuo Zhang, Yunshu Mao, Xiangyu Zhang
We leverage 20 different types of injected backdoor attacks in the literature as the guidance and study their correspondences in normally trained models, which we call natural backdoor vulnerabilities.
1 code implementation • 27 Oct 2022 • Zhenting Wang, Kai Mei, Hailun Ding, Juan Zhai, Shiqing Ma
On average, the detection accuracy of our method is 93\%.
1 code implementation • CVPR 2022 • Zhenting Wang, Juan Zhai, Shiqing Ma
Existing attacks use visible patterns (e. g., a patch or image transformations) as triggers, which are vulnerable to human inspection.
1 code implementation • 13 Feb 2022 • Zhenting Wang, Hailun Ding, Juan Zhai, Shiqing Ma
By further analyzing the training process and model architectures, we found that piece-wise linear functions cause this hyperplane surface.
1 code implementation • CVPR 2022 • Yingqi Liu, Guangyu Shen, Guanhong Tao, Zhenting Wang, Shiqing Ma, Xiangyu Zhang
Our results on the TrojAI competition rounds 2-4, which have patch backdoors and filter backdoors, show that existing scanners may produce hundreds of false positives (i. e., clean models recognized as trojaned), while our technique removes 78-100% of them with a small increase of false negatives by 0-30%, leading to 17-41% overall accuracy improvement.
no code implementations • 16 Mar 2021 • Yingqi Liu, Guangyu Shen, Guanhong Tao, Zhenting Wang, Shiqing Ma, Xiangyu Zhang
A prominent challenge is hence to distinguish natural features and injected backdoors.