Search Results for author: Zhenting Wang

Found 13 papers, 9 papers with code

Exploring Concept Depth: How Large Language Models Acquire Knowledge at Different Layers?

1 code implementation • 10 Apr 2024 • Mingyu Jin, Qinkai Yu, Jingyuan Huang, Qingcheng Zeng, Zhenting Wang, Wenyue Hua, Haiyan Zhao, Kai Mei, Yanda Meng, Kaize Ding, Fan Yang, Mengnan Du, Yongfeng Zhang

In this paper, we explore the hypothesis that LLMs process concepts of varying complexities in different layers, introducing the idea of "Concept Depth" to suggest that more complex concepts are typically acquired in deeper layers.

Paper
Code

Finding needles in a haystack: A Black-Box Approach to Invisible Watermark Detection

no code implementations • 23 Mar 2024 • Minzhou Pan, Zhenting Wang, Xin Dong, Vikash Sehwag, Lingjuan Lyu, Xue Lin

In this paper, we propose WaterMark Detection (WMD), the first invisible watermark detection method under a black-box and annotation-free setting.

Paper
Add Code

Health-LLM: Personalized Retrieval-Augmented Disease Prediction System

1 code implementation • 1 Feb 2024 • Mingyu Jin, Qinkai Yu, Dong Shu, Chong Zhang, Lizhou Fan, Wenyue Hua, Suiyuan Zhu, Yanda Meng, Zhenting Wang, Mengnan Du, Yongfeng Zhang

Compared to traditional health management applications, our system has three main advantages: (1) It integrates health reports and medical knowledge into a large model to ask relevant questions to large language model for disease prediction; (2) It leverages a retrieval augmented generation (RAG) mechanism to enhance feature extraction; (3) It incorporates a semi-automated feature updating framework that can merge and delete features to improve accuracy of disease prediction.

Disease Prediction Language Modelling +3

Paper
Code

DIAGNOSIS: Detecting Unauthorized Data Usages in Text-to-image Diffusion Models

1 code implementation • 6 Jul 2023 • Zhenting Wang, Chen Chen, Lingjuan Lyu, Dimitris N. Metaxas, Shiqing Ma

To address this issue, we propose a method for detecting such unauthorized data usage by planting the injected memorization into the text-to-image diffusion models trained on the protected dataset.

Memorization

Paper
Code

Alteration-free and Model-agnostic Origin Attribution of Generated Images

no code implementations • 29 May 2023 • Zhenting Wang, Chen Chen, Yi Zeng, Lingjuan Lyu, Shiqing Ma

To overcome this problem, we first develop an alteration-free and model-agnostic origin attribution method via input reverse-engineering on image generation models, i. e., inverting the input of a particular model for a specific image.

Image Generation

Paper
Add Code

NOTABLE: Transferable Backdoor Attacks Against Prompt-based NLP Models

1 code implementation • 28 May 2023 • Kai Mei, Zheng Li, Zhenting Wang, Yang Zhang, Shiqing Ma

Such attacks can be easily affected by retraining on downstream tasks and with different prompting strategies, limiting the transferability of backdoor attacks.

Paper
Code

UNICORN: A Unified Backdoor Trigger Inversion Framework

1 code implementation • 5 Apr 2023 • Zhenting Wang, Kai Mei, Juan Zhai, Shiqing Ma

Then, it proposes a unified framework to invert backdoor triggers based on the formalization of triggers and the identified inner behaviors of backdoor models from our analysis.

Backdoor Attack

Paper
Code

Backdoor Vulnerabilities in Normally Trained Deep Learning Models

no code implementations • 29 Nov 2022 • Guanhong Tao, Zhenting Wang, Siyuan Cheng, Shiqing Ma, Shengwei An, Yingqi Liu, Guangyu Shen, Zhuo Zhang, Yunshu Mao, Xiangyu Zhang

We leverage 20 different types of injected backdoor attacks in the literature as the guidance and study their correspondences in normally trained models, which we call natural backdoor vulnerabilities.

Data Poisoning

Paper
Add Code

Rethinking the Reverse-engineering of Trojan Triggers

1 code implementation • 27 Oct 2022 • Zhenting Wang, Kai Mei, Hailun Ding, Juan Zhai, Shiqing Ma

On average, the detection accuracy of our method is 93\%.

Paper
Code

BppAttack: Stealthy and Efficient Trojan Attacks against Deep Neural Networks via Image Quantization and Contrastive Adversarial Learning

1 code implementation • CVPR 2022 • Zhenting Wang, Juan Zhai, Shiqing Ma

Existing attacks use visible patterns (e. g., a patch or image transformations) as triggers, which are vulnerable to human inspection.

Contrastive Learning Quantization

Paper
Code

Training with More Confidence: Mitigating Injected and Natural Backdoors During Training

1 code implementation • 13 Feb 2022 • Zhenting Wang, Hailun Ding, Juan Zhai, Shiqing Ma

By further analyzing the training process and model architectures, we found that piece-wise linear functions cause this hyperplane surface.

Backdoor Attack

Paper
Code

Complex Backdoor Detection by Symmetric Feature Differencing

1 code implementation • CVPR 2022 • Yingqi Liu, Guangyu Shen, Guanhong Tao, Zhenting Wang, Shiqing Ma, Xiangyu Zhang

Our results on the TrojAI competition rounds 2-4, which have patch backdoors and filter backdoors, show that existing scanners may produce hundreds of false positives (i. e., clean models recognized as trojaned), while our technique removes 78-100% of them with a small increase of false negatives by 0-30%, leading to 17-41% overall accuracy improvement.

Paper
Code

EX-RAY: Distinguishing Injected Backdoor from Natural Features in Neural Networks by Examining Differential Feature Symmetry

no code implementations • 16 Mar 2021 • Yingqi Liu, Guangyu Shen, Guanhong Tao, Zhenting Wang, Shiqing Ma, Xiangyu Zhang

A prominent challenge is hence to distinguish natural features and injected backdoors.

Backdoor Attack

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.