Search Results for author: Neil Gong

Found 7 papers, 3 papers with code

Stable Signature is Unstable: Removing Image Watermark from Diffusion Models

no code implementations • 12 May 2024 • Yuepeng Hu, Zhengyuan Jiang, Moyang Guo, Neil Gong

Our results show that our attack can effectively remove the watermark from a diffusion model such that its generated images are non-watermarked, while maintaining the visual quality of the generated images.

Decoder

Paper
Add Code

Concealing Backdoor Model Updates in Federated Learning by Trigger-Optimized Data Poisoning

no code implementations • 10 May 2024 • Yujie Zhang, Neil Gong, Michael K. Reiter

Federated Learning (FL) is a decentralized machine learning method that enables participants to collaboratively train a model without sharing their private data.

Backdoor Attack Data Poisoning +2

Paper
Add Code

PLeak: Prompt Leaking Attacks against Large Language Model Applications

1 code implementation • 10 May 2024 • Bo Hui, Haolin Yuan, Neil Gong, Philippe Burlina, Yinzhi Cao

As a result, a natural attack, called prompt leaking, is to steal the system prompt from an LLM application, which compromises the developer's intellectual property.

Language Modelling Large Language Model

Paper
Code

A Transfer Attack to Image Watermarks

no code implementations • 22 Mar 2024 • Yuepeng Hu, Zhengyuan Jiang, Moyang Guo, Neil Gong

The robustness of such watermark-based detector against evasion attacks in the white-box and black-box settings is well understood in the literature.

Paper
Add Code

GradSafe: Detecting Unsafe Prompts for LLMs via Safety-Critical Gradient Analysis

1 code implementation • 21 Feb 2024 • Yueqi Xie, Minghong Fang, Renjie Pi, Neil Gong

In this study, we propose GradSafe, which effectively detects unsafe prompts by scrutinizing the gradients of safety-critical parameters in LLMs.

Paper
Code

Mendata: A Framework to Purify Manipulated Training Data

no code implementations • 3 Dec 2023 • Zonghao Huang, Neil Gong, Michael K. Reiter

Untrusted data used to train a model might have been manipulated to endow the learned model with hidden properties that the data contributor might later exploit.

Data Poisoning

Paper
Add Code

SneakyPrompt: Jailbreaking Text-to-image Generative Models

1 code implementation • 20 May 2023 • Yuchen Yang, Bo Hui, Haolin Yuan, Neil Gong, Yinzhi Cao

Text-to-image generative models such as Stable Diffusion and DALL$\cdot$E raise many ethical concerns due to the generation of harmful images such as Not-Safe-for-Work (NSFW) ones.

Reinforcement Learning (RL) Semantic Similarity +1

102

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.