Search Results for author: Neil Gong

Found 7 papers, 3 papers with code

Stable Signature is Unstable: Removing Image Watermark from Diffusion Models

no code implementations12 May 2024 Yuepeng Hu, Zhengyuan Jiang, Moyang Guo, Neil Gong

Our results show that our attack can effectively remove the watermark from a diffusion model such that its generated images are non-watermarked, while maintaining the visual quality of the generated images.

Decoder

Concealing Backdoor Model Updates in Federated Learning by Trigger-Optimized Data Poisoning

no code implementations10 May 2024 Yujie Zhang, Neil Gong, Michael K. Reiter

Federated Learning (FL) is a decentralized machine learning method that enables participants to collaboratively train a model without sharing their private data.

Backdoor Attack Data Poisoning +2

PLeak: Prompt Leaking Attacks against Large Language Model Applications

1 code implementation10 May 2024 Bo Hui, Haolin Yuan, Neil Gong, Philippe Burlina, Yinzhi Cao

As a result, a natural attack, called prompt leaking, is to steal the system prompt from an LLM application, which compromises the developer's intellectual property.

Language Modelling Large Language Model

A Transfer Attack to Image Watermarks

no code implementations22 Mar 2024 Yuepeng Hu, Zhengyuan Jiang, Moyang Guo, Neil Gong

The robustness of such watermark-based detector against evasion attacks in the white-box and black-box settings is well understood in the literature.

GradSafe: Detecting Unsafe Prompts for LLMs via Safety-Critical Gradient Analysis

1 code implementation21 Feb 2024 Yueqi Xie, Minghong Fang, Renjie Pi, Neil Gong

In this study, we propose GradSafe, which effectively detects unsafe prompts by scrutinizing the gradients of safety-critical parameters in LLMs.

Mendata: A Framework to Purify Manipulated Training Data

no code implementations3 Dec 2023 Zonghao Huang, Neil Gong, Michael K. Reiter

Untrusted data used to train a model might have been manipulated to endow the learned model with hidden properties that the data contributor might later exploit.

Data Poisoning

SneakyPrompt: Jailbreaking Text-to-image Generative Models

1 code implementation20 May 2023 Yuchen Yang, Bo Hui, Haolin Yuan, Neil Gong, Yinzhi Cao

Text-to-image generative models such as Stable Diffusion and DALL$\cdot$E raise many ethical concerns due to the generation of harmful images such as Not-Safe-for-Work (NSFW) ones.

Reinforcement Learning (RL) Semantic Similarity +1

Cannot find the paper you are looking for? You can Submit a new open access paper.