no code implementations • 12 May 2024 • Yuepeng Hu, Zhengyuan Jiang, Moyang Guo, Neil Gong
Our results show that our attack can effectively remove the watermark from a diffusion model such that its generated images are non-watermarked, while maintaining the visual quality of the generated images.
no code implementations • 10 May 2024 • Yujie Zhang, Neil Gong, Michael K. Reiter
Federated Learning (FL) is a decentralized machine learning method that enables participants to collaboratively train a model without sharing their private data.
1 code implementation • 10 May 2024 • Bo Hui, Haolin Yuan, Neil Gong, Philippe Burlina, Yinzhi Cao
As a result, a natural attack, called prompt leaking, is to steal the system prompt from an LLM application, which compromises the developer's intellectual property.
no code implementations • 22 Mar 2024 • Yuepeng Hu, Zhengyuan Jiang, Moyang Guo, Neil Gong
The robustness of such watermark-based detector against evasion attacks in the white-box and black-box settings is well understood in the literature.
1 code implementation • 21 Feb 2024 • Yueqi Xie, Minghong Fang, Renjie Pi, Neil Gong
In this study, we propose GradSafe, which effectively detects unsafe prompts by scrutinizing the gradients of safety-critical parameters in LLMs.
no code implementations • 3 Dec 2023 • Zonghao Huang, Neil Gong, Michael K. Reiter
Untrusted data used to train a model might have been manipulated to endow the learned model with hidden properties that the data contributor might later exploit.
1 code implementation • 20 May 2023 • Yuchen Yang, Bo Hui, Haolin Yuan, Neil Gong, Yinzhi Cao
Text-to-image generative models such as Stable Diffusion and DALL$\cdot$E raise many ethical concerns due to the generation of harmful images such as Not-Safe-for-Work (NSFW) ones.