no code implementations • 4 Apr 2024 • Shuo Chen, Zhen Han, Bailan He, Zifeng Ding, Wenqian Yu, Philip Torr, Volker Tresp, Jindong Gu
Various jailbreak attacks have been proposed to red-team Large Language Models (LLMs) and revealed the vulnerable safeguards of LLMs.
no code implementations • 14 Jun 2023 • Wenqian Yu, Jindong Gu, Zhijiang Li, Philip Torr
Adversarial examples (AEs) with small adversarial perturbations can mislead deep neural networks (DNNs) into wrong predictions.