no code implementations • 6 May 2024 • Yiting Qu, Xinyue Shen, Yixin Wu, Michael Backes, Savvas Zannettou, Yang Zhang
First, we curate a large dataset of 10K real-world and AI-generated images that are annotated as safe or unsafe based on a set of 11 unsafe categories of images (sexual, violent, hateful, etc.).
no code implementations • 8 Feb 2024 • Junjie Chu, Yugeng Liu, Ziqing Yang, Xinyue Shen, Michael Backes, Yang Zhang
Some jailbreak prompt datasets, available from the Internet, can also achieve high attack success rates on many LLMs, such as ChatGLM3, GPT-3. 5, and PaLM2.
no code implementations • 3 Nov 2023 • Boyang Zhang, Xinyue Shen, Wai Man Si, Zeyang Sha, Zeyuan Chen, Ahmed Salem, Yun Shen, Michael Backes, Yang Zhang
Moderating offensive, hateful, and toxic language has always been an important but challenging topic in the domain of safe use in NLP.
1 code implementation • 7 Aug 2023 • Xinyue Shen, Zeyuan Chen, Michael Backes, Yun Shen, Yang Zhang
The misuse of large language models (LLMs) has garnered significant attention from the general public and LLM vendors.
1 code implementation • 23 May 2023 • Yiting Qu, Xinyue Shen, Xinlei He, Michael Backes, Savvas Zannettou, Yang Zhang
Our evaluation result shows that 24% of the generated images using DreamBooth are hateful meme variants that present the features of the original hateful meme and the target individual/community; these generated images are comparable to hateful meme variants collected from the real world.
no code implementations • 18 Apr 2023 • Xinyue Shen, Zeyuan Chen, Michael Backes, Yang Zhang
In this paper, we perform the first large-scale measurement of ChatGPT's reliability in the generic QA scenario with a carefully curated set of 5, 695 questions across ten datasets and eight domains.
2 code implementations • 26 Mar 2023 • Xinlei He, Xinyue Shen, Zeyuan Chen, Michael Backes, Yang Zhang
Extensive evaluations on public datasets with curated texts generated by various powerful LLMs such as ChatGPT-turbo and Claude demonstrate the effectiveness of different detection methods.
1 code implementation • 20 Feb 2023 • Xinyue Shen, Yiting Qu, Michael Backes, Yang Zhang
In this paper, we perform the first study on understanding the threat of a novel attack, namely prompt stealing attack, which aims to steal prompts from generated images by text-to-image generation models.
no code implementations • 4 Oct 2022 • Xinyue Shen, Xinlei He, Zheng Li, Yun Shen, Michael Backes, Yang Zhang
Different from previous work, we are the first to systematically threat modeling on SSL in every phase of the model supply chain, i. e., pre-training, release, and downstream phases.
no code implementations • 7 Aug 2017 • Xinyue Shen, Yuantao Gu
In this work we propose to fit a sparse logistic regression model by a weakly convex regularized nonconvex optimization problem.
3 code implementations • 12 Sep 2016 • Xinyue Shen, Steven Diamond, Madeleine Udell, Yuantao Gu, Stephen Boyd
A multi-convex optimization problem is one in which the variables can be partitioned into sets over which the problem is convex when the other variables are fixed.
Optimization and Control