Search Results for author: Liya Su

Found 2 papers, 0 papers with code

Can LLMs Deeply Detect Complex Malicious Queries? A Framework for Jailbreaking via Obfuscating Intent

no code implementations • 6 May 2024 • Shang Shang, Xinqiang Zhao, Zhongjiang Yao, Yepeng Yao, Liya Su, Zijing Fan, Xiaodan Zhang, Zhengwei Jiang

To demonstrate and address the underlying maliciousness, we propose a theoretical hypothesis and analytical approach, and introduce a new black-box jailbreak attack methodology named IntentObfuscator, exploiting this identified flaw by obfuscating the true intentions behind user prompts. This approach compels LLMs to inadvertently generate restricted content, bypassing their built-in content security measures.

Intent Detection

Paper
Add Code

The Janus Interface: How Fine-Tuning in Large Language Models Amplifies the Privacy Risks

no code implementations • 24 Oct 2023 • Xiaoyi Chen, Siyuan Tang, Rui Zhu, Shijun Yan, Lei Jin, ZiHao Wang, Liya Su, Zhikun Zhang, XiaoFeng Wang, Haixu Tang

In our research, we propose a novel attack, Janus, which exploits the fine-tuning interface to recover forgotten PIIs from the pre-training data in LLMs.

In-Context Learning

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.