Search Results for author: Enze Wang

Found 2 papers, 0 papers with code

Foot In The Door: Understanding Large Language Model Jailbreaking via Cognitive Psychology

no code implementations • 24 Feb 2024 • Zhenhua Wang, Wei Xie, Baosheng Wang, Enze Wang, Zhiwen Gui, Shuoyoucheng Ma, Kai Chen

Our research provides a psychological explanation of the jailbreak prompts.

Decision Making Language Modelling +1

Paper
Add Code

Self-Deception: Reverse Penetrating the Semantic Firewall of Large Language Models

no code implementations • 16 Aug 2023 • Zhenhua Wang, Wei Xie, Kai Chen, Baosheng Wang, Zhiwen Gui, Enze Wang

Inspired by the attack that penetrates traditional firewalls through reverse tunnels, we introduce a "self-deception" attack that can bypass the semantic firewall by inducing LLM to generate prompts that facilitate jailbreak.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.