Search Results for author: Qingyuan Lu

Found 1 papers, 1 papers with code

A StrongREJECT for Empty Jailbreaks

1 code implementation • 15 Feb 2024 • Alexandra Souly, Qingyuan Lu, Dillon Bowen, Tu Trinh, Elvis Hsieh, Sana Pandey, Pieter Abbeel, Justin Svegliato, Scott Emmons, Olivia Watkins, Sam Toyer

We show that our new grading scheme better accords with human judgment of response quality and overall jailbreak effectiveness, especially on the sort of low-quality responses that contribute the most to over-estimation of jailbreak performance on existing benchmarks.

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.