2 code implementations • 9 Apr 2024 • Hongyu Cai, Arjun Arunasalam, Leo Y. Lin, Antonio Bianchi, Z. Berkay Celik
We evaluate our metrics on a benchmark dataset produced from three malicious intent datasets and three jailbreak systems.
1 code implementation • 3 Oct 2023 • Yufan Chen, Arjun Arunasalam, Z. Berkay Celik
In this paper, we measure their ability to refute popular S&P misconceptions that the general public holds.