Search Results for author: Arjun Arunasalam

Found 2 papers, 2 papers with code

Rethinking How to Evaluate Language Model Jailbreak

2 code implementations9 Apr 2024 Hongyu Cai, Arjun Arunasalam, Leo Y. Lin, Antonio Bianchi, Z. Berkay Celik

We evaluate our metrics on a benchmark dataset produced from three malicious intent datasets and three jailbreak systems.

Informativeness Language Modelling +1

Cannot find the paper you are looking for? You can Submit a new open access paper.