Search Results for author: Edoardo Debenedetti

Found 6 papers, 4 papers with code

JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models

1 code implementation • 28 Mar 2024 • Patrick Chao, Edoardo Debenedetti, Alexander Robey, Maksym Andriushchenko, Francesco Croce, Vikash Sehwag, Edgar Dobriban, Nicolas Flammarion, George J. Pappas, Florian Tramer, Hamed Hassani, Eric Wong

To address these challenges, we introduce JailbreakBench, an open-sourced benchmark with the following components: (1) an evolving repository of state-of-the-art adversarial prompts, which we refer to as jailbreak artifacts; (2) a jailbreaking dataset comprising 100 behaviors -- both original and sourced from prior work -- which align with OpenAI's usage policies; (3) a standardized evaluation framework that includes a clearly defined threat model, system prompts, chat templates, and scoring functions; and (4) a leaderboard that tracks the performance of attacks and defenses for various LLMs.

Paper
Code

Scaling Compute Is Not All You Need for Adversarial Robustness

no code implementations • 20 Dec 2023 • Edoardo Debenedetti, Zishen Wan, Maksym Andriushchenko, Vikash Sehwag, Kshitij Bhardwaj, Bhavya Kailkhura

Finally, we make our benchmarking framework (built on top of \texttt{timm}~\citep{rw2019timm}) publicly available to facilitate future analysis in efficient robust deep learning.

Adversarial Robustness Benchmarking

Paper
Add Code

Privacy Side Channels in Machine Learning Systems

no code implementations • 11 Sep 2023 • Edoardo Debenedetti, Giorgio Severi, Nicholas Carlini, Christopher A. Choquette-Choo, Matthew Jagielski, Milad Nasr, Eric Wallace, Florian Tramèr

Most current approaches for protecting privacy in machine learning (ML) assume that models exist in a vacuum, when in reality, ML models are part of larger systems that include components for training data filtering, output monitoring, and more.

Paper
Add Code

Evading Black-box Classifiers Without Breaking Eggs

1 code implementation • 5 Jun 2023 • Edoardo Debenedetti, Nicholas Carlini, Florian Tramèr

We then design new attacks that reduce the number of bad queries by $1. 5$-$7. 3\times$, but often at a significant increase in total (non-bad) queries.

Paper
Code

A Light Recipe to Train Robust Vision Transformers

1 code implementation • 15 Sep 2022 • Edoardo Debenedetti, Vikash Sehwag, Prateek Mittal

Additionally, investigating the reasons for the robustness of our models, we show that it is easier to generate strong attacks during training when using our recipe and that this leads to better robustness at test time.

Adversarial Robustness Data Augmentation +1

Paper
Code

RobustBench: a standardized adversarial robustness benchmark

1 code implementation • 19 Oct 2020 • Francesco Croce, Maksym Andriushchenko, Vikash Sehwag, Edoardo Debenedetti, Nicolas Flammarion, Mung Chiang, Prateek Mittal, Matthias Hein

As a research community, we are still lacking a systematic understanding of the progress on adversarial robustness which often makes it hard to identify the most promising ideas in training robust models.

Adversarial Robustness Benchmarking +3

603

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.