RedEval Dataset | Papers With Code

Name:*

Full name (optional):

Description (Markdown and $\LaTeX$ enabled):*

**RedEval** is a safety evaluation benchmark designed to assess the robustness of large language models (LLMs) against harmful prompts. It simulates and evaluates LLM applications across various scenarios, all while **eliminating the need for human intervention**. Here are the key aspects of RedEval:

1. **Purpose**: RedEval aims to evaluate LLM safety using a technique called **Chain of Utterances (CoU)**-based prompts. CoU prompts are effective at breaking the safety guardrails of various LLMs, including **GPT-4**, **ChatGPT**, and open-source models.

2. **Safety Assessment**: RedEval provides **simple scripts** to evaluate both **closed-source systems** (such as ChatGPT and GPT-4) and **open-source LLMs** on its benchmark. The evaluation focuses on **harmful questions** and computes the **Attack Success Rate (ASR)**.

3. **Question Banks**:
    - **HarmfulQA**: Consists of **1,960 harmful questions** covering **10 topics** and approximately **10 subtopics** each.
    - **DangerousQA**: Contains **200 harmful questions** across **6 adjectives**: racist, stereotypical, sexist, illegal, toxic, and harmful.
    - **CategoricalQA**: Includes **11 categories of harm**, each with **5 sub-categories**, available in English, Chinese, and Vietnamese.
    - **AdversarialQA**: Provides a set of **500 instructions** to tease out harmful behaviors from the model.

4. **Safety Alignment**: RedEval also offers code to perform **safety alignment** of LLMs. For instance, it aligns **Vicuna-7B** on **HarmfulQA**, resulting in a safer version of Vicuna that is more robust against RedEval.

5. **Installation**:
    - Create a conda environment: `conda create --name redeval -c conda-forge python=3.11`
    - Activate the environment: `conda activate redeval`
    - Install required packages: `pip install -r requirements.txt`
    - Store API keys in the `api_keys` directory for use by the LLM as a judge and the `generate_responses.py` script for closed-source models.

6. **Prompt Templates**:
    - Choose a prompt template for red-teaming:
        - **Chain of Utterances (CoU)**: Effective at breaking safety guardrails.
        - **Chain of Thoughts (CoT)**
        - **Standard prompt**
        - **Suffix prompt**
        - Note: Different LLMs may require slight variations in the prompt template.

7. **How to Perform Red-Teaming**:
    - **Step 0**: Decide on the prompt template.
    - **Step 1**: Generate model outputs on harmful questions by providing a path to the question bank and the red-teaming prompt.

Homepage URL (optional):

Paper where the dataset was introduced:

Introduction date:

Dataset license:

URL to full license terms:

Image

---

RedEval

Benchmarks

Add a new result Link an existing benchmark

Papers

Dataset Loaders

Add Remove

Tasks

Similar Datasets

HHH

HarmfulQA

DangerousQA

Usage

License

Modalities

Languages

RedEval

Benchmarks Edit Add a new result Link an existing benchmark

Papers

Dataset Loaders Edit Add Remove

Tasks Edit