RedEval is a safety evaluation benchmark designed to assess the robustness of large language models (LLMs) against harmful prompts. It simulates and evaluates LLM applications across various scenarios, all while eliminating the need for human intervention. Here are the key aspects of RedEval:

  1. Purpose: RedEval aims to evaluate LLM safety using a technique called Chain of Utterances (CoU)-based prompts. CoU prompts are effective at breaking the safety guardrails of various LLMs, including GPT-4, ChatGPT, and open-source models.

  2. Safety Assessment: RedEval provides simple scripts to evaluate both closed-source systems (such as ChatGPT and GPT-4) and open-source LLMs on its benchmark. The evaluation focuses on harmful questions and computes the Attack Success Rate (ASR).

  3. Question Banks:

    • HarmfulQA: Consists of 1,960 harmful questions covering 10 topics and approximately 10 subtopics each.
    • DangerousQA: Contains 200 harmful questions across 6 adjectives: racist, stereotypical, sexist, illegal, toxic, and harmful.
    • CategoricalQA: Includes 11 categories of harm, each with 5 sub-categories, available in English, Chinese, and Vietnamese.
    • AdversarialQA: Provides a set of 500 instructions to tease out harmful behaviors from the model.
  4. Safety Alignment: RedEval also offers code to perform safety alignment of LLMs. For instance, it aligns Vicuna-7B on HarmfulQA, resulting in a safer version of Vicuna that is more robust against RedEval.

  5. Installation:

    • Create a conda environment: conda create --name redeval -c conda-forge python=3.11
    • Activate the environment: conda activate redeval
    • Install required packages: pip install -r requirements.txt
    • Store API keys in the api_keys directory for use by the LLM as a judge and the generate_responses.py script for closed-source models.
  6. Prompt Templates:

    • Choose a prompt template for red-teaming:
      • Chain of Utterances (CoU): Effective at breaking safety guardrails.
      • Chain of Thoughts (CoT)
      • Standard prompt
      • Suffix prompt
      • Note: Different LLMs may require slight variations in the prompt template.
  7. How to Perform Red-Teaming:

    • Step 0: Decide on the prompt template.
    • Step 1: Generate model outputs on harmful questions by providing a path to the question bank and the red-teaming prompt.

Papers


Paper Code Results Date Stars

Dataset Loaders


No data loaders found. You can submit your data loader here.

Tasks


Similar Datasets


License


  • Unknown

Modalities


Languages