1 code implementation • 20 Dec 2023 • Edmund Mills, Shiye Su, Stuart Russell, Scott Emmons
The ALMANACS scenarios span twelve safety-relevant topics such as ethical reasoning and advanced AI behaviors; they have idiosyncratic premises to invoke model-specific behavior; and they have a train-test distributional shift to encourage faithful explanations.