1 code implementation • LREC 2022 • Jörg Frohberg, Frank Binder
We introduce the CRASS (counterfactual reasoning assessment) data set and benchmark utilizing questionized counterfactual conditionals as a novel and powerful tool to evaluate large language models.