1 code implementation • 8 Mar 2021 • Patrick Schramowski, Cigdem Turan, Nico Andersen, Constantin A. Rothkopf, Kristian Kersting
That is, we show that these norms can be captured geometrically by a direction, which can be computed, e. g., by a PCA, in the embedding space, reflecting well the agreement of phrases to social norms implicitly expressed in the training texts and providing a path for attenuating or even preventing toxic degeneration in LMs.