1 code implementation • ACL 2020 • Dan Hendrycks, Xiaoyuan Liu, Eric Wallace, Adam Dziedzic, Rishabh Krishnan, Dawn Song
Although pretrained Transformers such as BERT achieve high accuracy on in-distribution examples, do they generalize to new distributions?