1 code implementation • 2 Jun 2023 • Benoist Wolleb, Romain Silvestri, Giorgos Vernikos, Ljiljana Dolamic, Andrei Popescu-Belis
Subword tokenization is the de facto standard for tokenization in neural language models and machine translation systems.