Learning to Disentangle Textual Representations and Attributes via Mutual Information

1 Jan 2021 · Pierre Colombo, Chloé Clavel, Pablo Piantanida ·

Learning disentangled representations of textual data is essential for many natural language tasks such as fair classification (\textit{e.g.} building classifiers whose decisions cannot disproportionately hurt or benefit specific groups identified by sensitive attributes), style transfer and sentence generation, among others. The existent dominant approaches in the context of text data have been based on training an adversary (discriminator or teacher) that aims at making attribute values difficult to be inferred from the latent code. Although these approaches are remarkably simple and even though the adversary seems to be performing perfectly during the training phase, after training is completed a fair amount of sensitive information to infer the attribute still remains. This paper investigates learning to disentangle representations by minimizing a novel variational (upper) bound of the mutual information between an identified attribute and the latent code of a deep neural network encoder. We demonstrate that our surrogate leads to better disentangled representations on both fair classification and sentence generation tasks while not suffering from the degeneracy of adversarial losses in multi-class scenarios. Furthermore, by optimizing the trade-off between the level of disentanglement and quality of the generated sentences for polarity transfer and sentence generation tasks, we provide some lights to the well-known debate on whether or not \textit{``disentangled representations may be helpful for polarity transfer and sentence generation purposes''}.

PDF Abstract