Token Level Identification of Multiword Expressions Using Contextual Information

WS 2020  ·  REYHANEH HASHEMPOUR, Aline Villavicencio ·

Studies on detecting idiomatic expressions mostly focus on discovering potentially idiomatic expressions disregarding the context. However, many idioms like kick the bucket could be idiomatic/literal depending on the context. In this work, we use Context2Vec model to include contextual information. The model learns a generic context embedding function from large corpora, using bidirectional LSTM. We build a simple nearest neighbor classification on Context2Vec which outperforms the popular context representation of average-of-word-embeddings. Through lexical substitution task, we further show that the Context2Vec model is able to place MWEs into distinct {`}sense{'}(idiomatic/literal) regions of the embedding space, while traditional word embedding i.e. Skip Gram lacks this ability.

PDF Abstract
No code implementations yet. Submit your code now

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods