Token Level Identification of Multiword Expressions Using Contextual Information

WS 2020 · REYHANEH HASHEMPOUR, Aline Villavicencio ·

Studies on detecting idiomatic expressions mostly focus on discovering potentially idiomatic expressions disregarding the context. However, many idioms like kick the bucket could be idiomatic/literal depending on the context. In this work, we use Context2Vec model to include contextual information. The model learns a generic context embedding function from large corpora, using bidirectional LSTM. We build a simple nearest neighbor classification on Context2Vec which outperforms the popular context representation of average-of-word-embeddings. Through lexical substitution task, we further show that the Context2Vec model is able to place MWEs into distinct {`}sense{'}(idiomatic/literal) regions of the embedding space, while traditional word embedding i.e. Skip Gram lacks this ability.

PDF Abstract

Code

Add Remove Mark official

No code implementations yet. Submit your code now

Tasks

Add Remove

Word Embeddings

Datasets

Add Datasets introduced or used in this paper

Results from the Paper

Add Remove

Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods

Add Remove

BiLSTM • context2vec • LSTM • Sigmoid Activation • Tanh Activation

Edit Social Preview

Token Level Identification of Multiword Expressions Using Contextual Information

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit Add Remove

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Add Remove

Methods

Add Remove