Search Results for author: John P. McCrae

Found 49 papers, 11 papers with code

Mapping WordNet Instances to Wikipedia

no code implementations • GWC 2018 • John P. McCrae

Lexical resource differ from encyclopaedic resources and represent two distinct types of resource covering general language and named entities respectively.

Paper
Add Code

Towards Classification of Legal Pharmaceutical Text using GAN-BERT

no code implementations • CSRNLP (LREC) 2022 • Tapan Auti, Rajdeep Sarkar, Bernardo Stearns, Atul Kr. Ojha, Arindam Paul, Michaela Comerford, Jay Megaro, John Mariano, Vall Herard, John P. McCrae

Pharmaceutical text classification is an important area of research for commercial and research institutions working in the pharmaceutical domain.

Sentence Sentence Classification +2

Paper
Add Code

Towards the Construction of a WordNet for Old English

no code implementations • LREC 2022 • Fahad Khan, Francisco J. Minaya Gómez, Rafael Cruz González, Harry Diakoff, Javier E. Diaz Vera, John P. McCrae, Ciara O’Loughlin, William Michael Short, Sander Stolk

In this paper we will discuss our preliminary work towards the construction of a WordNet for Old English, taking our inspiration from other similar WN construction projects for ancient languages such as Ancient Greek, Latin and Sanskrit.

Paper
Add Code

MHE: Code-Mixed Corpora for Similar Language Identification

no code implementations • LREC 2022 • Priya Rani, John P. McCrae, Theodorus Fransen

This data-set is the first Magahi-Hindi-English code-mixed data-set for similar language identification task.

Language Identification Sentence

Paper
Add Code

NUIG-Panlingua-KMI Hindi-Marathi MT Systems for Similar Language Translation Task @ WMT 2020

no code implementations • WMT (EMNLP) 2020 • Atul Kr. Ojha, Priya Rani, Akanksha Bansal, Bharathi Raja Chakravarthi, Ritesh Kumar, John P. McCrae

NUIG-Panlingua-KMI submission to WMT 2020 seeks to push the state-of-the-art in Similar Language Translation Task for Hindi↔Marathi language pair.

NMT Translation

Paper
Add Code

Towards a Crowd-Sourced WordNet for Colloquial English

no code implementations • GWC 2018 • John P. McCrae, Ian Wood, Amanda Hicks

Princeton WordNet is one of the most widely-used resources for natural language processing, but is updated only infrequently and cannot keep up with the fast-changing usage of the English language on social media platforms such as Twitter.

Paper
Add Code

Improving Wordnets for Under-Resourced Languages Using Machine Translation

no code implementations • GWC 2018 • Bharathi Raja Chakravarthi, Mihael Arcan, John P. McCrae

In addition to that, we carried out a manual evaluation of the translations for the Tamil language, where we demonstrate that our approach can aid in improving wordnet resources for under-resourced Dravidian languages.

Machine Translation Translation

Paper
Add Code

Cross-lingual Sentence Embedding using Multi-Task Learning

no code implementations • EMNLP 2021 • Koustava Goswami, Sourav Dutta, Haytham Assem, Theodorus Fransen, John P. McCrae

We demonstrate the efficacy of an unsupervised as well as a weakly supervised variant of our framework on STS, BUCC and Tatoeba benchmark tasks.

Multi-Task Learning Semantic Similarity +6

Paper
Add Code

Few-shot and Zero-shot Approaches to Legal Text Classification: A Case Study in the Financial Sector

no code implementations • EMNLP (NLLP) 2021 • Rajdeep Sarkar, Atul Kr. Ojha, Jay Megaro, John Mariano, Vall Herard, John P. McCrae

This method allows predictive coding methods to be rapidly developed for new regulations and markets.

text-classification Text Classification

Paper
Add Code

CogALex-VI Shared Task: Bidirectional Transformer based Identification of Semantic Relations

no code implementations • COLING (CogALex) 2020 • Saurav Karmakar, John P. McCrae

This paper presents a bidirectional transformer based approach for recognising semantic relationships between a pair of words as proposed by CogALex VI shared task in 2020.

Paper
Add Code

The GlobalWordNet Formats: Updates for 2020

1 code implementation • EACL (GWC) 2021 • John P. McCrae, Michael Wayne Goodman, Francis Bond, Alexandre Rademaker, Ewa Rudnicka, Luis Morgado Da Costa

The Global Wordnet Formats have been introduced to enable wordnets to have a common representation that can be integrated through the Global WordNet Grid.

424

Paper
Code

Linghub2: Language Resource Discovery Tool for Language Technologies

no code implementations • LREC 2022 • Cécile Robin, Gautham Vadakkekara Suresh, Víctor Rodriguez-Doncel, John P. McCrae, Paul Buitelaar

Language resources are a key component of natural language processing and related research and applications.

Management

Paper
Add Code

Bengali and Magahi PUD Treebank and Parser

no code implementations • WILDRE (LREC) 2022 • Pritha Majumdar, Deepak Alok, Akanksha Bansal, Atul Kr. Ojha, John P. McCrae

A preliminary set of sentences was annotated manually - 600 for Bengali and 200 for Magahi.

Paper
Add Code

Bilingual Lexicon Induction across Orthographically-distinct Under-Resourced Dravidian Languages

no code implementations • VarDial (COLING) 2020 • Bharathi Raja Chakravarthi, Navaneethan Rajasekaran, Mihael Arcan, Kevin McGuinness, Noel E. O’Connor, John P. McCrae

Bilingual lexicons are a vital tool for under-resourced languages and recent state-of-the-art approaches to this leverage pretrained monolingual word embeddings using supervised or semi-supervised approaches.

Bilingual Lexicon Induction Word Embeddings

Paper
Add Code

English WordNet 2019 – An Open-Source WordNet for English

1 code implementation • GWC 2019 • John P. McCrae, Alexandre Rademaker, Francis Bond, Ewa Rudnicka, Christiane Fellbaum

We describe the release of a new wordnet for English based on the Princeton WordNet, but now developed under an open-source model.

424

Paper
Code

ULD-NUIG at Social Media Mining for Health Applications (#SMM4H) Shared Task 2021

no code implementations • NAACL (SMM4H) 2021 • Atul Kr. Ojha, Priya Rani, Koustava Goswami, Bharathi Raja Chakravarthi, John P. McCrae

Social media platforms such as Twitter and Facebook have been utilised for various research studies, from the cohort-level discussion to community-driven approaches to address the challenges in utilizing social media data for health, clinical and biomedical information.

named-entity-recognition Named Entity Recognition +1

Paper
Add Code

Findings of the Shared Task on Offensive Language Identification in Tamil, Malayalam, and Kannada

no code implementations • EACL (DravidianLangTech) 2021 • Bharathi Raja Chakravarthi, Ruba Priyadharshini, Navya Jose, Anand Kumar M, Thomas Mandl, Prasanna Kumar Kumaresan, Rahul Ponnusamy, Hariharan R L, John P. McCrae, Elizabeth Sherly

Detecting offensive language in social media in local languages is critical for moderating user-generated content.

Benchmarking Language Identification

Paper
Add Code

Findings of the Shared Task on Machine Translation in Dravidian languages

no code implementations • EACL (DravidianLangTech) 2021 • Bharathi Raja Chakravarthi, Ruba Priyadharshini, Shubhanker Banerjee, Richard Saldanha, John P. McCrae, Anand Kumar M, Parameswari Krishnamurthy, Melvin Johnson

This paper describes the datasets used, the methodology used for the evaluation of participants, and the experiments’ overall results.

Machine Translation Translation

Paper
Add Code

Towards a Linking between WordNet and Wikidata

no code implementations • EACL (GWC) 2021 • John P. McCrae, David Cillessen

WordNet is the most widely used lexical resource for English, while Wikidata is one of the largest knowledge graphs of entity and concepts available.

Knowledge Graphs

Paper
Add Code

Monolingual Word Sense Alignment as a Classification Problem

no code implementations • EACL (GWC) 2021 • Sina Ahmadi, John P. McCrae

Words are defined based on their meanings in various ways in different resources.

Classification Relationship Detection +1

Paper
Add Code

MaCmS: Magahi Code-mixed Dataset for Sentiment Analysis

no code implementations • 7 Mar 2024 • Priya Rani, Gaurav Negi, Theodorus Fransen, John P. McCrae

The present paper introduces new sentiment data, MaCMS, for Magahi-Hindi-English (MHE) code-mixed language, where Magahi is a less-resourced minority language.

Sentiment Analysis

Paper
Add Code

Text Detoxification as Style Transfer in English and Hindi

no code implementations • 12 Feb 2024 • Sourabrata Mukherjee, Akanksha Bansal, Atul Kr. Ojha, John P. McCrae, Ondřej Dušek

This task contributes to safer and more respectful online communication and can be considered a Text Style Transfer (TST) task, where the text style changes while its content is preserved.

Multi-Task Learning Sentence +2

Paper
Add Code

Weakly-supervised Deep Cognate Detection Framework for Low-Resourced Languages Using Morphological Knowledge of Closely-Related Languages

1 code implementation • 9 Nov 2023 • Koustava Goswami, Priya Rani, Theodorus Fransen, John P. McCrae

We train an encoder to gain morphological knowledge of a language and transfer the knowledge to perform unsupervised and weakly-supervised cognate detection tasks with and without the pivot language for the closely-related languages.

Information Retrieval named-entity-recognition +3

Paper
Code

Empowering recommender systems using automatically generated Knowledge Graphs and Reinforcement Learning

1 code implementation • 11 Jul 2023 • Ghanshyam Verma, Shovon Sengupta, Simon Simanta, Huan Chen, Janos A. Perge, Devishree Pillai, John P. McCrae, Paul Buitelaar

Personalized recommendations have a growing importance in direct marketing, which motivates research to enhance customer experiences by knowledge graph (KG) applications.

Decision Making Knowledge Graphs +3

Paper
Code

Findings of the Sentiment Analysis of Dravidian Languages in Code-Mixed Text

no code implementations • 18 Nov 2021 • Bharathi Raja Chakravarthi, Ruba Priyadharshini, Sajeetha Thavareesan, Dhivya Chinnappa, Durairaj Thenmozhi, Elizabeth Sherly, John P. McCrae, Adeep Hande, Rahul Ponnusamy, Shubhanker Banerjee, Charangan Vasantharajan

We received 22 systems for Tamil-English, 15 systems for Malayalam-English, and 15 for Kannada-English.

Sentiment Analysis

Paper
Add Code

DravidianCodeMix: Sentiment Analysis and Offensive Language Identification Dataset for Dravidian Languages in Code-Mixed Text

1 code implementation • 17 Jun 2021 • Bharathi Raja Chakravarthi, Ruba Priyadharshini, Vigneshwaran Muralidaran, Navya Jose, Shardul Suryawanshi, Elizabeth Sherly, John P. McCrae

This paper describes the development of a multilingual, manually annotated dataset for three under-resourced Dravidian languages generated from social media comments.

Language Identification Sentiment Analysis

Paper
Code

DravidianMultiModality: A Dataset for Multi-modal Sentiment Analysis in Tamil and Malayalam

no code implementations • 9 Jun 2021 • Bharathi Raja Chakravarthi, Jishnu Parameswaran P. K, Premjith B, K. P Soman, Rahul Ponnusamy, Prasanna Kumar Kumaresan, Kingston Pal Thamburaj, John P. McCrae

This is the first multimodal sentiment analysis dataset for Tamil and Malayalam by volunteer annotators.

Multimodal Sentiment Analysis

Paper
Add Code

Unsupervised Deep Language and Dialect Identification for Short Texts

no code implementations • COLING 2020 • Koustava Goswami, Rajdeep Sarkar, Bharathi Raja Chakravarthi, Theodorus Fransen, John P. McCrae

Automatic Language Identification (LI) or Dialect Identification (DI) of short texts of closely related languages or dialects, is one of the primary steps in many natural language processing pipelines.

Dialect Identification Sentence +1

Paper
Add Code

Suggest me a movie for tonight: Leveraging Knowledge Graphs for Conversational Recommendation

1 code implementation • COLING 2020 • Rajdeep Sarkar, Koustava Goswami, Mihael Arcan, John P. McCrae

Conversational recommender systems focus on the task of suggesting products to users based on the conversation flow.

Knowledge Graphs Recommendation Systems

Paper
Code

Contextual Modulation for Relation-Level Metaphor Identification

1 code implementation • Findings of the Association for Computational Linguistics 2020 • Omnia Zayed, John P. McCrae, Paul Buitelaar

Identifying metaphors in text is very challenging and requires comprehending the underlying comparison.

Relation Visual Reasoning

Paper
Code

A Survey of Orthographic Information in Machine Translation

no code implementations • 4 Aug 2020 • Bharathi Raja Chakravarthi, Priya Rani, Mihael Arcan, John P. McCrae

It introduces under-resourced languages in terms of machine translation and how orthographic information can be utilised to improve machine translation.

Bilingual Lexicon Induction Translation

Paper
Add Code

ULD@NUIG at SemEval-2020 Task 9: Generative Morphemes with an Attention Model for Sentiment Analysis in Code-Mixed Text

no code implementations • SEMEVAL 2020 • Koustava Goswami, Priya Rani, Bharathi Raja Chakravarthi, Theodorus Fransen, John P. McCrae

Code mixing is a common phenomena in multilingual societies where people switch from one language to another for various reasons.

Sentiment Analysis

Paper
Add Code

Corpus Creation for Sentiment Analysis in Code-Mixed Tamil-English Text

1 code implementation • LREC 2020 • Bharathi Raja Chakravarthi, Vigneshwaran Muralidaran, Ruba Priyadharshini, John P. McCrae

One such application is to analyse the popular sentiments of videos on social media based on viewer comments.

Decision Making Sentiment Analysis

Paper
Code

A Sentiment Analysis Dataset for Code-Mixed Malayalam-English

1 code implementation • LREC 2020 • Bharathi Raja Chakravarthi, Navya Jose, Shardul Suryawanshi, Elizabeth Sherly, John P. McCrae

However, very few resources are available for code-mixed data to create models specific for this data.

Sentiment Analysis

Paper
Code

Towards an Interoperable Ecosystem of AI and LT Platforms: A Roadmap for the Implementation of Different Levels of Interoperability

1 code implementation • LREC 2020 • Georg Rehm, Dimitrios Galanis, Penny Labropoulou, Stelios Piperidis, Martin Welß, Ricardo Usbeck, Joachim köhler, Miltos Deligiannis, Katerina Gkirtzou, Johannes Fischer, Christian Chiarcos, Nils Feldhus, Julián Moreno-Schneider, Florian Kintzel, Elena Montiel, Víctor Rodríguez Doncel, John P. McCrae, David Laqua, Irina Patricia Theile, Christian Dittmar, Kalina Bontcheva, Ian Roberts, Andrejs Vasiljevs, Andis Lagzdiņš

With regard to the wider area of AI/LT platform interoperability, we concentrate on two core aspects: (1) cross-platform search and discovery of resources and services; (2) composition of cross-platform service workflows.

Paper
Code

Classification Benchmarks for Under-resourced Bengali Language based on Multichannel Convolutional-LSTM Network

1 code implementation • 11 Apr 2020 • Md. Rezaul Karim, Bharathi Raja Chakravarthi, John P. McCrae, Michael Cochez

Evaluations against several baseline embedding models, e. g., Word2Vec and GloVe yield up to 92. 30%, 82. 25%, and 90. 45% F1-scores in case of document classification, sentiment analysis, and hate speech detection, respectively during 5-fold cross-validation tests.

Classification Document Classification +4

Paper
Code

Multilingual Multimodal Machine Translation for Dravidian Languages utilizing Phonetic Transcription

no code implementations • WS 2019 • Bharathi Raja Chakravarthi, Ruba Priyadharshini, Bernardo Stearns, Arun Jayapal, Sridevy S, Mihael Arcan, Manel Zarrouk, John P. McCrae

Multimodal Machine Translation Translation

Paper
Add Code

A Character-Level LSTM Network Model for Tokenizing the Old Irish text of the W\"urzburg Glosses on the Pauline Epistles

no code implementations • WS 2019 • Adrian Doyle, John P. McCrae, Clodagh Downey

Paper
Add Code

Adapting Term Recognition to an Under-Resourced Language: the Case of Irish

no code implementations • WS 2019 • John P. McCrae, Adrian Doyle

Paper
Add Code

WordNet Gloss Translation for Under-resourced Languages using Multilingual Neural Machine Translation

no code implementations • WS 2019 • Bharathi Raja Chakravarthi, Mihael Arcan, John P. McCrae

Machine Translation Translation

Paper
Add Code

Temporal Analysis of Entity Relatedness and its Evolution using Wikipedia and DBpedia

no code implementations • 12 Dec 2018 • Narumol Prangnawarat, John P. McCrae, Conor Hayes

We then show that integrating multiple time frames in our methods can give a better overall similarity demonstrating that temporal evolution can have an important effect on entity relatedness.

Paper
Add Code

Constructing an Annotated Corpus of Verbal MWEs for English

no code implementations • COLING 2018 • Abigail Walsh, Claire Bonial, Kristina Geeraert, John P. McCrae, Nathan Schneider, Clarissa Somers

This paper describes the construction and annotation of a corpus of verbal MWEs for English, as part of the PARSEME Shared Task 1. 1 on automatic identification of verbal MWEs.

Word Alignment