no code implementations • EACL (WANLP) 2021 • Ahmed Abdelali, Hamdy Mubarak, Younes Samih, Sabit Hassan, Kareem Darwish
For extrinsic evaluation, we are able to build effective country level dialect identification on tweets with a macro-averaged F1-score of 60. 6% across 18 classes.
no code implementations • EACL (WANLP) 2021 • Hamdy Mubarak, Sabit Hassan
Mapping user locations to countries can be useful for many applications such as dialect identification, author profiling, recommendation system, etc.
no code implementations • EACL (WANLP) 2021 • Hamdy Mubarak, Sabit Hassan, Ahmed Abdelali
With Twitter being one of the most popular social media platforms in the Arab region, it is not surprising to find accounts that post adult content in Arabic tweets; despite the fact that these platforms dissuade users from such content.
1 code implementation • 29 Nov 2023 • Sabit Hassan, Malihe Alikhani
In this work, we propose a novel framework based on theories of discourse to study the inferential links that connect counter speeches to the hateful comment.
no code implementations • 31 May 2023 • Yan Wang, Heidi Ann Scharf Donovan, Sabit Hassan, Mailhe Alikhani
In this paper, we present a novel dataset (MedNgage), which consists of patient-nurse conversations about cancer symptom management.
no code implementations • 26 May 2023 • Sabit Hassan, Malihe Alikhani
While active learning (AL) has shown promise in training models with a small amount of annotated data, AL's reliance on the model's behavior for selective sampling can lead to an accumulation of unwanted bias rather than bias mitigation.
1 code implementation • 19 Feb 2023 • Meng Ye, Karan Sikka, Katherine Atwell, Sabit Hassan, Ajay Divakaran, Malihe Alikhani
Content moderation is the process of flagging content based on pre-defined platform rules.
1 code implementation • COLING 2022 • Katherine Atwell, Sabit Hassan, Malihe Alikhani
Then, we introduce the first discourse-aware style-transfer models that can effectively reduce offensiveness in Reddit text while preserving the meaning of the original text.
1 code implementation • Findings (ACL) 2022 • Mert İnan, Yang Zhong, Sabit Hassan, Lorna Quandt, Malihe Alikhani
To employ our strategies, we first annotate a subset of the benchmark PHOENIX-14T, a German Sign Language dataset, with different levels of intensification.
no code implementations • 18 Jan 2022 • Hamdy Mubarak, Sabit Hassan, Shammur Absar Chowdhury
We evaluate our models on external datasets - a Twitter dataset collected using a completely different method, and a multi-platform dataset containing comments from Twitter, YouTube and Facebook, for assessing generalization capability.
no code implementations • LREC 2022 • Hamdy Mubarak, Sabit Hassan, Shammur Absar Chowdhury, Firoj Alam
We studied the data for individual types of tweets and temporal changes in stance towards vaccine.
no code implementations • LREC 2022 • Sabit Hassan, Shaden Shaar, Kareem Darwish
Next, we show that using cross-lingual approaches with English data alone, we can achieve more than 90% and 80% relative effectiveness of the Arabic and Spanish BERT models respectively.
no code implementations • EACL 2021 • Sabit Hassan, Hamdy Mubarak, Ahmed Abdelali, Kareem Darwish
This system demonstration paper describes ASAD: Arabic Social media Analysis and unDerstanding, a suite of seven individual modules that allows users to determine dialects, sentiment, news category, offensiveness, hate speech, adult content, and spam in Arabic tweets.
no code implementations • 21 Feb 2021 • Ahmed Abdelali, Sabit Hassan, Hamdy Mubarak, Kareem Darwish, Younes Samih
The experiments highlight the centrality of data diversity and the efficacy of linguistically aware segmentation.
no code implementations • EACL (Louhi) 2021 • Hamdy Mubarak, Sabit Hassan
Over the past few months, there were huge numbers of circulating tweets and discussions about Coronavirus (COVID-19) in the Arab region.
no code implementations • SEMEVAL 2020 • Sabit Hassan, Younes Samih, Hamdy Mubarak, Ahmed Abdelali
This paper describes the systems submitted by the Arabic Language Technology group (ALT) at SemEval-2020 Task 12: Multilingual Offensive Language Identification in Social Media.
no code implementations • 13 May 2020 • Ahmed Abdelali, Hamdy Mubarak, Younes Samih, Sabit Hassan, Kareem Darwish
We present QADI, an automatically collected dataset of tweets belonging to a wide range of country-level Arabic dialects -covering 18 different countries in the Middle East and North Africa region.
no code implementations • LREC 2020 • Sabit Hassan, Younes Samih, Hamdy Mubarak, Ahmed Abdelali, Ammar Rashed, Shammur Absar Chowdhury
In this paper, we describe our efforts at OSACT Shared Task on Offensive Language Detection.
no code implementations • LREC 2020 • Hamdy Mubarak, Sabit Hassan, Ahmed Abdelali
In this paper, we introduce a generic method for collecting parallel tweets.
no code implementations • WS 2019 • Houda Bouamor, Sabit Hassan, Nizar Habash
In this paper, we present the results and findings of the MADAR Shared Task on Arabic Fine-Grained Dialect Identification.