Search Results for author: Sabit Hassan

Found 20 papers, 4 papers with code

QADI: Arabic Dialect Identification in the Wild

no code implementations • EACL (WANLP) 2021 • Ahmed Abdelali, Hamdy Mubarak, Younes Samih, Sabit Hassan, Kareem Darwish

For extrinsic evaluation, we are able to build effective country level dialect identification on tweets with a macro-averaged F1-score of 60. 6% across 18 classes.

Dialect Identification

Paper
Add Code

UL2C: Mapping User Locations to Countries on Arabic Twitter

no code implementations • EACL (WANLP) 2021 • Hamdy Mubarak, Sabit Hassan

Mapping user locations to countries can be useful for many applications such as dialect identification, author profiling, recommendation system, etc.

Dialect Identification

Paper
Add Code

Adult Content Detection on Arabic Twitter: Analysis and Experiments

no code implementations • EACL (WANLP) 2021 • Hamdy Mubarak, Sabit Hassan, Ahmed Abdelali

With Twitter being one of the most popular social media platforms in the Arab region, it is not surprising to find accounts that post adult content in Arabic tweets; despite the fact that these platforms dissuade users from such content.

Paper
Add Code

DisCGen: A Framework for Discourse-Informed Counterspeech Generation

1 code implementation • 29 Nov 2023 • Sabit Hassan, Malihe Alikhani

In this work, we propose a novel framework based on theories of discourse to study the inferential links that connect counter speeches to the hateful comment.

Paper
Code

MedNgage: A Dataset for Understanding Engagement in Patient-Nurse Conversations

no code implementations • 31 May 2023 • Yan Wang, Heidi Ann Scharf Donovan, Sabit Hassan, Mailhe Alikhani

In this paper, we present a novel dataset (MedNgage), which consists of patient-nurse conversations about cancer symptom management.

Management

Paper
Add Code

D-CALM: A Dynamic Clustering-based Active Learning Approach for Mitigating Bias

no code implementations • 26 May 2023 • Sabit Hassan, Malihe Alikhani

While active learning (AL) has shown promise in training models with a small amount of annotated data, AL's reliance on the model's behavior for selective sampling can lead to an accumulation of unwanted bias rather than bias mitigation.

Active Learning Clustering +2

Paper
Add Code

Multilingual Content Moderation: A Case Study on Reddit

1 code implementation • 19 Feb 2023 • Meng Ye, Karan Sikka, Katherine Atwell, Sabit Hassan, Ajay Divakaran, Malihe Alikhani

Content moderation is the process of flagging content based on pre-defined platform rules.

Cross-Lingual Transfer Transfer Learning

Paper
Code

APPDIA: A Discourse-aware Transformer-based Style Transfer Model for Offensive Social Media Conversations

1 code implementation • COLING 2022 • Katherine Atwell, Sabit Hassan, Malihe Alikhani

Then, we introduce the first discourse-aware style-transfer models that can effectively reduce offensiveness in Reddit text while preserving the meaning of the original text.

Style Transfer

Paper
Code

Modeling Intensification for Sign Language Generation: A Computational Approach

1 code implementation • Findings (ACL) 2022 • Mert İnan, Yang Zhong, Sabit Hassan, Lorna Quandt, Malihe Alikhani

To employ our strategies, we first annotate a subset of the benchmark PHOENIX-14T, a German Sign Language dataset, with different levels of intensification.

Text Generation

Paper
Code

Emojis as Anchors to Detect Arabic Offensive Language and Hate Speech

no code implementations • 18 Jan 2022 • Hamdy Mubarak, Sabit Hassan, Shammur Absar Chowdhury

We evaluate our models on external datasets - a Twitter dataset collected using a completely different method, and a multi-platform dataset containing comments from Twitter, YouTube and Facebook, for assessing generalization capability.

Cultural Vocal Bursts Intensity Prediction

Paper
Add Code

ArCovidVac: Analyzing Arabic Tweets About COVID-19 Vaccination

no code implementations • LREC 2022 • Hamdy Mubarak, Sabit Hassan, Shammur Absar Chowdhury, Firoj Alam

We studied the data for individual types of tweets and temporal changes in stance towards vaccine.

Informativeness Stance Detection

Paper
Add Code

Cross-lingual Emotion Detection

no code implementations • LREC 2022 • Sabit Hassan, Shaden Shaar, Kareem Darwish

Next, we show that using cross-lingual approaches with English data alone, we can achieve more than 90% and 80% relative effectiveness of the Arabic and Spanish BERT models respectively.

Paper
Add Code

ASAD: Arabic Social media Analytics and unDerstanding

no code implementations • EACL 2021 • Sabit Hassan, Hamdy Mubarak, Ahmed Abdelali, Kareem Darwish

This system demonstration paper describes ASAD: Arabic Social media Analysis and unDerstanding, a suite of seven individual modules that allows users to determine dialects, sentiment, news category, offensiveness, hate speech, adult content, and spam in Arabic tweets.

Paper
Add Code

Pre-Training BERT on Arabic Tweets: Practical Considerations

no code implementations • 21 Feb 2021 • Ahmed Abdelali, Sabit Hassan, Hamdy Mubarak, Kareem Darwish, Younes Samih

The experiments highlight the centrality of data diversity and the efficacy of linguistically aware segmentation.

Paper
Add Code

ArCorona: Analyzing Arabic Tweets in the Early Days of Coronavirus (COVID-19) Pandemic

no code implementations • EACL (Louhi) 2021 • Hamdy Mubarak, Sabit Hassan

Over the past few months, there were huge numbers of circulating tweets and discussions about Coronavirus (COVID-19) in the Arab region.

Misinformation

Paper
Add Code

ALT at SemEval-2020 Task 12: Arabic and English Offensive Language Identification in Social Media

no code implementations • SEMEVAL 2020 • Sabit Hassan, Younes Samih, Hamdy Mubarak, Ahmed Abdelali

This paper describes the systems submitted by the Arabic Language Technology group (ALT) at SemEval-2020 Task 12: Multilingual Offensive Language Identification in Social Media.

Language Identification

Paper
Add Code

Arabic Dialect Identification in the Wild

no code implementations • 13 May 2020 • Ahmed Abdelali, Hamdy Mubarak, Younes Samih, Sabit Hassan, Kareem Darwish

We present QADI, an automatically collected dataset of tweets belonging to a wide range of country-level Arabic dialects -covering 18 different countries in the Middle East and North Africa region.

Dialect Identification