no code implementations • EACL (WANLP) 2021 • Ahmed Abdelali, Hamdy Mubarak, Younes Samih, Sabit Hassan, Kareem Darwish
For extrinsic evaluation, we are able to build effective country level dialect identification on tweets with a macro-averaged F1-score of 60. 6% across 18 classes.
1 code implementation • IWCS (ACL) 2021 • Esther Seyffarth, Younes Samih, Laura Kallmeyer, Hassan Sajjad
This paper addresses the question to which extent neural contextual language models such as BERT implicitly represent complex semantic properties.
no code implementations • 26 Apr 2024 • Teresa Lynn, Malik H. Altakrori, Samar Mohamed Magdy, Rocktim Jyoti Das, Chenyang Lyu, Mohamed Nasr, Younes Samih, Alham Fikri Aji, Preslav Nakov, Shantanu Godbole, Salim Roukos, Radu Florian, Nizar Habash
The rapid evolution of Natural Language Processing (NLP) has favored major languages such as English, leaving a significant gap for many others due to limited resources.
Extractive Question-Answering Machine Reading Comprehension +2
no code implementations • 13 Nov 2023 • David Arps, Laura Kallmeyer, Younes Samih, Hassan Sajjad
We replicate the findings of M\"uller-Eberstein et al. (2022) on nonce test data and show that the performance declines on both MLMs and ALMs wrt.
1 code implementation • 13 Apr 2022 • David Arps, Younes Samih, Laura Kallmeyer, Hassan Sajjad
We find that 4 pretrained transfomer LMs obtain high performance on our probing tasks even on manipulated data, suggesting that semantic and syntactic knowledge in their representations can be separated and that constituency information is in fact learned by the LM.
no code implementations • 18 Nov 2021 • Hamdy Mubarak, Ahmed Abdelali, Kareem Darwish, Younes Samih
Rampant use of offensive language on social media led to recent efforts on automatic identification of such language.
no code implementations • EACL 2021 • Younes Samih, Kareem Darwish
We show that this approach outperforms two strong baselines and achieves 89. 6{\%} accuracy and 91. 3{\%} macro F-measure on eight controversial topics.
no code implementations • 21 Feb 2021 • Ahmed Abdelali, Sabit Hassan, Hamdy Mubarak, Kareem Darwish, Younes Samih
The experiments highlight the centrality of data diversity and the efficacy of linguistically aware segmentation.
no code implementations • SEMEVAL 2020 • Sabit Hassan, Younes Samih, Hamdy Mubarak, Ahmed Abdelali
This paper describes the systems submitted by the Arabic Language Technology group (ALT) at SemEval-2020 Task 12: Multilingual Offensive Language Identification in Social Media.
no code implementations • 13 May 2020 • Ahmed Abdelali, Hamdy Mubarak, Younes Samih, Sabit Hassan, Kareem Darwish
We present QADI, an automatically collected dataset of tweets belonging to a wide range of country-level Arabic dialects -covering 18 different countries in the Middle East and North Africa region.
no code implementations • LREC 2020 • Sabit Hassan, Younes Samih, Hamdy Mubarak, Ahmed Abdelali, Ammar Rashed, Shammur Absar Chowdhury
In this paper, we describe our efforts at OSACT Shared Task on Offensive Language Detection.
no code implementations • 7 Apr 2020 • Younes Samih, Kareem Darwish
We show that this approach outperforms two strong baselines and achieves 89. 6% accuracy and 91. 3% macro F-measure on eight controversial topics.
no code implementations • EACL (WANLP) 2021 • Hamdy Mubarak, Ammar Rashed, Kareem Darwish, Younes Samih, Ahmed Abdelali
Detecting offensive language on Twitter has many applications ranging from detecting/predicting bullying to measuring polarization.
no code implementations • IJCNLP 2019 • Hamdy Mubarak, Ahmed Abdelali, Kareem Darwish, Mohamed Eldesouki, Younes Samih, Hassan Sajjad
Short vowels, aka diacritics, are more often omitted when writing different varieties of Arabic including Modern Standard Arabic (MSA), Classical Arabic (CA), and Dialectal Arabic (DA).
no code implementations • WS 2019 • Mohammed Attia, Younes Samih, Ali Elkahky, Hamdy Mubarak, Ahmed Abdelali, Kareem Darwish
When speakers code-switch between their native language and a second language or language variant, they follow a syntactic pattern where words and phrases from the embedded language are inserted into the matrix language.
no code implementations • WS 2019 • Younes Samih, Hamdy Mubarak, Ahmed Abdelali, Mohammed Attia, Mohamed Eldesouki, Kareem Darwish
This paper describes the QC-GO team submission to the MADAR Shared Task Subtask 1 (travel domain dialect identification) and Subtask 2 (Twitter user location identification).
no code implementations • NAACL 2019 • Hamdy Mubarak, Ahmed Abdelali, Hassan Sajjad, Younes Samih, Kareem Darwish
Arabic text is typically written without short vowels (or diacritics).
no code implementations • 15 Oct 2018 • Ahmed Abdelali, Mohammed Attia, Younes Samih, Kareem Darwish, Hamdy Mubarak
Diacritization process attempt to restore the short vowels in Arabic written text; which typically are omitted.
no code implementations • COLING 2018 • Rafael Ehren, Timm Lichte, Younes Samih
We submitted results for seven languages in the closed track of the task and for one language in the open track.
no code implementations • WS 2018 • Mohammed Attia, Younes Samih, Wolfgang Maier
This paper describes our system submission to the CALCS 2018 shared task on named entity recognition on code-switched data for the language variant pair of Modern Standard Arabic and Egyptian dialectal Arabic.
no code implementations • ACL 2018 • Tatiana Bladier, Andreas van Cranenburgh, Younes Samih, Laura Kallmeyer
We present ongoing work on data-driven parsing of German and French with Lexicalized Tree Adjoining Grammars.
no code implementations • SEMEVAL 2018 • Mohammed Attia, Younes Samih, Manaal Faruqui, Wolfgang Maier
This paper describes our system submission to the SemEval 2018 Task 10 on Capturing Discriminative Attributes.
2 code implementations • 19 Aug 2017 • Mohamed Eldesouki, Younes Samih, Ahmed Abdelali, Mohammed Attia, Hamdy Mubarak, Kareem Darwish, Kallmeyer Laura
Arabic word segmentation is essential for a variety of NLP applications such as machine translation and information retrieval.
Ranked #1 on Sentiment Analysis on DynaSent (using extra training data)
no code implementations • CONLL 2017 • Younes Samih, Mohamed Eldesouki, Mohammed Attia, Kareem Darwish, Ahmed Abdelali, Hamdy Mubarak, Laura Kallmeyer
Arabic dialects do not just share a common koin{\'e}, but there are shared pan-dialectal linguistic phenomena that allow computational models for dialects to learn from each other.
no code implementations • WS 2017 • Younes Samih, Mohammed Attia, Mohamed Eldesouki, Ahmed Abdelali, Hamdy Mubarak, Laura Kallmeyer, Kareem Darwish
The automated processing of Arabic Dialects is challenging due to the lack of spelling standards and to the scarcity of annotated data and resources in general.
no code implementations • WS 2016 • Mohammed Attia, Suraj Maharjan, Younes Samih, Laura Kallmeyer, Thamar Solorio
The evaluation results of our system on the test set is 88. 1{\%} (79. 0{\%} for TRUE only) f-measure for Task-1 on detecting semantic similarity, and 76. 0{\%} (42. 3{\%} when excluding RANDOM) for Task-2 on identifying finer-grained semantic relations.
no code implementations • LREC 2016 • Younes Samih, Wolfgang Maier
In this paper, we describe our effort in the development and annotation of a large scale corpus containing code-switched data.
no code implementations • JEPTALNRECITAL 2015 • Simon Petitjean, Younes Samih, Timm Lichte
Dans cet article, nous pr{\'e}sentons une mod{\'e}lisation de la morphologie d{\'e}rivationnelle de l{'}arabe utilisant le cadre m{\'e}tagrammatical offert par XMG.
no code implementations • LREC 2012 • Khaled Shaalan, Mohammed Attia, Pavel Pecina, Younes Samih, Josef van Genabith
Furthermore, from a large list of valid forms and invalid forms we create a character-based tri-gram language model to approximate knowledge about permissible character clusters in Arabic, creating a novel method for detecting spelling errors.