no code implementations • ICON 2020 • Ramchandra Joshi, Rusbabh Karnavat, Kaustubh Jirapure, Raviraj Joshi
We train these models primarily using the out of domain data and employ simple domain adaptation techniques based on the characteristics of the in-domain dataset.
1 code implementation • 28 Apr 2024 • Saloni Mittal, Vidula Magdum, Omkar Dhekane, Sharayu Hiwarkhedkar, Raviraj Joshi
We conduct a comparative analysis between monolingual and multilingual BERT models, including MahaBERT, IndicBERT, and MuRIL.
no code implementations • 28 Apr 2024 • Sharayu Hiwarkhedkar, Saloni Mittal, Vidula Magdum, Omkar Dhekane, Raviraj Joshi, Geetanjali Kale, Arnav Ladkat
Thus, it is important that we select the correct data in the form of domain-specific data from this vast corpus to achieve optimum results aligned with our domain-specific tasks.
1 code implementation • 20 Apr 2024 • Ruturaj Ghatage, Aditya Kulkarni, Rajlaxmi Patil, Sharvi Endait, Raviraj Joshi
Hence, to address this challenge, we also present a generic approach for translating SQuAD into any low-resource language.
1 code implementation • 4 Jan 2024 • Aishwarya Mirashi, Srushti Sonavane, Purva Lingayat, Tejas Padhiyar, Raviraj Joshi
This research contributes significantly to expanding the pool of available text classification datasets and also makes it possible to develop topic classification models for Indian regional languages.
1 code implementation • 30 Dec 2023 • Harsh Chaudhari, Anuja Patil, Dhanashree Lavekar, Pranav Khairnar, Raviraj Joshi
This work introduces the L3Cube-MahaSocialNER dataset, the first and largest social media dataset specifically designed for Named Entity Recognition (NER) in the Marathi language.
no code implementations • 3 Dec 2023 • Harsh Chaudhari, Anuja Patil, Dhanashree Lavekar, Pranav Khairnar, Raviraj Joshi, Sachin Pande
In this work, we focus on NER for low-resource language and present our case study in the context of the Indian language Marathi.
no code implementations • 2 Dec 2023 • Raviraj Joshi, Nikesh Garera
We further present an exhaustive evaluation of single-speaker adaptation and multi-speaker training with Tacotron2 + Waveglow setup to show that the former approach works better.
no code implementations • 2 Dec 2023 • Raviraj Joshi, Nikesh Garera
Using transfer learning from high-resource language and synthetic corpus we present a low-cost solution to train a custom TTS model.
no code implementations • 29 Nov 2023 • Tanmay Chavan, Shantanu Patankar, Aditya Kane, Omkar Gokhale, Geetanjali Kale, Raviraj Joshi
The results of the experiments strongly undermine the robustness of sentence encoders.
1 code implementation • 5 Nov 2023 • Vidula Magdum, Omkar Dhekane, Sharayu Hiwarkhedkar, Saloni Mittal, Raviraj Joshi
We present mahaNLP, an open-source natural language processing (NLP) library specifically built for the Marathi language.
no code implementations • 3 Oct 2023 • Ananya Joshi, Raviraj Joshi
In our increasingly interconnected digital world, social media platforms have emerged as powerful channels for the dissemination of hate speech and offensive content.
no code implementations • 1 Oct 2023 • Aabha Pingle, Aditya Vyawahare, Isha Joshi, Rahul Tangsali, Geetanjali Kale, Raviraj Joshi
While sentiment analysis research has been extensively conducted in English and other Western languages, there exists a significant gap in research efforts for sentiment analysis in low-resource languages.
1 code implementation • 24 Jun 2023 • Aabha Pingle, Aditya Vyawahare, Isha Joshi, Rahul Tangsali, Raviraj Joshi
The exploration of sentiment analysis in low-resource languages, such as Marathi, has been limited due to the availability of suitable datasets.
1 code implementation • 24 Jun 2023 • Tanmay Chavan, Omkar Gokhale, Aditya Kane, Shantanu Patankar, Raviraj Joshi
This is the first work that presents artifacts for code-mixed Marathi research.
no code implementations • 10 Jun 2023 • Maithili Sabane, Aparna Ranade, Onkar Litake, Parth Patil, Raviraj Joshi, Dipali Kadam
Named Entity Recognition (NER) is a fundamental task in NLP that is used to locate the key information in text and is primarily applied in conversational and search systems.
no code implementations • 8 Jun 2023 • Gauri Takawane, Abhishek Phaltankar, Varad Patwardhan, Aryan Patil, Raviraj Joshi, Mukta S. Takalikar
We propose a pipeline to improve code-mixed systems that comprise data preprocessing, word-level language identification, language augmentation, and model training on downstream tasks like sentiment analysis.
no code implementations • 25 May 2023 • Aryan Patil, Varad Patwardhan, Abhishek Phaltankar, Gauri Takawane, Raviraj Joshi
We perform a comparative analysis of different Transformer-based language Models pre-trained using unsupervised approaches.
no code implementations • 22 Apr 2023 • Samruddhi Deode, Janhavi Gadre, Aditi Kajale, Ananya Joshi, Raviraj Joshi
We propose a simple yet effective approach to convert vanilla multilingual BERT models into multilingual sentence BERT models using synthetic corpus.
no code implementations • 20 Dec 2022 • Tanmay Chavan, Shantanu Patankar, Aditya Kane, Omkar Gokhale, Raviraj Joshi
The MahaTweetBERT, a BERT model, pre-trained on Marathi tweets when fine-tuned on the combined dataset (HASOC 2021 + HASOC 2022 + MahaHate), outperforms all models with an F1 score of 98. 43 on the HASOC 2022 test set.
no code implementations • 12 Dec 2022 • Rahul Tangsali, Aabha Pingle, Aditya Vyawahare, Isha Joshi, Raviraj Joshi
The research on text summarization for low-resource Indian languages has been limited due to the availability of relevant datasets.
1 code implementation • 21 Nov 2022 • Ananya Joshi, Aditi Kajale, Janhavi Gadre, Samruddhi Deode, Raviraj Joshi
We evaluate these models on real text classification datasets to show embeddings obtained from synthetic data training are generalizable to real datasets as well and thus represent an effective training strategy for low-resource languages.
no code implementations • 21 Nov 2022 • Raviraj Joshi
Further, since Indic languages, Hindi and Marathi share the Devanagari script, we train a single model for both languages.
1 code implementation • 9 Oct 2022 • Omkar Gokhale, Aditya Kane, Shantanu Patankar, Tanmay Chavan, Raviraj Joshi
Pre-training large neural language models, such as BERT, has led to impressive gains on many natural language processing (NLP) tasks.
no code implementations • 26 Sep 2022 • Arnav Ladkat, Aamir Miyajiwala, Samiksha Jagadale, Rekha Kulkarni, Raviraj Joshi
This step helps cover the target domain vocabulary and improves the model performance on the downstream task.
no code implementations • 12 Sep 2022 • Abhishek Velankar, Hrushikesh Patil, Raviraj Joshi
In this work, we deeply explore a wide range of challenges in automatic hate speech detection by presenting a hierarchical organization of these problems.
no code implementations • 26 Jun 2022 • Raviraj Joshi, Subodh Kumar
These models are based on Listen-Attend-Spell (LAS) encoder-decoder architecture.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • ECNLP (ACL) 2022 • Raviraj Joshi, Anupam Singh
The parallel data in the target domain is then used to fine-tune the final dense layer of generic ASR models.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
1 code implementation • 29 May 2022 • Raviraj Joshi
With L3Cube-MahaNLP, we aim to build resources and a library for Marathi natural language processing.
no code implementations • 19 Apr 2022 • Abhishek Velankar, Hrushikesh Patil, Raviraj Joshi
We focus on the Marathi language and evaluate the models on the datasets for hate speech detection, sentiment analysis and simple text classification in Marathi.
1 code implementation • WILDRE (LREC) 2022 • Ravindra Nayak, Raviraj Joshi
We present L3Cube-HingCorpus, the first large-scale real Hindi-English code mixed data in a Roman script.
1 code implementation • WILDRE (LREC) 2022 • Parth Patil, Aparna Ranade, Maithili Sabane, Onkar Litake, Raviraj Joshi
Named Entity Recognition (NER) is a basic NLP task and finds major applications in conversational and search systems.
1 code implementation • TRAC (COLING) 2022 • Abhishek Velankar, Hrushikesh Patil, Amol Gore, Shubham Salunke, Raviraj Joshi
In this work, we present L3Cube-MahaHate, the first major Hate Speech Dataset in Marathi.
no code implementations • 24 Mar 2022 • Onkar Litake, Maithili Sabane, Parth Patil, Aparna Ranade, Raviraj Joshi
In this work, we consider NER for low-resource Indian languages like Hindi and Marathi.
1 code implementation • WILDRE (LREC) 2022 • Raviraj Joshi
We present L3Cube-MahaCorpus a Marathi monolingual data set scraped from different internet sources.
no code implementations • 18 Jan 2022 • Snehal Khandve, Vedangi Wagh, Apurva Wani, Isha Joshi, Raviraj Joshi
Along with the hierarchical approaches, this work also provides a comparison of different deep learning algorithms like USE, BERT, HAN, Longformer, and BigBird for long document classification.
no code implementations • 2 Jan 2022 • Aamir Miyajiwala, Arnav Ladkat, Samiksha Jagadale, Raviraj Joshi
In this work, we carry out a data-focused study evaluating the impact of systematic practical perturbations on the performance of the deep learning based text classification models like CNN, LSTM, and BERT-based algorithms.
no code implementations • 15 Nov 2021 • Raviraj Joshi, Venkateshan Kannan
Overall, we report an improvement of 36. 9% over the phoneme-CTC system.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 1 Nov 2021 • Vedangi Wagh, Snehal Khandve, Isha Joshi, Apurva Wani, Geetanjali Kale, Raviraj Joshi
We re-iterate that long document classification is a simpler task and even basic algorithms perform competitively with BERT-based approaches on most of the datasets.
no code implementations • 23 Oct 2021 • Abhishek Velankar, Hrushikesh Patil, Amol Gore, Shubham Salunke, Raviraj Joshi
The basic models based on CNN and LSTM are augmented with fast text word embeddings.
no code implementations • 18 Oct 2021 • Ravindra Nayak, Raviraj Joshi
We specifically focus on code mixed English-Hindi text and transformer-based approaches.
no code implementations • 20 Jun 2021 • Prutha Gaherwar, Shraddha Joshi, Raviraj Joshi, Rahul Khengare
While encryption is the best way to ensure image security, full encryption and decryption is a computationally-intensive process.
1 code implementation • EACL (WASSA) 2021 • Atharva Kulkarni, Meet Mandhane, Manali Likhitkar, Gayatri Kshirsagar, Raviraj Joshi
We also present the guidelines using which we annotated the tweets.
no code implementations • 30 Jan 2021 • Rutuja Taware, Shraddha Varat, Gaurav Salunke, Chaitanya Gawande, Geetanjali Kale, Rahul Khengare, Raviraj Joshi
We show that these systems are over-reliant on the important words present in the text that are useful for classification.
no code implementations • 30 Jan 2021 • Pranali Bora, Tulika Awalgaonkar, Himanshu Palve, Raviraj Joshi, Purvi Goel
We have also compared our image-based hierarchical neural network model with simple image-based CNN architecture and text-based CNN and LSTM models to highlight its novelty and efficiency.
no code implementations • 13 Jan 2021 • Atharva Kulkarni, Meet Mandhane, Manali Likhitkar, Gayatri Kshirsagar, Jayashree Jagdale, Raviraj Joshi
The Marathi language is one of the prominent languages used in India.
no code implementations • 11 Jan 2021 • Apurva Wani, Isha Joshi, Snehal Khandve, Vedangi Wagh, Raviraj Joshi
These platforms have led to an increase in the creation and spread of fake news.
no code implementations • 11 Jan 2021 • Ramchandra Joshi, Rushabh Karnavat, Kaustubh Jirapure, Raviraj Joshi
The pre-trained Hindi fast text word embeddings by IndicNLP and Facebook are used in conjunction with CNN and LSTM models.
no code implementations • 22 Dec 2020 • Ramchandra Joshi, Rushabh Karnavat, Kaustubh Jirapure, Raviraj Joshi
The shared task aims to build a translation system for Indian languages in specific domains like Artificial Intelligence (AI) and Chemistry using a small in-domain parallel corpus.
no code implementations • 23 Nov 2020 • Ramchandra Joshi, Raviraj Joshi
We evaluate different deep learning models and input representation combinations for this task.
no code implementations • 19 Jan 2020 • Ramchandra Joshi, Purvi Goel, Raviraj Joshi
Usage of deep learning in text processing has revolutionized the techniques for text processing and achieved remarkable results.