Search Results for author: Raviraj Joshi

Found 51 papers, 15 papers with code

Domain Adaptation of NMT models for English-Hindi Machine Translation Task : AdapMT Shared Task ICON 2020

no code implementations • ICON 2020 • Ramchandra Joshi, Rusbabh Karnavat, Kaustubh Jirapure, Raviraj Joshi

We train these models primarily using the out of domain data and employ simple domain adaptation techniques based on the characteristics of the in-domain dataset.

Domain Adaptation Machine Translation +2

Paper
Add Code

L3Cube-MahaNews: News-based Short Text and Long Document Classification Datasets in Marathi

1 code implementation • 28 Apr 2024 • Saloni Mittal, Vidula Magdum, Omkar Dhekane, Sharayu Hiwarkhedkar, Raviraj Joshi

We conduct a comparative analysis between monolingual and multilingual BERT models, including MahaBERT, IndicBERT, and MuRIL.

Document Classification text-classification +1

Paper
Code

TextGram: Towards a better domain-adaptive pretraining

no code implementations • 28 Apr 2024 • Sharayu Hiwarkhedkar, Saloni Mittal, Vidula Magdum, Omkar Dhekane, Raviraj Joshi, Geetanjali Kale, Arnav Ladkat

Thus, it is important that we select the correct data in the form of domain-specific data from this vast corpus to achieve optimum results aligned with our domain-specific tasks.

text-classification Text Classification

Paper
Add Code

MahaSQuAD: Bridging Linguistic Divides in Marathi Question-Answering

1 code implementation • 20 Apr 2024 • Ruturaj Ghatage, Aditya Kulkarni, Rajlaxmi Patil, Sharvi Endait, Raviraj Joshi

Hence, to address this challenge, we also present a generic approach for translating SQuAD into any low-resource language.

Information Retrieval Question Answering +1

Paper
Code

L3Cube-IndicNews: News-based Short Text and Long Document Classification Datasets in Indic Languages

1 code implementation • 4 Jan 2024 • Aishwarya Mirashi, Srushti Sonavane, Purva Lingayat, Tejas Padhiyar, Raviraj Joshi

This research contributes significantly to expanding the pool of available text classification datasets and also makes it possible to develop topic classification models for Indian regional languages.

Document Classification Multilingual text classification +3

Paper
Code

L3Cube-MahaSocialNER: A Social Media based Marathi NER Dataset and BERT models

1 code implementation • 30 Dec 2023 • Harsh Chaudhari, Anuja Patil, Dhanashree Lavekar, Pranav Khairnar, Raviraj Joshi

This work introduces the L3Cube-MahaSocialNER dataset, the first and largest social media dataset specifically designed for Named Entity Recognition (NER) in the Marathi language.

Marketing named-entity-recognition +2

Paper
Code

On Significance of Subword tokenization for Low Resource and Efficient Named Entity Recognition: A case study in Marathi

no code implementations • 3 Dec 2023 • Harsh Chaudhari, Anuja Patil, Dhanashree Lavekar, Pranav Khairnar, Raviraj Joshi, Sachin Pande

In this work, we focus on NER for low-resource language and present our case study in the context of the Indian language Marathi.

Computational Efficiency Machine Translation +4

Paper
Add Code

Code-Mixed Text to Speech Synthesis under Low-Resource Constraints

no code implementations • 2 Dec 2023 • Raviraj Joshi, Nikesh Garera

We further present an exhaustive evaluation of single-speaker adaptation and multi-speaker training with Tacotron2 + Waveglow setup to show that the former approach works better.

Speech Synthesis Text-To-Speech Synthesis +2

Paper
Add Code

Rapid Speaker Adaptation in Low Resource Text to Speech Systems using Synthetic Data and Transfer learning

no code implementations • 2 Dec 2023 • Raviraj Joshi, Nikesh Garera

Using transfer learning from high-resource language and synthetic corpus we present a low-cost solution to train a custom TTS model.

Decoder Transfer Learning

Paper
Add Code

SenTest: Evaluating Robustness of Sentence Encoders

no code implementations • 29 Nov 2023 • Tanmay Chavan, Shantanu Patankar, Aditya Kane, Omkar Gokhale, Geetanjali Kale, Raviraj Joshi

The results of the experiments strongly undermine the robustness of sentence encoders.

Contrastive Learning Information Retrieval +2

Paper
Add Code

mahaNLP: A Marathi Natural Language Processing Library

1 code implementation • 5 Nov 2023 • Vidula Magdum, Omkar Dhekane, Sharayu Hiwarkhedkar, Saloni Mittal, Raviraj Joshi

We present mahaNLP, an open-source natural language processing (NLP) library specifically built for the Marathi language.

Hate Speech Detection NER +3

Paper
Code

Harnessing Pre-Trained Sentence Transformers for Offensive Language Detection in Indian Languages

no code implementations • 3 Oct 2023 • Ananya Joshi, Raviraj Joshi

In our increasingly interconnected digital world, social media platforms have emerged as powerful channels for the dissemination of hate speech and offensive content.

Hate Speech Detection Sentence +2

Paper
Add Code

Robust Sentiment Analysis for Low Resource languages Using Data Augmentation Approaches: A Case Study in Marathi

no code implementations • 1 Oct 2023 • Aabha Pingle, Aditya Vyawahare, Isha Joshi, Rahul Tangsali, Geetanjali Kale, Raviraj Joshi

While sentiment analysis research has been extensively conducted in English and other Western languages, there exists a significant gap in research efforts for sentiment analysis in low-resource languages.

Data Augmentation Pseudo Label +3

Paper
Add Code

L3Cube-MahaSent-MD: A Multi-domain Marathi Sentiment Analysis Dataset and Transformer Models

1 code implementation • 24 Jun 2023 • Aabha Pingle, Aditya Vyawahare, Isha Joshi, Rahul Tangsali, Raviraj Joshi

The exploration of sentiment analysis in low-resource languages, such as Marathi, has been limited due to the availability of suitable datasets.

Sentiment Analysis

Paper
Code

My Boli: Code-mixed Marathi-English Corpora, Pretrained Language Models and Evaluation Benchmarks

1 code implementation • 24 Jun 2023 • Tanmay Chavan, Omkar Gokhale, Aditya Kane, Shantanu Patankar, Raviraj Joshi

This is the first work that presents artifacts for code-mixed Marathi research.

Benchmarking Hate Speech Detection +2

Paper
Code

Enhancing Low Resource NER Using Assisting Language And Transfer Learning

no code implementations • 10 Jun 2023 • Maithili Sabane, Aparna Ranade, Onkar Litake, Parth Patil, Raviraj Joshi, Dipali Kadam

Named Entity Recognition (NER) is a fundamental task in NLP that is used to locate the key information in text and is primarily applied in conversational and search systems.

named-entity-recognition Named Entity Recognition +4

Paper
Add Code

Leveraging Language Identification to Enhance Code-Mixed Text Classification

no code implementations • 8 Jun 2023 • Gauri Takawane, Abhishek Phaltankar, Varad Patwardhan, Aryan Patil, Raviraj Joshi, Mukta S. Takalikar

We propose a pipeline to improve code-mixed systems that comprise data preprocessing, word-level language identification, language augmentation, and model training on downstream tasks like sentiment analysis.

Hate Speech Detection Language Identification +4

Paper
Add Code

Comparative Study of Pre-Trained BERT Models for Code-Mixed Hindi-English Data

no code implementations • 25 May 2023 • Aryan Patil, Varad Patwardhan, Abhishek Phaltankar, Gauri Takawane, Raviraj Joshi

We perform a comparative analysis of different Transformer-based language Models pre-trained using unsupervised approaches.

Emotion Recognition Sentiment Analysis

Paper
Add Code

L3Cube-IndicSBERT: A simple approach for learning cross-lingual sentence representations using multilingual BERT

no code implementations • 22 Apr 2023 • Samruddhi Deode, Janhavi Gadre, Aditi Kajale, Ananya Joshi, Raviraj Joshi

We propose a simple yet effective approach to convert vanilla multilingual BERT models into multilingual sentence BERT models using synthetic corpus.

Sentence Sentence Similarity +1

Paper
Add Code

A Twitter BERT Approach for Offensive Language Detection in Marathi

no code implementations • 20 Dec 2022 • Tanmay Chavan, Shantanu Patankar, Aditya Kane, Omkar Gokhale, Raviraj Joshi

The MahaTweetBERT, a BERT model, pre-trained on Marathi tweets when fine-tuned on the combined dataset (HASOC 2021 + HASOC 2022 + MahaHate), outperforms all models with an F1 score of 98. 43 on the HASOC 2022 test set.

Data Augmentation Language Identification +2

Paper
Add Code

Implementing Deep Learning-Based Approaches for Article Summarization in Indian Languages

no code implementations • 12 Dec 2022 • Rahul Tangsali, Aabha Pingle, Aditya Vyawahare, Isha Joshi, Raviraj Joshi

The research on text summarization for low-resource Indian languages has been limited due to the availability of relevant datasets.

Text Summarization

Paper
Add Code

L3Cube-MahaSBERT and HindSBERT: Sentence BERT Models and Benchmarking BERT Sentence Representations for Hindi and Marathi

1 code implementation • 21 Nov 2022 • Ananya Joshi, Aditi Kajale, Janhavi Gadre, Samruddhi Deode, Raviraj Joshi

We evaluate these models on real text classification datasets to show embeddings obtained from synthetic data training are generalizable to real datasets as well and thus represent an effective training strategy for low-resource languages.

Benchmarking Machine Translation +7

Paper
Code

L3Cube-HindBERT and DevBERT: Pre-Trained BERT Transformer models for Devanagari based Hindi and Marathi Languages

no code implementations • 21 Nov 2022 • Raviraj Joshi

Further, since Indic languages, Hindi and Marathi share the Devanagari script, we train a single model for both languages.

named-entity-recognition Named Entity Recognition +4

Paper
Add Code

Spread Love Not Hate: Undermining the Importance of Hateful Pre-training for Hate Speech Detection

1 code implementation • 9 Oct 2022 • Omkar Gokhale, Aditya Kane, Shantanu Patankar, Tanmay Chavan, Raviraj Joshi

Pre-training large neural language models, such as BERT, has led to impressive gains on many natural language processing (NLP) tasks.

Hate Speech Detection

Paper
Code

Towards Simple and Efficient Task-Adaptive Pre-training for Text Classification

no code implementations • 26 Sep 2022 • Arnav Ladkat, Aamir Miyajiwala, Samiksha Jagadale, Rekha Kulkarni, Raviraj Joshi

This step helps cover the target domain vocabulary and improves the model performance on the downstream task.

Domain Adaptation text-classification +1

Paper
Add Code

A Review of Challenges in Machine Learning based Automated Hate Speech Detection

no code implementations • 12 Sep 2022 • Abhishek Velankar, Hrushikesh Patil, Raviraj Joshi

In this work, we deeply explore a wide range of challenges in automatic hate speech detection by presenting a hierarchical organization of these problems.

Hate Speech Detection

Paper
Add Code

On Comparison of Encoders for Attention based End to End Speech Recognition in Standalone and Rescoring Mode

no code implementations • 26 Jun 2022 • Raviraj Joshi, Subodh Kumar

These models are based on Listen-Attend-Spell (LAS) encoder-decoder architecture.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

A Simple Baseline for Domain Adaptation in End to End ASR Systems Using Synthetic Data

no code implementations • ECNLP (ACL) 2022 • Raviraj Joshi, Anupam Singh

The parallel data in the target domain is then used to fine-tune the final dense layer of generic ASR models.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

L3Cube-MahaNLP: Marathi Natural Language Processing Datasets, Models, and Library

1 code implementation • 29 May 2022 • Raviraj Joshi

With L3Cube-MahaNLP, we aim to build resources and a library for Marathi natural language processing.

Hate Speech Detection Language Modelling +4

Paper
Code

Mono vs Multilingual BERT for Hate Speech Detection and Text Classification: A Case Study in Marathi

no code implementations • 19 Apr 2022 • Abhishek Velankar, Hrushikesh Patil, Raviraj Joshi

We focus on the Marathi language and evaluate the models on the datasets for hate speech detection, sentiment analysis and simple text classification in Marathi.

Hate Speech Detection Sentence +5

Paper
Add Code

L3Cube-HingCorpus and HingBERT: A Code Mixed Hindi-English Dataset and BERT Language Models

1 code implementation • WILDRE (LREC) 2022 • Ravindra Nayak, Raviraj Joshi

We present L3Cube-HingCorpus, the first large-scale real Hindi-English code mixed data in a Roman script.

Language Identification Language Modelling +5

Paper
Code

L3Cube-MahaNER: A Marathi Named Entity Recognition Dataset and BERT models

1 code implementation • WILDRE (LREC) 2022 • Parth Patil, Aparna Ranade, Maithili Sabane, Onkar Litake, Raviraj Joshi

Named Entity Recognition (NER) is a basic NLP task and finds major applications in conversational and search systems.

named-entity-recognition Named Entity Recognition +4

Paper
Code

L3Cube-MahaHate: A Tweet-based Marathi Hate Speech Detection Dataset and BERT models

1 code implementation • TRAC (COLING) 2022 • Abhishek Velankar, Hrushikesh Patil, Amol Gore, Shubham Salunke, Raviraj Joshi

In this work, we present L3Cube-MahaHate, the first major Hate Speech Dataset in Marathi.

Hate Speech Detection

Paper
Code

Mono vs Multilingual BERT: A Case Study in Hindi and Marathi Named Entity Recognition

no code implementations • 24 Mar 2022 • Onkar Litake, Maithili Sabane, Parth Patil, Aparna Ranade, Raviraj Joshi

In this work, we consider NER for low-resource Indian languages like Hindi and Marathi.

named-entity-recognition Named Entity Recognition +1

Paper
Add Code

L3Cube-MahaCorpus and MahaBERT: Marathi Monolingual Corpus, Marathi BERT Language Models, and Resources

1 code implementation • WILDRE (LREC) 2022 • Raviraj Joshi

We present L3Cube-MahaCorpus a Marathi monolingual data set scraped from different internet sources.

named-entity-recognition Named Entity Recognition +5

Paper
Code

Hierarchical Neural Network Approaches for Long Document Classification

no code implementations • 18 Jan 2022 • Snehal Khandve, Vedangi Wagh, Apurva Wani, Isha Joshi, Raviraj Joshi

Along with the hierarchical approaches, this work also provides a comparison of different deep learning algorithms like USE, BERT, HAN, Longformer, and BigBird for long document classification.

Document Classification Sentence +2

Paper
Add Code

On Sensitivity of Deep Learning Based Text Classification Algorithms to Practical Input Perturbations

no code implementations • 2 Jan 2022 • Aamir Miyajiwala, Arnav Ladkat, Samiksha Jagadale, Raviraj Joshi

In this work, we carry out a data-focused study evaluating the impact of systematic practical perturbations on the performance of the deep learning based text classification models like CNN, LSTM, and BERT-based algorithms.

text-classification Text Classification

Paper
Add Code

Attention based end to end Speech Recognition for Voice Search in Hindi and English

no code implementations • 15 Nov 2021 • Raviraj Joshi, Venkateshan Kannan

Overall, we report an improvement of 36. 9% over the phoneme-CTC system.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Comparative Study of Long Document Classification

no code implementations • 1 Nov 2021 • Vedangi Wagh, Snehal Khandve, Isha Joshi, Apurva Wani, Geetanjali Kale, Raviraj Joshi

We re-iterate that long document classification is a simpler task and even basic algorithms perform competitively with BERT-based approaches on most of the datasets.

BIG-bench Machine Learning Document Classification +1

Paper
Add Code

Hate and Offensive Speech Detection in Hindi and Marathi

no code implementations • 23 Oct 2021 • Abhishek Velankar, Hrushikesh Patil, Amol Gore, Shubham Salunke, Raviraj Joshi

The basic models based on CNN and LSTM are augmented with fast text word embeddings.

Sentiment Analysis text-classification +2

Paper
Add Code

Contextual Hate Speech Detection in Code Mixed Text using Transformer Based Approaches

no code implementations • 18 Oct 2021 • Ravindra Nayak, Raviraj Joshi

We specifically focus on code mixed English-Hindi text and transformer-based approaches.

Hate Speech Detection

Paper
Add Code

SISA: Securing Images by Selective Alteration

no code implementations • 20 Jun 2021 • Prutha Gaherwar, Shraddha Joshi, Raviraj Joshi, Rahul Khengare

While encryption is the best way to ensure image security, full encryption and decryption is a computationally-intensive process.

Object Recognition

Paper
Add Code

L3CubeMahaSent: A Marathi Tweet-based Sentiment Analysis Dataset

1 code implementation • EACL (WASSA) 2021 • Atharva Kulkarni, Meet Mandhane, Manali Likhitkar, Gayatri Kshirsagar, Raviraj Joshi

We also present the guidelines using which we annotated the tweets.

Sentiment Analysis

Paper
Code

ShufText: A Simple Black Box Approach to Evaluate the Fragility of Text Classification Models

no code implementations • 30 Jan 2021 • Rutuja Taware, Shraddha Varat, Gaurav Salunke, Chaitanya Gawande, Geetanjali Kale, Rahul Khengare, Raviraj Joshi

We show that these systems are over-reliant on the important words present in the text that are useful for classification.

General Classification Language Modelling +5

Paper
Add Code

ICodeNet -- A Hierarchical Neural Network Approach for Source Code Author Identification

no code implementations • 30 Jan 2021 • Pranali Bora, Tulika Awalgaonkar, Himanshu Palve, Raviraj Joshi, Purvi Goel

We have also compared our image-based hierarchical neural network model with simple image-based CNN architecture and text-based CNN and LSTM models to highlight its novelty and efficiency.

Paper
Add Code

Experimental Evaluation of Deep Learning models for Marathi Text Classification

no code implementations • 13 Jan 2021 • Atharva Kulkarni, Meet Mandhane, Manali Likhitkar, Gayatri Kshirsagar, Jayashree Jagdale, Raviraj Joshi

The Marathi language is one of the prominent languages used in India.

General Classification text-classification +2

Paper
Add Code

Evaluating Deep Learning Approaches for Covid19 Fake News Detection

no code implementations • 11 Jan 2021 • Apurva Wani, Isha Joshi, Snehal Khandve, Vedangi Wagh, Raviraj Joshi

These platforms have led to an increase in the creation and spread of fake news.

Fake News Detection Language Modelling +2

Paper
Add Code

Evaluation of Deep Learning Models for Hostility Detection in Hindi Text

no code implementations • 11 Jan 2021 • Ramchandra Joshi, Rushabh Karnavat, Kaustubh Jirapure, Raviraj Joshi

The pre-trained Hindi fast text word embeddings by IndicNLP and Facebook are used in conjunction with CNN and LSTM models.

Multi-Label Classification Text Detection +1

Paper
Add Code

Domain Adaptation of NMT models for English-Hindi Machine Translation Task at AdapMT ICON 2020

no code implementations • 22 Dec 2020 • Ramchandra Joshi, Rushabh Karnavat, Kaustubh Jirapure, Raviraj Joshi

The shared task aims to build a translation system for Indian languages in specific domains like Artificial Intelligence (AI) and Chemistry using a small in-domain parallel corpus.

Domain Adaptation Machine Translation +2

Paper
Add Code

Evaluating Input Representation for Language Identification in Hindi-English Code Mixed Text

no code implementations • 23 Nov 2020 • Ramchandra Joshi, Raviraj Joshi

We evaluate different deep learning models and input representation combinations for this task.

Language Identification Sentence +3

Paper
Add Code

Deep Learning for Hindi Text Classification: A Comparison

no code implementations • 19 Jan 2020 • Ramchandra Joshi, Purvi Goel, Raviraj Joshi

Usage of deep learning in text processing has revolutionized the techniques for text processing and achieved remarkable results.

General Classification Sentence +3

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.