Search Results for author: Bo-Han Lu

Found 3 papers, 2 papers with code

BRCC and SentiBahasaRojak: The First Bahasa Rojak Corpus for Pretraining and Sentiment Analysis Dataset

no code implementations • COLING 2022 • Nanda Putri Romadhona, Sin-En Lu, Bo-Han Lu, Richard Tzong-Han Tsai

Finally, to test the effectiveness of the Mixed XLM model pre-trained on BRCC for social media scenarios where code-mixing is found frequently, we compile a new Bahasa Rojak sentiment analysis dataset, SentiBahasaRojak, with a Kappa value of 0. 77.

Data Augmentation Sentiment Analysis +1

Paper
Add Code

Enhancing Taiwanese Hokkien Dual Translation by Exploring and Standardizing of Four Writing Systems

1 code implementation • 18 Mar 2024 • Bo-Han Lu, Yi-Hsuan Lin, En-Shiun Annie Lee, Richard Tzong-Han Tsai

The study aims to address this gap by developing a dual translation model between Taiwanese Hokkien and both Traditional Mandarin Chinese and English.

Machine Translation Translation

Paper
Code

Exploring Methods for Building Dialects-Mandarin Code-Mixing Corpora: A Case Study in Taiwanese Hokkien

1 code implementation • 21 Jan 2023 • Sin-En Lu, Bo-Han Lu, Chao-Yi Lu, Richard Tzong-Han Tsai

In natural language processing (NLP), code-mixing (CM) is a challenging task, especially when the mixed languages include dialects.

Language Modelling Transfer Learning +1

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.