Search Results for author: Bo-Han Lu

Found 3 papers, 2 papers with code

BRCC and SentiBahasaRojak: The First Bahasa Rojak Corpus for Pretraining and Sentiment Analysis Dataset

no code implementations COLING 2022 Nanda Putri Romadhona, Sin-En Lu, Bo-Han Lu, Richard Tzong-Han Tsai

Finally, to test the effectiveness of the Mixed XLM model pre-trained on BRCC for social media scenarios where code-mixing is found frequently, we compile a new Bahasa Rojak sentiment analysis dataset, SentiBahasaRojak, with a Kappa value of 0. 77.

Data Augmentation Sentiment Analysis +1

Enhancing Taiwanese Hokkien Dual Translation by Exploring and Standardizing of Four Writing Systems

1 code implementation18 Mar 2024 Bo-Han Lu, Yi-Hsuan Lin, En-Shiun Annie Lee, Richard Tzong-Han Tsai

The study aims to address this gap by developing a dual translation model between Taiwanese Hokkien and both Traditional Mandarin Chinese and English.

Machine Translation Translation

Exploring Methods for Building Dialects-Mandarin Code-Mixing Corpora: A Case Study in Taiwanese Hokkien

1 code implementation21 Jan 2023 Sin-En Lu, Bo-Han Lu, Chao-Yi Lu, Richard Tzong-Han Tsai

In natural language processing (NLP), code-mixing (CM) is a challenging task, especially when the mixed languages include dialects.

Language Modelling Transfer Learning +1

Cannot find the paper you are looking for? You can Submit a new open access paper.