no code implementations • 20 Feb 2024 • Guan-Ting Lin, Cheng-Han Chiang, Hung-Yi Lee
When using text-only LLMs to model spoken dialogue, text-only LLMs cannot give different responses based on the speaking style of the current turn.
no code implementations • 24 Jan 2024 • Chyi-Jiunn Lin, Guan-Ting Lin, Yung-Sung Chuang, Wei-Lun Wu, Shang-Wen Li, Abdelrahman Mohamed, Hung-Yi Lee, Lin-shan Lee
However, the real-world problem of Open-domain SQA (openSQA), in which the machine needs to first retrieve passages that possibly contain the answer from a spoken archive in addition, was never considered.
no code implementations • 5 Jan 2024 • Kevin Everson, Yile Gu, Huck Yang, Prashanth Gurunath Shivakumar, Guan-Ting Lin, Jari Kolehmainen, Ivan Bulyko, Ankur Gandhe, Shalini Ghosh, Wael Hamza, Hung-Yi Lee, Ariya Rastrow, Andreas Stolcke
In the realm of spoken language understanding (SLU), numerous natural language understanding (NLU) methodologies have been adapted by supplying large language models (LLMs) with transcribed speech instead of conventional written text.
no code implementations • 23 Dec 2023 • Guan-Ting Lin, Prashanth Gurunath Shivakumar, Ankur Gandhe, Chao-Han Huck Yang, Yile Gu, Shalini Ghosh, Andreas Stolcke, Hung-Yi Lee, Ivan Bulyko
Specifically, our framework serializes tasks in the order of current paralinguistic attribute prediction, response paralinguistic attribute prediction, and response text generation with autoregressive conditioning.
no code implementations • 15 Dec 2023 • Min-Han Shih, Ho-Lam Chung, Yu-Chi Pai, Ming-Hao Hsu, Guan-Ting Lin, Shang-Wen Li, Hung-Yi Lee
Furthermore, the GSQA model has only been fine-tuned on the spoken extractive QA dataset.
no code implementations • 29 May 2023 • Guan-Wei Wu, Guan-Ting Lin, Shang-Wen Li, Hung-Yi Lee
However, the absence of intermediate targets and training guidance for textless SLU often results in suboptimal performance.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 15 Nov 2022 • Derek Xu, Shuyan Dong, Changhan Wang, Suyoun Kim, Zhaojiang Lin, Akshat Shrivastava, Shang-Wen Li, Liang-Hsuan Tseng, Alexei Baevski, Guan-Ting Lin, Hung-Yi Lee, Yizhou Sun, Wei Wang
Recent studies find existing self-supervised speech encoders contain primarily acoustic rather than semantic information.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +10
1 code implementation • 13 Oct 2022 • Guan-Ting Lin, Chi-Luen Feng, Wei-Ping Huang, Yuan Tseng, Tzu-Han Lin, Chen-An Li, Hung-Yi Lee, Nigel G. Ward
We find that 13 of the 15 SSL models outperformed the baseline on all the prosody-related tasks.
2 code implementations • 27 Mar 2022 • Guan-Ting Lin, Shang-Wen Li, Hung-Yi Lee
Although deep learning-based end-to-end Automatic Speech Recognition (ASR) has shown remarkable performance in recent years, it suffers severe performance regression on test samples drawn from different data distributions.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
1 code implementation • 9 Mar 2022 • Guan-Ting Lin, Yung-Sung Chuang, Ho-Lam Chung, Shu-wen Yang, Hsuan-Jui Chen, Shuyan Dong, Shang-Wen Li, Abdelrahman Mohamed, Hung-Yi Lee, Lin-shan Lee
We empirically showed that DUAL yields results comparable to those obtained by cascading ASR and text QA model and robust to real-world data.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 14 Oct 2021 • Guan-Ting Lin, Manuel Giambi
Deep-learning techniques using BERT have achieved very promising results in the field and different methods have been proposed to integrate structured knowledge to enhance performance.
no code implementations • 7 Oct 2021 • Guan-Ting Lin, Chan-Jan Hsu, Da-Rong Liu, Hung-Yi Lee, Yu Tsao
In this work, we further analyze the training robustness of unsupervised ASR on the domain mismatch scenarios in which the domains of unpaired speech and text are different.
5 code implementations • 3 May 2021 • Shu-wen Yang, Po-Han Chi, Yung-Sung Chuang, Cheng-I Jeff Lai, Kushal Lakhotia, Yist Y. Lin, Andy T. Liu, Jiatong Shi, Xuankai Chang, Guan-Ting Lin, Tzu-Hsien Huang, Wei-Cheng Tseng, Ko-tik Lee, Da-Rong Liu, Zili Huang, Shuyan Dong, Shang-Wen Li, Shinji Watanabe, Abdelrahman Mohamed, Hung-Yi Lee
SUPERB is a leaderboard to benchmark the performance of a shared model across a wide range of speech processing tasks with minimal architecture changes and labeled data.