Search Results for author: Shun-Po Chuang

Found 13 papers, 6 papers with code

ImGeoNet: Image-induced Geometry-aware Voxel Representation for Multi-view 3D Object Detection

no code implementations • ICCV 2023 • Tao Tu, Shun-Po Chuang, Yu-Lun Liu, Cheng Sun, Ke Zhang, Donna Roy, Cheng-Hao Kuo, Min Sun

The results demonstrate that ImGeoNet outperforms the current state-of-the-art multi-view image-based method, ImVoxelNet, on all three datasets in terms of detection accuracy.

Ranked #24 on 3D Object Detection on ScanNetV2

3D Object Detection object-detection

Paper
Add Code

EURO: ESPnet Unsupervised ASR Open-source Toolkit

1 code implementation • 30 Nov 2022 • Dongji Gao, Jiatong Shi, Shun-Po Chuang, Leibny Paola Garcia, Hung-Yi Lee, Shinji Watanabe, Sanjeev Khudanpur

This paper describes the ESPnet Unsupervised ASR Open-source Toolkit (EURO), an end-to-end open-source toolkit for unsupervised automatic speech recognition (UASR).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

7,924

Paper
Code

Learning Phone Recognition from Unpaired Audio and Phone Sequences Based on Generative Adversarial Network

no code implementations • 29 Jul 2022 • Da-Rong Liu, Po-chun Hsu, Yi-Chen Chen, Sung-Feng Huang, Shun-Po Chuang, Da-Yi Wu, Hung-Yi Lee

GAN training is adopted in the first stage to find the mapping relationship between unpaired speech and phone sequence.

Acoustic Unit Discovery Generative Adversarial Network

Paper
Add Code

Anticipation-Free Training for Simultaneous Machine Translation

1 code implementation • IWSLT (ACL) 2022 • Chih-Chiang Chang, Shun-Po Chuang, Hung-Yi Lee

Existing methods increase latency or introduce adaptive read-write policies for SimulMT models to handle local reordering and improve translation quality.

Hallucination Machine Translation +2

Paper
Code

Investigating the Reordering Capability in CTC-based Non-Autoregressive End-to-End Speech Translation

1 code implementation • Findings (ACL) 2021 • Shun-Po Chuang, Yung-Sung Chuang, Chih-Chiang Chang, Hung-Yi Lee

We study the possibilities of building a non-autoregressive speech-to-text translation model using connectionist temporal classification (CTC), and use CTC-based automatic speech recognition as an auxiliary task to improve the performance.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Code

Non-autoregressive Mandarin-English Code-switching Speech Recognition

no code implementations • 6 Apr 2021 • Shun-Po Chuang, Heng-Jui Chang, Sung-Feng Huang, Hung-Yi Lee

Mandarin-English code-switching (CS) is frequently used among East and Southeast Asian people.

Decoder Sentence +2

Paper
Add Code

Stabilizing Label Assignment for Speech Separation by Self-supervised Pre-training

1 code implementation • 29 Oct 2020 • Sung-Feng Huang, Shun-Po Chuang, Da-Rong Liu, Yi-Chen Chen, Gene-Ping Yang, Hung-Yi Lee

Speech separation has been well developed, with the very successful permutation invariant training (PIT) approach, although the frequent label assignment switching happening during PIT training remains to be a problem when better convergence speed and achievable performance are desired.

Ranked #6 on Speech Separation on Libri2Mix (using extra training data)

Speaker Separation Speech Enhancement +1

Paper
Code

Worse WER, but Better BLEU? Leveraging Word Embedding as Intermediate in Multitask End-to-End Speech Translation

no code implementations • ACL 2020 • Shun-Po Chuang, Tzu-Wei Sung, Alexander H. Liu, Hung-Yi Lee

Speech translation (ST) aims to learn transformations from speech in the source language to the text in the target language.

Decoder Translation

Paper
Add Code

Training a code-switching language model with monolingual data

no code implementations • 14 Nov 2019 • Shun-Po Chuang, Tzu-Wei Sung, Hung-Yi Lee

A lack of code-switching data complicates the training of code-switching (CS) language models.

Language Modelling Translation +1

Paper
Add Code

Sequence-to-sequence Automatic Speech Recognition with Word Embedding Regularization and Fused Decoding

1 code implementation • 28 Oct 2019 • Alexander H. Liu, Tzu-Wei Sung, Shun-Po Chuang, Hung-Yi Lee, Lin-shan Lee

This allows the decoder to consider the semantic consistency during decoding by absorbing the information carried by the transformed decoder feature, which is learned to be close to the target word embedding.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2