no code implementations • 23 Oct 2023 • Sara Papi, Peidong Wang, Junkun Chen, Jian Xue, Naoyuki Kanda, Jinyu Li, Yashesh Gaur
The growing need for instant spoken language transcription and translation is driven by increased global communication and cross-lingual interactions.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 6 Oct 2023 • Junkun Chen, Jian Xue, Peidong Wang, Jing Pan, Jinyu Li
Simultaneous Speech-to-Text translation serves a critical role in real-time crosslingual communication.
1 code implementation • 14 Sep 2023 • Mu Yang, Naoyuki Kanda, Xiaofei Wang, Junkun Chen, Peidong Wang, Jian Xue, Jinyu Li, Takuya Yoshioka
End-to-end speech translation (ST) for conversation recordings involves several under-explored challenges such as speaker diarization (SD) without accurate word time stamps and handling of overlapping speech in a streaming fashion.
1 code implementation • 11 Aug 2023 • Xing Lan, Jiayi Lyu, Hanyu Jiang, Kun Dong, Zehai Niu, Yi Zhang, Jian Xue
Remarkably, this pioneering framework stands as the first-ever work to achieve instance, panoptic, and promptable segmentation on food images.
Ranked #1 on Semantic Segmentation on FoodSeg103 (using extra training data)
no code implementations • 30 Jul 2023 • Eric Sun, Jinyu Li, Jian Xue, Yifan Gong
When mixing 20, 000 hours augmented speech data generated by our method with 12, 500 hours original transcribed speech data for Italian Transformer transducer model pre-training, we achieve 8. 7% relative word error rate reduction.
no code implementations • 7 Jul 2023 • Sara Papi, Peidong Wang, Junkun Chen, Jian Xue, Jinyu Li, Yashesh Gaur
In real-world applications, users often require both translations and transcriptions of speech to enhance their comprehension, particularly in streaming scenarios where incremental generation is necessary.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 1 Mar 2023 • Eric Sun, Jinyu Li, Yuxuan Hu, Yimeng Zhu, Long Zhou, Jian Xue, Peidong Wang, Linquan Liu, Shujie Liu, Edward Lin, Yifan Gong
We propose gated language experts and curriculum training to enhance multilingual transformer transducer models without requiring language identification (LID) input from users during inference.
no code implementations • 12 Dec 2022 • Jinbao Wang, Ke Lu, Jian Xue
This paper proposes a novel application system for the generation of three-dimensional (3D) character animation driven by markerless human body motion capturing.
no code implementations • 5 Dec 2022 • Rui Zhao, Jian Xue, Partha Parthasarathy, Veljko Miljanic, Jinyu Li
Neural transducer is now the most popular end-to-end model for speech recognition, due to its naturally streaming ability.
no code implementations • 7 Nov 2022 • Yashesh Gaur, Nick Kibre, Jian Xue, Kangyuan Shu, Yuhui Wang, Issac Alphanso, Jinyu Li, Yifan Gong
Automatic Speech Recognition (ASR) systems typically yield output in lexical form.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 5 Nov 2022 • Peidong Wang, Eric Sun, Jian Xue, Yu Wu, Long Zhou, Yashesh Gaur, Shujie Liu, Jinyu Li
In this paper, we propose LAMASSU, a streaming language-agnostic multilingual speech recognition and translation model using neural transducers.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +4
no code implementations • 4 Nov 2022 • Jian Xue, Peidong Wang, Jinyu Li, Eric Sun
In this paper, we introduce our work of building a Streaming Multilingual Speech Model (SM2), which can transcribe or translate multiple spoken languages into texts of the target language.
1 code implementation • 24 May 2022 • Liping Hou, Ke Lu, Xue Yang, Yuqiu Li, Jian Xue
To go further, in this paper, we propose a unified Gaussian representation called G-Rep to construct Gaussian distributions for OBB, QBB, and PointSet, which achieves a unified solution to various representations and problems.
1 code implementation • 11 Apr 2022 • Jian Xue, Peidong Wang, Jinyu Li, Matt Post, Yashesh Gaur
Neural transducers have been widely used in automatic speech recognition (ASR).
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 4 Nov 2021 • Wei Gan, Jian Xue, Ke Lu, Yanfu Yan, Pengcheng Gao, Jiayi Lyu
Extended FEAFA (FEAFA+) includes 150 video sequences from FEAFA and DISFA, with a total of 230, 184 frames being manually annotated on floating-point intensity value of 24 redefined AUs using the Expression Quantitative Tool.
no code implementations • 27 Apr 2021 • Rui Zhao, Jian Xue, Jinyu Li, Wenning Wei, Lei He, Yifan Gong
The first challenge is solved with a splicing data method which concatenates the speech segments extracted from the source domain data.
1 code implementation • 7 Apr 2021 • Xing Lan, Qinghao Hu, Qiang Chen, Jian Xue, Jian Cheng
In particular, our HIH reaches 4. 08 NME (Normalized Mean Error) on WFLW, and 3. 21 on COFW, which exceeds previous methods by a significant margin.
Ranked #4 on Face Alignment on WFW (Extra Data)
no code implementations • 2 Apr 2019 • Yanfu Yan, Ke Lu, Jian Xue, Pengcheng Gao, Jiayi Lyu
To meet the need for videos labeled in great detail, we present a well-annotated dataset named FEAFA for Facial Expression Analysis and 3D Facial Animation.