Search Results for author: Jian Xue

Found 18 papers, 5 papers with code

Leveraging Timestamp Information for Serialized Joint Streaming Recognition and Translation

no code implementations • 23 Oct 2023 • Sara Papi, Peidong Wang, Junkun Chen, Jian Xue, Naoyuki Kanda, Jinyu Li, Yashesh Gaur

The growing need for instant spoken language transcription and translation is driven by increased global communication and cross-lingual interactions.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Improving Stability in Simultaneous Speech Translation: A Revision-Controllable Decoding Approach

no code implementations • 6 Oct 2023 • Junkun Chen, Jian Xue, Peidong Wang, Jing Pan, Jinyu Li

Simultaneous Speech-to-Text translation serves a critical role in real-time crosslingual communication.

Simultaneous Speech-to-Text Translation Translation

Paper
Add Code

DiariST: Streaming Speech Translation with Speaker Diarization

1 code implementation • 14 Sep 2023 • Mu Yang, Naoyuki Kanda, Xiaofei Wang, Junkun Chen, Peidong Wang, Jian Xue, Jinyu Li, Takuya Yoshioka

End-to-end speech translation (ST) for conversation recordings involves several under-explored challenges such as speaker diarization (SD) without accurate word time stamps and handling of overlapping speech in a streaming fashion.

speaker-diarization Speaker Diarization +3

Paper
Code

FoodSAM: Any Food Segmentation

1 code implementation • 11 Aug 2023 • Xing Lan, Jiayi Lyu, Hanyu Jiang, Kun Dong, Zehai Niu, Yi Zhang, Jian Xue

Remarkably, this pioneering framework stands as the first-ever work to achieve instance, panoptic, and promptable segmentation on food images.

Ranked #1 on Semantic Segmentation on FoodSeg103 (using extra training data)

Image Segmentation Instance Segmentation +2

129

Paper
Code

Pre-training End-to-end ASR Models with Augmented Speech Samples Queried by Text

no code implementations • 30 Jul 2023 • Eric Sun, Jinyu Li, Jian Xue, Yifan Gong

When mixing 20, 000 hours augmented speech data generated by our method with 12, 500 hours original transcribed speech data for Italian Transformer transducer model pre-training, we achieve 8. 7% relative word error rate reduction.

Automatic Speech Recognition Data Augmentation +2

Paper
Add Code

Token-Level Serialized Output Training for Joint Streaming ASR and ST Leveraging Textual Alignments

no code implementations • 7 Jul 2023 • Sara Papi, Peidong Wang, Junkun Chen, Jian Xue, Jinyu Li, Yashesh Gaur

In real-world applications, users often require both translations and transcriptions of speech to enhance their comprehension, particularly in streaming scenarios where incremental generation is necessary.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Building High-accuracy Multilingual ASR with Gated Language Experts and Curriculum Training

no code implementations • 1 Mar 2023 • Eric Sun, Jinyu Li, Yuxuan Hu, Yimeng Zhu, Long Zhou, Jian Xue, Peidong Wang, Linquan Liu, Shujie Liu, Edward Lin, Yifan Gong

We propose gated language experts and curriculum training to enhance multilingual transformer transducer models without requiring language identification (LID) input from users during inference.

Language Identification

Paper
Add Code

Markerless Body Motion Capturing for 3D Character Animation based on Multi-view Cameras

no code implementations • 12 Dec 2022 • Jinbao Wang, Ke Lu, Jian Xue

This paper proposes a novel application system for the generation of three-dimensional (3D) character animation driven by markerless human body motion capturing.

Paper
Add Code

Fast and accurate factorized neural transducer for text adaption of end-to-end speech recognition models

no code implementations • 5 Dec 2022 • Rui Zhao, Jian Xue, Partha Parthasarathy, Veljko Miljanic, Jinyu Li

Neural transducer is now the most popular end-to-end model for speech recognition, due to its naturally streaming ability.

Language Modelling speech-recognition +1

Paper
Add Code

Streaming, fast and accurate on-device Inverse Text Normalization for Automatic Speech Recognition

no code implementations • 7 Nov 2022 • Yashesh Gaur, Nick Kibre, Jian Xue, Kangyuan Shu, Yuhui Wang, Issac Alphanso, Jinyu Li, Yifan Gong

Automatic Speech Recognition (ASR) systems typically yield output in lexical form.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

LAMASSU: Streaming Language-Agnostic Multilingual Speech Recognition and Translation Using Neural Transducers

no code implementations • 5 Nov 2022 • Peidong Wang, Eric Sun, Jian Xue, Yu Wu, Long Zhou, Yashesh Gaur, Shujie Liu, Jinyu Li

In this paper, we propose LAMASSU, a streaming language-agnostic multilingual speech recognition and translation model using neural transducers.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Paper
Add Code

A Weakly-Supervised Streaming Multilingual Speech Model with Truly Zero-Shot Capability

no code implementations • 4 Nov 2022 • Jian Xue, Peidong Wang, Jinyu Li, Eric Sun

In this paper, we introduce our work of building a Streaming Multilingual Speech Model (SM2), which can transcribe or translate multiple spoken languages into texts of the target language.

Machine Translation speech-recognition +2

Paper
Add Code

G-Rep: Gaussian Representation for Arbitrary-Oriented Object Detection

1 code implementation • 24 May 2022 • Liping Hou, Ke Lu, Xue Yang, Yuqiu Li, Jian Xue

To go further, in this paper, we propose a unified Gaussian representation called G-Rep to construct Gaussian distributions for OBB, QBB, and PointSet, which achieves a unified solution to various representations and problems.

Object object-detection +3

1,757

Paper
Code

Large-Scale Streaming End-to-End Speech Translation with Neural Transducers

1 code implementation • 11 Apr 2022 • Jian Xue, Peidong Wang, Jinyu Li, Matt Post, Yashesh Gaur

Neural transducers have been widely used in automatic speech recognition (ASR).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Code

FEAFA+: An Extended Well-Annotated Dataset for Facial Expression Analysis and 3D Facial Animation

no code implementations • 4 Nov 2021 • Wei Gan, Jian Xue, Ke Lu, Yanfu Yan, Pengcheng Gao, Jiayi Lyu

Extended FEAFA (FEAFA+) includes 150 video sequences from FEAFA and DISFA, with a total of 230, 184 frames being manually annotated on floating-point intensity value of 24 redefined AUs using the Expression Quantitative Tool.

Paper
Add Code

On Addressing Practical Challenges for RNN-Transducer

no code implementations • 27 Apr 2021 • Rui Zhao, Jian Xue, Jinyu Li, Wenning Wei, Lei He, Yifan Gong

The first challenge is solved with a splicing data method which concatenates the speech segments extracted from the source domain data.

speech-recognition Speech Recognition

Paper
Add Code

HIH: Towards More Accurate Face Alignment via Heatmap in Heatmap

1 code implementation • 7 Apr 2021 • Xing Lan, Qinghao Hu, Qiang Chen, Jian Xue, Jian Cheng

In particular, our HIH reaches 4. 08 NME (Normalized Mean Error) on WFLW, and 3. 21 on COFW, which exceeds previous methods by a significant margin.

Ranked #4 on Face Alignment on WFW (Extra Data)

Face Alignment regression

Paper
Code

FEAFA: A Well-Annotated Dataset for Facial Expression Analysis and 3D Facial Animation

no code implementations • 2 Apr 2019 • Yanfu Yan, Ke Lu, Jian Xue, Pengcheng Gao, Jiayi Lyu

To meet the need for videos labeled in great detail, we present a well-annotated dataset named FEAFA for Facial Expression Analysis and 3D Facial Animation.

3D Face Reconstruction regression

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.