TenTrans Large-Scale Multilingual Machine Translation System for WMT21

coreference-resolution Machine Reading Comprehension +4

Paper
Code

End-to-End Chinese Speaker Identification

1 code implementation • NAACL 2022 • Dian Yu, Ben Zhou, Dong Yu

End-to-end SI systems, on the other hand, are not limited by individual modules, but suffer from insufficient training data from the existing small-scale datasets.

Paper
Code

Variational Graph Autoencoding as Cheap Supervision for AMR Coreference Resolution

no code implementations • ACL 2022 • Irene Li, Linfeng Song, Kun Xu, Dong Yu

Coreference resolution over semantic graphs like AMRs aims to group the graph nodes that represent the same entity.

coreference-resolution

Paper
Add Code

字里行间的道德:中文文本道德句识别研究(Morality Between the Lines: Research on Identification of Chinese Moral Sentence)

no code implementations • CCL 2021 • Shiya Peng, Chang Liu, Yayue Deng, Dong Yu

Dialogue Rewriting Text Generation

Paper
Add Code

RAST: Domain-Robust Dialogue Rewriting as Sequence Tagging

no code implementations • EMNLP 2021 • Jie Hao, Linfeng Song, LiWei Wang, Kun Xu, Zhaopeng Tu, Dong Yu

The task of dialogue rewriting aims to reconstruct the latest dialogue utterance by copying the missing content from the dialogue context.

Paper
Add Code

Instance-adaptive training with noise-robust losses against noisy labels

no code implementations • EMNLP 2021 • Lifeng Jin, Linfeng Song, Kun Xu, Dong Yu

In order to alleviate the huge demand for annotated datasets for different tasks, many recent natural language processing datasets have adopted automated pipelines for fast-tracking usable data.

Paper
Add Code

面向人工智能伦理计算的中文道德词典构建方法研究(Construction of a Chinese Moral Dictionary for Artificial Intelligence Ethical Computing)

no code implementations • CCL 2020 • Hongrui Wang, Chang Liu, Dong Yu

道德词典资源的建设是人工智能伦理计算的一个研究重点。由于道德行为复杂多样, 现有的英文道德词典分类体系并不完善, 而中文方面目前尚未有相关的词典资源, 理论体系和构建方法仍待探究。针对以上问题, 该文提出了面向人工智能伦理计算的中文道德词典构建任务, 设计了四类标签和四种类型, 得到包含25, 012个词的中文道德词典资源。实验结果表明, 该词典资源不仅能够使机器学会道德知识, 判断词的道德标签和类型, 而且能够为句子级别的道德文本分析提供数据支持。

Paper
Add Code

结合深度学习和语言难度特征的句子可读性计算方法(The method of calculating sentence readability combined with deep learning and language difficulty characteristics)

no code implementations • CCL 2020 • Yuling Tang, Dong Yu

本文提出了可读性语料库构建的改进方法, 基于该方法, 构建了规模更大的汉语句子可读性语料库。该语料库在句子绝对难度评估任务上的准确率达到0. 7869, 相对前人工作提升了0. 15以上, 证明了改进方法的有效性。将深度学习方法应用于汉语可读性评估, 探究了不同深度学习方法自动捕获难度特征的能力, 并进仛步探究了向深度学习特征中融入不同层面的语难度特征对模型整体性能的影响。实验结果显示, 不同深度学习模型的难度特征捕获能力不尽相同, 语言难度特征可以不同程度地提高深度学习模型的难度表征能力。

Paper
Add Code

From Polarity to Intensity: Mining Morality from Semantic Space

no code implementations • COLING 2022 • Chunxu Zhao, Pengyuan Liu, Dong Yu

It only needs moral polarity labels, which are more robust and easier to acquire.

Mathematical Reasoning Self-Learning

Paper
Add Code

CoreValue:面向价值观计算的中文核心价值-行为体系及知识库(CoreValue: Chinese Core Value-Behavior Frame and Knowledge Base for Value Computing)

no code implementations • CCL 2022 • Pengyuan Liu, Sanle Zhang, Dong Yu, Lin Bo

Paper
Add Code

Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing

no code implementations • 18 Apr 2024 • Ye Tian, Baolin Peng, Linfeng Song, Lifeng Jin, Dian Yu, Haitao Mi, Dong Yu

Despite the impressive capabilities of Large Language Models (LLMs) on various tasks, they still struggle with scenarios that involves complex reasoning and planning.

Paper
Add Code

Entropy Guided Extrapolative Decoding to Improve Factuality in Large Language Models

no code implementations • 14 Apr 2024 • Souvik Das, Lifeng Jin, Linfeng Song, Haitao Mi, Baolin Peng, Dong Yu

Current state-of-the-art approaches refine decoding by contrasting early-exit distributions from a lower layer with the final layer to exploit information related to factuality within the model forward procedure.

Hallucination

Paper
Add Code

Polarity Calibration for Opinion Summarization

1 code implementation • 2 Apr 2024 • Yuanyuan Lei, Kaiqiang Song, Sangwoo Cho, Xiaoyang Wang, Ruihong Huang, Dong Yu

To address this issue and make the summarizer express both sides of opinions, we introduce the concept of polarity calibration, which aims to align the polarity of output summary with that of input text.

Opinion Summarization

Paper
Code

Conceptual and Unbiased Reasoning in Language Models

no code implementations • 30 Mar 2024 • Ben Zhou, Hongming Zhang, Sihao Chen, Dian Yu, Hongwei Wang, Baolin Peng, Dan Roth, Dong Yu

Conceptual reasoning, the ability to reason in abstract and high-level perspectives, is key to generalization in human cognition.

Decision Making

Paper
Add Code

Self-Consistency Boosts Calibration for Math Reasoning

no code implementations • 14 Mar 2024 • Ante Wang, Linfeng Song, Ye Tian, Baolin Peng, Lifeng Jin, Haitao Mi, Jinsong Su, Dong Yu

Calibration, which establishes the correlation between accuracy and model confidence, is important for LLM development.

GSM8K Math

Paper
Add Code

A Knowledge Plug-and-Play Test Bed for Open-domain Dialogue Generation

1 code implementation • 6 Mar 2024 • Xiangci Li, Linfeng Song, Lifeng Jin, Haitao Mi, Jessica Ouyang, Dong Yu

In this paper, we present a high-quality benchmark named multi-source Wizard of Wikipedia (Ms. WoW) for evaluating multi-source dialogue knowledge selection and response generation.

Dialogue Generation Response Generation

Paper
Code

Can Large Language Models do Analytical Reasoning?

no code implementations • 6 Mar 2024 • Yebowen Hu, Kaiqiang Song, Sangwoo Cho, Xiaoyang Wang, Hassan Foroosh, Dong Yu, Fei Liu

Our analytical reasoning embodies the tasks of letting large language models count how many points each team scores in a quarter in the NBA and NFL games.

Language Modelling Large Language Model

Paper
Add Code

Collaborative decoding of critical tokens for boosting factuality of large language models

no code implementations • 28 Feb 2024 • Lifeng Jin, Baolin Peng, Linfeng Song, Haitao Mi, Ye Tian, Dong Yu

The most common training pipeline for large language models includes pretraining, finetuning and aligning phases, with their respective resulting models, such as the pretrained model and the finetuned model.

Hallucination Instruction Following

Paper
Add Code

Fact-and-Reflection (FaR) Improves Confidence Calibration of Large Language Models

no code implementations • 27 Feb 2024 • Xinran Zhao, Hongming Zhang, Xiaoman Pan, Wenlin Yao, Dong Yu, Tongshuang Wu, Jianshu Chen

For a LLM to be trustworthy, its confidence level should be well-calibrated with its actual performance.

Common Sense Reasoning Question Answering

Paper
Add Code

Fine-Grained Self-Endorsement Improves Factuality and Reasoning

no code implementations • 23 Feb 2024 • Ante Wang, Linfeng Song, Baolin Peng, Ye Tian, Lifeng Jin, Haitao Mi, Jinsong Su, Dong Yu

Experiments on Biographies show that our method can effectively improve the factuality of generations with simple and intuitive prompts across different scales of LLMs.

GSM8K Language Modelling +2

Paper
Add Code

MatPlotAgent: Method and Evaluation for LLM-Based Agentic Scientific Data Visualization

1 code implementation • 18 Feb 2024 • Zhiyu Yang, Zihan Zhou, Shuo Wang, Xin Cong, Xu Han, Yukun Yan, Zhenghao Liu, Zhixing Tan, Pengyuan Liu, Dong Yu, Zhiyuan Liu, Xiaodong Shi, Maosong Sun

Scientific data visualization plays a crucial role in research by enabling the direct display of complex information and assisting researchers in identifying implicit patterns.

Code Generation Data Visualization

Paper
Code

SportsMetrics: Blending Text and Numerical Data to Understand Information Fusion in LLMs

no code implementations • 15 Feb 2024 • Yebowen Hu, Kaiqiang Song, Sangwoo Cho, Xiaoyang Wang, Hassan Foroosh, Dong Yu, Fei Liu

In this paper, we introduce four novel tasks centered around sports data analytics to evaluate the numerical reasoning and information fusion capabilities of LLMs.

Paper
Add Code

Rewards-in-Context: Multi-objective Alignment of Foundation Models with Dynamic Preference Adjustment

1 code implementation • 15 Feb 2024 • Rui Yang, Xiaoman Pan, Feng Luo, Shuang Qiu, Han Zhong, Dong Yu, Jianshu Chen

We consider the problem of multi-objective alignment of foundation models with human preferences, which is a critical step towards helpful and harmless AI systems.

Reinforcement Learning (RL)

Paper
Code

SPECTRUM: Speaker-Enhanced Pre-Training for Long Dialogue Summarization

no code implementations • 31 Jan 2024 • Sangwoo Cho, Kaiqiang Song, Chao Zhao, Xiaoyang Wang, Dong Yu

Multi-turn dialogues are characterized by their extended length and the presence of turn-taking conversations.

Language Modelling Large Language Model

Paper
Add Code

WebVoyager: Building an End-to-End Web Agent with Large Multimodal Models

1 code implementation • 25 Jan 2024 • Hongliang He, Wenlin Yao, Kaixin Ma, Wenhao Yu, Yong Dai, Hongming Zhang, Zhenzhong Lan, Dong Yu

The rapid advancement of large language models (LLMs) has led to a new era marked by the development of autonomous applications in real-world scenarios, which drives innovation in creating advanced web agents.

Paper
Code

MM-LLMs: Recent Advances in MultiModal Large Language Models

no code implementations • 24 Jan 2024 • Duzhen Zhang, Yahan Yu, Chenxing Li, Jiahua Dong, Dan Su, Chenhui Chu, Dong Yu

In the past year, MultiModal Large Language Models (MM-LLMs) have undergone substantial advancements, augmenting off-the-shelf LLMs to support MM inputs or outputs via cost-effective training strategies.

Decision Making

Paper
Add Code

Inconsistent dialogue responses and how to recover from them

1 code implementation • 18 Jan 2024 • Mian Zhang, Lifeng Jin, Linfeng Song, Haitao Mi, Dong Yu

One critical issue for chat systems is to stay consistent about preferences, opinions, beliefs and facts of itself, which has been shown a difficult problem.

Paper
Code

InFoBench: Evaluating Instruction Following Ability in Large Language Models

1 code implementation • 7 Jan 2024 • Yiwei Qin, Kaiqiang Song, Yebowen Hu, Wenlin Yao, Sangwoo Cho, Xiaoyang Wang, Xuansheng Wu, Fei Liu, PengFei Liu, Dong Yu

This paper introduces the Decomposed Requirements Following Ratio (DRFR), a new metric for evaluating Large Language Models' (LLMs) ability to follow instructions.

Instruction Following

Paper
Code

Zebra: Extending Context Window with Layerwise Grouped Local-Global Attention

no code implementations • 14 Dec 2023 • Kaiqiang Song, Xiaoyang Wang, Sangwoo Cho, Xiaoman Pan, Dong Yu

This paper introduces a novel approach to enhance the capabilities of Large Language Models (LLMs) in processing and understanding extensive text sequences, a critical aspect in applications requiring deep comprehension and synthesis of large volumes of information.

Paper
Add Code

Dense X Retrieval: What Retrieval Granularity Should We Use?

1 code implementation • 11 Dec 2023 • Tong Chen, Hongwei Wang, Sihao Chen, Wenhao Yu, Kaixin Ma, Xinran Zhao, Hongming Zhang, Dong Yu

We discover that the retrieval unit choice significantly impacts the performance of both retrieval and downstream tasks.

Retrieval Sentence +1

Paper
Code

CLOMO: Counterfactual Logical Modification with Large Language Models

no code implementations • 29 Nov 2023 • Yinya Huang, Ruixin Hong, Hongming Zhang, Wei Shao, Zhicheng Yang, Dong Yu, ChangShui Zhang, Xiaodan Liang, Linqi Song

In this study, we delve into the realm of counterfactual reasoning capabilities of large language models (LLMs).

counterfactual Counterfactual Reasoning +1

Paper
Add Code

Deep Audio Zooming: Beamwidth-Controllable Neural Beamformer

no code implementations • 22 Nov 2023 • Meng Yu, Dong Yu

Audio zooming, a signal processing technique, enables selective focusing and enhancement of sound signals from a specified region, attenuating others.

Paper
Add Code

MMC: Advancing Multimodal Chart Understanding with Large-scale Instruction Tuning

4 code implementations • 15 Nov 2023 • Fuxiao Liu, Xiaoyang Wang, Wenlin Yao, Jianshu Chen, Kaiqiang Song, Sangwoo Cho, Yaser Yacoob, Dong Yu

Recognizing the need for a comprehensive evaluation of LMM chart understanding, we also propose a MultiModal Chart Benchmark (\textbf{MMC-Benchmark}), a comprehensive human-annotated benchmark with nine distinct tasks evaluating reasoning capabilities over charts.

211

Paper
Code

Chain-of-Note: Enhancing Robustness in Retrieval-Augmented Language Models

no code implementations • 15 Nov 2023 • Wenhao Yu, Hongming Zhang, Xiaoman Pan, Kaixin Ma, Hongwei Wang, Dong Yu

In response to these challenges, we introduces Chain-of-Noting (CoN), a novel approach aimed at improving the robustness of RALMs in facing noisy, irrelevant documents and in handling unknown scenarios.

Hallucination Retrieval

Paper
Add Code

A Closer Look at the Self-Verification Abilities of Large Language Models in Logical Reasoning

1 code implementation • 14 Nov 2023 • Ruixin Hong, Hongming Zhang, Xinyu Pang, Dong Yu, ChangShui Zhang

In this paper, we take a closer look at the self-verification abilities of LLMs in the context of logical reasoning, focusing on their ability to identify logical fallacies accurately.

Logical Fallacies Logical Reasoning

Paper
Code

TencentLLMEval: A Hierarchical Evaluation of Real-World Capabilities for Human-Aligned LLMs

1 code implementation • 9 Nov 2023 • Shuyi Xie, Wenlin Yao, Yong Dai, Shaobo Wang, Donlin Zhou, Lifeng Jin, Xinhua Feng, Pengzhi Wei, Yujie Lin, Zhichao Hu, Dong Yu, Zhengyou Zhang, Jing Nie, Yuhong Liu

We construct a hierarchical task tree encompassing 7 major areas covering over 200 categories and over 800 tasks, which covers diverse capabilities such as question answering, reasoning, multiturn dialogue, and text generation, to evaluate LLMs in a comprehensive and in-depth manner.

Benchmarking Question Answering +1

Paper
Code

Sub-Sentence Encoder: Contrastive Learning of Propositional Semantic Representations

1 code implementation • 7 Nov 2023 • Sihao Chen, Hongming Zhang, Tong Chen, Ben Zhou, Wenhao Yu, Dian Yu, Baolin Peng, Hongwei Wang, Dan Roth, Dong Yu

We introduce sub-sentence encoder, a contrastively-learned contextual embedding model for fine-grained semantic representation of text.

Contrastive Learning Semantic Similarity +3

Paper
Code

RIR-SF: Room Impulse Response Based Spatial Feature for Multi-channel Multi-talker ASR

no code implementations • 31 Oct 2023 • Yiwen Shao, Shi-Xiong Zhang, Dong Yu

Multi-channel multi-talker automatic speech recognition (ASR) presents ongoing challenges within the speech community, particularly when confronted with significant reverberation effects.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

UniX-Encoder: A Universal $X$-Channel Speech Encoder for Ad-Hoc Microphone Array Speech Processing

no code implementations • 25 Oct 2023 • Zili Huang, Yiwen Shao, Shi-Xiong Zhang, Dong Yu

2) Multi-Task Capability: Beyond the single-task focus of previous systems, UniX-Encoder acts as a robust upstream model, adeptly extracting features for diverse tasks including ASR and speaker recognition.

speaker-diarization Speaker Diarization +3

Paper
Add Code

On the Dimensionality of Sentence Embeddings

no code implementations • 23 Oct 2023 • Hongwei Wang, Hongming Zhang, Dong Yu

Therefore, we propose a two-step training method for sentence representation learning models, wherein the encoder and the pooler are optimized separately to mitigate the overall performance loss in low-dimension scenarios.

Sentence Sentence Classification +3

Paper
Add Code

Bridging the Gap between Synthetic and Authentic Images for Multimodal Machine Translation

1 code implementation • 20 Oct 2023 • Wenyu Guo, Qingkai Fang, Dong Yu, Yang Feng

Multimodal machine translation (MMT) simultaneously takes the source sentence and a relevant image as input for translation.

Multimodal Machine Translation Sentence +2

Paper
Code

uSee: Unified Speech Enhancement and Editing with Conditional Diffusion Models

no code implementations • 2 Oct 2023 • Muqiao Yang, Chunlei Zhang, Yong Xu, Zhongweiyang Xu, Heming Wang, Bhiksha Raj, Dong Yu

Speech enhancement aims to improve the quality of speech signals in terms of quality and intelligibility, and speech editing refers to the process of editing the speech according to specific user needs.

Denoising Self-Supervised Learning +2

Paper
Add Code

From Language Modeling to Instruction Following: Understanding the Behavior Shift in LLMs after Instruction Tuning

1 code implementation • 30 Sep 2023 • Xuansheng Wu, Wenlin Yao, Jianshu Chen, Xiaoman Pan, Xiaoyang Wang, Ninghao Liu, Dong Yu

In this work, we investigate how the instruction tuning adjusts pre-trained models with a focus on intrinsic changes.

Instruction Following Language Modelling +1

Paper
Code

The Trickle-down Impact of Reward (In-)consistency on RLHF

1 code implementation • 28 Sep 2023 • Lingfeng Shen, Sihao Chen, Linfeng Song, Lifeng Jin, Baolin Peng, Haitao Mi, Daniel Khashabi, Dong Yu

We propose Contrast Instructions -- a benchmarking strategy for the consistency of RM.

Benchmarking

Paper
Code

Advancing Acoustic Howling Suppression through Recursive Training of Neural Networks

no code implementations • 27 Sep 2023 • Hao Zhang, Yixuan Zhang, Meng Yu, Dong Yu

In this paper, we introduce a novel training framework designed to comprehensively address the acoustic howling issue by examining its fundamental formation process.

Acoustic echo cancellation

Paper
Add Code

Neural Network Augmented Kalman Filter for Robust Acoustic Howling Suppression

no code implementations • 27 Sep 2023 • Yixuan Zhang, Hao Zhang, Meng Yu, Dong Yu

Acoustic howling suppression (AHS) is a critical challenge in audio communication systems.

Paper
Add Code

Proposition from the Perspective of Chinese Language: A Chinese Proposition Classification Evaluation Benchmark

no code implementations • 18 Sep 2023 • Conghui Niu, Mengyang Hu, Lin Bo, Xiaoli He, Dong Yu, Pengyuan Liu

Existing propositions often rely on logical constants for classification.

Classification Natural Language Understanding

Paper
Add Code

Stabilizing RLHF through Advantage Model and Selective Rehearsal

no code implementations • 18 Sep 2023 • Baolin Peng, Linfeng Song, Ye Tian, Lifeng Jin, Haitao Mi, Dong Yu

Large Language Models (LLMs) have revolutionized natural language processing, yet aligning these models with human values and preferences using RLHF remains a significant challenge.

Paper
Add Code

Unifying Robustness and Fidelity: A Comprehensive Study of Pretrained Generative Methods for Speech Enhancement in Adverse Conditions

no code implementations • 16 Sep 2023 • Heming Wang, Meng Yu, Hao Zhang, Chunlei Zhang, Zhongweiyang Xu, Muqiao Yang, Yixuan Zhang, Dong Yu

Enhancing speech signal quality in adverse acoustic environments is a persistent challenge in speech processing.

Speech Enhancement

Paper
Add Code

LASER: LLM Agent with State-Space Exploration for Web Navigation

1 code implementation • 15 Sep 2023 • Kaixin Ma, Hongming Zhang, Hongwei Wang, Xiaoman Pan, Wenhao Yu, Dong Yu

We evaluate our proposed LLM Agent with State-Space ExploRation (LASER) on both the WebShop task and amazon. com.

Decision Making

Paper
Code

Unsupervised Multi-document Summarization with Holistic Inference

no code implementations • 8 Sep 2023 • Haopeng Zhang, Sangwoo Cho, Kaiqiang Song, Xiaoyang Wang, Hongwei Wang, Jiawei Zhang, Dong Yu

SRI balances the importance and diversity of a subset of sentences from the source documents and can be calculated in unsupervised and adaptive manners.

Document Summarization Extractive Summarization +1

Paper
Add Code

Text-Only Domain Adaptation for End-to-End Speech Recognition through Down-Sampling Acoustic Representation

no code implementations • 4 Sep 2023 • Jiaxu Zhu, Weinan Tong, Yaoxun Xu, Changhe Song, Zhiyong Wu, Zhao You, Dan Su, Dong Yu, Helen Meng

Mapping two modalities, speech and text, into a shared representation space, is a research topic of using text-only data to improve end-to-end automatic speech recognition (ASR) performance in new domains.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Bayes Risk Transducer: Transducer with Controllable Alignment Prediction

1 code implementation • 19 Aug 2023 • Jinchuan Tian, Jianwei Yu, Hangting Chen, Brian Yan, Chao Weng, Dong Yu, Shinji Watanabe

While the vanilla transducer does not have a prior preference for any of the valid paths, this work intends to enforce the preferred paths and achieve controllable alignment prediction.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

7,880

Paper
Code

Skills-in-Context Prompting: Unlocking Compositionality in Large Language Models

no code implementations • 1 Aug 2023 • Jiaao Chen, Xiaoman Pan, Dian Yu, Kaiqiang Song, Xiaoyang Wang, Dong Yu, Jianshu Chen

Compositional generalization empowers the LLMs to solve problems that are harder than the ones they have seen (i. e., easy-to-hard generalization), which is a critical reasoning capability of human-like intelligence.

Ranked #17 on Math Word Problem Solving on MATH

Math Math Word Problem Solving

Paper
Add Code

MinT: Boosting Generalization in Mathematical Reasoning via Multi-View Fine-Tuning

no code implementations • 16 Jul 2023 • Zhenwen Liang, Dian Yu, Xiaoman Pan, Wenlin Yao, Qingkai Zeng, Xiangliang Zhang, Dong Yu

Our approach uniquely considers the various annotation formats as different "views" and leverages them in training the model.

Knowledge Distillation Mathematical Reasoning

Paper
Add Code

A Stitch in Time Saves Nine: Detecting and Mitigating Hallucinations of LLMs by Validating Low-Confidence Generation

no code implementations • 8 Jul 2023 • Neeraj Varshney, Wenlin Yao, Hongming Zhang, Jianshu Chen, Dong Yu

Specifically, the detection technique achieves a recall of ~88% and the mitigation technique successfully mitigates 57. 6% of the correctly detected hallucinations.

Hallucination

Paper
Add Code

Make-A-Voice: Unified Voice Synthesis With Discrete Representation

no code implementations • 30 May 2023 • Rongjie Huang, Chunlei Zhang, Yongqi Wang, Dongchao Yang, Luping Liu, Zhenhui Ye, Ziyue Jiang, Chao Weng, Zhou Zhao, Dong Yu

Various applications of voice synthesis have been developed independently despite the fact that they generate "voice" as output in common.

Singing Voice Synthesis Voice Conversion

Paper
Add Code

PIVOINE: Instruction Tuning for Open-world Information Extraction

1 code implementation • 24 May 2023 • Keming Lu, Xiaoman Pan, Kaiqiang Song, Hongming Zhang, Dong Yu, Jianshu Chen

In particular, we construct INSTRUCTOPENWIKI, a substantial instruction tuning dataset for Open-world IE enriched with a comprehensive corpus, extensive annotations, and diverse instructions.

Instruction Following Language Modelling +1

Paper
Code

Bridging Continuous and Discrete Spaces: Interpretable Sentence Representation Learning via Compositional Operations

1 code implementation • 24 May 2023 • James Y. Huang, Wenlin Yao, Kaiqiang Song, Hongming Zhang, Muhao Chen, Dong Yu

It is unclear whether the compositional semantics of sentences can be directly reflected as compositional operations in the embedding space.

Paper
Code

Open-Domain Event Graph Induction for Mitigating Framing Bias

no code implementations • 22 May 2023 • Siyi Liu, Hongming Zhang, Hongwei Wang, Kaiqiang Song, Dan Roth, Dong Yu

However, none of the existing methods have explicitly addressed the issue of framing bias that is inherent in news articles.

Paper
Add Code

Hybrid AHS: A Hybrid of Kalman Filter and Deep Learning for Acoustic Howling Suppression

no code implementations • 4 May 2023 • Hao Zhang, Meng Yu, Yuzhong Wu, Tao Yu, Dong Yu

During offline training, a pre-processed signal obtained from the Kalman filter and an ideal microphone signal generated via teacher-forced training strategy are used to train the deep neural network (DNN).

Paper
Add Code

Faithful Question Answering with Monte-Carlo Planning

1 code implementation • 4 May 2023 • Ruixin Hong, Hongming Zhang, Hong Zhao, Dong Yu, ChangShui Zhang

In this paper, we propose FAME (FAithful question answering with MontE-carlo planning) to answer questions based on faithful reasoning steps.

Decision Making Question Answering +1

Paper
Code

Deep Learning for Joint Acoustic Echo and Acoustic Howling Suppression in Hybrid Meetings

no code implementations • 2 May 2023 • Hao Zhang, Meng Yu, Dong Yu

In particular, the interplay between acoustic echo and acoustic howling in a hybrid meeting makes the joint suppression of them difficult.

Paper
Add Code

Deep AHS: A Deep Learning Approach to Acoustic Howling Suppression

no code implementations • 18 Feb 2023 • Hao Zhang, Meng Yu, Dong Yu

In this paper, we formulate acoustic howling suppression (AHS) as a supervised learning problem and propose a deep learning approach, called Deep AHS, to address it.

Chatbot Response Generation

Paper
Add Code

Search-Engine-augmented Dialogue Response Generation with Cheaply Supervised Query Production

1 code implementation • 16 Feb 2023 • Ante Wang, Linfeng Song, Qi Liu, Haitao Mi, Longyue Wang, Zhaopeng Tu, Jinsong Su, Dong Yu

We propose a dialogue model that can access the vast and dynamic information from any search engine for response generation.

Paper
Code

Friend-training: Learning from Models of Different but Related Tasks

no code implementations • 31 Jan 2023 • Mian Zhang, Lifeng Jin, Linfeng Song, Haitao Mi, Xiabing Zhou, Dong Yu

Current self-training methods such as standard self-training, co-training, tri-training, and others often focus on improving model performance on a single task, utilizing differences in input features, model architectures, and training processes.

Dialogue Rewriting Dialogue Understanding +1

Paper
Add Code

Neural Target Speech Extraction: An Overview

1 code implementation • 31 Jan 2023 • Katerina Zmolikova, Marc Delcroix, Tsubasa Ochiai, Keisuke Kinoshita, Jan Černocký, Dong Yu

Humans can listen to a target speaker even in challenging acoustic conditions that have noise, reverberation, and interfering speakers.

Speech Extraction

Paper
Code

NeuralKalman: A Learnable Kalman Filter for Acoustic Echo Cancellation

no code implementations • 29 Jan 2023 • Yixuan Zhang, Meng Yu, Hao Zhang, Dong Yu, DeLiang Wang

The robustness of the Kalman filter to double talk and its rapid convergence make it a popular approach for addressing acoustic echo cancellation (AEC) challenges.

Acoustic echo cancellation

Paper
Add Code

OASum: Large-Scale Open Domain Aspect-based Summarization

1 code implementation • 19 Dec 2022 • Xianjun Yang, Kaiqiang Song, Sangwoo Cho, Xiaoyang Wang, Xiaoman Pan, Linda Petzold, Dong Yu

Specifically, zero/few-shot and fine-tuning results show that the model pre-trained on our corpus demonstrates a strong aspect or query-focused generation ability compared with the backbone model.

Paper
Code

TriNet: stabilizing self-supervised learning from complete or slow collapse on ASR

no code implementations • 12 Dec 2022 • Lixin Cao, Jun Wang, Ben Yang, Dan Su, Dong Yu

Self-supervised learning (SSL) models confront challenges of abrupt informational collapse or slow dimensional collapse.

Self-Supervised Learning

Paper
Add Code

ZeroKBC: A Comprehensive Benchmark for Zero-Shot Knowledge Base Completion

1 code implementation • 6 Dec 2022 • Pei Chen, Wenlin Yao, Hongming Zhang, Xiaoman Pan, Dian Yu, Dong Yu, Jianshu Chen

However, there has been limited research on the zero-shot KBC settings, where we need to deal with unseen entities and relations that emerge in a constantly growing knowledge base.

Knowledge Base Completion Knowledge Graphs

Paper
Code

Deep Neural Mel-Subband Beamformer for In-car Speech Separation

no code implementations • 22 Nov 2022 • Vinay Kothapally, Yong Xu, Meng Yu, Shi-Xiong Zhang, Dong Yu

While current deep learning (DL)-based beamforming techniques have been proved effective in speech separation, they are often designed to process narrow-band (NB) frequencies independently which results in higher computational costs and inference times, making them unsuitable for real-world use.

Contrastive Learning Sentence +1

Paper
Add Code

Efficient Zero-shot Event Extraction with Context-Definition Alignment

1 code implementation • 9 Nov 2022 • Hongming Zhang, Wenlin Yao, Dong Yu

We argue that using the static embedding of the event type name might not be enough because a single word could be ambiguous, and we need a sentence to define the type semantics accurately.

Paper
Code

Discover, Explanation, Improvement: An Automatic Slice Detection Framework for Natural Language Processing

no code implementations • 8 Nov 2022 • Wenyue Hua, Lifeng Jin, Linfeng Song, Haitao Mi, Yongfeng Zhang, Dong Yu

Pretrained natural language processing (NLP) models have achieved high overall performance, but they still make systematic errors.

Paper
Add Code

Toward Unifying Text Segmentation and Long Document Summarization

1 code implementation • 28 Oct 2022 • Sangwoo Cho, Kaiqiang Song, Xiaoyang Wang, Fei Liu, Dong Yu

The problem is only exacerbated by a lack of segmentation in transcripts of audio/video recordings.

Ranked #5 on Text Summarization on Pubmed

Document Summarization Extractive Summarization +3

Paper
Code

Knowledge-in-Context: Towards Knowledgeable Semi-Parametric Language Models

no code implementations • 28 Oct 2022 • Xiaoman Pan, Wenlin Yao, Hongming Zhang, Dian Yu, Dong Yu, Jianshu Chen

In this paper, we develop a novel semi-parametric language model architecture, Knowledge-in-Context (KiC), which empowers a parametric text-to-text language model with a knowledge-rich external memory.

Ranked #5 on Question Answering on StoryCloze

Common Sense Reasoning Coreference Resolution +7

Paper
Add Code

Salience Allocation as Guidance for Abstractive Summarization

1 code implementation • 22 Oct 2022 • Fei Wang, Kaiqiang Song, Hongming Zhang, Lifeng Jin, Sangwoo Cho, Wenlin Yao, Xiaoyang Wang, Muhao Chen, Dong Yu

Recent literature adds extractive summaries as guidance for abstractive summarization models to provide hints of salient content and achieves better performance.

Ranked #7 on Abstractive Text Summarization on CNN / Daily Mail

Abstractive Text Summarization

Paper
Code

MetaLogic: Logical Reasoning Explanations with Fine-Grained Structure

3 code implementations • 22 Oct 2022 • Yinya Huang, Hongming Zhang, Ruixin Hong, Xiaodan Liang, ChangShui Zhang, Dong Yu

To this end, we propose a comprehensive logical reasoning explanation form.

Logical Reasoning

Paper
Code

Learning a Grammar Inducer from Massive Uncurated Instructional Videos

1 code implementation • 22 Oct 2022 • Songyang Zhang, Linfeng Song, Lifeng Jin, Haitao Mi, Kun Xu, Dong Yu, Jiebo Luo

While previous work focuses on building systems for inducing grammars on text that are well-aligned with video content, we investigate the scenario, in which text and video are only in loose correspondence.

Language Acquisition Video Alignment

Paper
Code

Z-LaVI: Zero-Shot Language Solver Fueled by Visual Imagination

1 code implementation • 21 Oct 2022 • Yue Yang, Wenlin Yao, Hongming Zhang, Xiaoyang Wang, Dong Yu, Jianshu Chen

Large-scale pretrained language models have made significant advances in solving downstream language understanding tasks.

Ranked #2 on Visual Commonsense Tests on ViComTe-color

Language Modelling Retrieval +2

Paper
Code

Bayes risk CTC: Controllable CTC alignment in Sequence-to-Sequence tasks

no code implementations • 14 Oct 2022 • Jinchuan Tian, Brian Yan, Jianwei Yu, Chao Weng, Dong Yu, Shinji Watanabe

Besides predicting the target sequence, a side product of CTC is to predict the alignment, which is the most probable input-long sequence that specifies a hard aligning relationship between the input and target units.

Paper
Add Code

Cross-Lingual Speaker Identification Using Distant Supervision

1 code implementation • 11 Oct 2022 • Ben Zhou, Dian Yu, Dong Yu, Dan Roth

Speaker identification, determining which character said each utterance in literary text, benefits many downstream tasks.

Language Modelling Speaker Identification

Paper
Code

Zemi: Learning Zero-Shot Semi-Parametric Language Models from Multiple Tasks

1 code implementation • 1 Oct 2022 • Zhenhailong Wang, Xiaoman Pan, Dian Yu, Dong Yu, Jianshu Chen, Heng Ji

Notably, our proposed $\text{Zemi}_\text{LARGE}$ outperforms T0-3B by 16% on all seven evaluation tasks while being 3. 9x smaller in model size.

Language Modelling Retrieval +2

Paper
Code

C3-DINO: Joint Contrastive and Non-contrastive Self-Supervised Learning for Speaker Verification

no code implementations • 15 Aug 2022 • Chunlei Zhang, Dong Yu

On the basis of the pretrained CSSL model, we further propose to employ a negative sample free SSL objective (i. e., DINO) to fine-tune the speaker embedding network.

Contrastive Learning Self-Supervised Learning +1

Paper
Add Code

Diffsound: Discrete Diffusion Model for Text-to-sound Generation

1 code implementation • 20 Jul 2022 • Dongchao Yang, Jianwei Yu, Helin Wang, Wen Wang, Chao Weng, Yuexian Zou, Dong Yu

In this study, we investigate generating sound conditioned on a text prompt and propose a novel text-to-sound generation framework that consists of a text encoder, a Vector Quantized Variational Autoencoder (VQ-VAE), a decoder, and a vocoder.

Ranked #13 on Audio Generation on AudioCaps

Audio Generation

330

Paper
Code

Hierarchical Context Tagging for Utterance Rewriting

1 code implementation • 22 Jun 2022 • Lisa Jin, Linfeng Song, Lifeng Jin, Dong Yu, Daniel Gildea

HCT (i) tags the source string with token-level edit actions and slotted rules and (ii) fills in the resulting rule slots with spans from the dialogue context.

TAG

Paper
Code

Automatic Prosody Annotation with Pre-Trained Text-Speech Model

1 code implementation • 16 Jun 2022 • Ziqian Dai, Jianwei Yu, Yan Wang, Nuo Chen, Yanyao Bian, Guangzhi Li, Deng Cai, Dong Yu

Prosodic boundary plays an important role in text-to-speech synthesis (TTS) in terms of naturalness and readability.

Speech Synthesis Text-To-Speech Synthesis

108

Paper
Code

UTTS: Unsupervised TTS with Conditional Disentangled Sequential Variational Auto-encoder

no code implementations • 6 Jun 2022 • Jiachen Lian, Chunlei Zhang, Gopala Krishna Anumanchipalli, Dong Yu

We leverage recent advancements in self-supervised speech representation learning as well as speech synthesis front-end techniques for system development.

Representation Learning Speech Synthesis +1

Paper
Add Code

LAE: Language-Aware Encoder for Monolingual and Multilingual ASR

1 code implementation • 5 Jun 2022 • Jinchuan Tian, Jianwei Yu, Chunlei Zhang, Chao Weng, Yuexian Zou, Dong Yu

Experiments conducted on Mandarin-English code-switched speech suggest that the proposed LAE is capable of discriminating different languages in frame-level and shows superior performance on both monolingual and multilingual ASR tasks.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

160

Paper
Code

NeuralEcho: A Self-Attentive Recurrent Neural Network For Unified Acoustic Echo Suppression And Speech Enhancement

no code implementations • 20 May 2022 • Meng Yu, Yong Xu, Chunlei Zhang, Shi-Xiong Zhang, Dong Yu

Acoustic echo cancellation (AEC) plays an important role in the full-duplex speech communication as well as the front-end speech enhancement for recognition in the conditions when the loudspeaker plays back.

Acoustic echo cancellation Speech Enhancement +2

Paper
Add Code

Towards Improved Zero-shot Voice Conversion with Conditional DSVAE

1 code implementation • 11 May 2022 • Jiachen Lian, Chunlei Zhang, Gopala Krishna Anumanchipalli, Dong Yu

In our experiment on the VCTK dataset, we demonstrate that content embeddings derived from the conditional DSVAE overcome the randomness and achieve a much better phoneme classification accuracy, a stabilized vocalization and a better zero-shot VC performance compared with the competitive DSVAE baseline.

Voice Conversion

Paper
Code

Distant finetuning with discourse relations for stance classification

no code implementations • 27 Apr 2022 • Lifeng Jin, Kun Xu, Linfeng Song, Dong Yu

Approaches for the stance classification task, an important task for understanding argumentation in debates and detecting fake news, have been relying on models which deal with individual debate topics.

Classification Stance Classification

Paper
Add Code

FastDiff: A Fast Conditional Diffusion Model for High-Quality Speech Synthesis

2 code implementations • 21 Apr 2022 • Rongjie Huang, Max W. Y. Lam, Jun Wang, Dan Su, Dong Yu, Yi Ren, Zhou Zhao

Also, FastDiff enables a sampling speed of 58x faster than real-time on a V100 GPU, making diffusion models practically applicable to speech synthesis deployment for the first time.

Ranked #7 on Text-To-Speech Synthesis on LJSpeech (using extra training data)

Denoising Speech Synthesis +2

423

Paper
Code

3M: Multi-loss, Multi-path and Multi-level Neural Networks for speech recognition

1 code implementation • 7 Apr 2022 • Zhao You, Shulin Feng, Dan Su, Dong Yu

Recently, Conformer based CTC/AED model has become a mainstream architecture for ASR.

Ranked #2 on Speech Recognition on WenetSpeech

Data Augmentation Disentanglement +2

115

Paper
Code

Robust Disentangled Variational Speech Representation Learning for Zero-shot Voice Conversion

1 code implementation • 30 Mar 2022 • Jiachen Lian, Chunlei Zhang, Dong Yu

A zero-shot voice conversion is performed by feeding an arbitrary speaker embedding and content embeddings to the VAE decoder.

Paper
Code

Integrating Lattice-Free MMI into End-to-End Speech Recognition

1 code implementation • 29 Mar 2022 • Jinchuan Tian, Jianwei Yu, Chao Weng, Yuexian Zou, Dong Yu

However, the effectiveness and efficiency of the MBR-based methods are compromised: the MBR criterion is only used in system training, which creates a mismatch between training and decoding; the on-the-fly decoding process in MBR-based methods results in the need for pre-trained models and slow training speeds.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

160

Paper
Code

BDDM: Bilateral Denoising Diffusion Models for Fast and High-Quality Speech Synthesis

1 code implementation • ICLR 2022 • Max W. Y. Lam, Jun Wang, Dan Su, Dong Yu

We propose a new bilateral denoising diffusion model (BDDM) that parameterizes both the forward and reverse processes with a schedule network and a score network, which can train with a novel bilateral modeling objective.

Ranked #1 on Speech Synthesis on LJSpeech

Image Generation Speech Synthesis

213

Paper
Code

Towards Abstractive Grounded Summarization of Podcast Transcripts

1 code implementation • ACL 2022 • Kaiqiang Song, Chen Li, Xiaoyang Wang, Dong Yu, Fei Liu

Summarization of podcast transcripts is of practical benefit to both content providers and consumers.

Abstractive Text Summarization

Paper
Code

Learning-by-Narrating: Narrative Pre-Training for Zero-Shot Dialogue Comprehension

1 code implementation • ACL 2022 • Chao Zhao, Wenlin Yao, Dian Yu, Kaiqiang Song, Dong Yu, Jianshu Chen

Comprehending a dialogue requires a model to capture diverse kinds of key information in the utterances, which are either scattered around or implicitly implied in different turns of conversations.

Paper
Code

C-MORE: Pretraining to Answer Open-Domain Questions by Consulting Millions of References

1 code implementation • ACL 2022 • Xiang Yue, Xiaoman Pan, Wenlin Yao, Dian Yu, Dong Yu, Jianshu Chen

And with our pretrained reader, the entire system improves by up to 4% in exact match.

Open-Domain Question Answering

Paper
Code

Full RGB Just Noticeable Difference (JND) Modelling

no code implementations • 1 Mar 2022 • Jian Jin, Dong Yu, Weisi Lin, Lili Meng, Hao Wang, Huaxiang Zhang

Besides, the JND of the red and blue channels are larger than that of the green one according to the experimental results of the proposed model, which demonstrates that more changes can be tolerated in the red and blue channels, in line with the well-known fact that the human visual system is more sensitive to the green channel in comparison with the red and blue ones.

Image Quality Assessment

Paper
Add Code

VCVTS: Multi-speaker Video-to-Speech synthesis via cross-modal knowledge transfer from voice conversion

no code implementations • 18 Feb 2022 • Disong Wang, Shan Yang, Dan Su, Xunying Liu, Dong Yu, Helen Meng

Though significant progress has been made for speaker-dependent Video-to-Speech (VTS) synthesis, little attention is devoted to multi-speaker VTS that can map silent video to speech, while allowing flexible control of speaker identity, all in a single system.

Quantization Speech Synthesis +2

Paper
Add Code

FlowEval: A Consensus-Based Dialogue Evaluation Framework Using Segment Act Flows

no code implementations • 14 Feb 2022 • Jianqiao Zhao, Yanyang Li, Wanyu Du, Yangfeng Ji, Dong Yu, Michael R. Lyu, LiWei Wang

Hence, we propose segment act, an extension of dialog act from utterance level to segment level, and crowdsource a large-scale dataset for it.

Dialogue Evaluation

Paper
Add Code

DiffGAN-TTS: High-Fidelity and Efficient Text-to-Speech with Denoising Diffusion GANs

2 code implementations • 28 Jan 2022 • Songxiang Liu, Dan Su, Dong Yu

Denoising diffusion probabilistic models (DDPMs) are expressive generative models that have been used to solve a variety of speech synthesis problems.

Denoising Speech Synthesis

293

Paper
Code

Improving Mandarin End-to-End Speech Recognition with Word N-gram Language Model

1 code implementation • 6 Jan 2022 • Jinchuan Tian, Jianwei Yu, Chao Weng, Yuexian Zou, Dong Yu

Then, the LM score of the hypothesis is obtained by intersecting the generated lattice with an external word N-gram LM.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

160

Paper
Code

Consistent Training and Decoding For End-to-end Speech Recognition Using Lattice-free MMI

1 code implementation • 5 Dec 2021 • Jinchuan Tian, Jianwei Yu, Chao Weng, Shi-Xiong Zhang, Dan Su, Dong Yu, Yuexian Zou

Recently, End-to-End (E2E) frameworks have achieved remarkable results on various Automatic Speech Recognition (ASR) tasks.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

160

Paper
Code

Joint Modeling of Code-Switched and Monolingual ASR via Conditional Factorization

no code implementations • 29 Nov 2021 • Brian Yan, Chunlei Zhang, Meng Yu, Shi-Xiong Zhang, Siddharth Dalmia, Dan Berrebbi, Chao Weng, Shinji Watanabe, Dong Yu

Conversational bilingual speech encompasses three types of utterances: two purely monolingual types and one intra-sententially code-switched type.

Computational Efficiency speech-recognition +1

Paper
Add Code

SpeechMoE2: Mixture-of-Experts Model with Improved Routing

no code implementations • 23 Nov 2021 • Zhao You, Shulin Feng, Dan Su, Dong Yu

Mixture-of-experts based acoustic models with dynamic routing mechanisms have proved promising results for speech recognition.

Paper
Add Code

Multi-Channel Multi-Speaker ASR Using 3D Spatial Feature

no code implementations • 22 Nov 2021 • Yiwen Shao, Shi-Xiong Zhang, Dong Yu

Experimental results show that 1) the proposed ALL-In-One model achieved a comparable error rate to the pipelined system while reducing the inference time by half; 2) the proposed 3D spatial feature significantly outperformed (31\% CERR) all previous works of using the 1D directional information in both paradigms.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Meta-Voice: Fast few-shot style transfer for expressive voice cloning using meta learning

no code implementations • 14 Nov 2021 • Songxiang Liu, Dan Su, Dong Yu

The task of few-shot style transfer for voice cloning in text-to-speech (TTS) synthesis aims at transferring speaking styles of an arbitrary source speaker to a target speaker's voice using very limited amount of neutral data.

Disentanglement Meta-Learning +2

Paper
Add Code

Joint Neural AEC and Beamforming with Double-Talk Detection

no code implementations • 9 Nov 2021 • Vinay Kothapally, Yong Xu, Meng Yu, Shi-Xiong Zhang, Dong Yu

We train the proposed model in an end-to-end approach to eliminate background noise and echoes from far-end audio devices, which include nonlinear distortions.

Acoustic echo cancellation Denoising +2

Paper
Add Code

Connect-the-Dots: Bridging Semantics between Words and Definitions via Aligning Word Sense Inventories

2 code implementations • EMNLP 2021 • Wenlin Yao, Xiaoman Pan, Lifeng Jin, Jianshu Chen, Dian Yu, Dong Yu

We then train a model to identify semantic equivalence between a target word in context and one of its glosses using these aligned inventories, which exhibits strong transfer capability to many WSD tasks.

Sentence Word Sense Disambiguation

Paper
Code

FAST-RIR: Fast neural diffuse room impulse response generator

2 code implementations • 7 Oct 2021 • Anton Ratnarajah, Shi-Xiong Zhang, Meng Yu, Zhenyu Tang, Dinesh Manocha, Dong Yu

We present a neural-network-based fast diffuse room impulse response generator (FAST-RIR) for generating room impulse responses (RIRs) for a given acoustic environment.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

137

Paper
Code

SynCLR: A Synthesis Framework for Contrastive Learning of out-of-domain Speech Representations

no code implementations • 29 Sep 2021 • Rongjie Huang, Max W. Y. Lam, Jun Wang, Dan Su, Dong Yu, Zhou Zhao, Yi Ren

Learning generalizable speech representations for unseen samples in different domains has been a challenge with ever increasing importance to date.

Contrastive Learning Data Augmentation +4

Paper
Add Code

Exophoric Pronoun Resolution in Dialogues with Topic Regularization

1 code implementation • EMNLP 2021 • Xintong Yu, Hongming Zhang, Yangqiu Song, ChangShui Zhang, Kun Xu, Dong Yu

Resolving pronouns to their referents has long been studied as a fundamental natural language understanding problem.

coreference-resolution Natural Language Understanding

Paper
Code

Referee: Towards reference-free cross-speaker style transfer with low-quality data for expressive speech synthesis

no code implementations • 8 Sep 2021 • Songxiang Liu, Shan Yang, Dan Su, Dong Yu

The S2W model is trained with high-quality target data, which is adopted to effectively aggregate style descriptors and generate high-fidelity speech in the target speaker's voice.

Expressive Speech Synthesis Sentence +1

Paper
Add Code

Bilateral Denoising Diffusion Models

no code implementations • 26 Aug 2021 • Max W. Y. Lam, Jun Wang, Rongjie Huang, Dan Su, Dong Yu

In this paper, we propose novel bilateral denoising diffusion models (BDDMs), which take significantly fewer steps to generate high-quality samples.

Denoising Scheduling

Paper
Add Code

TexSmart: A System for Enhanced Natural Language Understanding

no code implementations • ACL 2021 • Lemao Liu, Haisong Zhang, Haiyun Jiang, Yangming Li, Enbo Zhao, Kun Xu, Linfeng Song, Suncong Zheng, Botong Zhou, Dick Zhu, Xiao Feng, Tao Chen, Tao Yang, Dong Yu, Feng Zhang, Zhanhui Kang, Shuming Shi

This paper introduces TexSmart, a text understanding system that supports fine-grained named entity recognition (NER) and enhanced semantic analysis functionalities.

named-entity-recognition Named Entity Recognition +2

Paper
Add Code

Importance-based Neuron Allocation for Multilingual Neural Machine Translation

1 code implementation • ACL 2021 • Wanying Xie, Yang Feng, Shuhao Gu, Dong Yu

Multilingual neural machine translation with a single model has drawn much attention due to its capability to deal with multiple languages.

General Knowledge Machine Translation +1

Paper
Code

Do Boat and Ocean Suggest Beach? Dialogue Summarization with External Knowledge

1 code implementation • AKBC 2021 • Tianqing Fang, Haojie Pan, Hongming Zhang, Yangqiu Song, Kun Xu, Dong Yu

To evaluate the inference capability of different methods, we also propose a new evaluation metric based on CODC.

Text Generation

Paper
Code

Raw Waveform Encoder with Multi-Scale Globally Attentive Locally Recurrent Networks for End-to-End Speech Recognition

no code implementations • 8 Jun 2021 • Max W. Y. Lam, Jun Wang, Chao Weng, Dan Su, Dong Yu

End-to-end speech recognition generally uses hand-engineered acoustic features as input and excludes the feature extraction module from its joint optimization.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Latency-Controlled Neural Architecture Search for Streaming Speech Recognition

no code implementations • 8 May 2021 • Liqiang He, Shulin Feng, Dan Su, Dong Yu

Extensive experiments show that: 1) Based on the proposed neural architecture, the neural networks with a medium latency of 550ms (millisecond) and a low latency of 190ms can be learned in the vanilla and revised operation space respectively.

Paper
Add Code

SpeechMoE: Scaling to Large Acoustic Models with Dynamic Routing Mixture of Experts

1 code implementation • 7 May 2021 • Zhao You, Shulin Feng, Dan Su, Dong Yu

Recently, Mixture of Experts (MoE) based Transformer has shown promising results in many domains.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

115

Paper
Code

MIMO Self-attentive RNN Beamformer for Multi-speaker Speech Separation

no code implementations • 17 Apr 2021 • Xiyun Li, Yong Xu, Meng Yu, Shi-Xiong Zhang, Jiaming Xu, Bo Xu, Dong Yu

The spatial self-attention module is designed to attend on the cross-channel correlation in the covariance matrices.

Paper
Add Code

Conversational Semantic Role Labeling

no code implementations • 11 Apr 2021 • Kun Xu, Han Wu, Linfeng Song, Haisong Zhang, Linqi Song, Dong Yu

Semantic role labeling (SRL) aims to extract the arguments for each predicate in an input sentence.

coreference-resolution Dialogue Understanding +3

Paper
Add Code

Video-aided Unsupervised Grammar Induction

1 code implementation • NAACL 2021 • Songyang Zhang, Linfeng Song, Lifeng Jin, Kun Xu, Dong Yu, Jiebo Luo

We investigate video-aided grammar induction, which learns a constituency parser from both unlabeled text and its corresponding video.

Optical Character Recognition (OCR)

Paper
Code

MetricNet: Towards Improved Modeling For Non-Intrusive Speech Quality Assessment

no code implementations • 2 Apr 2021 • Meng Yu, Chunlei Zhang, Yong Xu, ShiXiong Zhang, Dong Yu

The objective speech quality assessment is usually conducted by comparing received speech signal with its clean reference, while human beings are capable of evaluating the speech quality without any reference, such as in the mean opinion score (MOS) tests.

Paper
Add Code

TeCANet: Temporal-Contextual Attention Network for Environment-Aware Speech Dereverberation

no code implementations • 31 Mar 2021 • Helin Wang, Bo Wu, LianWu Chen, Meng Yu, Jianwei Yu, Yong Xu, Shi-Xiong Zhang, Chao Weng, Dan Su, Dong Yu

In this paper, we exploit the effective way to leverage contextual information to improve the speech dereverberation performance in real-world reverberant environments.

Room Impulse Response (RIR) Speech Dereverberation

Paper
Add Code

Towards Robust Speaker Verification with Target Speaker Enhancement

no code implementations • 16 Mar 2021 • Chunlei Zhang, Meng Yu, Chao Weng, Dong Yu

This paper proposes the target speaker enhancement based speaker verification network (TASE-SVNet), an all neural model that couples target speaker enhancement and speaker embedding extraction for robust speaker verification (SV).

Speaker Verification Speech Enhancement

Paper
Add Code

NaturalConv: A Chinese Dialogue Dataset Towards Multi-turn Topic-driven Conversation

no code implementations • 3 Mar 2021 • Xiaoyang Wang, Chen Li, Jianqiao Zhao, Dong Yu

To facilitate the research on this corpus, we provide results of several benchmark models.

Paper
Add Code

Tune-In: Training Under Negative Environments with Interference for Attention Networks Simulating Cocktail Party Effect

no code implementations • 2 Mar 2021 • Jun Wang, Max W. Y. Lam, Dan Su, Dong Yu

We study the cocktail party problem and propose a novel attention network called Tune-In, abbreviated for training under negative environments with interference.

Speaker Verification Speech Separation

Paper
Add Code

Contrastive Separative Coding for Self-supervised Representation Learning

no code implementations • 1 Mar 2021 • Jun Wang, Max W. Y. Lam, Dan Su, Dong Yu

To extract robust deep representations from long sequential modeling of speech data, we propose a self-supervised learning approach, namely Contrastive Separative Coding (CSC).

Representation Learning Self-Supervised Learning +1

Paper
Add Code

Sandglasset: A Light Multi-Granularity Self-attentive Network For Time-Domain Speech Separation

2 code implementations • 1 Mar 2021 • Max W. Y. Lam, Jun Wang, Dan Su, Dong Yu

One of the leading single-channel speech separation (SS) models is based on a TasNet with a dual-path segmentation technique, where the size of each segment remains unchanged throughout all layers.

Ranked #8 on Speech Separation on WSJ0-3mix

Computational Efficiency Speech Separation

Paper
Code

Deep Learning based Multi-Source Localization with Source Splitting and its Effectiveness in Multi-Talker Speech Recognition

no code implementations • 16 Feb 2021 • Aswin Shanmugam Subramanian, Chao Weng, Shinji Watanabe, Meng Yu, Dong Yu

In addition to using the prediction error as a metric for evaluating our localization model, we also establish its potency as a frontend with automatic speech recognition (ASR) as the downstream task.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Structural Information Preserving for Graph-to-Text Generation

1 code implementation • ACL 2020 • Linfeng Song, Ante Wang, Jinsong Su, Yue Zhang, Kun Xu, Yubin Ge, Dong Yu

The task of graph-to-text generation aims at producing sentences that preserve the meaning of input graphs.

Ranked #10 on Data-to-Text Generation on WebNLG

Data-to-Text Generation

Paper
Code

Self-Teaching Machines to Read and Comprehend with Large-Scale Multi-Subject Question-Answering Data

no code implementations • Findings (EMNLP) 2021 • Dian Yu, Kai Sun, Dong Yu, Claire Cardie

In spite of much recent research in the area, it is still unclear whether subject-area question-answering data is useful for machine reading comprehension (MRC) tasks.

Machine Reading Comprehension Multiple-choice +1

Paper
Add Code

Effective Low-Cost Time-Domain Audio Separation Using Globally Attentive Locally Recurrent Networks

2 code implementations • 13 Jan 2021 • Max W. Y. Lam, Jun Wang, Dan Su, Dong Yu

Recent research on the time-domain audio separation networks (TasNets) has brought great success to speech separation.

Ranked #14 on Speech Separation on WSJ0-2mix

named-entity-recognition Named Entity Recognition +1

Paper
Code

TexSmart: A Text Understanding System for Fine-Grained NER and Enhanced Semantic Analysis

no code implementations • 31 Dec 2020 • Haisong Zhang, Lemao Liu, Haiyun Jiang, Yangming Li, Enbo Zhao, Kun Xu, Linfeng Song, Suncong Zheng, Botong Zhou, Jianchen Zhu, Xiao Feng, Tao Chen, Tao Yang, Dong Yu, Feng Zhang, Zhanhui Kang, Shuming Shi

This technique report introduces TexSmart, a text understanding system that supports fine-grained named entity recognition (NER) and enhanced semantic analysis functionalities.

Paper
Add Code

Robust Dialogue Utterance Rewriting as Sequence Tagging

1 code implementation • 29 Dec 2020 • Jie Hao, Linfeng Song, LiWei Wang, Kun Xu, Zhaopeng Tu, Dong Yu

The task of dialogue rewriting aims to reconstruct the latest dialogue utterance by copying the missing content from the dialogue context.

Dialogue Rewriting Text Generation

Paper
Code

Multi-channel Multi-frame ADL-MVDR for Target Speech Separation

no code implementations • 24 Dec 2020 • Zhuohuang Zhang, Yong Xu, Meng Yu, Shi-Xiong Zhang, LianWu Chen, Donald S. Williamson, Dong Yu

Many purely neural network based speech separation approaches have been proposed to improve objective assessment scores, but they often introduce nonlinear distortions that are harmful to modern automatic speech recognition (ASR) systems.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Self-supervised Text-independent Speaker Verification using Prototypical Momentum Contrastive Learning

1 code implementation • 13 Dec 2020 • Wei Xia, Chunlei Zhang, Chao Weng, Meng Yu, Dong Yu

First, we examine a simple contrastive learning approach (SimCLR) with a momentum contrastive (MoCo) learning framework, where the MoCo speaker embedding system utilizes a queue to maintain a large set of negative examples.

Clustering Contrastive Learning +2

Paper
Code

Phonetic Posteriorgrams based Many-to-Many Singing Voice Conversion via Adversarial Training

1 code implementation • 3 Dec 2020 • Haohan Guo, Heng Lu, Na Hu, Chunlei Zhang, Shan Yang, Lei Xie, Dan Su, Dong Yu

In order to make timbre conversion more stable and controllable, speaker embedding is further decomposed to the weighted sum of a group of trainable vectors representing different timbre clusters.

Audio Generation Disentanglement +1

122

Paper
Code

BLCU-NLP at SemEval-2020 Task 5: Data Augmentation for Efficient Counterfactual Detecting

no code implementations • SEMEVAL 2020 • Chang Liu, Dong Yu

We demonstrate the effectiveness of our approaches, which achieves 0. 95 of subtask 1 in F1 while using only a subset of giving training set to fine-tune the BERT model, and our official submission achieves F1 0. 802, which ranks us 16th in the competition.

Common Sense Reasoning counterfactual +1

Paper
Add Code

SHIKEBLCU at SemEval-2020 Task 2: An External Knowledge-enhanced Matrix for Multilingual and Cross-Lingual Lexical Entailment

no code implementations • SEMEVAL 2020 • Shike Wang, Yuchen Fan, Xiangying Luo, Dong Yu

In our system, we expand the number of external constraints in multiple languages to obtain more specialised multilingual word embeddings.

Lexical Entailment Machine Translation +4

Paper
Add Code

Improving RNN Transducer With Target Speaker Extraction and Neural Uncertainty Estimation

no code implementations • 26 Nov 2020 • Jiatong Shi, Chunlei Zhang, Chao Weng, Shinji Watanabe, Meng Yu, Dong Yu

Target-speaker speech recognition aims to recognize target-speaker speech from noisy environments with background noise and interfering speakers.

Speech Enhancement Speech Extraction +1 Sound Audio and Speech Processing

Paper
Add Code

Audio-visual Multi-channel Integration and Recognition of Overlapped Speech

no code implementations • 16 Nov 2020 • Jianwei Yu, Shi-Xiong Zhang, Bo Wu, Shansong Liu, Shoukang Hu, Mengzhe Geng, Xunying Liu, Helen Meng, Dong Yu

Automatic speech recognition (ASR) technologies have been significantly advanced in the past few decades.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Automatic Summarization of Open-Domain Podcast Episodes

no code implementations • 9 Nov 2020 • Kaiqiang Song, Chen Li, Xiaoyang Wang, Dong Yu, Fei Liu

Instead, we investigate several less-studied aspects of neural abstractive summarization, including (i) the importance of selecting important segments from transcripts to serve as input to the summarizer; (ii) striking a balance between the amount and quality of training instances; (iii) the appropriate summary length and start/end points.

Abstractive Text Summarization

Paper
Add Code

Directional ASR: A New Paradigm for E2E Multi-Speaker Speech Recognition with Source Localization

no code implementations • 30 Oct 2020 • Aswin Shanmugam Subramanian, Chao Weng, Shinji Watanabe, Meng Yu, Yong Xu, Shi-Xiong Zhang, Dong Yu

The advantages of D-ASR over existing methods are threefold: (1) it provides explicit speaker locations, (2) it improves the explainability factor, and (3) it achieves better ASR performance as the process is more streamlined.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Replay and Synthetic Speech Detection with Res2net Architecture

2 code implementations • 28 Oct 2020 • Xu Li, Na Li, Chao Weng, Xunying Liu, Dan Su, Dong Yu, Helen Meng

This multiple scaling mechanism significantly improves the countermeasure's generalizability to unseen spoofing attacks.

Feature Engineering Synthetic Speech Detection

Paper
Code

Multi-Channel Speaker Verification for Single and Multi-talker Speech

no code implementations • 23 Oct 2020 • Saurabh Kataria, Shi-Xiong Zhang, Dong Yu

We find the improvements from speaker-dependent directional features more consistent in multi-talker conditions than clean.

Action Detection Activity Detection +2

Paper
Add Code

Better Highlighting: Creating Sub-Sentence Summary Highlights

1 code implementation • EMNLP 2020 • Sangwoo Cho, Kaiqiang Song, Chen Li, Dong Yu, Hassan Foroosh, Fei Liu

Amongst the best means to summarize is highlighting.

Point Processes Sentence

Paper
Code

High-Fidelity 3D Digital Human Head Creation from RGB-D Selfies

2 code implementations • 12 Oct 2020 • Linchao Bao, Xiangkai Lin, Yajing Chen, Haoxian Zhang, Sheng Wang, Xuefei Zhe, Di Kang, HaoZhi Huang, Xinwei Jiang, Jue Wang, Dong Yu, Zhengyou Zhang

We present a fully automatic system that can produce high-fidelity, photo-realistic 3D digital human heads with a consumer RGB-D selfie camera.

Vocal Bursts Intensity Prediction

731

Paper
Code

Token-level Adaptive Training for Neural Machine Translation

1 code implementation • EMNLP 2020 • Shuhao Gu, Jinchao Zhang, Fandong Meng, Yang Feng, Wanying Xie, Jie zhou, Dong Yu

The vanilla NMT model usually adopts trivial equal-weighted objectives for target tokens with different frequencies and tends to generate more high-frequency tokens and less low-frequency tokens compared with the golden token distribution.

Machine Translation NMT +1

Paper
Code

Semantic Role Labeling Guided Multi-turn Dialogue ReWriter

no code implementations • EMNLP 2020 • Kun Xu, Haochen Tan, Linfeng Song, Han Wu, Haisong Zhang, Linqi Song, Dong Yu

For multi-turn dialogue rewriting, the capacity of effectively modeling the linguistic knowledge in dialog context and getting rid of the noises is essential to improve its performance.

Dialogue Rewriting Semantic Role Labeling

Paper
Add Code

Improving Machine Reading Comprehension with Contextualized Commonsense Knowledge

no code implementations • ACL 2022 • Kai Sun, Dian Yu, Jianshu Chen, Dong Yu, Claire Cardie

In this paper, we aim to extract commonsense knowledge to improve machine reading comprehension.

Machine Reading Comprehension

Paper
Add Code

Learned Transferable Architectures Can Surpass Hand-Designed Architectures for Large Scale Speech Recognition

no code implementations • 25 Aug 2020 • Liqiang He, Dan Su, Dong Yu

Extensive experiments show that: (i) the architecture searched on the small proxy dataset can be transferred to the large dataset for the speech recognition tasks.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

An Overview of Deep-Learning-Based Audio-Visual Speech Enhancement and Separation

1 code implementation • 21 Aug 2020 • Daniel Michelsanti, Zheng-Hua Tan, Shi-Xiong Zhang, Yong Xu, Meng Yu, Dong Yu, Jesper Jensen

Speech enhancement and speech separation are two related tasks, whose purpose is to extract either one or more target speech signals, respectively, from a mixture of sounds generated by several sources.

Speech Enhancement Speech Separation

195

Paper
Code

ADL-MVDR: All deep learning MVDR beamformer for target speech separation

1 code implementation • 16 Aug 2020 • Zhuohuang Zhang, Yong Xu, Meng Yu, Shi-Xiong Zhang, LianWu Chen, Dong Yu

Speech separation algorithms are often used to separate the target speech from other interfering sources.

Image Captioning Sentence

Paper
Code

Peking Opera Synthesis via Duration Informed Attention Network

no code implementations • 7 Aug 2020 • Yusong Wu, Shengchen Li, Chengzhu Yu, Heng Lu, Chao Weng, Liqiang Zhang, Dong Yu

In this work, we propose to deal with this issue and synthesize expressive Peking Opera singing from the music score based on the Duration Informed Attention Network (DurIAN) framework.

Singing Voice Synthesis

Paper
Add Code

Comprehensive Image Captioning via Scene Graph Decomposition

1 code implementation • ECCV 2020 • Yiwu Zhong, Li-Wei Wang, Jianshu Chen, Dong Yu, Yin Li

We address the challenging problem of image captioning by revisiting the representation of image scene graph.

Paper
Code

Improving Weakly Supervised Visual Grounding by Contrastive Knowledge Distillation

1 code implementation • CVPR 2021 • Liwei Wang, Jing Huang, Yin Li, Kun Xu, Zhengyuan Yang, Dong Yu

Our core innovation is the learning of a region-phrase score function, based on which an image-sentence score function is further constructed.

Contrastive Learning Knowledge Distillation +6

Paper
Code

ZPR2: Joint Zero Pronoun Recovery and Resolution using Multi-Task Learning and BERT

no code implementations • ACL 2020 • Linfeng Song, Kun Xu, Yue Zhang, Jianshu Chen, Dong Yu

Zero pronoun recovery and resolution aim at recovering the dropped pronoun and pointing out its anaphoric mentions, respectively.

Multi-Task Learning

Paper
Add Code

Speaker Independent and Multilingual/Mixlingual Speech-Driven Talking Head Generation Using Phonetic Posteriorgrams

no code implementations • 20 Jun 2020 • Huirong Huang, Zhiyong Wu, Shiyin Kang, Dongyang Dai, Jia Jia, Tianxiao Fu, Deyi Tuo, Guangzhi Lei, Peng Liu, Dan Su, Dong Yu, Helen Meng

Recent approaches mainly have following limitations: 1) most speaker-independent methods need handcrafted features that are time-consuming to design or unreliable; 2) there is no convincing method to support multilingual or mixlingual speech as input.

Talking Head Generation

Paper
Add Code

Investigating Robustness of Adversarial Samples Detection for Automatic Speaker Verification

no code implementations • 11 Jun 2020 • Xu Li, Na Li, Jinghua Zhong, Xixin Wu, Xunying Liu, Dan Su, Dong Yu, Helen Meng

Orthogonal to prior approaches, this work proposes to defend ASV systems against adversarial attacks with a separate detection network, rather than augmenting adversarial data into ASV training.

Binary Classification Data Augmentation +1

Paper
Add Code

Audio-visual Multi-channel Recognition of Overlapped Speech

no code implementations • 18 May 2020 • Jianwei Yu, Bo Wu, Rongzhi Gu, Shi-Xiong Zhang, LianWu Chen, Yong Xu. Meng Yu, Dan Su, Dong Yu, Xunying Liu, Helen Meng

Automatic speech recognition (ASR) of overlapped speech remains a highly challenging task to date.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Paper
Add Code

Recurrent Chunking Mechanisms for Long-Text Machine Reading Comprehension

1 code implementation • ACL 2020 • Hongyu Gong, Yelong Shen, Dian Yu, Jianshu Chen, Dong Yu

In this paper, we study machine reading comprehension (MRC) on long texts, where a model takes as inputs a lengthy document and a question and then extracts a text span from the document as an answer.

Chunking Machine Reading Comprehension +1

Paper
Code

MART: Memory-Augmented Recurrent Transformer for Coherent Video Paragraph Captioning

1 code implementation • ACL 2020 • Jie Lei, Li-Wei Wang, Yelong Shen, Dong Yu, Tamara L. Berg, Mohit Bansal

Generating multi-sentence descriptions for videos is one of the most challenging captioning tasks due to its high requirements for not only visual relevance but also discourse-based coherence across the sentences in the paragraph.

Ranked #5 on Video Captioning on ActivityNet Captions

Data-to-Text Generation Table-to-Text Generation

168

Paper
Code

Neural Spatio-Temporal Beamformer for Target Speech Separation

1 code implementation • 8 May 2020 • Yong Xu, Meng Yu, Shi-Xiong Zhang, Lian-Wu Chen, Chao Weng, Jianming Liu, Dong Yu

Purely neural network (NN) based speech separation and enhancement methods, although can achieve good objective scores, inevitably cause nonlinear speech distortions that are harmful for the automatic speech recognition (ASR).

Audio and Speech Processing Sound

Paper
Code

Towards Faithful Neural Table-to-Text Generation with Content-Matching Constraints

no code implementations • ACL 2020 • Zhenyi Wang, Xiaoyang Wang, Bang An, Dong Yu, Changyou Chen

Text generation from a knowledge base aims to translate knowledge triples to natural language descriptions.

Ranked #1 on Data-to-Text Generation on Wikipedia Person and Animal Dataset

Paper
Add Code

Dialogue-Based Relation Extraction

3 code implementations • ACL 2020 • Dian Yu, Kai Sun, Claire Cardie, Dong Yu

We present the first human-annotated dialogue-based relation extraction (RE) dataset DialogRE, aiming to support the prediction of relation(s) between two arguments that appear in a dialogue.

Ranked #6 on Dialog Relation Extraction on DialogRE (F1c (v1) metric)

Dialog Relation Extraction Relation +1

132

Paper
Code

Multi-modal Multi-channel Target Speech Separation

no code implementations • 16 Mar 2020 • Rongzhi Gu, Shi-Xiong Zhang, Yong Xu, Lian-Wu Chen, Yuexian Zou, Dong Yu

Target speech separation refers to extracting a target speaker's voice from an overlapped audio of simultaneous talkers.

Paper
Add Code

Enhancing End-to-End Multi-channel Speech Separation via Spatial Feature Learning

no code implementations • 9 Mar 2020 • Rongzhi Gu, Shi-Xiong Zhang, Lian-Wu Chen, Yong Xu, Meng Yu, Dan Su, Yuexian Zou, Dong Yu

Hand-crafted spatial features (e. g., inter-channel phase difference, IPD) play a fundamental role in recent deep learning based multi-channel speech separation (MCSS) methods.

graph construction Knowledge Graphs +1

Paper
Add Code

On the Role of Conceptualization in Commonsense Knowledge Graph Construction

1 code implementation • 6 Mar 2020 • Mutian He, Yangqiu Song, Kun Xu, Dong Yu

Commonsense knowledge graphs (CKGs) like Atomic and ASER are substantially different from conventional KGs as they consist of much larger number of nodes formed by loosely-structured text, which, though, enables them to handle highly diverse queries in natural language related to commonsense, leads to unique challenges for automatic KG construction methods.

Paper
Code

Coordinated Reasoning for Cross-Lingual Knowledge Graph Alignment

no code implementations • 23 Jan 2020 • Kun Xu, Linfeng Song, Yansong Feng, Yan Song, Dong Yu

Existing entity alignment methods mainly vary on the choices of encoding the knowledge graph, but they typically use the same decoding method, which independently chooses the local optimal match for each source entity.

Entity Alignment

Paper
Add Code

Multiplex Word Embeddings for Selectional Preference Acquisition

1 code implementation • IJCNLP 2019 • Hongming Zhang, Jiaxin Bai, Yan Song, Kun Xu, Changlong Yu, Yangqiu Song, Wilfred Ng, Dong Yu

Therefore, in this paper, we propose a multiplex word embedding model, which can be easily extended according to various relations among words.

Word Embeddings Word Similarity

Paper
Code

Audio-visual Recognition of Overlapped speech for the LRS2 dataset

no code implementations • 6 Jan 2020 • Jianwei Yu, Shi-Xiong Zhang, Jian Wu, Shahram Ghorbani, Bo Wu, Shiyin Kang, Shansong Liu, Xunying Liu, Helen Meng, Dong Yu

Experiments on overlapped speech simulated from the LRS2 dataset suggest the proposed AVSR system outperformed the audio only baseline LF-MMI DNN system by up to 29. 98\% absolute in word error rate (WER) reduction, and produced recognition performance comparable to a more complex pipelined system.

Ranked #4 on Audio-Visual Speech Recognition on LRS2

Audio-Visual Speech Recognition Automatic Speech Recognition (ASR) +4

Paper
Add Code

Synthesising Expressiveness in Peking Opera via Duration Informed Attention Network

no code implementations • 27 Dec 2019 • Yusong Wu, Shengchen Li, Chengzhu Yu, Heng Lu, Chao Weng, Liqiang Zhang, Dong Yu

This paper presents a method that generates expressive singing voice of Peking opera.

Paper
Add Code

Learning Singing From Speech

no code implementations • 20 Dec 2019 • Liqiang Zhang, Chengzhu Yu, Heng Lu, Chao Weng, Yusong Wu, Xiang Xie, Zijin Li, Dong Yu

The proposed algorithm first integrate speech and singing synthesis into a unified framework, and learns universal speaker embeddings that are shareable between speech and singing synthesis tasks.

Speech Synthesis Voice Conversion

Paper
Add Code

A Unified Framework for Speech Separation

no code implementations • 17 Dec 2019 • Fahimeh Bahmaninezhad, Shi-Xiong Zhang, Yong Xu, Meng Yu, John H. L. Hansen, Dong Yu

The initial solutions introduced for deep learning based speech separation analyzed the speech signals into time-frequency domain with STFT; and then encoded mixed signals were fed into a deep neural network based separator.

Music Generation Translation +1

Paper
Add Code

PitchNet: Unsupervised Singing Voice Conversion with Pitch Adversarial Network

no code implementations • 4 Dec 2019 • Chengqi Deng, Chengzhu Yu, Heng Lu, Chao Weng, Dong Yu

However, the converted singing voice can be easily out of key, showing that the existing approach cannot model the pitch information precisely.

Paper
Add Code

Modeling Fluency and Faithfulness for Diverse Neural Machine Translation

1 code implementation • 30 Nov 2019 • Yang Feng, Wanying Xie, Shuhao Gu, Chenze Shao, Wen Zhang, Zhengxin Yang, Dong Yu

Neural machine translation models usually adopt the teacher forcing strategy for training which requires the predicted sequence matches ground truth word by word and forces the probability of each prediction to approach a 0-1 distribution.

Machine Translation Translation

Paper
Code

Minimum Bayes Risk Training of RNN-Transducer for End-to-End Speech Recognition

no code implementations • 28 Nov 2019 • Chao Weng, Chengzhu Yu, Jia Cui, Chunlei Zhang, Dong Yu

In this work, we propose minimum Bayes risk (MBR) training of RNN-Transducer (RNN-T) for end-to-end speech recognition.

Language Modelling speech-recognition +1

Paper
Add Code

Joint Parsing and Generation for Abstractive Summarization

2 code implementations • 23 Nov 2019 • Kaiqiang Song, Logan Lebanoff, Qipeng Guo, Xipeng Qiu, xiangyang xue, Chen Li, Dong Yu, Fei Liu

If generating a word can introduce an erroneous relation to the summary, the behavior must be discouraged.

Ranked #27 on Text Summarization on GigaWord

Abstractive Text Summarization Sentence

Paper
Code

Improving Pre-Trained Multilingual Model with Vocabulary Expansion

no code implementations • CONLL 2019 • Hai Wang, Dian Yu, Kai Sun, Jianshu Chen, Dong Yu

However, in multilingual setting, it is extremely resource-consuming to pre-train a deep language model over large-scale corpora for each language.

Language Modelling Machine Reading Comprehension +6

Paper
Add Code

BLCU-NLP at COIN-Shared Task1: Stagewise Fine-tuning BERT for Commonsense Inference in Everyday Narrations

no code implementations • WS 2019 • Chunhua Liu, Dong Yu

This paper describes our system for COIN Shared Task 1: Commonsense Inference in Everyday Narrations.

Machine Reading Comprehension

Paper
Add Code

Mixup-breakdown: a consistency training method for improving generalization of speech separation models

no code implementations • 28 Oct 2019 • Max W. Y. Lam, Jun Wang, Dan Su, Dong Yu

Deep-learning based speech separation models confront poor generalization problem that even the state-of-the-art models could abruptly fail when evaluating them in mismatch conditions.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

DFSMN-SAN with Persistent Memory Model for Automatic Speech Recognition

no code implementations • 28 Oct 2019 • Zhao You, Dan Su, Jie Chen, Chao Weng, Dong Yu

Self-attention networks (SAN) have been introduced into automatic speech recognition (ASR) and achieved state-of-the-art performance owing to its superior ability in capturing long term dependency.

Paper
Add Code

Multi-Document Summarization with Determinantal Point Processes and Contextualized Representations

no code implementations • WS 2019 • Sangwoo Cho, Chen Li, Dong Yu, Hassan Foroosh, Fei Liu

Emerged as one of the best performing techniques for extractive summarization, determinantal point processes select the most probable set of sentences to form a summary according to a probability measure defined by modeling sentence prominence and pairwise repulsion.

Document Summarization Extractive Summarization +3

Paper
Add Code

Speech-XLNet: Unsupervised Acoustic Model Pretraining For Self-Attention Networks

no code implementations • 23 Oct 2019 • Xingchen Song, Guangsen Wang, Zhiyong Wu, Yiheng Huang, Dan Su, Dong Yu, Helen Meng

Our best systems achieve a relative improvement of 11. 9% and 8. 3% on the TIMIT and WSJ tasks respectively.

Representation Learning

Paper
Add Code

Generating Diverse Story Continuations with Controllable Semantics

no code implementations • WS 2019 • Lifu Tu, Xiaoan Ding, Dong Yu, Kevin Gimpel

We propose a simple and effective modeling framework for controlled generation of multiple, diverse outputs.