Search Results for author: Yu Zhang

Found 522 papers, 161 papers with code

A Coarse-to-Fine Labeling Framework for Joint Word Segmentation, POS Tagging, and Constituent Parsing

1 code implementation • CoNLL (EMNLP) 2021 • Yang Hou, Houquan Zhou, Zhenghua Li, Yu Zhang, Min Zhang, Zhefeng Wang, Baoxing Huai, Nicholas Jing Yuan

In the coarse labeling stage, the joint model outputs a bracketed tree, in which each node corresponds to one of four labels (i. e., phrase, subphrase, word, subword).

Part-Of-Speech Tagging POS +2

Paper
Code

All Information is Valuable: Question Matching over Full Information Transmission Network

no code implementations • Findings (NAACL) 2022 • Le Qi, Yu Zhang, Qingyu Yin, Guidong Zheng, Wen Junjie, Jinlong Li, Ting Liu

In this process, there are two kinds of critical information that are commonly employed: the representation information of original questions and the interactive information between pairs of questions.

Paper
Add Code

Joint Goal Segmentation and Goal Success Prediction on Multi-Domain Conversations

no code implementations • COLING 2022 • Meiguo Wang, Benjamin Yao, Bin Guo, Xiaohu Liu, Yu Zhang, Tuan-Hung Pham, Chenlei Guo

To evaluate the performance of a multi-domain goal-oriented Dialogue System (DS), it is important to understand what the users’ goals are for the conversations and whether those goals are successfully achieved.

Dialogue Evaluation Multi-Task Learning +1

Paper
Add Code

Learning to See in the Dark with Events

no code implementations • ECCV 2020 • Song Zhang, Yu Zhang, Zhe Jiang, Dongqing Zou, Jimmy Ren, Bin Zhou

A detail enhancing branch is proposed to reconstruct day light-specific features from the domain-invariant representations in a residual manner, regularized by a ranking loss.

Representation Learning Unsupervised Domain Adaptation

Paper
Add Code

\textrm{DuReader}_{\textrm{vis}}: A Chinese Dataset for Open-domain Document Visual Question Answering

1 code implementation • Findings (ACL) 2022 • Le Qi, Shangwen Lv, Hongyu Li, Jing Liu, Yu Zhang, Qiaoqiao She, Hua Wu, Haifeng Wang, Ting Liu

Open-domain question answering has been used in a wide range of applications, such as web search and enterprise search, which usually takes clean texts extracted from various formats of documents (e. g., web pages, PDFs, or Word documents) as the information source.

document understanding Open-Domain Question Answering +1

1,108

Paper
Code

Learn to Cross-lingual Transfer with Meta Graph Learning Across Heterogeneous Languages

no code implementations • EMNLP 2020 • Zheng Li, Mukul Kumar, William Headden, Bing Yin, Ying WEI, Yu Zhang, Qiang Yang

Recent emergence of multilingual pre-training language model (mPLM) has enabled breakthroughs on various downstream cross-lingual transfer (CLT) tasks.

Cross-Lingual Transfer Graph Learning +1

Paper
Add Code

CogDPM: Diffusion Probabilistic Models via Cognitive Predictive Coding

no code implementations • 3 May 2024 • Kaiyuan Chen, Xingzhuo Guo, Yu Zhang, Jianmin Wang, Mingsheng Long

The precision weighting mechanism posits that the brain allocates more attention to signals with lower precision, contributing to the cognitive ability of human brains.

Paper
Add Code

Mixture of insighTful Experts (MoTE): The Synergy of Thought Chains and Expert Mixtures in Self-Alignment

no code implementations • 1 May 2024 • Zhili Liu, Yunhao Gou, Kai Chen, Lanqing Hong, Jiahui Gao, Fei Mi, Yu Zhang, Zhenguo Li, Xin Jiang, Qun Liu, James T. Kwok

As the capabilities of large language models (LLMs) have expanded dramatically, aligning these models with human values presents a significant challenge, posing potential risks during deployment.

Paper
Add Code

NC-SDF: Enhancing Indoor Scene Reconstruction Using Neural SDFs with View-Dependent Normal Compensation

no code implementations • 1 May 2024 • Ziyi Chen, Xiaolong Wu, Yu Zhang

Specifically, we integrate view-dependent biases in monocular normal priors into the neural implicit representation of the scene.

3D Reconstruction Indoor Scene Reconstruction

Paper
Add Code

Learning Visuotactile Skills with Two Multifingered Hands

1 code implementation • 25 Apr 2024 • Toru Lin, Yu Zhang, Qiyang Li, Haozhi Qi, Brent Yi, Sergey Levine, Jitendra Malik

Two significant challenges exist: the lack of an affordable and accessible teleoperation system suitable for a dual-arm setup with multifingered hands, and the scarcity of multifingered hand hardware equipped with touch sensing.

Paper
Code

NTIRE 2024 Challenge on Short-form UGC Video Quality Assessment: Methods and Results

1 code implementation • 17 Apr 2024 • Xin Li, Kun Yuan, Yajing Pei, Yiting Lu, Ming Sun, Chao Zhou, Zhibo Chen, Radu Timofte, Wei Sun, HaoNing Wu, ZiCheng Zhang, Jun Jia, Zhichao Zhang, Linhan Cao, Qiubo Chen, Xiongkuo Min, Weisi Lin, Guangtao Zhai, Jianhui Sun, Tianyi Wang, Lei LI, Han Kong, Wenxuan Wang, Bing Li, Cheng Luo, Haiqiang Wang, Xiangguang Chen, Wenhui Meng, Xiang Pan, Huiying Shi, Han Zhu, Xiaozhong Xu, Lei Sun, Zhenzhong Chen, Shan Liu, Fangyuan Kong, Haotian Fan, Yifang Xu, Haoran Xu, Mengduo Yang, Jie zhou, Jiaze Li, Shijie Wen, Mai Xu, Da Li, Shunyu Yao, Jiazhi Du, WangMeng Zuo, Zhibo Li, Shuai He, Anlong Ming, Huiyuan Fu, Huadong Ma, Yong Wu, Fie Xue, Guozhi Zhao, Lina Du, Jie Guo, Yu Zhang, huimin zheng, JunHao Chen, Yue Liu, Dulan Zhou, Kele Xu, Qisheng Xu, Tao Sun, Zhixiang Ding, Yuhang Hu

This paper reviews the NTIRE 2024 Challenge on Shortform UGC Video Quality Assessment (S-UGC VQA), where various excellent solutions are submitted and evaluated on the collected dataset KVQ from popular short-form video platform, i. e., Kuaishou/Kwai Platform.

valid Video Quality Assessment +1

Paper
Code

Scenario-Adaptive Fine-Grained Personalization Network: Tailoring User Behavior Representation to the Scenario Context

no code implementations • 15 Apr 2024 • Moyu Zhang, Yongxiang Tang, Jinxin Hu, Yu Zhang

To enhance the model's capacity to capture user interests from historical behavior sequences in each scenario, we develop a ranking framework named the Scenario-Adaptive Fine-Grained Personalization Network (SFPNet), which designs a kind of fine-grained method for multi-scenario personalized recommendations.

Paper
Add Code

Graph Chain-of-Thought: Augmenting Large Language Models by Reasoning on Graphs

1 code implementation • 10 Apr 2024 • Bowen Jin, Chulin Xie, Jiawei Zhang, Kashob Kumar Roy, Yu Zhang, Suhang Wang, Yu Meng, Jiawei Han

Then, we propose a simple and effective framework called Graph Chain-of-thought (Graph-CoT) to augment LLMs with graphs by encouraging LLMs to reason on the graph iteratively.

Paper
Code

CORP: A Multi-Modal Dataset for Campus-Oriented Roadside Perception Tasks

no code implementations • 4 Apr 2024 • Beibei Wang, Shuang Meng, Lu Zhang, Chenjie Wang, Jingjing Huang, Yao Li, Haojie Ren, Yuxuan Xiao, Yuru Peng, Jianmin Ji, Yu Zhang, Yanyong Zhang

Numerous roadside perception datasets have been introduced to propel advancements in autonomous driving and intelligent transportation systems research and development.

Autonomous Driving Instance Segmentation +1

Paper
Add Code

Addressing Heterogeneity in Federated Load Forecasting with Personalization Layers

no code implementations • 1 Apr 2024 • Shourya Bose, Yu Zhang, Kibaek Kim

The advent of smart meters has enabled pervasive collection of energy consumption data for training short-term load forecasting models.

Federated Learning Load Forecasting +1

Paper
Add Code

UAlign: Pushing the Limit of Template-free Retrosynthesis Prediction with Unsupervised SMILES Alignment

1 code implementation • 25 Mar 2024 • Kaipeng Zeng, Bo Yang, Xin Zhao, Yu Zhang, Fan Nie, Xiaokang Yang, Yaohui Jin, Yanyan Xu

Single-step retrosynthesis prediction, a crucial step in the planning process, has witnessed a surge in interest in recent years due to advancements in AI for science.

Graph-to-Sequence molecular representation +3

Paper
Code

A Survey on Consumer IoT Traffic: Security and Privacy

1 code implementation • 24 Mar 2024 • Yan Jia, Yuxin Song, Zihou Liu, Qingyin Tan, Fangming Wang, Yu Zhang, Zheli Liu

From the security and privacy perspective, this survey seeks out the new characteristics in CIoT traffic analysis, the state-of-the-art progress in CIoT traffic analysis, and the challenges yet to be solved.

Paper
Code

Task-Aware Low-Rank Adaptation of Segment Anything Model

no code implementations • 16 Mar 2024 • Xuehao Wang, Feiyang Ye, Yu Zhang

Furthermore, we introduce modified SAM (mSAM) for multi-task learning where we remove the prompt encoder of SAM and use task-specific no mask embeddings and mask decoder for each task.

Decoder Image Segmentation +3

Paper
Add Code

Eyes Closed, Safety On: Protecting Multimodal LLMs via Image-to-Text Transformation

no code implementations • 14 Mar 2024 • Yunhao Gou, Kai Chen, Zhili Liu, Lanqing Hong, Hang Xu, Zhenguo Li, Dit-yan Yeung, James T. Kwok, Yu Zhang

Multimodal large language models (MLLMs) have shown impressive reasoning abilities, which, however, are also more vulnerable to jailbreak attacks than their LLM predecessors.

Optical Character Recognition (OCR)

Paper
Add Code

Conditional Score-Based Diffusion Model for Cortical Thickness Trajectory Prediction

no code implementations • 11 Mar 2024 • Qing Xiao, Siyeop Yoon, Hui Ren, Matthew Tivnan, Lichao Sun, Quanzheng Li, Tianming Liu, Yu Zhang, Xiang Li

Alzheimer's Disease (AD) is a neurodegenerative condition characterized by diverse progression rates among individuals, with changes in cortical thickness (CTh) closely linked to its progression.

Trajectory Prediction

Paper
Add Code

ERASOR++: Height Coding Plus Egocentric Ratio Based Dynamic Object Removal for Static Point Cloud Mapping

no code implementations • 8 Mar 2024 • Jiabao Zhang, Yu Zhang

Mapping plays a crucial role in location and navigation within automatic systems.

Paper
Add Code

SplattingAvatar: Realistic Real-Time Human Avatars with Mesh-Embedded Gaussian Splatting

1 code implementation • 8 Mar 2024 • Zhijing Shao, Zhaolong Wang, Zhuang Li, Duotun Wang, Xiangru Lin, Yu Zhang, Mingming Fan, Zeyu Wang

We present SplattingAvatar, a hybrid 3D representation of photorealistic human avatars with Gaussian Splatting embedded on a triangle mesh, which renders over 300 FPS on a modern GPU and 30 FPS on a mobile device.

284

Paper
Code

Real-Time Adaptive Safety-Critical Control with Gaussian Processes in High-Order Uncertain Models

no code implementations • 29 Feb 2024 • Yu Zhang, long wen, Xiangtong Yao, Zhenshan Bing, Linghuan Kong, wei he, Alois Knoll

Subsequently, the hyperparameters of the Gaussian model are trained with a specially compound kernel, and the Gaussian model's online inferential capability and computational efficiency are strengthened by updating a solitary inducing point derived from new samples, in conjunction with the learned hyperparameters.

Computational Efficiency Gaussian Processes

Paper
Add Code

Online Efficient Safety-Critical Control for Mobile Robots in Unknown Dynamic Multi-Obstacle Environments

no code implementations • 26 Feb 2024 • Yu Zhang, Guangyao Tian, long wen, Xiangtong Yao, Liding Zhang, Zhenshan Bing, wei he, Alois Knoll

This paper proposes a LiDAR-based goal-seeking and exploration framework, addressing the efficiency of online obstacle avoidance in unstructured environments populated with static and moving obstacles.

Paper
Add Code

CFRet-DVQA: Coarse-to-Fine Retrieval and Efficient Tuning for Document Visual Question Answering

no code implementations • 26 Feb 2024 • Jinxu Zhang, Yongqi Yu, Yu Zhang

Document Visual Question Answering (DVQA) is a task that involves responding to queries based on the content of images.

Language Modelling Large Language Model +3

Paper
Add Code

A Unified Taxonomy-Guided Instruction Tuning Framework for Entity Set Expansion and Taxonomy Expansion

1 code implementation • 20 Feb 2024 • Yanzhen Shen, Yu Zhang, Yunyi Zhang, Jiawei Han

Entity Set Expansion, Taxonomy Expansion, and Seed-Guided Taxonomy Construction are three representative tasks that can be used to automatically populate an existing taxonomy with new entities.

Language Modelling Large Language Model +1

Paper
Code

SoLA: Solver-Layer Adaption of LLM for Better Logic Reasoning

no code implementations • 19 Feb 2024 • Yu Zhang, Hui-Ling Zhen, Zehua Pei, Yingzhao Lian, Lihao Yin, Mingxuan Yuan, Bei Yu

In this paper, we propose a novel solver-layer adaptation (SoLA) method, where we introduce a solver as a new layer of the LLM to differentially guide solutions towards satisfiability.

Logical Reasoning

Paper
Add Code

Evaluating Large Language Models in Analysing Classroom Dialogue

no code implementations • 4 Feb 2024 • Yun Long, Haifeng Luo, Yu Zhang

Recognizing the knowledge-intensive and labor-intensive nature of traditional qualitative methods in educational research, this study investigates the potential of LLM to streamline and enhance the analysis process.

Paper
Add Code

KICGPT: Large Language Model with Knowledge in Context for Knowledge Graph Completion

1 code implementation • 4 Feb 2024 • Yanbin Wei, Qiushi Huang, James T. Kwok, Yu Zhang

Knowledge Graph Completion (KGC) is crucial for addressing knowledge graph incompleteness and supporting downstream applications.

In-Context Learning Language Modelling

Paper
Code

Rendering Graphs for Graph Reasoning in Multimodal Large Language Models

no code implementations • 3 Feb 2024 • Yanbin Wei, Shuai Fu, Weisen Jiang, James T. Kwok, Yu Zhang

In this paper, we take the first step in incorporating visual information into graph reasoning tasks and propose a new benchmark GITQA, where each sample is a tuple (graph, image, textual description).

Common Sense Reasoning Knowledge Graph Completion

Paper
Add Code

iMove: Exploring Bio-impedance Sensing for Fitness Activity Recognition

no code implementations • 31 Jan 2024 • Mengxi Liu, Vitor Fortes Rey, Yu Zhang, Lala Shakti Swarup Ray, Bo Zhou, Paul Lukowicz

While IMUs are currently the prominent fitness tracking modality, through iMove, we show bio-impedence can help improve IMU-based fitness tracking through sensor fusion and contrastive learning. To evaluate our methods, we conducted an experiment including six upper body fitness activities performed by ten subjects over five days to collect synchronized data from bio-impedance across two wrists and IMU on the left wrist. The contrastive learning framework uses the two modalities to train a better IMU-only classification model, where bio-impedance is only required at the training phase, by which the average Macro F1 score with the input of a single IMU was improved by 3. 22 \% reaching 84. 71 \% compared to the 81. 49 \% of the IMU baseline model.

Contrastive Learning Human Activity Recognition +1

Paper
Add Code

Distribution-consistency Structural Causal Models

no code implementations • 29 Jan 2024 • Heyang Gong, Chaochao Lu, Yu Zhang

In the field of causal modeling, potential outcomes (PO) and structural causal models (SCMs) stand as the predominant frameworks.

counterfactual Counterfactual Reasoning +1

Paper
Add Code

Seed-Guided Fine-Grained Entity Typing in Science and Engineering Domains

1 code implementation • 23 Jan 2024 • Yu Zhang, Yunyi Zhang, Yanzhen Shen, Yu Deng, Lucian Popa, Larisa Shwartz, ChengXiang Zhai, Jiawei Han

In this paper, we study the task of seed-guided fine-grained entity typing in science and engineering domains, which takes the name and a few seed entities for each entity type as the only supervision and aims to classify new entity mentions into both seen and unseen types (i. e., those without seed entities).

Entity Typing Natural Language Inference

Paper
Code

Multilingual and Fully Non-Autoregressive ASR with Large Language Model Fusion: A Comprehensive Study

no code implementations • 23 Jan 2024 • W. Ronny Huang, Cyril Allauzen, Tongzhou Chen, Kilol Gupta, Ke Hu, James Qin, Yu Zhang, Yongqiang Wang, Shuo-Yiin Chang, Tara N. Sainath

In the era of large models, the autoregressive nature of decoding often results in latency serving as a significant bottleneck.

Language Modelling Large Language Model +2

Paper
Add Code

HG3-NeRF: Hierarchical Geometric, Semantic, and Photometric Guided Neural Radiance Fields for Sparse View Inputs

no code implementations • 22 Jan 2024 • Zelin Gao, Weichen Dai, Yu Zhang

We propose Hierarchical Geometric Guidance (HGG) to incorporate the attachment of Structure from Motion (SfM), namely sparse depth prior, into the scene representations.

Novel View Synthesis

Paper
Add Code

SMUTF: Schema Matching Using Generative Tags and Hybrid Features

no code implementations • 22 Jan 2024 • Yu Zhang, Mei Di, Haozheng Luo, Chenwei Xu, Richard Tzong-Han Tsai

Recognizing the lack of extensive, publicly available datasets for SM, we have created and open-sourced the HDXSM dataset from the public humanitarian data.

Feature Engineering Humanitarian

Paper
Add Code

A First-Order Multi-Gradient Algorithm for Multi-Objective Bi-Level Optimization

no code implementations • 17 Jan 2024 • Feiyang Ye, Baijiong Lin, Xiaofeng Cao, Yu Zhang, Ivor Tsang

In this paper, we study the Multi-Objective Bi-Level Optimization (MOBLO) problem, where the upper-level subproblem is a multi-objective optimization problem and the lower-level subproblem is for scalar optimization.

Multi-Task Learning

Paper
Add Code

E^2-LLM: Efficient and Extreme Length Extension of Large Language Models

no code implementations • 13 Jan 2024 • Jiaheng Liu, Zhiqi Bai, Yuanxing Zhang, Chenchen Zhang, Yu Zhang, Ge Zhang, Jiakai Wang, Haoran Que, Yukang Chen, Wenbo Su, Tiezheng Ge, Jie Fu, Wenhu Chen, Bo Zheng

Typically, training LLMs with long context sizes is computationally expensive, requiring extensive training hours and GPU resources.

4k Position

Paper
Add Code

Scaling Laws And Statistical Properties of The Transaction Flows And Holding Times of Bitcoin

no code implementations • 9 Jan 2024 • Didier Sornette, Yu Zhang

Defining age-dependent transaction flows as the fraction of bitcoins that are traded at a given time and that were born (last traded) at some specific earlier time, we document that the time-averaged transaction flow fraction has a power law dependence as a function of age, with an exponent close to $-1. 5$, a value compatible with priority queuing theory.

Paper
Add Code

Online Test-Time Adaptation of Spatial-Temporal Traffic Flow Forecasting

1 code implementation • 8 Jan 2024 • Pengxin Guo, Pengrong Jin, Ziyue Li, Lei Bai, Yu Zhang

To make the model trained on historical data better adapt to future data in a fully online manner, this paper conducts the first study of the online test-time adaptation techniques for spatial-temporal traffic flow forecasting problems.

Ranked #4 on Traffic Prediction on PeMS07

Test-time Adaptation Traffic Prediction

Paper
Code

VLLaVO: Mitigating Visual Gap through LLMs

1 code implementation • 6 Jan 2024 • Shuhao Chen, Yulong Zhang, Weisen Jiang, Jiangang Lu, Yu Zhang

Recent advances achieved by deep learning models rely on the independent and identically distributed assumption, hindering their applications in real-world scenarios with domain shifts.

Domain Generalization Language Modelling +2

Paper
Code

Mixture of Cluster-conditional LoRA Experts for Vision-language Instruction Tuning

no code implementations • 19 Dec 2023 • Yunhao Gou, Zhili Liu, Kai Chen, Lanqing Hong, Hang Xu, Aoxue Li, Dit-yan Yeung, James T. Kwok, Yu Zhang

Instruction tuning of Large Vision-language Models (LVLMs) has revolutionized the development of versatile models with zero-shot generalization across a wide range of downstream vision-language tasks.

Instruction Following Zero-shot Generalization

Paper
Add Code

StyleSinger: Style Transfer for Out-of-Domain Singing Voice Synthesis

no code implementations • 17 Dec 2023 • Yu Zhang, Rongjie Huang, RuiQi Li, Jinzheng He, Yan Xia, Feiyang Chen, Xinyu Duan, Baoxing Huai, Zhou Zhao

Moreover, existing SVS methods encounter a decline in the quality of synthesized singing voices in OOD scenarios, as they rest upon the assumption that the target vocal attributes are discernible during the training phase.

Quantization Singing Voice Synthesis +1

Paper
Add Code

Memory-Efficient Reversible Spiking Neural Networks

1 code implementation • 13 Dec 2023 • Hong Zhang, Yu Zhang

In this paper, we propose the reversible spiking neural network to reduce the memory cost of intermediate activations and membrane potentials during training.

Paper
Code

A Unified Framework for Unsupervised Domain Adaptation based on Instance Weighting

no code implementations • 8 Dec 2023 • Jinjing Zhu, Feiyang Ye, Qiao Xiao, Pengxin Guo, Yu Zhang, Qiang Yang

Specifically, the proposed LIWUDA method constructs a weight network to assign weights to each instance based on its probability of belonging to common classes, and designs Weighted Optimal Transport (WOT) for domain alignment by leveraging instance weights.

Partial Domain Adaptation Universal Domain Adaptation +1

Paper
Add Code

Lite-Mind: Towards Efficient and Robust Brain Representation Network

no code implementations • 6 Dec 2023 • Zixuan Gong, Qi Zhang, Guangyin Bao, Lei Zhu, Yu Zhang, Ke Liu, Liang Hu, Duoqian Miao

The limited data availability and the low signal-to-noise ratio of fMRI signals lead to the challenging task of fMRI-to-image retrieval.

Brain Decoding Image Retrieval +3

Paper
Add Code

Dynamic Inertial Poser (DynaIP): Part-Based Motion Dynamics Learning for Enhanced Human Pose Estimation with Sparse Inertial Sensors

1 code implementation • 2 Dec 2023 • Yu Zhang, Songpengcheng Xia, Lei Chu, Jiarui Yang, Qi Wu, Ling Pei

This paper introduces a novel human pose estimation approach using sparse inertial sensors, addressing the shortcomings of previous methods reliant on synthetic data.

Pose Estimation

Paper
Code

IG Captioner: Information Gain Captioners are Strong Zero-shot Classifiers

no code implementations • 27 Nov 2023 • Chenglin Yang, Siyuan Qiao, Yuan Cao, Yu Zhang, Tao Zhu, Alan Yuille, Jiahui Yu

To tackle this problem, we redesign the scoring objective for the captioner to alleviate the distributional bias and focus on measuring the gain of information brought by the visual inputs.

Caption Generation Language Modelling +2

Paper
Add Code

HiPose: Hierarchical Binary Surface Encoding and Correspondence Pruning for RGB-D 6DoF Object Pose Estimation

1 code implementation • 21 Nov 2023 • Yongliang Lin, Yongzhi Su, Praveen Nathan, Sandeep Inuganti, Yan Di, Martin Sundermeyer, Fabian Manhardt, Didier Stricker, Jason Rambach, Yu Zhang

In this work, we present a novel dense-correspondence method for 6DoF object pose estimation from a single RGB-D image.

Pose Estimation

Paper
Code

Privacy-Preserving Load Forecasting via Personalized Model Obfuscation

no code implementations • 21 Nov 2023 • Shourya Bose, Yu Zhang, Kibaek Kim

The widespread adoption of smart meters provides access to detailed and localized load consumption data, suitable for training building-level load forecasting models.

Federated Learning Load Forecasting +1

Paper
Add Code

Spatio-Temporal Similarity Measure based Multi-Task Learning for Predicting Alzheimer's Disease Progression using MRI Data

no code implementations • 6 Nov 2023 • Xulong Wang, Yu Zhang, Menghui Zhou, Tong Liu, Jun Qi, Po Yang

The experimental results show that compared with directly ROI based learning, our proposed method is more effective in predicting disease progression.

Multi-Task Learning

Paper
Add Code

Signal Processing Meets SGD: From Momentum to Filter

no code implementations • 6 Nov 2023 • Zhipeng Yao, Yu Zhang, Dazhou Li

To address this contradiction, we propose a novel optimization method that aims to accelerate the convergence rate of SGD without loss of generalization.

Paper
Add Code

E3 TTS: Easy End-to-End Diffusion-based Text to Speech

no code implementations • 2 Nov 2023 • Yuan Gao, Nobuyuki Morioka, Yu Zhang, Nanxin Chen

Instead, E3 TTS models the temporal structure of the waveform through the diffusion process.

Paper
Add Code

"Why Should I Review This Paper?" Unifying Semantic, Topic, and Citation Factors for Paper-Reviewer Matching

no code implementations • 23 Oct 2023 • Yu Zhang, Yanzhen Shen, Xiusi Chen, Bowen Jin, Jiawei Han

As many academic conferences are overwhelmed by a rapidly increasing number of paper submissions, automatically finding appropriate reviewers for each submission becomes a more urgent need than ever.

Information Retrieval Language Modelling +1

Paper
Add Code

Machine Learning Methods for Background Potential Estimation in 2DEGs

no code implementations • 11 Oct 2023 • Carlo da Cunha, Nobuyuki Aoki, David Ferry, Kevin Vora, Yu Zhang

In the realm of quantum-effect devices and materials, two-dimensional electron gases (2DEGs) stand as fundamental structures that promise transformative technologies.

Image-to-Image Translation

Paper
Add Code

Ontology Enrichment for Effective Fine-grained Entity Typing

no code implementations • 11 Oct 2023 • Siru Ouyang, Jiaxin Huang, Pranav Pillai, Yunyi Zhang, Yu Zhang, Jiawei Han

In this study, we propose OnEFET, where we (1) enrich each node in the ontology structure with two types of extra information: instance information for training sample augmentation and topic information to relate types to contexts, and (2) develop a coarse-to-fine typing algorithm that exploits the enriched information by training an entailment model with contrasting topics and instance-based augmented training samples.

Entity Typing

Paper
Add Code

Non-autoregressive Text Editing with Copy-aware Latent Alignments

1 code implementation • 11 Oct 2023 • Yu Zhang, Yue Zhang, Leyang Cui, Guohong Fu

In this work, we propose a novel non-autoregressive text editing method to circumvent the above issues, by modeling the edit process with latent CTC alignments.

Management Sentence +1

Paper
Code

Learning Multiplex Embeddings on Text-rich Networks with One Text Encoder

no code implementations • 10 Oct 2023 • Bowen Jin, Wentao Zhang, Yu Zhang, Yu Meng, Han Zhao, Jiawei Han

Mainstream text representation learning methods use pretrained language models (PLMs) to generate one embedding for each text unit, expecting that all types of relations between texts can be captured by these single-view embeddings.

Representation Learning

Paper
Add Code

CodeFuse-13B: A Pretrained Multi-lingual Code Large Language Model

no code implementations • 10 Oct 2023 • Peng Di, Jianguo Li, Hang Yu, Wei Jiang, Wenting Cai, Yang Cao, Chaoyu Chen, Dajun Chen, Hongwei Chen, Liang Chen, Gang Fan, Jie Gong, Zi Gong, Wen Hu, Tingting Guo, Zhichao Lei, Ting Li, Zheng Li, Ming Liang, Cong Liao, Bingchang Liu, Jiachen Liu, Zhiwei Liu, Shaojun Lu, Min Shen, Guangpei Wang, Huan Wang, Zhi Wang, Zhaogui Xu, Jiawei Yang, Qing Ye, Gehao Zhang, Yu Zhang, Zelin Zhao, Xunjin Zheng, Hailian Zhou, Lifu Zhu, Xianying Zhu

It is specifically designed for code-related tasks with both English and Chinese prompts and supports over 40 programming languages.

Code Translation Language Modelling +1

Paper
Add Code

BYOM: Building Your Own Multi-Task Model For Free

no code implementations • 3 Oct 2023 • Weisen Jiang, Baijiong Lin, Han Shi, Yu Zhang, Zhenguo Li, James T. Kwok

Recently, various merging methods have been proposed to build a multi-task model from task-specific finetuned models without retraining.

Paper
Add Code

SLM: Bridge the thin gap between speech and text foundation models

no code implementations • 30 Sep 2023 • Mingqiu Wang, Wei Han, Izhak Shafran, Zelin Wu, Chung-Cheng Chiu, Yuan Cao, Yongqiang Wang, Nanxin Chen, Yu Zhang, Hagen Soltau, Paul Rubenstein, Lukas Zilka, Dian Yu, Zhong Meng, Golan Pundak, Nikhil Siddhartha, Johan Schalkwyk, Yonghui Wu

We present a joint Speech and Language Model (SLM), a multitask, multilingual, and dual-modal model that takes advantage of pretrained foundational speech and language models.

Instruction Following Language Modelling +3

Paper
Add Code

SSPFusion: A Semantic Structure-Preserving Approach for Infrared and Visible Image Fusion

no code implementations • 26 Sep 2023 • Qiao Yang, Yu Zhang, Jian Zhang, Zijing Zhao, Shunli Zhang, Jinqiao Wang, Junzhe Chen

Most existing learning-based infrared and visible image fusion (IVIF) methods exhibit massive redundant information in the fusion images, i. e., yielding edge-blurring effect or unrecognizable for object detectors.

Infrared And Visible Image Fusion

Paper
Add Code

IAIFNet: An Illumination-Aware Infrared and Visible Image Fusion Network

no code implementations • 26 Sep 2023 • Qiao Yang, Yu Zhang, Jian Zhang, Zijing Zhao, Shunli Zhang, Jinqiao Wang, Junzhe Chen

Infrared and visible image fusion (IVIF) is used to generate fusion images with comprehensive features of both images, which is beneficial for downstream vision tasks.

Infrared And Visible Image Fusion

Paper
Add Code

Adversarial Attacks on Video Object Segmentation with Hard Region Discovery

no code implementations • 25 Sep 2023 • Ping Li, Yu Zhang, Li Yuan, Jian Zhao, Xianghua Xu, Xiaoqin Zhang

Particularly, the gradients from the segmentation model are exploited to discover the easily confused region, in which it is difficult to identify the pixel-wise objects from the background in a frame.

Autonomous Driving Object +5

Paper
Add Code

Domain-Guided Conditional Diffusion Model for Unsupervised Domain Adaptation

no code implementations • 23 Sep 2023 • Yulong Zhang, Shuhao Chen, Weisen Jiang, Yu Zhang, Jiangang Lu, James T. Kwok

However, the performance of existing UDA methods is constrained by the large domain shift and limited target domain data.

Unsupervised Domain Adaptation

Paper
Add Code

Unify word-level and span-level tasks: NJUNLP's Participation for the WMT2023 Quality Estimation Shared Task

1 code implementation • 23 Sep 2023 • Xiang Geng, Zhejian Lai, Yu Zhang, Shimin Tao, Hao Yang, Jiajun Chen, ShuJian Huang

We generate pseudo MQM data using parallel data from the WMT translation task.

Sentence

Paper
Code

Efficient Long-Short Temporal Attention Network for Unsupervised Video Object Segmentation

no code implementations • 21 Sep 2023 • Ping Li, Yu Zhang, Li Yuan, Huaxin Xiao, Binbin Lin, Xianghua Xu

Unsupervised Video Object Segmentation (VOS) aims at identifying the contours of primary foreground objects in videos without any prior knowledge.

Semantic Segmentation Unsupervised Video Object Segmentation +1

Paper
Add Code

Fully Transformer-Equipped Architecture for End-to-End Referring Video Object Segmentation

no code implementations • 21 Sep 2023 • Ping Li, Yu Zhang, Li Yuan, Xianghua Xu

Referring Video Object Segmentation (RVOS) requires segmenting the object in video referred by a natural language query.

Object Referring Video Object Segmentation +4

Paper
Add Code

MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models

1 code implementation • 21 Sep 2023 • Longhui Yu, Weisen Jiang, Han Shi, Jincheng Yu, Zhengying Liu, Yu Zhang, James T. Kwok, Zhenguo Li, Adrian Weller, Weiyang Liu

Our MetaMath-7B model achieves 66. 4% on GSM8K and 19. 4% on MATH, exceeding the state-of-the-art models of the same size by 11. 5% and 8. 7%.

Ranked #54 on Arithmetic Reasoning on GSM8K (using extra training data)

Arithmetic Reasoning GSM8K +4

321

Paper
Code

Health diagnosis and recuperation of aged Li-ion batteries with data analytics and equivalent circuit modeling

no code implementations • 21 Sep 2023 • Riko I Made, Jing Lin, Jintao Zhang, Yu Zhang, Lionel C. H. Moh, Zhaolin Liu, Ning Ding, Sing Yang Chiam, Edwin Khoo, Xuesong Yin, Guangyuan Wesley Zheng

Battery health assessment and recuperation play a crucial role in the utilization of second-life Li-ion batteries.

Paper
Add Code

Multimodal Modeling For Spoken Language Identification

no code implementations • 19 Sep 2023 • Shikhar Bharadwaj, Min Ma, Shikhar Vashishth, Ankur Bapna, Sriram Ganapathy, Vera Axelrod, Siddharth Dalmia, Wei Han, Yu Zhang, Daan van Esch, Sandy Ritchie, Partha Talukdar, Jason Riesa

Spoken language identification refers to the task of automatically predicting the spoken language in a given utterance.

Language Identification Spoken language identification

Paper
Add Code

USM-SCD: Multilingual Speaker Change Detection Based on Large Pretrained Foundation Models

no code implementations • 14 Sep 2023 • Guanlong Zhao, Yongqiang Wang, Jason Pelecanos, Yu Zhang, Hank Liao, Yiling Huang, Han Lu, Quan Wang

We show that the USM-SCD model can achieve more than 75% average speaker change detection F1 score across a test set that consists of data from 96 languages.

Change Detection

Paper
Add Code

Multi-view Self-supervised Disentanglement for General Image Denoising

1 code implementation • ICCV 2023 • Hao Chen, Chenyuan Qu, Yu Zhang, Chen Chen, Jianbo Jiao

It is understandable as the model is designed to learn paired mapping (e. g. from a noisy image to its clean version).

Ranked #1 on Denoising on CBSD68 sigm75

Disentanglement Image Denoising +1

Paper
Code

StereoFlowGAN: Co-training for Stereo and Flow with Unsupervised Domain Adaptation

no code implementations • 4 Sep 2023 • Zhexiao Xiong, Feng Qiao, Yu Zhang, Nathan Jacobs

We introduce a novel training strategy for stereo matching and optical flow estimation that utilizes image-to-image translation between synthetic and real image domains.

Image-to-Image Translation Optical Flow Estimation +3

Paper
Add Code

Siren's Song in the AI Ocean: A Survey on Hallucination in Large Language Models

1 code implementation • 3 Sep 2023 • Yue Zhang, Yafu Li, Leyang Cui, Deng Cai, Lemao Liu, Tingchen Fu, Xinting Huang, Enbo Zhao, Yu Zhang, Yulong Chen, Longyue Wang, Anh Tuan Luu, Wei Bi, Freda Shi, Shuming Shi

While large language models (LLMs) have demonstrated remarkable capabilities across a range of downstream tasks, a significant concern revolves around their propensity to exhibit hallucinations: LLMs occasionally generate content that diverges from the user input, contradicts previously generated context, or misaligns with established world knowledge.

Hallucination World Knowledge

819

Paper
Code

Fine-Grained Cross-View Geo-Localization Using a Correlation-Aware Homography Estimator

1 code implementation • NeurIPS 2023 • Xiaolong Wang, Runsen Xu, Zuofan Cui, Zeyu Wan, Yu Zhang

In this paper, we introduce a novel approach to fine-grained cross-view geo-localization.

Homography Estimation

Paper
Code

Occlusion-Aware Detection and Re-ID Calibrated Network for Multi-Object Tracking

no code implementations • 30 Aug 2023 • Yukun Su, Ruizhou Sun, Xin Shu, Yu Zhang, Qingyao Wu

Multi-Object Tracking (MOT) is a crucial computer vision task that aims to predict the bounding boxes and identities of objects simultaneously.

Multi-Object Tracking Object

Paper
Add Code

Dual-Balancing for Multi-Task Learning

1 code implementation • 23 Aug 2023 • Baijiong Lin, Weisen Jiang, Feiyang Ye, Yu Zhang, Pengguang Chen, Ying-Cong Chen, Shu Liu, James T. Kwok

Multi-task learning (MTL), a learning paradigm to learn multiple related tasks simultaneously, has achieved great success in various fields.

Multi-Task Learning

1,735

Paper
Code

High-Fidelity Lake Extraction via Two-Stage Prompt Enhancement: Establishing a Novel Baseline and Benchmark

1 code implementation • 16 Aug 2023 • Ben Chen, Xuechao Zou, Kai Li, Yu Zhang, Junliang Xing, Pin Tao

Lake extraction from remote sensing imagery is a complex challenge due to the varied lake shapes and data noise.

Decoder

Paper
Code

Forward-Backward Reasoning in Large Language Models for Mathematical Verification

no code implementations • 15 Aug 2023 • Weisen Jiang, Han Shi, Longhui Yu, Zhengying Liu, Yu Zhang, Zhenguo Li, James T. Kwok

Instead of using forward or backward reasoning alone, we propose FOBAR to combine FOrward and BAckward Reasoning for verification.

Mathematical Reasoning

Paper
Add Code

DiffCR: A Fast Conditional Diffusion Framework for Cloud Removal from Optical Satellite Images

1 code implementation • 8 Aug 2023 • Xuechao Zou, Kai Li, Junliang Xing, Yu Zhang, Shiying Wang, Lei Jin, Pin Tao

Optical satellite images are a critical data source; however, cloud cover often compromises their quality, hindering image applications and analysis.

Cloud Removal Image Generation

Paper
Code

LEFormer: A Hybrid CNN-Transformer Architecture for Accurate Lake Extraction from Remote Sensing Imagery

1 code implementation • 8 Aug 2023 • Ben Chen, Xuechao Zou, Yu Zhang, Jiayu Li, Kai Li, Junliang Xing, Pin Tao

LEFormer contains three main modules: CNN encoder, Transformer encoder, and cross-encoder fusion.

Paper
Code

Decomposing and Coupling Saliency Map for Lesion Segmentation in Ultrasound Images

no code implementations • 2 Aug 2023 • Zhenyuan Ning, Yixiao Mao, Qianjin Feng, Shengzhou Zhong, Yu Zhang

Complex scenario of ultrasound image, in which adjacent tissues (i. e., background) share similar intensity with and even contain richer texture patterns than lesion region (i. e., foreground), brings a unique challenge for accurate lesion segmentation.

Decoder Dimensionality Reduction +3

Paper
Add Code

MATNilm: Multi-appliance-task Non-intrusive Load Monitoring with Limited Labeled Data

1 code implementation • 27 Jul 2023 • Jing Xiong, Tianqi Hong, Dongbo Zhao, Yu Zhang

Non-intrusive load monitoring (NILM) identifies the status and power consumption of various household appliances by disaggregating the total power usage signal of an entire house.

energy management Non-Intrusive Load Monitoring

Paper
Code

A Dual-mode Local Search Algorithm for Solving the Minimum Dominating Set Problem

no code implementations • 25 Jul 2023 • Enqiang Zhu, Yu Zhang, Shengzhi Wang, Darren Strash, Chanjuan Liu

Given a graph, the minimum dominating set (MinDS) problem is to identify a smallest set $D$ of vertices such that every vertex not in $D$ is adjacent to at least one vertex in $D$.

Paper
Add Code

Real-World Evaluation of Full-Duplex Millimeter Wave Communication Systems

no code implementations • 20 Jul 2023 • Ian P. Roberts, Yu Zhang, Tawfik Osman, Ahmed Alkhateeb

Noteworthy strides continue to be made in the development of full-duplex millimeter wave (mmWave) communication systems, but most of this progress has been built on theoretical models and validated through simulation.

Paper
Add Code

"It Felt Like Having a Second Mind": Investigating Human-AI Co-creativity in Prewriting with Large Language Models

no code implementations • 20 Jul 2023 • Qian Wan, Siying Hu, Yu Zhang, Piaohong Wang, Bo Wen, Zhicong Lu

This collaborative process champions the human in a dominant role, in addition to mixed and shifting levels of initiative that exist between humans and LLMs.

Paper
Add Code

Weakly Supervised Multi-Label Classification of Full-Text Scientific Papers

1 code implementation • 24 Jun 2023 • Yu Zhang, Bowen Jin, Xiusi Chen, Yanzhen Shen, Yunyi Zhang, Yu Meng, Jiawei Han

Instead of relying on human-annotated training samples to build a classifier, weakly supervised scientific paper classification aims to classify papers only using category descriptions (e. g., category names, category-indicative keywords).

Multi-Label Classification

Paper
Code

AudioPaLM: A Large Language Model That Can Speak and Listen

no code implementations • 22 Jun 2023 • Paul K. Rubenstein, Chulayuth Asawaroengchai, Duc Dung Nguyen, Ankur Bapna, Zalán Borsos, Félix de Chaumont Quitry, Peter Chen, Dalia El Badawy, Wei Han, Eugene Kharitonov, Hannah Muckenhirn, Dirk Padfield, James Qin, Danny Rozenberg, Tara Sainath, Johan Schalkwyk, Matt Sharifi, Michelle Tadmor, Ramanovich, Marco Tagliasacchi, Alexandru Tudor, Mihajlo Velimirović, Damien Vincent, Jiahui Yu, Yongqiang Wang, Vicky Zayats, Neil Zeghidour, Yu Zhang, Zhishuai Zhang, Lukas Zilka, Christian Frank

AudioPaLM inherits the capability to preserve paralinguistic information such as speaker identity and intonation from AudioLM and the linguistic knowledge present only in text large language models such as PaLM-2.

Language Modelling Large Language Model +5

Paper
Add Code

FlowFace++: Explicit Semantic Flow-supervised End-to-End Face Swapping

no code implementations • 22 Jun 2023 • Yu Zhang, Hao Zeng, Bowen Ma, Wei zhang, Zhimeng Zhang, Yu Ding, Tangjie Lv, Changjie Fan

The discriminator is shape-aware and relies on a semantic flow-guided operation to explicitly calculate the shape discrepancies between the target and source faces, thus optimizing the face swapping network to generate highly realistic results.

Decoder Face Swapping

Paper
Add Code

Learning Variable Impedance Skills from Demonstrations with Passivity Guarantee

no code implementations • 20 Jun 2023 • Yu Zhang, Long Cheng, Xiuze Xia, Haoyu Zhang

The proposed approach involves the estimation of full stiffness matrices from human demonstrations, which are then combined with sensed forces and motion information to create a model using the non-parametric method.

Paper
Add Code

Efficient Adapters for Giant Speech Models

no code implementations • 13 Jun 2023 • Nanxin Chen, Izhak Shafran, Yu Zhang, Chung-Cheng Chiu, Hagen Soltau, James Qin, Yonghui Wu

However, finetuning all parameters from the self-supervised learned model can be computationally expensive, and becomes infeasiable as the size of the model and the number of downstream tasks scales.

Paper
Add Code

PersonaPKT: Building Personalized Dialogue Agents via Parameter-efficient Knowledge Transfer

no code implementations • 13 Jun 2023 • Xu Han, Bin Guo, Yoon Jung, Benjamin Yao, Yu Zhang, Xiaohu Liu, Chenlei Guo

Personalized dialogue agents (DAs) powered by large pre-trained language models (PLMs) often rely on explicit persona descriptions to maintain personality consistency.

Response Generation Transfer Learning

Paper
Add Code

A Graph Transformer-Driven Approach for Network Robustness Learning

no code implementations • 12 Jun 2023 • Yu Zhang, Jia Li, Jie Ding, Xiang Li

Learning and analysis of network robustness, including controllability robustness and connectivity robustness, is critical for various networked systems against attacks.

Paper
Add Code

Gotta: Generative Few-shot Question Answering by Prompt-based Cloze Data Augmentation

1 code implementation • 7 Jun 2023 • Xiusi Chen, Yu Zhang, Jinliang Deng, Jyun-Yu Jiang, Wei Wang

Few-shot question answering (QA) aims at precisely discovering answers to a set of questions from context passages while only a few training samples are available.

Data Augmentation Question Answering

Paper
Code

COPR: Consistency-Oriented Pre-Ranking for Online Advertising

no code implementations • 6 Jun 2023 • Zhishan Zhao, Jingyue Gao, Yu Zhang, Shuguang Han, Siyuan Lou, Xiang-Rong Sheng, Zhe Wang, Han Zhu, Yuning Jiang, Jian Xu, Bo Zheng

In this architecture, the pre-ranking model is expected to be a lightweight approximation of the ranking model, which handles more candidates with strict latency requirements.

Paper
Add Code

Bi-LRFusion: Bi-Directional LiDAR-Radar Fusion for 3D Dynamic Object Detection

1 code implementation • CVPR 2023 • Yingjie Wang, Jiajun Deng, Yao Li, Jinshui Hu, Cong Liu, Yu Zhang, Jianmin Ji, Wanli Ouyang, Yanyong Zhang

LiDAR and Radar are two complementary sensing approaches in that LiDAR specializes in capturing an object's 3D shape while Radar provides longer detection ranges as well as velocity hints.

object-detection Object Detection

Paper
Code

Explanation Graph Generation via Generative Pre-training over Synthetic Graphs

1 code implementation • 1 Jun 2023 • Han Cui, Shangzhan Li, Yu Zhang, Qi Shi

The generation of explanation graphs is a significant task that aims to produce explanation graphs in response to user input, revealing the internal reasoning process.

Graph Generation Language Modelling

Paper
Code

How to Estimate Model Transferability of Pre-Trained Speech Models?

1 code implementation • 1 Jun 2023 • Zih-Ching Chen, Chao-Han Huck Yang, Bo Li, Yu Zhang, Nanxin Chen, Shuo-Yiin Chang, Rohit Prabhavalkar, Hung-Yi Lee, Tara N. Sainath

In this work, we introduce a "score-based assessment" framework for estimating the transferability of pre-trained speech models (PSMs) for fine-tuning target tasks.

Paper
Code

Effective Structured Prompting by Meta-Learning and Representative Verbalizer

1 code implementation • 1 Jun 2023 • Weisen Jiang, Yu Zhang, James T. Kwok

Combining meta-learning the prompt pool and RepVerb, we propose MetaPrompter for effective structured prompting.

Meta-Learning

Paper
Code

LibriTTS-R: A Restored Multi-Speaker Text-to-Speech Corpus

no code implementations • 30 May 2023 • Yuma Koizumi, Heiga Zen, Shigeki Karita, Yifan Ding, Kohei Yatabe, Nobuyuki Morioka, Michiel Bacchiani, Yu Zhang, Wei Han, Ankur Bapna

The constituent samples of LibriTTS-R are identical to those of LibriTTS, with only the sound quality improved.

Paper
Add Code

Mixture-of-Expert Conformer for Streaming Multilingual ASR

no code implementations • 25 May 2023 • Ke Hu, Bo Li, Tara N. Sainath, Yu Zhang, Francoise Beaufays

We evaluate the proposed model on a set of 12 languages, and achieve an average 11. 9% relative improvement in WER over the baseline.

Automatic Speech Recognition speech-recognition +1

Paper
Add Code

PIEClass: Weakly-Supervised Text Classification with Prompting and Noise-Robust Iterative Ensemble Training

1 code implementation • 23 May 2023 • Yunyi Zhang, Minhao Jiang, Yu Meng, Yu Zhang, Jiawei Han

Weakly-supervised text classification trains a classifier using the label name of each target class as the only supervision, which largely reduces human annotation efforts.

Pseudo Label Sentiment Analysis +3

Paper
Code

Pre-training Multi-task Contrastive Learning Models for Scientific Literature Understanding

no code implementations • 23 May 2023 • Yu Zhang, Hao Cheng, Zhihong Shen, Xiaodong Liu, Ye-Yi Wang, Jianfeng Gao

Scientific literature understanding tasks have gained significant attention due to their potential to accelerate scientific discovery.

Citation Prediction Contrastive Learning

Paper
Add Code

Capturing Conversion Rate Fluctuation during Sales Promotions: A Novel Historical Data Reuse Approach

1 code implementation • 22 May 2023 • Zhangming Chan, Yu Zhang, Shuguang Han, Yong Bai, Xiang-Rong Sheng, Siyuan Lou, Jiacen Hu, Baolin Liu, Yuning Jiang, Jian Xu, Bo Zheng

However, we observe that a well-trained CVR prediction model often performs sub-optimally during sales promotions.

Recommendation Systems Retrieval

Paper
Code

Patton: Language Model Pretraining on Text-Rich Networks

no code implementations • 20 May 2023 • Bowen Jin, Wentao Zhang, Yu Zhang, Yu Meng, Xinyang Zhang, Qi Zhu, Jiawei Han

A real-world text corpus sometimes comprises not only text documents but also semantic links between them (e. g., academic papers in a bibliographic network are linked by citations and co-authorships).

Language Modelling Masked Language Modeling +1

Paper
Add Code

Temporal Consistent Automatic Video Colorization via Semantic Correspondence

1 code implementation • 13 May 2023 • Yu Zhang, Siqi Chen, Mingdao Wang, Xianlin Zhang, Chuang Zhu, Yue Zhang, Xueming Li

Extensive experiments demonstrate that our method outperforms other methods in maintaining temporal consistency both qualitatively and quantitatively.

Colorization Image Colorization +1

Paper
Code

A Self-Training Framework Based on Multi-Scale Attention Fusion for Weakly Supervised Semantic Segmentation

1 code implementation • 10 May 2023 • Guoqing Yang, Chuang Zhu, Yu Zhang

Weakly supervised semantic segmentation (WSSS) based on image-level labels is challenging since it is hard to obtain complete semantic regions.

Denoising Weakly supervised Semantic Segmentation +1

Paper
Code

A Unifying Framework of Attention-based Neural Load Forecasting

1 code implementation • 8 May 2023 • Jing Xiong, Yu Zhang

In this paper, we propose a unifying deep learning framework for load forecasting, which includes time-varying feature weighting, hierarchical temporal attention, and feature-reinforced error correction.

Decoder Load Forecasting

Paper
Code

Chain-of-Skills: A Configurable Model for Open-domain Question Answering

1 code implementation • 4 May 2023 • Kaixin Ma, Hao Cheng, Yu Zhang, Xiaodong Liu, Eric Nyberg, Jianfeng Gao

Our approach outperforms recent self-supervised retrievers in zero-shot evaluations and achieves state-of-the-art fine-tuned retrieval performance on NQ, HotpotQA and OTT-QA.

Ranked #4 on Question Answering on HotpotQA

Open-Domain Question Answering Retrieval +1

Paper
Code

Transforming Visual Scene Graphs to Image Captions

1 code implementation • 3 May 2023 • Xu Yang, Jiawei Peng, Zihua Wang, Haiyang Xu, Qinghao Ye, Chenliang Li, Songfang Huang, Fei Huang, Zhangzikang Li, Yu Zhang

In TSG, we apply multi-head attention (MHA) to design the Graph Neural Network (GNN) for embedding scene graphs.

Attribute Decoder +2

Paper
Code

An Adaptive Policy to Employ Sharpness-Aware Minimization

no code implementations • 28 Apr 2023 • Weisen Jiang, Hansi Yang, Yu Zhang, James Kwok

Sharpness-aware minimization (SAM), which searches for flat minima by min-max optimization, has been shown to be useful in improving model generalization.

Paper
Add Code

Understanding Shared Speech-Text Representations

no code implementations • 27 Apr 2023 • Gary Wang, Kyle Kastner, Ankur Bapna, Zhehuai Chen, Andrew Rosenberg, Bhuvana Ramabhadran, Yu Zhang

Recently, a number of approaches to train speech models by incorpo-rating text into end-to-end models have been developed, with Mae-stro advancing state-of-the-art automatic speech recognition (ASR)and Speech Translation (ST) performance.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Detection of Pavement Cracks by Deep Learning Models of Transformer and UNet

no code implementations • 25 Apr 2023 • Yu Zhang, Lin Zhang

In this study, we investigated nine promising models to evaluate their performance in pavement surface crack detection by model accuracy, computational complexity, and model stability.

Paper
Add Code

Mastering Asymmetrical Multiplayer Game with Multi-Agent Asymmetric-Evolution Reinforcement Learning

no code implementations • 20 Apr 2023 • Chenglu Sun, Yichi Zhang, Yu Zhang, Ziling Lu, Jingbin Liu, Sijia Xu, Weidong Zhang

We propose asymmetric-evolution training (AET), a novel multi-agent reinforcement learning framework that can train multiple kinds of agents simultaneously in AMP game.

Multi-agent Reinforcement Learning reinforcement-learning

Paper
Add Code

Handling Heavy Occlusion in Dense Crowd Tracking by Focusing on the Heads

no code implementations • 16 Apr 2023 • Yu Zhang, Huaming Chen, Wei Bao, Zhongzheng Lai, Zao Zhang, Dong Yuan

Being able to identify and track all the pedestrians in the dense crowd scene with computer vision approaches is a typical challenge in this field, also known as the Multiple Object Tracking (MOT) challenge.

Multiple Object Tracking object-detection +1

Paper
Add Code

SPColor: Semantic Prior Guided Exemplar-based Image Colorization

1 code implementation • 13 Apr 2023 • Siqi Chen, Xueming Li, Xianlin Zhang, Mingdao Wang, Yu Zhang, Yue Zhang

Previous methods search for correspondence across the entire reference image, and this type of global matching is easy to get mismatch.

Colorization Image Colorization +1

Paper
Code

Simplifying Low-Light Image Enhancement Networks with Relative Loss Functions

1 code implementation • 6 Apr 2023 • Yu Zhang, Xiaoguang Di, Junde Wu, Rao Fu, Yong Li, Yue Wang, Yanwu Xu, Guohui YANG, Chunhui Wang

In this paper, to make the learning easier in low-light image enhancement, we introduce FLW-Net (Fast and LightWeight Network) and two relative loss functions.

Low-Light Image Enhancement

Paper
Code

Safe Explicable Planning

no code implementations • 4 Apr 2023 • Akkamahadevi Hanni, Andrew Boateng, Yu Zhang

The goal of SEP is to find behaviors that align with human expectations while adhering to the specified safety criterion.

Decision Making

Paper
Add Code

Personalized Federated Learning with Local Attention

no code implementations • 2 Apr 2023 • Sicong Liang, Junchao Tian, Shujun Yang, Yu Zhang

The key challenge of FL is the heterogeneity of local data in different clients, such as heterogeneous label distribution and feature shift, which could lead to significant performance degradation of the learned models.

Image Classification object-detection +2

Paper
Add Code

Exemplar-based Video Colorization with Long-term Spatiotemporal Dependency

no code implementations • 27 Mar 2023 • Siqi Chen, Xueming Li, Xianlin Zhang, Mingdao Wang, Yu Zhang, Jiatong Han, Yue Zhang

Exemplar-based video colorization is an essential technique for applications like old movie restoration.

Colorization

Paper
Add Code

$P^{3}O$: Transferring Visual Representations for Reinforcement Learning via Prompting

no code implementations • 22 Mar 2023 • Guoliang You, Xiaomeng Chu, Yifan Duan, Jie Peng, Jianmin Ji, Yu Zhang, Yanyong Zhang

In particular, we specify a prompt-transformer for representation conversion and propose a two-step training process to train the prompt-transformer for the target environment, while the rest of the DRL pipeline remains unchanged.

reinforcement-learning

Paper
Add Code

Diffusion-based Target Sampler for Unsupervised Domain Adaptation

no code implementations • 17 Mar 2023 • Yulong Zhang, Shuhao Chen, Yu Zhang, Jiangang Lu

The generated samples can well simulate the data distribution of the target domain and help existing UDA methods transfer from the source domain to the target domain more easily, thus improving the transfer performance.

Unsupervised Domain Adaptation

Paper
Add Code

Miipher: A Robust Speech Restoration Model Integrating Self-Supervised Speech and Text Representations

1 code implementation • 3 Mar 2023 • Yuma Koizumi, Heiga Zen, Shigeki Karita, Yifan Ding, Kohei Yatabe, Nobuyuki Morioka, Yu Zhang, Wei Han, Ankur Bapna, Michiel Bacchiani

Experiments show that Miipher (i) is robust against various audio degradation and (ii) enable us to train a high-quality text-to-speech (TTS) model from restored speech samples collected from the Web.

Speech Denoising Speech Enhancement

Paper
Code

Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages

no code implementations • 2 Mar 2023 • Yu Zhang, Wei Han, James Qin, Yongqiang Wang, Ankur Bapna, Zhehuai Chen, Nanxin Chen, Bo Li, Vera Axelrod, Gary Wang, Zhong Meng, Ke Hu, Andrew Rosenberg, Rohit Prabhavalkar, Daniel S. Park, Parisa Haghani, Jason Riesa, Ginger Perng, Hagen Soltau, Trevor Strohman, Bhuvana Ramabhadran, Tara Sainath, Pedro Moreno, Chung-Cheng Chiu, Johan Schalkwyk, Françoise Beaufays, Yonghui Wu

We introduce the Universal Speech Model (USM), a single large model that performs automatic speech recognition (ASR) across 100+ languages.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

PCR-CG: Point Cloud Registration via Deep Explicit Color and Geometry

1 code implementation • 28 Feb 2023 • Yu Zhang, Junle Yu, Xiaolin Huang, Wenhui Zhou, Ji Hou

Different from previous methods that only use geometry representation, our module is specifically designed to effectively correlate color into geometry for the point cloud registration task.

Point Cloud Registration

Paper
Code

Edgeformers: Graph-Empowered Transformers for Representation Learning on Textual-Edge Networks

1 code implementation • 21 Feb 2023 • Bowen Jin, Yu Zhang, Yu Meng, Jiawei Han

Edges in many real-world social/information networks are associated with rich text information (e. g., user-user communications or user-product reviews).

Edge Classification Link Prediction +1

Paper
Code

Massively Multilingual Shallow Fusion with Large Language Models

no code implementations • 17 Feb 2023 • Ke Hu, Tara N. Sainath, Bo Li, Nan Du, Yanping Huang, Andrew M. Dai, Yu Zhang, Rodrigo Cabrera, Zhifeng Chen, Trevor Strohman

In this work, we propose to train a single multilingual language model (LM) for shallow fusion in multiple languages.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

JEIT: Joint End-to-End Model and Internal Language Model Training for Speech Recognition

no code implementations • 16 Feb 2023 • Zhong Meng, Weiran Wang, Rohit Prabhavalkar, Tara N. Sainath, Tongzhou Chen, Ehsan Variani, Yu Zhang, Bo Li, Andrew Rosenberg, Bhuvana Ramabhadran

We propose JEIT, a joint end-to-end (E2E) model and internal language model (ILM) training method to inject large-scale unpaired text into ILM during E2E training which improves rare-word speech recognition.

Language Modelling speech-recognition +1

Paper
Add Code

The Effect of Metadata on Scientific Literature Tagging: A Cross-Field Cross-Model Study

1 code implementation • 7 Feb 2023 • Yu Zhang, Bowen Jin, Qi Zhu, Yu Meng, Jiawei Han

Due to the exponential growth of scientific publications on the Web, there is a pressing need to tag each paper with fine-grained topics so that researchers can track their interested fields of study rather than drowning in the whole literature.

Language Modelling Multi Label Text Classification +3

Paper
Code

TrajMatch: Towards Automatic Spatio-temporal Calibration for Roadside LiDARs through Trajectory Matching

no code implementations • 4 Feb 2023 • Haojie Ren, Sha Zhang, Sugang Li, Yao Li, Xinchen Li, Jianmin Ji, Yu Zhang, Yanyong Zhang

In this paper, we propose TrajMatch -- the first system that can automatically calibrate for roadside LiDARs in both time and space.

Paper
Add Code

Efficient Domain Adaptation for Speech Foundation Models

no code implementations • 3 Feb 2023 • Bo Li, Dongseong Hwang, Zhouyuan Huo, Junwen Bai, Guru Prakash, Tara N. Sainath, Khe Chai Sim, Yu Zhang, Wei Han, Trevor Strohman, Francoise Beaufays

The FM encoder adapter and decoder are then finetuned to the target domain with a small amount of supervised in-domain data.

Decoder Domain Adaptation +3

Paper
Add Code

Physics-guided Residual Learning for Probabilistic Power Flow Analysis

no code implementations • 28 Jan 2023 • Kejun Chen, Yu Zhang

Probabilistic power flow (PPF) analysis is critical to power system operation and planning.

Paper
Add Code

From English to More Languages: Parameter-Efficient Model Reprogramming for Cross-Lingual Speech Recognition

no code implementations • 19 Jan 2023 • Chao-Han Huck Yang, Bo Li, Yu Zhang, Nanxin Chen, Rohit Prabhavalkar, Tara N. Sainath, Trevor Strohman

In this work, we propose a new parameter-efficient learning framework based on neural model reprogramming for cross-lingual speech recognition, which can \textbf{re-purpose} well-trained English automatic speech recognition (ASR) models to recognize the other languages.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Super-Resolution Harmonic Retrieval of Non-Circular Signals

no code implementations • 17 Jan 2023 • Yu Zhang, Yue Wang, Zhi Tian, Geert Leus, Gong Zhang

This paper proposes a super-resolution harmonic retrieval method for uncorrelated strictly non-circular signals, whose covariance and pseudo-covariance present Toeplitz and Hankel structures, respectively.

Retrieval Super-Resolution

Paper
Add Code

OA-BEV: Bringing Object Awareness to Bird's-Eye-View Representation for Multi-Camera 3D Object Detection

no code implementations • 13 Jan 2023 • Xiaomeng Chu, Jiajun Deng, Yuan Zhao, Jianmin Ji, Yu Zhang, Houqiang Li, Yanyong Zhang

To this end, we propose OA-BEV, a network that can be plugged into the BEV-based 3D object detection framework to bring out the objects by incorporating object-aware pseudo-3D features and depth features.

3D Object Detection Object +1

Paper
Add Code

Adaptively Clustering Neighbor Elements for Image Captioning

no code implementations • 5 Jan 2023 • Zihua Wang, Xu Yang, Haiyang Xu, Hanwang Zhang, and Qinghao Ye, Chenliang Li, and Weiwei Sun, Ming Yan, Songfang Huang, Fei Huang, Yu Zhang

We design a novel global-local Transformer named \textbf{Ada-ClustFormer} (\textbf{ACF}) to generate captions.

Clustering Decoder +1

Paper
Add Code

Learning Trajectory-Word Alignments for Video-Language Tasks

no code implementations • ICCV 2023 • Xu Yang, Zhangzikang Li, Haiyang Xu, Hanwang Zhang, Qinghao Ye, Chenliang Li, Ming Yan, Yu Zhang, Fei Huang, Songfang Huang

To amend this, we propose a novel TW-BERT to learn Trajectory-Word alignment by a newly designed trajectory-to-word (T2W) attention for solving video-language tasks.

Question Answering Retrieval +4

Paper
Add Code

Range-Nullspace Video Frame Interpolation With Focalized Motion Estimation

no code implementations • CVPR 2023 • ZHIYANG YU, Yu Zhang, Dongqing Zou, Xijun Chen, Jimmy S. Ren, Shunqing Ren

Continuous-time video frame interpolation is a fundamental technique in computer vision for its flexibility in synthesizing motion trajectories and novel video frames at arbitrary intermediate time steps.

Video Frame Interpolation

Paper
Add Code

Adaptive Positional Encoding for Bundle-Adjusting Neural Radiance Fields

no code implementations • ICCV 2023 • Zelin Gao, Weichen Dai, Yu Zhang

Neural Radiance Fields have shown great potential to synthesize novel views with only a few discrete image observations of the world.

Paper
Add Code

E2NeRF: Event Enhanced Neural Radiance Fields from Blurry Images

1 code implementation • ICCV 2023 • Yunshan Qi, Lin Zhu, Yu Zhang, Jia Li

To solve this problem, we propose a novel Event-Enhanced NeRF (E2NeRF) by utilizing the combination data of a bio-inspired event camera and a standard RGB camera.

Deblurring Image Deblurring +2

Paper
Code

PEAL: Prior-Embedded Explicit Attention Learning for Low-Overlap Point Cloud Registration

1 code implementation • CVPR 2023 • Junle Yu, Luwei Ren, Yu Zhang, Wenhui Zhou, Lili Lin, Guojun Dai

Recently, it has achieved huge success in incorporating Transformer into point cloud feature representation, which usually adopts a self-attention module to learn intra-point-cloud features first, then utilizes a cross-attention module to perform feature exchange between input point clouds.

Point Cloud Registration

Paper
Code

Dynamic Sparse Network for Time Series Classification: Learning What to "see''

1 code implementation • 19 Dec 2022 • Qiao Xiao, Boqian Wu, Yu Zhang, Shiwei Liu, Mykola Pechenizkiy, Elena Mocanu, Decebal Constantin Mocanu

The receptive field (RF), which determines the region of time series to be ``seen'' and used, is critical to improve the performance for time series classification (TSC).

Time Series Time Series Analysis +1

Paper
Code

Mu$^{2}$SLAM: Multitask, Multilingual Speech and Language Models

no code implementations • 19 Dec 2022 • Yong Cheng, Yu Zhang, Melvin Johnson, Wolfgang Macherey, Ankur Bapna

We present Mu$^{2}$SLAM, a multilingual sequence-to-sequence model pre-trained jointly on unlabeled speech, unlabeled text and supervised data spanning Automatic Speech Recognition (ASR), Automatic Speech Translation (AST) and Machine Translation (MT), in over 100 languages.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +7

Paper
Add Code

Effective Seed-Guided Topic Discovery by Integrating Multiple Types of Contexts

1 code implementation • 12 Dec 2022 • Yu Zhang, Yunyi Zhang, Martin Michalski, Yucheng Jiang, Yu Meng, Jiawei Han

Instead of mining coherent topics from a given text corpus in a completely unsupervised manner, seed-guided topic discovery methods leverage user-provided seed words to extract distinctive and coherent topics so that the mined topics can better cater to the user's interest.

Language Modelling Word Embeddings

Paper
Code

Unsupervised Deep Learning for AC Optimal Power Flow via Lagrangian Duality

no code implementations • 7 Dec 2022 • Kejun Chen, Shourya Bose, Yu Zhang

Non-convex AC optimal power flow (AC-OPF) is a fundamental optimization problem in power system analysis.

Paper
Add Code

Entity Set Co-Expansion in StackOverflow

no code implementations • 5 Dec 2022 • Yu Zhang, Yunyi Zhang, Yucheng Jiang, Martin Michalski, Yu Deng, Lucian Popa, ChengXiang Zhai, Jiawei Han

Given a few seed entities of a certain type (e. g., Software or Programming Language), entity set expansion aims to discover an extensive set of entities that share the same type as the seeds.

graph construction Management

Paper
Add Code

Feature Aggregation and Propagation Network for Camouflaged Object Detection

1 code implementation • 2 Dec 2022 • Tao Zhou, Yi Zhou, Chen Gong, Jian Yang, Yu Zhang

In this paper, we propose a novel Feature Aggregation and Propagation Network (FAP-Net) for camouflaged object detection.

Object object-detection +1

Paper
Code

ExpNet: A unified network for Expert-Level Classification

no code implementations • 29 Nov 2022 • Junde Wu, Huihui Fang, Yehui Yang, Yu Zhang, Haoyi Xiong, Huazhu Fu, Yanwu Xu

In the paper, we call them expert-level classification.

Classification

Paper
Add Code

TSGP: Two-Stage Generative Prompting for Unsupervised Commonsense Question Answering

no code implementations • 24 Nov 2022 • Yueqing Sun, Yu Zhang, Le Qi, Qi Shi

In this paper, we aim to address the above limitation by leveraging the implicit knowledge stored in PrLMs and propose a two-stage prompt-based unsupervised commonsense question answering framework (TSGP).

Answer Generation Question Answering +1

Paper
Add Code

Leveraging per Image-Token Consistency for Vision-Language Pre-training

no code implementations • CVPR 2023 • Yunhao Gou, Tom Ko, Hansi Yang, James Kwok, Yu Zhang, Mingxuan Wang

(2) Under-utilization of the unmasked tokens: CMLM primarily focuses on the masked token but it cannot simultaneously leverage other tokens to learn vision-language associations.

Language Modelling Masked Language Modeling +1

Paper
Add Code

Disentangling Task Relations for Few-shot Text Classification via Self-Supervised Hierarchical Task Clustering

no code implementations • 16 Nov 2022 • Juan Zha, Zheng Li, Ying WEI, Yu Zhang

However, most prior works assume that all the tasks are sampled from a single data source, which cannot adapt to real-world scenarios where tasks are heterogeneous and lie in different distributions.

Clustering Few-Shot Text Classification +1

Paper
Add Code

Normative Modeling via Conditional Variational Autoencoder and Adversarial Learning to Identify Brain Dysfunction in Alzheimer's Disease

no code implementations • 13 Nov 2022 • Xuetong Wang, Kanhao Zhao, Rong Zhou, Alex Leow, Ricardo Osorio, Yu Zhang, Lifang He

Normative modeling is an emerging and promising approach to effectively study disorder heterogeneity in individual participants.

Paper
Add Code

TLP: A Deep Learning-based Cost Model for Tensor Program Tuning

1 code implementation • 7 Nov 2022 • Yi Zhai, Yu Zhang, Shuo Liu, Xiaomeng Chu, Jie Peng, Jianmin Ji, Yanyong Zhang

Instead of extracting features from the tensor program itself, TLP extracts features from the schedule primitives.

Multi-Task Learning

Paper
Code

Tuning Language Models as Training Data Generators for Augmentation-Enhanced Few-Shot Learning

1 code implementation • 6 Nov 2022 • Yu Meng, Martin Michalski, Jiaxin Huang, Yu Zhang, Tarek Abdelzaher, Jiawei Han

In this work, we study few-shot learning with PLMs from a different perspective: We first tune an autoregressive PLM on the few-shot samples and then use it as a generator to synthesize a large amount of novel training samples which augment the original training set.

Few-Shot Learning

Paper
Code

Max Markov Chain

no code implementations • 2 Nov 2022 • Yu Zhang, Mitchell Bucklew

In this paper, we introduce Max Markov Chain (MMC), a novel representation for a useful subset of High-order Markov Chains (HMCs) with sparse correlations among the states.

Paper
Add Code

A Quantum Kernel Learning Approach to Acoustic Modeling for Spoken Command Recognition

no code implementations • 2 Nov 2022 • Chao-Han Huck Yang, Bo Li, Yu Zhang, Nanxin Chen, Tara N. Sainath, Sabato Marco Siniscalchi, Chin-Hui Lee

We propose a quantum kernel learning (QKL) framework to address the inherent data sparsity issues often encountered in training large-scare acoustic models in low-resource scenarios.

Spoken Command Recognition

Paper
Add Code

MedSegDiff: Medical Image Segmentation with Diffusion Probabilistic Model

2 code implementations • 1 Nov 2022 • Junde Wu, Rao Fu, Huihui Fang, Yu Zhang, Yehui Yang, Haoyi Xiong, Huiying Liu, Yanwu Xu

Inspired by the success of DPM, we propose the first DPM based model toward general medical image segmentation tasks, which we named MedSegDiff.

Anomaly Detection Brain Tumor Segmentation +8

932

Paper
Code

Modular Hybrid Autoregressive Transducer

no code implementations • 31 Oct 2022 • Zhong Meng, Tongzhou Chen, Rohit Prabhavalkar, Yu Zhang, Gary Wang, Kartik Audhkhasi, Jesse Emond, Trevor Strohman, Bhuvana Ramabhadran, W. Ronny Huang, Ehsan Variani, Yinghui Huang, Pedro J. Moreno

In this work, we propose a modular hybrid autoregressive transducer (MHAT) that has structurally separated label and blank decoders to predict label and blank distributions, respectively, along with a shared acoustic encoder.

Decoder Language Modelling +2

Paper
Add Code

Accelerating RNN-T Training and Inference Using CTC guidance

no code implementations • 29 Oct 2022 • Yongqiang Wang, Zhehuai Chen, Chengjian Zheng, Yu Zhang, Wei Han, Parisa Haghani

We propose a novel method to accelerate training and inference process of recurrent neural network transducer (RNN-T) based on the guidance from a co-trained connectionist temporal classification (CTC) model.

Decoder

Paper
Add Code

Residual Adapters for Few-Shot Text-to-Speech Speaker Adaptation

no code implementations • 28 Oct 2022 • Nobuyuki Morioka, Heiga Zen, Nanxin Chen, Yu Zhang, Yifan Ding

Adapting a neural text-to-speech (TTS) model to a target speaker typically involves fine-tuning most if not all of the parameters of a pretrained multi-speaker backbone model.

Paper
Add Code

Visually-Aware Audio Captioning With Adaptive Audio-Visual Attention

1 code implementation • 28 Oct 2022 • Xubo Liu, Qiushi Huang, Xinhao Mei, Haohe Liu, Qiuqiang Kong, Jianyuan Sun, Shengchen Li, Tom Ko, Yu Zhang, Lilian H. Tang, Mark D. Plumbley, Volkan Kılıç, Wenwu Wang

Audio captioning aims to generate text descriptions of audio clips.

AudioCaps Audio captioning +1

Paper
Code

Virtuoso: Massive Multilingual Speech-Text Joint Semi-Supervised Learning for Text-To-Speech

no code implementations • 27 Oct 2022 • Takaaki Saeki, Heiga Zen, Zhehuai Chen, Nobuyuki Morioka, Gary Wang, Yu Zhang, Ankur Bapna, Andrew Rosenberg, Bhuvana Ramabhadran

This paper proposes Virtuoso, a massively multilingual speech-text joint semi-supervised learning framework for text-to-speech synthesis (TTS) models.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Personalized Dialogue Generation with Persona-Adaptive Attention

1 code implementation • 27 Oct 2022 • Qiushi Huang, Yu Zhang, Tom Ko, Xubo Liu, Bo Wu, Wenwu Wang, Lilian Tang

Persona-based dialogue systems aim to generate consistent responses based on historical context and predefined persona.

Dialogue Generation

Paper
Code

Deep Bayesian Video Frame Interpolation

1 code implementation • Conference 2022 • ZHIYANG YU, Yu Zhang, Xujie Xiang, Dongqing Zou, Xijun Chen, Jimmy S. Ren

Abstract.

Ranked #1 on Video Frame Interpolation on GoPro

Video Frame Interpolation

Paper
Code

Maestro-U: Leveraging joint speech-text representation learning for zero supervised speech ASR

no code implementations • 18 Oct 2022 • Zhehuai Chen, Ankur Bapna, Andrew Rosenberg, Yu Zhang, Bhuvana Ramabhadran, Pedro Moreno, Nanxin Chen

First, we show that by combining speech representations with byte-level text representations and use of language embeddings, we can dramatically reduce the Character Error Rate (CER) on languages with no supervised speech from 64. 8\% to 30. 8\%, a relative reduction of 53\%.

Representation Learning speech-recognition +2

Paper
Add Code

Improving generalizability of distilled self-supervised speech processing models under distorted settings

1 code implementation • 14 Oct 2022 • Kuan-Po Huang, Yu-Kuan Fu, Tsu-Yuan Hsu, Fabian Ritter Gutierrez, Fan-Lin Wang, Liang-Hsuan Tseng, Yu Zhang, Hung-Yi Lee

Self-supervised learned (SSL) speech pre-trained models perform well across various speech processing tasks.

Knowledge Distillation

Paper
Code

JOIST: A Joint Speech and Text Streaming Model For ASR

no code implementations • 13 Oct 2022 • Tara N. Sainath, Rohit Prabhavalkar, Ankur Bapna, Yu Zhang, Zhouyuan Huo, Zhehuai Chen, Bo Li, Weiran Wang, Trevor Strohman

In addition, we explore JOIST using a streaming E2E model with an order of magnitude more data, which are also novelties compared to previous works.

Paper
Add Code

Comparison of Soft and Hard Target RNN-T Distillation for Large-scale ASR

no code implementations • 11 Oct 2022 • Dongseong Hwang, Khe Chai Sim, Yu Zhang, Trevor Strohman

Knowledge distillation is an effective machine learning technique to transfer knowledge from a teacher model to a smaller student model, especially with unlabeled data.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

GIDN: A Lightweight Graph Inception Diffusion Network for High-efficient Link Prediction

no code implementations • 4 Oct 2022 • Zixiao Wang, Yuluo Guo, Jin Zhao, Yu Zhang, Hui Yu, Xiaofei Liao, Biao Wang, Ting Yu

In this paper, we propose a Graph Inception Diffusion Networks(GIDN) model.

Ranked #1 on Link Property Prediction on ogbl-ddi

Link Prediction

Paper
Add Code

On Clustering Trend in Language Evolution Based on Dynamical Behaviors of Multi-Agent Model

no code implementations • 3 Oct 2022 • Yu Zhang, Li Liu, Chen Diao, Ning Cai

Computer model has been extensively adopted to overcome the time limitation of language evolution by transforming language theory into physical modeling mechanism, which helps to explore the general laws of the evolution.

Clustering

Paper
Add Code

Selective Partial Domain Adaptation

2 code implementations • British Machine Vision Conference 2022 • Pengxin Guo, Jinjing Zhu, Yu Zhang

To solve this problem, we propose a Selective Partial Domain Adaptation (SPDA) method, which selects useful data for the adaptation to the target domain.

Ranked #1 on Partial Domain Adaptation on VisDA2017

Partial Domain Adaptation

Paper
Code

A Novel Observer-Centric Approach for Detecting Faults in Islanded AC Microgrids with Uncertainties

no code implementations • 26 Sep 2022 • Gabriel Intriago, Andres Intriago, Charalambos Konstantinou, Yu Zhang

This paper proposes a strategy based on observers and residuals for detecting internal faults in grid-forming inverters with power-sharing coordination.

Fault Detection

Paper
Add Code

Spatiotemporal Multi-scale Bilateral Motion Network for Gait Recognition

no code implementations • 26 Sep 2022 • Xinnan Ding, Shan Du, Yu Zhang, Kejun Wang

The critical goal of gait recognition is to acquire the inter-frame walking habit representation from the gait sequences.

Gait Recognition Optical Flow Estimation

Paper
Add Code

Real-Time Power System Event Detection: A Novel Instance Selection Approach

no code implementations • 25 Sep 2022 • Gabriel Intriago, Yu Zhang

Instance selection is a vital technique for energy big data analytics.

Event Detection

Paper
Add Code

How Good Is Neural Combinatorial Optimization? A Systematic Evaluation on the Traveling Salesman Problem

no code implementations • 22 Sep 2022 • Shengcai Liu, Yu Zhang, Ke Tang, Xin Yao

Hopefully, this work would help with a better understanding of the strengths and weaknesses of NCO and provide a comprehensive evaluation protocol for further benchmarking NCO approaches in comparison to other approaches.

Benchmarking Combinatorial Optimization +1

Paper
Add Code

Discrete Linear Canonical Transform on Graphs

no code implementations • 21 Sep 2022 • Yu Zhang, Bing-Zhao Li

In this paper, we propose and design the definition of the discrete linear canonical transform on graphs (GLCT), which is an extension of the discrete linear canonical transform (DLCT), just as the graph Fourier transform (GFT) is an extension of the discrete Fourier transform (DFT).

Paper
Add Code

Online Beam Learning with Interference Nulling for Millimeter Wave MIMO Systems

no code implementations • 9 Sep 2022 • Yu Zhang, Tawfik Osman, Ahmed Alkhateeb

Furthermore, a hardware proof-of-concept prototype based on mmWave phased arrays is built and used to implement and evaluate the developed online beam learning solutions in realistic scenarios.

Paper
Add Code

Exploiting Deep Reinforcement Learning for Edge Caching in Cell-Free Massive MIMO Systems

no code implementations • 26 Aug 2022 • Yu Zhang, Shuaifei Chen, Jiayi Zhang

Cell-free massive multiple-input-multiple-output is promising to meet the stringent quality-of-experience (QoE) requirements of railway wireless communications by coordinating many successional access points (APs) to serve the onboard users coherently.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Local Low-Rank Approximation With Superpixel-Guided Locality Preserving Graph for Hyperspectral Image Classification

1 code implementation • journal 2022 • Shujun Yang, Yu Zhang, Yuheng Jia, and Weijia Zhang

By taking advantage of the local manifold structure, a Laplacian graph is constructed from the superpixels to ensure that a typical pixel should be similar to its neighbors within the same superpixel.

Hyperspectral Image Classification Superpixels

Paper
Code

An Adaptive Repeated-Intersection-Reduction Local Search for the Maximum Independent Set Problem

no code implementations • 16 Aug 2022 • Enqiang Zhu, Yu Zhang, Chanjuan Liu

The maximum independent set (MIS) problem, a classical NP-hard problem with extensive applications in various areas, aims to find the largest set of vertices with no edge among them.

Paper
Add Code

See What You See: Self-supervised Cross-modal Retrieval of Visual Stimuli from Brain Activity

no code implementations • 7 Aug 2022 • Zesheng Ye, Lina Yao, Yu Zhang, Sylvia Gustin

Recent studies demonstrate the use of a two-stage supervised framework to generate images that depict human perception to visual stimuli from EEG, referring to EEG-visual reconstruction.

Cross-Modal Retrieval EEG +1

Paper
Add Code

CROLoss: Towards a Customizable Loss for Retrieval Models in Recommender Systems

1 code implementation • 5 Aug 2022 • Yongxiang Tang, Wentao Bai, Guilin Li, Xialong Liu, Yu Zhang

In this paper, we proposed the Customizable Recall@N Optimization Loss (CROLoss), a loss function that can directly optimize the Recall@N metrics and is customizable for different choices of N. This proposed CROLoss formulation defines a more generalized loss function space, covering most of the conventional loss functions as special cases.

Recommendation Systems Retrieval

Paper
Code

An Efficient Person Clustering Algorithm for Open Checkout-free Groceries

1 code implementation • 5 Aug 2022 • Junde Wu, Yu Zhang, Rao Fu, Yuanpei Liu, Jing Gao

Then, to ensure that the method adapts to the dynamic and unseen person flow, we propose Graph Convolutional Network (GCN) with a simple Nearest Neighbor (NN) strategy to accurately cluster the instances of CSG.

Clustering

148

Paper
Code

A Study of Modeling Rising Intonation in Cantonese Neural Speech Synthesis

no code implementations • 3 Aug 2022 • Qibing Bai, Tom Ko, Yu Zhang

In human speech, the attitude of a speaker cannot be fully expressed only by the textual content.

Speech Synthesis

Paper
Add Code

Dense Cross-Query-and-Support Attention Weighted Mask Aggregation for Few-Shot Segmentation

1 code implementation • 18 Jul 2022 • Xinyu Shi, Dong Wei, Yu Zhang, Donghuan Lu, Munan Ning, Jiashun Chen, Kai Ma, Yefeng Zheng

A key to this challenging task is to fully utilize the information in the support images by exploiting fine-grained correlations between the query and support images.

Ranked #4 on Few-Shot Semantic Segmentation on COCO-20i (1-shot)

Few-Shot Semantic Segmentation Segmentation +1

Paper
Code

Masked Spatial-Spectral Autoencoders Are Excellent Hyperspectral Defenders

no code implementations • 16 Jul 2022 • Jiahao Qi, Zhiqiang Gong, Xingyue Liu, Kangcheng Bin, Chen Chen, YongQian Li, Wei Xue, Yu Zhang, Ping Zhong

Deep learning methodology contributes a lot to the development of hyperspectral image (HSI) analysis community.

Adversarial Defense Learning Theory +1

Paper
Add Code

Deformer: Towards Displacement Field Learning for Unsupervised Medical Image Registration

1 code implementation • 7 Jul 2022 • Jiashun Chen, Donghuan Lu, Yu Zhang, Dong Wei, Munan Ning, Xinyu Shi, Zhe Xu, Yefeng Zheng

In this study, we propose a novel Deformer module along with a multi-scale framework for the deformable image registration task.

Image Registration Medical Image Registration

Paper
Code

Structured Light with Redundancy Codes

no code implementations • 18 Jun 2022 • Zhanghao Sun, Yu Zhang, Yicheng Wu, Dong Huo, Yiming Qian, Jian Wang

We propose three applications using our redundancy codes: (1) Self error-correction for SL imaging under strong ambient light, (2) Error detection for adaptive reconstruction under global illumination, and (3) Interference filtering with device-specific projection sequence encoding, especially for event camera-based SL and light curtain devices.

Paper
Add Code

Soft Retargeting Network for Click Through Rate Prediction

no code implementations • 4 Jun 2022 • Xiaochen Li, Xin Song, Pengjia Yuan, Xialong Liu, Yu Zhang

In this paper, we focus on a new type of user interest, i. e., user retargeting interest.

Click-Through Rate Prediction Graph Embedding

Paper
Add Code

FLEURS: Few-shot Learning Evaluation of Universal Representations of Speech

1 code implementation • 25 May 2022 • Alexis Conneau, Min Ma, Simran Khanuja, Yu Zhang, Vera Axelrod, Siddharth Dalmia, Jason Riesa, Clara Rivera, Ankur Bapna

We introduce FLEURS, the Few-shot Learning Evaluation of Universal Representations of Speech benchmark.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +6

Paper
Code

Co-optimization of Battery Routing and Load Restoration for Microgrids with Mobile Energy Storage Systems

no code implementations • 24 May 2022 • Shourya Bose, Sifat Chowdhury, Yu Zhang

Mobile energy storage systems (MESS) offer great operational flexibility to enhance the resiliency of distribution systems in an emergency condition.

Paper
Add Code

Heterformer: Transformer-based Deep Node Representation Learning on Heterogeneous Text-Rich Networks

1 code implementation • 20 May 2022 • Bowen Jin, Yu Zhang, Qi Zhu, Jiawei Han

In heterogeneous text-rich networks, this task is more challenging due to (1) presence or absence of text: Some nodes are associated with rich textual information, while others are not; (2) diversity of types: Nodes and edges of multiple types form a heterogeneous network structure.

Clustering Graph Attention +5

Paper
Code

Transferable Physical Attack against Object Detection with Separable Attention

no code implementations • 19 May 2022 • Yu Zhang, Zhiqiang Gong, Yichuang Zhang, YongQian Li, Kangcheng Bin, Jiahao Qi, Wei Xue, Ping Zhong

Transferable adversarial attack is always in the spotlight since deep learning models have been demonstrated to be vulnerable to adversarial samples.

Adversarial Attack object-detection +1

Paper
Add Code

Leveraging Pseudo-labeled Data to Improve Direct Speech-to-Speech Translation

1 code implementation • 18 May 2022 • Qianqian Dong, Fengpeng Yue, Tom Ko, Mingxuan Wang, Qibing Bai, Yu Zhang

Direct Speech-to-speech translation (S2ST) has drawn more and more attention recently.

Speech-to-Speech Translation Translation

Paper
Code

Seed-Guided Topic Discovery with Out-of-Vocabulary Seeds

1 code implementation • NAACL 2022 • Yu Zhang, Yu Meng, Xuan Wang, Sheng Wang, Jiawei Han

Discovering latent topics from text corpora has been studied for decades.

General Knowledge Topic Models

Paper
Code

Side-aware Meta-Learning for Cross-Dataset Listener Diagnosis with Subjective Tinnitus

no code implementations • 3 May 2022 • Yun Li, Zhe Liu, Lina Yao, Molly Lucas, Jessica J. M. Monaghan, Yu Zhang

With the development of digital technology, machine learning has paved the way for the next generation of tinnitus diagnoses.

BIG-bench Machine Learning EEG +1

Paper
Add Code

Variation-cognizant Probabilistic Power Flow Analysis via Multi-task Learning

no code implementations • 2 May 2022 • Kejun Chen, Yu Zhang

With an increasing high penetration of solar photovoltaic generation in electric power grids, voltage phasors and branch power flows experience more severe fluctuations.

Multi-Task Learning regression

Paper
Add Code

Differentially Private Load Restoration for Microgrids with Distributed Energy Storage

no code implementations • 29 Apr 2022 • Shourya Bose, Yu Zhang

Distributed energy storage systems (ESSs) can be efficiently leveraged for load restoration (LR) for a microgrid (MG) in island mode.

Paper
Add Code

Interpretable Graph Convolutional Network of Multi-Modality Brain Imaging for Alzheimer's Disease Diagnosis

no code implementations • 27 Apr 2022 • Houliang Zhou, Lifang He, Yu Zhang, Li Shen, Brian Chen

Identification of brain regions related to the specific neurological disorders are of great importance for biomarker and diagnostic studies.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.