Search Results for author: Hao Zhang

Found 406 papers, 149 papers with code

Friendly Topic Assistant for Transformer Based Abstractive Summarization

no code implementations EMNLP 2020 Zhengjue Wang, Zhibin Duan, Hao Zhang, Chaojie Wang, Long Tian, Bo Chen, Mingyuan Zhou

Abstractive document summarization is a comprehensive task including document understanding and summary generation, in which area Transformer-based models have achieved the state-of-the-art performance.

Abstractive Text Summarization Document Summarization +2

Incorporating Instructional Prompts into a Unified Generative Framework for Joint Multiple Intent Detection and Slot Filling

1 code implementation COLING 2022 Yangjun Wu, Han Wang, Dongxiang Zhang, Gang Chen, Hao Zhang

Specifically, we design 5-type templates as instructional prompts, and each template includes a question that acts as the driver to teach UGEN to grasp the paradigm, options that list the candidate intents or slots to reduce the answer search space, and the context denotes original utterance.

Intent Detection Question Answering +3

BIRNAT: Bidirectional Recurrent Neural Networks with Adversarial Training for Video Snapshot Compressive Imaging

1 code implementation ECCV 2020 Ziheng Cheng, Ruiying Lu, Zhengjue Wang, Hao Zhang, Bo Chen, Ziyi Meng, Xin Yuan

This measurement and the modulation masks are fed into our Recurrent Neural Network (RNN) to reconstruct the desired high-speed frames.

WordNet Troponymy and Extraction of “Manner-Result” Relations

no code implementations GWC 2018 Aliaksandr Huminski, Hao Zhang

The procedure of extraction includes three steps and the results are based on the analysis of the whole set of verbs in WordNet.

Multi-view Content-aware Indexing for Long Document Retrieval

no code implementations23 Apr 2024 Kuicai Dong, Derrick Goh Xin Deik, Yi Quan Lee, Hao Zhang, Xiangyang Li, Cong Zhang, Yong liu

As they do not consider content structures, the resultant chunks can exclude vital information or include irrelevant content.

Chunking Question Answering +1

Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length

1 code implementation12 Apr 2024 Xuezhe Ma, Xiaomeng Yang, Wenhan Xiong, Beidi Chen, Lili Yu, Hao Zhang, Jonathan May, Luke Zettlemoyer, Omer Levy, Chunting Zhou

The quadratic complexity and weak length extrapolation of Transformers limits their ability to scale to long sequences, and while sub-quadratic solutions like linear attention and state space models exist, they empirically underperform Transformers in pretraining efficiency and downstream task accuracy.

SSwsrNet: A Semi-Supervised Few-Shot Learning Framework for Wireless Signal Recognition

no code implementations3 Apr 2024 Hao Zhang, Fuhui Zhou, Qihui Wu, Naofal Al-Dhahir

Moreover, a modular semi-supervised learning method that combines labeled and unlabeled data using MixMatch is exploited to further improve the classification performance under few-sample conditions.

Classification Few-Shot Learning

Toward Inference-optimal Mixture-of-Expert Large Language Models

no code implementations3 Apr 2024 Longfei Yun, Yonghao Zhuang, Yao Fu, Eric P Xing, Hao Zhang

Like dense models, training MoEs requires answering the same question: given a training budget, what is the optimal allocation on the model size and number of tokens?

DPA-Net: Structured 3D Abstraction from Sparse Views via Differentiable Primitive Assembly

no code implementations1 Apr 2024 Fenggen Yu, Yiming Qian, Xu Zhang, Francisca Gil-Ureta, Brian Jackson, Eric Bennett, Hao Zhang

We present a differentiable rendering framework to learn structured 3D abstractions in the form of primitive assemblies from sparse RGB images capturing a 3D object.

Test-time Adaptation

Multi-Task Dense Prediction via Mixture of Low-Rank Experts

1 code implementation26 Mar 2024 YuQi Yang, Peng-Tao Jiang, Qibin Hou, Hao Zhang, Jinwei Chen, Bo Li

Furthermore, to control the parameters and computational cost brought by the increase in the number of experts, we take inspiration from LoRA and propose to leverage the low-rank format of a vanilla convolution in the expert network.

Text-IF: Leveraging Semantic Text Guidance for Degradation-Aware and Interactive Image Fusion

1 code implementation25 Mar 2024 Xunpeng Yi, Han Xu, Hao Zhang, Linfeng Tang, Jiayi Ma

Through the text semantic encoder and semantic interaction fusion decoder, Text-IF is accessible to the all-in-one infrared and visible image degradation-aware processing and the interactive flexible fusion outcomes.

Empowering Segmentation Ability to Multi-modal Large Language Models

no code implementations21 Mar 2024 YuQi Yang, Peng-Tao Jiang, Jing Wang, Hao Zhang, Kai Zhao, Jinwei Chen, Bo Li

Multi-modal large language models (MLLMs) can understand image-language prompts and demonstrate impressive reasoning ability.

Dialogue Generation Segmentation +1

TAPTR: Tracking Any Point with Transformers as Detection

no code implementations19 Mar 2024 Hongyang Li, Hao Zhang, Shilong Liu, Zhaoyang Zeng, Tianhe Ren, Feng Li, Lei Zhang

Based on the observation that point tracking bears a great resemblance to object detection and tracking, we borrow designs from DETR-like algorithms to address the task of TAP.

object-detection Object Detection +2

Learning Transferable Time Series Classifier with Cross-Domain Pre-training from Language Model

no code implementations19 Mar 2024 Mingyue Cheng, Xiaoyu Tao, Qi Liu, Hao Zhang, Yiheng Chen, Chenyi Lei

To address this challenge, we propose CrossTimeNet, a novel cross-domain SSL learning framework to learn transferable knowledge from various domains to largely benefit the target downstream task.

Language Modelling Time Series +1

AVIBench: Towards Evaluating the Robustness of Large Vision-Language Model on Adversarial Visual-Instructions

no code implementations14 Mar 2024 Hao Zhang, Wenqi Shao, Hong Liu, Yongqiang Ma, Ping Luo, Yu Qiao, Kaipeng Zhang

To bridge this gap, we introduce AVIBench, a framework designed to analyze the robustness of LVLMs when facing various adversarial visual-instructions (AVIs), including four types of image-based AVIs, ten types of text-based AVIs, and nine types of content bias AVIs (such as gender, violence, cultural, and racial biases, among others).

Fairness Language Modelling

Towards Personalized Evaluation of Large Language Models with An Anonymous Crowd-Sourcing Platform

no code implementations13 Mar 2024 Mingyue Cheng, Hao Zhang, Jiqian Yang, Qi Liu, Li Li, Xin Huang, Liwei Song, Zhi Li, Zhenya Huang, Enhong Chen

Through this gateway, users have the opportunity to submit their questions, testing the models on a personalized and potentially broader range of capabilities.

Language Modelling Large Language Model

MeaCap: Memory-Augmented Zero-shot Image Captioning

1 code implementation6 Mar 2024 Zequn Zeng, Yan Xie, Hao Zhang, Chiyu Chen, Zhengjue Wang, Bo Chen

The framework of MeaCap achieves the state-of-the-art performance on a series of zero-shot IC settings.

Caption Generation Image Captioning +4

Improving Adversarial Energy-Based Model via Diffusion Process

no code implementations4 Mar 2024 Cong Geng, Tian Han, Peng-Tao Jiang, Hao Zhang, Jinwei Chen, Søren Hauberg, Bo Li

Generative models have shown strong generation ability while efficient likelihood estimation is less explored.

Denoising Density Estimation

CLLMs: Consistency Large Language Models

1 code implementation28 Feb 2024 Siqi Kou, Lanxiang Hu, Zhezhi He, Zhijie Deng, Hao Zhang

Parallel decoding methods such as Jacobi decoding show promise for more efficient LLM inference as it breaks the sequential nature of the LLM decoding process and transforms it into parallelizable computation.

AnaMoDiff: 2D Analogical Motion Diffusion via Disentangled Denoising

no code implementations5 Feb 2024 Maham Tanveer, Yizhi Wang, Ruiqi Wang, Nanxuan Zhao, Ali Mahdavi-Amiri, Hao Zhang

We present AnaMoDiff, a novel diffusion-based method for 2D motion analogies that is applied to raw, unannotated videos of articulated characters.

Denoising Optical Flow Estimation

CNS-Edit: 3D Shape Editing via Coupled Neural Shape Optimization

no code implementations4 Feb 2024 Jingyu Hu, Ka-Hei Hui, Zhengzhe Liu, Hao Zhang, Chi-Wing Fu

First, we design the coupled neural shape (CNS) representation for supporting 3D shape editing.

Break the Sequential Dependency of LLM Inference Using Lookahead Decoding

1 code implementation3 Feb 2024 Yichao Fu, Peter Bailis, Ion Stoica, Hao Zhang

Autoregressive decoding of large language models (LLMs) is memory bandwidth bounded, resulting in high latency and significant wastes of the parallel processing power of modern accelerators.

Code Completion

APIServe: Efficient API Support for Large-Language Model Inferencing

no code implementations2 Feb 2024 Reyna Abhyankar, Zijian He, Vikranth Srivatsa, Hao Zhang, Yiying Zhang

Large language models are increasingly integrated with external tools and APIs like ChatGPT plugins to extend their capability beyond language-centric tasks.

Language Modelling Large Language Model

Overview of Sensing Attacks on Autonomous Vehicle Technologies and Impact on Traffic Flow

no code implementations26 Jan 2024 Zihao Li, Sixu Li, Hao Zhang, Yang Zhou, Siyang Xie, Yunlong Zhang

While perception systems in Connected and Autonomous Vehicles (CAVs), which encompass both communication technologies and advanced sensors, promise to significantly reduce human driving errors, they also expose CAVs to various cyberattacks.

Autonomous Vehicles

Grounded SAM: Assembling Open-World Models for Diverse Visual Tasks

1 code implementation25 Jan 2024 Tianhe Ren, Shilong Liu, Ailing Zeng, Jing Lin, Kunchang Li, He Cao, Jiayu Chen, Xinyu Huang, Yukang Chen, Feng Yan, Zhaoyang Zeng, Hao Zhang, Feng Li, Jie Yang, Hongyang Li, Qing Jiang, Lei Zhang

We introduce Grounded SAM, which uses Grounding DINO as an open-set object detector to combine with the segment anything model (SAM).

Segmentation

Parameter-Efficient Conversational Recommender System as a Language Processing Task

1 code implementation25 Jan 2024 Mathieu Ravaut, Hao Zhang, Lu Xu, Aixin Sun, Yong liu

Conversational recommender systems (CRS) aim to recommend relevant items to users by eliciting user preference through natural language conversation.

Dialogue Generation Knowledge Graphs +2

Focaler-IoU: More Focused Intersection over Union Loss

1 code implementation19 Jan 2024 Hao Zhang, Shuaijie Zhang

Existing researchs improve regression performance by utilizing the geometric relationship between bounding boxes, while ignoring the impact of difficult and easy sample distribution on bounding box regression.

Object object-detection +2

Learning Implicit Representation for Reconstructing Articulated Objects

no code implementations16 Jan 2024 Hao Zhang, Fang Li, Samyak Rawlekar, Narendra Ahuja

Our method simultaneously estimates the visible (explicit) representation (3D shapes, colors, camera parameters) and the implicit skeletal representation, from motion cues in the object video without 3D supervision.

3D Reconstruction Object

Empirical Evidence for the Fragment level Understanding on Drug Molecular Structure of LLMs

1 code implementation15 Jan 2024 Xiuyuan Hu, Guoqing Liu, Yang Zhao, Hao Zhang

AI for drug discovery has been a research hotspot in recent years, and SMILES-based language models has been increasingly applied in drug molecular design.

Drug Discovery

Crafter: Facial Feature Crafting against Inversion-based Identity Theft on Deep Models

no code implementations14 Jan 2024 Shiming Wang, Zhe Ji, Liyao Xiang, Hao Zhang, Xinbing Wang, Chenghu Zhou, Bo Li

However, such methods can not defend against adaptive attacks, in which an attacker takes a countermove against a known defence strategy.

SnapCap: Efficient Snapshot Compressive Video Captioning

no code implementations10 Jan 2024 JianQiao Sun, Yudi Su, Hao Zhang, Ziheng Cheng, Zequn Zeng, Zhengjue Wang, Bo Chen, Xin Yuan

To address these problems, in this paper, we propose a novel VC pipeline to generate captions directly from the compressed measurement, which can be captured by a snapshot compressive sensing camera and we dub our model SnapCap.

Compressive Sensing Video Captioning

MPN: Leveraging Multilingual Patch Neuron for Cross-lingual Model Editing

no code implementations6 Jan 2024 Nianwen Si, Hao Zhang, WeiQiang Zhang

Large language models are known for encoding a vast amount of factual knowledge, but they often becomes outdated due to the ever-changing nature of external information.

Model Editing

FED-NeRF: Achieve High 3D Consistency and Temporal Coherence for Face Video Editing on Dynamic NeRF

1 code implementation5 Jan 2024 Hao Zhang, Yu-Wing Tai, Chi-Keung Tang

However, achieving simultaneously multi-view consistency and temporal coherence while editing video sequences remains a formidable challenge.

Video Editing

Shape-IoU: More Accurate Metric considering Bounding Box Shape and Scale

1 code implementation29 Dec 2023 Hao Zhang, Shuaijie Zhang

As an important component of the detector localization branch, bounding box regression loss plays a significant role in object detection tasks.

object-detection Object Detection +1

Deep Unfolding Network with Spatial Alignment for multi-modal MRI reconstruction

no code implementations28 Dec 2023 Hao Zhang, Qi Wang, Jun Shi, Shihui Ying, Zhijie Wen

In this paper, we construct a novel Deep Unfolding Network with Spatial Alignment, termed DUN-SA, to appropriately embed the spatial alignment task into the reconstruction process.

MRI Reconstruction

Unlocking the Potential of Large Language Models for Explainable Recommendations

1 code implementation25 Dec 2023 Yucong Luo, Mingyue Cheng, Hao Zhang, Junyu Lu, Qi Liu, Enhong Chen

In this study, we propose LLMXRec, a simple yet effective two-stage explainable recommendation framework aimed at further boosting the explanation quality by employing LLMs.

Decision Making Explainable Recommendation +2

LARP: Language-Agent Role Play for Open-World Games

no code implementations24 Dec 2023 Ming Yan, Ruihao Li, Hao Zhang, Hao Wang, Zhilan Yang, Ji Yan

Language agents have shown impressive problem-solving skills within defined settings and brief timelines.

Decision Making

De novo Drug Design using Reinforcement Learning with Multiple GPT Agents

1 code implementation NeurIPS 2023 Xiuyuan Hu, Guoqing Liu, Yang Zhao, Hao Zhang

A central challenge in this field is to generate molecules with specific properties while also producing a wide range of diverse candidates.

reinforcement-learning

Compositional Zero-Shot Learning for Attribute-Based Object Reference in Human-Robot Interaction

no code implementations21 Dec 2023 Peng Gao, Ahmed Jaafar, Brian Reily, Christopher Reardon, Hao Zhang

However, visual observations of an object may not be available when it is referred to, and the number of objects and attributes may also be unbounded in open worlds.

16k Attribute +3

Beyond 1D and oversimplified kinematics: A generic analytical framework for surrogate safety measures

no code implementations12 Dec 2023 Sixu Li, Mohammad Anis, Dominique Lord, Hao Zhang, Yang Zhou, Xinyue Ye

This paper presents a generic analytical framework tailored for surrogate safety measures (SSMs) that is versatile across various highway geometries, capable of encompassing vehicle dynamics of differing dimensionality and fidelity, and suitable for dynamic, real-world environments.

Combined Invariant Subspace \& Frequency-Domain Subspace Method for Identification of Discrete-Time MIMO Linear Systems

1 code implementation12 Dec 2023 Jingze You, Chao Huang, Hao Zhang

Recently, a novel system identification method based on invariant subspace theory is introduced, aiming to address the identification problem of continuous-time (CT) linear time-invariant (LTI) systems by combining time-domain and frequency-domain methods.

CSL: Class-Agnostic Structure-Constrained Learning for Segmentation Including the Unseen

no code implementations9 Dec 2023 Hao Zhang, Fang Li, Lu Qi, Ming-Hsuan Yang, Narendra Ahuja

Addressing Out-Of-Distribution (OOD) Segmentation and Zero-Shot Semantic Segmentation (ZS3) is challenging, necessitating segmenting unseen classes.

Domain Adaptation Segmentation +2

LLaVA-Grounding: Grounded Visual Chat with Large Multimodal Models

1 code implementation5 Dec 2023 Hao Zhang, Hongyang Li, Feng Li, Tianhe Ren, Xueyan Zou, Shilong Liu, Shijia Huang, Jianfeng Gao, Lei Zhang, Chunyuan Li, Jianwei Yang

To address this issue, we have created GVC data that allows for the combination of grounding and chat capabilities.

Slice3D: Multi-Slice, Occlusion-Revealing, Single View 3D Reconstruction

no code implementations3 Dec 2023 Yizhi Wang, Wallace Lira, Wenqi Wang, Ali Mahdavi-Amiri, Hao Zhang

Our key observation is that object slicing is more advantageous than altering views to reveal occluded structures.

3D Reconstruction Denoising +1

Revisiting Single Image Reflection Removal In the Wild

1 code implementation29 Nov 2023 Yurui Zhu, Xueyang Fu, Peng-Tao Jiang, Hao Zhang, Qibin Sun, Jinwei Chen, Zheng-Jun Zha, Bo Li

This research focuses on the issue of single-image reflection removal (SIRR) in real-world conditions, examining it from two angles: the collection pipeline of real reflection pairs and the perception of real reflection locations.

Reflection Removal

Knowledge Unlearning for LLMs: Tasks, Methods, and Challenges

no code implementations27 Nov 2023 Nianwen Si, Hao Zhang, Heyu Chang, Wenlin Zhang, Dan Qu, WeiQiang Zhang

We further present evaluation datasets used in existing methods, and finally conclude this survey by presenting the ongoing challenges and future directions.

In-Context Learning Machine Unlearning +1

Visual In-Context Prompting

3 code implementations22 Nov 2023 Feng Li, Qing Jiang, Hao Zhang, Tianhe Ren, Shilong Liu, Xueyan Zou, Huaizhe xu, Hongyang Li, Chunyuan Li, Jianwei Yang, Lei Zhang, Jianfeng Gao

In-context prompting in large language models (LLMs) has become a prevalent approach to improve zero-shot capabilities, but this idea is less explored in the vision domain.

Segmentation Visual Prompting

DAE-Net: Deforming Auto-Encoder for fine-grained shape co-segmentation

1 code implementation22 Nov 2023 Zhiqin Chen, Qimin Chen, Hang Zhou, Hao Zhang

We present an unsupervised 3D shape co-segmentation method which learns a set of deformable part templates from a shape collection.

Interpretable Geoscience Artificial Intelligence (XGeoS-AI): Application to Demystify Image Recognition

no code implementations8 Nov 2023 Jin-Jian Xu, Hao Zhang, Chao-Sheng Tang, Lin Li, Bin Shi

Experimental results demonstrate that the effectiveness, versatility, and heuristics of the proposed framework have great potential in solving geoscience image recognition problems.

Computed Tomography (CT)

Spatial Process Approximations: Assessing Their Necessity

no code implementations6 Nov 2023 Hao Zhang

In spatial statistics and machine learning, the kernel matrix plays a pivotal role in prediction, classification, and maximum likelihood estimation.

Inner-IoU: More Effective Intersection over Union Loss with Auxiliary Bounding Box

1 code implementation6 Nov 2023 Hao Zhang, Cong Xu, Shuaijie Zhang

Based on the above, we first analyzed the BBR model and concluded that distinguishing different regression samples and using different scales of auxiliary bounding boxes to calculate losses can effectively accelerate the bounding box regression process.

 Ranked #1 on Object Detection on AI-TOD (mAP50 metric)

Object Detection regression

Few-shot Learning using Data Augmentation and Time-Frequency Transformation for Time Series Classification

no code implementations6 Nov 2023 Hao Zhang, Zhendong Pang, Jiangpeng Wang, Teng Li

Deep neural networks (DNNs) that tackle the time series classification (TSC) task have provided a promising framework in signal processing.

Data Augmentation Few-Shot Learning +2

Towards Automatic Sampling of User Behaviors for Sequential Recommender Systems

no code implementations1 Nov 2023 Hao Zhang, Mingyue Cheng, Qi Liu, Zhiding Liu, Enhong Chen

Sequential recommender systems (SRS) have gained widespread popularity in recommendation due to their ability to effectively capture dynamic user preferences.

Future prediction Sequential Recommendation

Sentence Bag Graph Formulation for Biomedical Distant Supervision Relation Extraction

1 code implementation29 Oct 2023 Hao Zhang, Yang Liu, Xiaoyan Liu, Tianming Liang, Gaurav Sharma, Liang Xue, Maozu Guo

We introduce a novel graph-based framework for alleviating key challenges in distantly-supervised relation extraction and demonstrate its effectiveness in the challenging and important domain of biomedical data.

Relation Relation Extraction +1

TLM: Token-Level Masking for Transformers

1 code implementation28 Oct 2023 Yangjun Wu, Kebin Fang, Dongxiang Zhang, Han Wang, Hao Zhang, Gang Chen

Structured dropout approaches, such as attention dropout and DropHead, have been investigated to regularize the multi-head attention mechanism in Transformers.

Data-to-Text Generation Grammatical Error Correction +1

Open-NeRF: Towards Open Vocabulary NeRF Decomposition

no code implementations25 Oct 2023 Hao Zhang, Fang Li, Narendra Ahuja

Current techniques for NeRF decomposition involve a trade-off between the flexibility of processing open-vocabulary queries and the accuracy of 3D segmentation.

3D Reconstruction Segmentation

Interaction-Driven Active 3D Reconstruction with Object Interiors

no code implementations23 Oct 2023 Zihao Yan, Fubao Su, Mingyang Wang, Ruizhen Hu, Hao Zhang, Hui Huang

We introduce an active 3D reconstruction method which integrates visual perception, robot-object interaction, and 3D scanning to recover both the exterior and interior, i. e., unexposed, geometries of a target 3D object.

3D Reconstruction Object

Long-Form Speech Translation through Segmentation with Finite-State Decoding Constraints on Large Language Models

no code implementations20 Oct 2023 Arya D. McCarthy, Hao Zhang, Shankar Kumar, Felix Stahlberg, Ke wu

One challenge in speech translation is that plenty of spoken content is long-form, but short units are necessary for obtaining high-quality translations.

Hallucination Translation

Experimental Results of Underwater Sound Speed Profile Inversion by Few-shot Multi-task Learning

no code implementations18 Oct 2023 Wei Huang, Fan Gao, Junting Wang, Hao Zhang

Underwater Sound Speed Profile (SSP) distribution has great influence on the propagation mode of acoustic signal, thus the fast and accurate estimation of SSP is of great importance in building underwater observation systems.

Compressive Sensing Few-Shot Learning +1

Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding in GPT-4V

3 code implementations17 Oct 2023 Jianwei Yang, Hao Zhang, Feng Li, Xueyan Zou, Chunyuan Li, Jianfeng Gao

We present Set-of-Mark (SoM), a new visual prompting method, to unleash the visual grounding abilities of large multimodal models (LMMs), such as GPT-4V.

Interactive Segmentation Referring Expression +4

Explaining How a Neural Network Play the Go Game and Let People Learn

no code implementations15 Oct 2023 Huilin Zhou, Huijie Tang, Mingjie Li, Hao Zhang, Zhenyu Liu, Quanshi Zhang

The AI model has surpassed human players in the game of Go, and it is widely believed that the AI model has encoded new knowledge about the Go game beyond human players.

Game of Go

Underwater Sound Speed Profile Construction: A Review

no code implementations12 Oct 2023 Wei Huang, Jixuan Zhou, Fan Gao, Jiajun Lu, Sijia Li, Pengfei Wu, Junting Wang, Hao Zhang, Tianhe Xu

The proposal of SSP inversion method greatly improves the convenience and real--time performance, but the accuracy is not as good as the direct measurement method.

Compressive Sensing

Fast Ray-Tracing-Based Precise Underwater Acoustic Localization without Prior Acknowledgment of Target Depth

no code implementations12 Oct 2023 Wei Huang, Hao Zhang, Kaitao Meng, Fan Gao, Wenzhou Sun, Jianxu Shu, Tianhe Xu, Deshi Li

To tackle this issue, we propose an iterative ray tracing 3D underwater localization (IRTUL) method for stratification compensation.

Online Speculative Decoding

no code implementations11 Oct 2023 Xiaoxuan Liu, Lanxiang Hu, Peter Bailis, Ion Stoica, Zhijie Deng, Alvin Cheung, Hao Zhang

We develop a prototype of online speculative decoding based on online knowledge distillation and evaluate it using both synthetic and real query data on several popular LLMs.

Knowledge Distillation

Open-Vocabulary Animal Keypoint Detection with Semantic-feature Matching

no code implementations8 Oct 2023 Hao Zhang, Lumin Xu, Shenqi Lai, Wenqi Shao, Nanning Zheng, Ping Luo, Yu Qiao, Kaipeng Zhang

Current image-based keypoint detection methods for animal (including human) bodies and faces are generally divided into full-supervised and few-shot class-agnostic approaches.

Keypoint Detection

DISTFLASHATTN: Distributed Memory-efficient Attention for Long-context LLMs Training

1 code implementation5 Oct 2023 Dacheng Li, Rulin Shao, Anze Xie, Eric P. Xing, Xuezhe Ma, Ion Stoica, Joseph E. Gonzalez, Hao Zhang

FlashAttention (Dao, 2023) effectively reduces the quadratic peak memory usage to linear in training transformer-based large language models (LLMs) on a single GPU.

Tuning Large language model for End-to-end Speech Translation

no code implementations3 Oct 2023 Hao Zhang, Nianwen Si, Yaqi Chen, Wenlin Zhang, Xukui Yang, Dan Qu, Xiaolin Jiao

The training of LST consists of two stages: (1) Modality adjustment, where the adapter is tuned to align speech representation with text embedding space, and (2) Downstream task fine-tuning, where both the adapter and LLM model are trained to optimize performance on the E2EST task.

Language Modelling Large Language Model +2

Advancing Acoustic Howling Suppression through Recursive Training of Neural Networks

no code implementations27 Sep 2023 Hao Zhang, Yixuan Zhang, Meng Yu, Dong Yu

In this paper, we introduce a novel training framework designed to comprehensively address the acoustic howling issue by examining its fundamental formation process.

Acoustic echo cancellation

Neural Network Augmented Kalman Filter for Robust Acoustic Howling Suppression

no code implementations27 Sep 2023 Yixuan Zhang, Hao Zhang, Meng Yu, Dong Yu

Acoustic howling suppression (AHS) is a critical challenge in audio communication systems.

Boundary-Aware Proposal Generation Method for Temporal Action Localization

no code implementations25 Sep 2023 Hao Zhang, Chunyan Feng, Jiahui Yang, Zheng Li, Caili Guo

More importantly, few works consider the background frames that are similar to action frames in pixels but dissimilar in semantics, which also leads to inaccurate temporal boundaries.

Action Recognition Contrastive Learning +1

LMSYS-Chat-1M: A Large-Scale Real-World LLM Conversation Dataset

1 code implementation21 Sep 2023 Lianmin Zheng, Wei-Lin Chiang, Ying Sheng, Tianle Li, Siyuan Zhuang, Zhanghao Wu, Yonghao Zhuang, Zhuohan Li, Zi Lin, Eric P. Xing, Joseph E. Gonzalez, Ion Stoica, Hao Zhang

Studying how people interact with large language models (LLMs) in real-world scenarios is increasingly important due to their widespread use in various applications.

Chatbot Instruction Following

LMDX: Language Model-based Document Information Extraction and Localization

no code implementations19 Sep 2023 Vincent Perot, Kai Kang, Florian Luisier, Guolong Su, Xiaoyu Sun, Ramya Sree Boppana, Zilong Wang, Jiaqi Mu, Hao Zhang, Nan Hua

Large Language Models (LLM) have revolutionized Natural Language Processing (NLP), improving state-of-the-art on many existing tasks and exhibiting emergent capabilities.

Language Modelling

Reformulating Sequential Recommendation: Learning Dynamic User Interest with Content-enriched Language Modeling

1 code implementation19 Sep 2023 Junzhe Jiang, Shang Qu, Mingyue Cheng, Qi Liu, Zhiding Liu, Hao Zhang, Rujiao Zhang, Kai Zhang, Rui Li, Jiatong Li, Min Gao

Recommender systems are indispensable in the realm of online applications, and sequential recommendation has enjoyed considerable prevalence due to its capacity to encapsulate the dynamic shifts in user interests.

Language Modelling Sequential Recommendation +1

Source-free Active Domain Adaptation for Diabetic Retinopathy Grading Based on Ultra-wide-field Fundus Image

1 code implementation19 Sep 2023 Jinye Ran, Guanghua Zhang, Ximei Zhang, Juan Xie, Fan Xia, Hao Zhang

Domain adaptation (DA) has been widely applied in the diabetic retinopathy (DR) grading of unannotated ultra-wide-field (UWF) fundus images, which can transfer annotated knowledge from labeled color fundus images.

Computational Efficiency Diabetic Retinopathy Grading +1

Text-Guided Generation and Editing of Compositional 3D Avatars

no code implementations13 Sep 2023 Hao Zhang, Yao Feng, Peter Kulits, Yandong Wen, Justus Thies, Michael J. Black

We argue that existing methods are limited because they employ a monolithic modeling approach, using a single representation for the head, face, hair, and accessories.

text-guided-generation Virtual Try-on

When Geoscience Meets Foundation Models: Towards General Geoscience Artificial Intelligence System

no code implementations13 Sep 2023 Hao Zhang, Jin-Jian Xu, Hong-Wei Cui, Lin Li, Yaowen Yang, Chao-Sheng Tang, Niklas Boers

Critically, the scalability and generalizability of GFMs empower them to address a wide array of prediction, simulation, and decision tasks related to the intricate interactions among Earth system components.

Efficient Memory Management for Large Language Model Serving with PagedAttention

4 code implementations12 Sep 2023 Woosuk Kwon, Zhuohan Li, Siyuan Zhuang, Ying Sheng, Lianmin Zheng, Cody Hao Yu, Joseph E. Gonzalez, Hao Zhang, Ion Stoica

On top of it, we build vLLM, an LLM serving system that achieves (1) near-zero waste in KV cache memory and (2) flexible sharing of KV cache within and across requests to further reduce memory usage.

Language Modelling Large Language Model +1

Text Injection for Capitalization and Turn-Taking Prediction in Speech Models

no code implementations14 Aug 2023 Shaan Bijwadia, Shuo-Yiin Chang, Weiran Wang, Zhong Meng, Hao Zhang, Tara N. Sainath

Text injection for automatic speech recognition (ASR), wherein unpaired text-only data is used to supplement paired audio-text data, has shown promising improvements for word error rate.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

DFA3D: 3D Deformable Attention For 2D-to-3D Feature Lifting

no code implementations ICCV 2023 Hongyang Li, Hao Zhang, Zhaoyang Zeng, Shilong Liu, Feng Li, Tianhe Ren, Lei Zhang

Existing feature lifting approaches, such as Lift-Splat-based and 2D attention-based, either use estimated depth to get pseudo LiDAR features and then splat them to a 3D space, which is a one-pass operation without feature refinement, or ignore depth and lift features by 2D attention mechanisms, which achieve finer semantics while suffering from a depth ambiguity problem.

3D Object Detection object-detection

Semantic-SAM: Segment and Recognize Anything at Any Granularity

1 code implementation10 Jul 2023 Feng Li, Hao Zhang, Peize Sun, Xueyan Zou, Shilong Liu, Jianwei Yang, Chunyuan Li, Lei Zhang, Jianfeng Gao

In this paper, we introduce Semantic-SAM, a universal image segmentation model to enable segment and recognize anything at any desired granularity.

Image Segmentation Segmentation +1

Computron: Serving Distributed Deep Learning Models with Model Parallel Swapping

1 code implementation24 Jun 2023 Daniel Zou, Xinchen Jin, Xueyang Yu, Hao Zhang, James Demmel

In anticipation of workloads that involve serving many of such large models to handle different tasks, we develop Computron, a system that uses memory swapping to serve multiple distributed models on a shared GPU cluster.

TSNet-SAC: Leveraging Transformers for Efficient Task Scheduling

no code implementations16 Jun 2023 Ke Deng, Zhiyuan He, Hao Zhang, Haohan Lin, DeSheng Wang

In future 6G Mobile Edge Computing (MEC), autopilot systems require the capability of processing multimodal data with strong interdependencies.

Edge-computing Scheduling

Evaluating the Robustness of Text-to-image Diffusion Models against Real-world Attacks

no code implementations16 Jun 2023 Hongcheng Gao, Hao Zhang, Yinpeng Dong, Zhijie Deng

Text-to-image (T2I) diffusion models (DMs) have shown promise in generating high-quality images from textual descriptions.

CLIPXPlore: Coupled CLIP and Shape Spaces for 3D Shape Exploration

no code implementations14 Jun 2023 Jingyu Hu, Ka-Hei Hui, Zhengzhe Liu, Hao Zhang, Chi-Wing Fu

This paper presents CLIPXPlore, a new framework that leverages a vision-language model to guide the exploration of the 3D shape space.

Attribute Language Modelling

detrex: Benchmarking Detection Transformers

1 code implementation12 Jun 2023 Tianhe Ren, Shilong Liu, Feng Li, Hao Zhang, Ailing Zeng, Jie Yang, Xingyu Liao, Ding Jia, Hongyang Li, He Cao, Jianan Wang, Zhaoyang Zeng, Xianbiao Qi, Yuhui Yuan, Jianwei Yang, Lei Zhang

To address this issue, we develop a unified, highly modular, and lightweight codebase called detrex, which supports a majority of the mainstream DETR-based instance recognition algorithms, covering various fundamental tasks, including object detection, segmentation, and pose estimation.

Benchmarking object-detection +2

Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena

5 code implementations NeurIPS 2023 Lianmin Zheng, Wei-Lin Chiang, Ying Sheng, Siyuan Zhuang, Zhanghao Wu, Yonghao Zhuang, Zi Lin, Zhuohan Li, Dacheng Li, Eric P. Xing, Hao Zhang, Joseph E. Gonzalez, Ion Stoica

Evaluating large language model (LLM) based chat assistants is challenging due to their broad capabilities and the inadequacy of existing benchmarks in measuring human preferences.

Chatbot Language Modelling +2

How Can Recommender Systems Benefit from Large Language Models: A Survey

1 code implementation9 Jun 2023 Jianghao Lin, Xinyi Dai, Yunjia Xi, Weiwen Liu, Bo Chen, Hao Zhang, Yong liu, Chuhan Wu, Xiangyang Li, Chenxu Zhu, Huifeng Guo, Yong Yu, Ruiming Tang, Weinan Zhang

In this paper, we conduct a comprehensive survey on this research direction from the perspective of the whole pipeline in real-world recommender systems.

Ethics Feature Engineering +5

ShaDDR: Interactive Example-Based Geometry and Texture Generation via 3D Shape Detailization and Differentiable Rendering

1 code implementation8 Jun 2023 Qimin Chen, Zhiqin Chen, Hang Zhou, Hao Zhang

Furthermore, we showcase the ability of our method to learn geometric details and textures from shapes reconstructed from real-world photos.

Texture Synthesis

DiViNeT: 3D Reconstruction from Disparate Views via Neural Template Regularization

no code implementations7 Jun 2023 Aditya Vora, Akshay Gadi Patil, Hao Zhang

We demonstrate that our approach is not only able to complete the surface geometry but also reconstructs surface details to a reasonable extent from a few disparate input views.

3D Reconstruction Surface Reconstruction

FaceDNeRF: Semantics-Driven Face Reconstruction, Prompt Editing and Relighting with Diffusion Models

2 code implementations NeurIPS 2023 Hao Zhang, Yanbo Xu, Tianyuan Dai, Yu-Wing Tai, Chi-Keung Tang

The ability to create high-quality 3D faces from a single image has become increasingly important with wide applications in video conferencing, AR/VR, and advanced video editing in movie industries.

3D Face Reconstruction Video Editing +1

MS-DETR: Natural Language Video Localization with Sampling Moment-Moment Interaction

1 code implementation30 May 2023 Jing Wang, Aixin Sun, Hao Zhang, XiaoLi Li

Given a query, the task of Natural Language Video Localization (NLVL) is to localize a temporal moment in an untrimmed video that semantically matches the query.

BRICS: Bi-level feature Representation of Image CollectionS

no code implementations29 May 2023 Dingdong Yang, Yizhi Wang, Ali Mahdavi-Amiri, Hao Zhang

Our key codes and feature grids are jointly trained continuously with well-defined gradient flows, leading to high usage rates of the feature grids and improved generative modeling compared to discrete Vector Quantization (VQ).

Image Generation Quantization

Semantic Segmentation with Bidirectional Language Models Improves Long-form ASR

no code implementations28 May 2023 W. Ronny Huang, Hao Zhang, Shankar Kumar, Shuo-Yiin Chang, Tara N. Sainath

We address this limitation by distilling punctuation knowledge from a bidirectional teacher language model (LM) trained on written, punctuated text.

Language Modelling Semantic Segmentation +1

Mobile Safety Application for Pedestrians

no code implementations27 May 2023 Sukru Yaren Gelbal, Mustafa Ridvan Cantas, Bilin Aksun Guvenc, Levent Guvenc, Gopichandra Surnilla, Hao Zhang

The work we discuss in this paper is related to a mobile application that utilizes the mobile phone sensors and Bluetooth communication to implement Personal Safety Message (PSM) broadcast using the SAE J2735 standard to create a Pedestrian to Vehicle (P2V) based safety warning structure.

Efficient Detection of LLM-generated Texts with a Bayesian Surrogate Model

no code implementations26 May 2023 Zhijie Deng, Hongcheng Gao, Yibo Miao, Hao Zhang

The detection of machine-generated text, especially from large language models (LLMs), is crucial in preventing serious social problems resulting from their misuse.

NoisywikiHow: A Benchmark for Learning with Real-world Noisy Labels in Natural Language Processing

1 code implementation18 May 2023 Tingting Wu, Xiao Ding, Minji Tang, Hao Zhang, Bing Qin, Ting Liu

To mitigate the effects of label noise, learning with noisy labels (LNL) methods are designed to achieve better generalization performance.

Learning with noisy labels

Hybrid AHS: A Hybrid of Kalman Filter and Deep Learning for Acoustic Howling Suppression

no code implementations4 May 2023 Hao Zhang, Meng Yu, Yuzhong Wu, Tao Yu, Dong Yu

During offline training, a pre-processed signal obtained from the Kalman filter and an ideal microphone signal generated via teacher-forced training strategy are used to train the deep neural network (DNN).

Deep Learning for Joint Acoustic Echo and Acoustic Howling Suppression in Hybrid Meetings

no code implementations2 May 2023 Hao Zhang, Meng Yu, Dong Yu

In particular, the interplay between acoustic echo and acoustic howling in a hybrid meeting makes the joint suppression of them difficult.

Speech Separation

A Strong and Reproducible Object Detector with Only Public Datasets

2 code implementations25 Apr 2023 Tianhe Ren, Jianwei Yang, Shilong Liu, Ailing Zeng, Feng Li, Hao Zhang, Hongyang Li, Zhaoyang Zeng, Lei Zhang

This work presents Focal-Stable-DINO, a strong and reproducible object detection model which achieves 64. 6 AP on COCO val2017 and 64. 8 AP on COCO test-dev using only 700M parameters without any test time augmentation.

Ranked #5 on Object Detection on COCO minival (using extra training data)

object-detection Object Detection

Improving Speech Translation by Cross-Modal Multi-Grained Contrastive Learning

no code implementations20 Apr 2023 Hao Zhang, Nianwen Si, Yaqi Chen, Wenlin Zhang, Xukui Yang, Dan Qu, Wei-Qiang Zhang

However, the final model often performs worse on the MT task than the MT model trained alone, which means that the knowledge transfer ability of this method is also limited.

Contrastive Learning Machine Translation +3

Decouple Non-parametric Knowledge Distillation For End-to-end Speech Translation

no code implementations20 Apr 2023 Hao Zhang, Nianwen Si, Yaqi Chen, Wenlin Zhang, Xukui Yang, Dan Qu, Zhen Li

Existing techniques often attempt to make knowledge transfer from a powerful machine translation (MT) to speech translation (ST) model with some elaborate techniques, which often requires transcription as extra input during training.

Knowledge Distillation Machine Translation +3

DropDim: A Regularization Method for Transformer Networks

no code implementations20 Apr 2023 Hao Zhang, Dan Qu, Keji Shao, Xukui Yang

In contrast to the general dropout method, which randomly drops neurons, DropDim drops part of the embedding dimensions.

MS-LSTM: Exploring Spatiotemporal Multiscale Representations in Video Prediction Domain

no code implementations16 Apr 2023 Zhifeng Ma, Hao Zhang, Jie Liu

The drastic variation of motion in spatial and temporal dimensions makes the video prediction task extremely challenging.

Video Prediction

Segment Everything Everywhere All at Once

2 code implementations NeurIPS 2023 Xueyan Zou, Jianwei Yang, Hao Zhang, Feng Li, Linjie Li, JianFeng Wang, Lijuan Wang, Jianfeng Gao, Yong Jae Lee

In SEEM, we propose a novel decoding mechanism that enables diverse prompting for all types of segmentation tasks, aiming at a universal segmentation interface that behaves like large language models (LLMs).

Image Segmentation Interactive Segmentation +4

RoSI: Recovering 3D Shape Interiors from Few Articulation Images

no code implementations13 Apr 2023 Akshay Gadi Patil, Yiming Qian, Shan Yang, Brian Jackson, Eric Bennett, Hao Zhang

The dominant majority of 3D models that appear in gaming, VR/AR, and those we use to train geometric deep learning algorithms are incomplete, since they are modeled as surface meshes and missing their interior structures.

Object

Detection Transformer with Stable Matching

1 code implementation ICCV 2023 Shilong Liu, Tianhe Ren, Jiayu Chen, Zhaoyang Zeng, Hao Zhang, Feng Li, Hongyang Li, Jun Huang, Hang Su, Jun Zhu, Lei Zhang

We point out that the unstable matching in DETR is caused by a multi-optimization path problem, which is highlighted by the one-to-one matching design in DETR.

Position

SpanRE: Entities and Overlapping Relations Extraction Based on Spans and Entity Attention

no code implementations6 Apr 2023 Hao Zhang

Then we present a labeled span mechanism to extract the objects and relations simultaneously, we use the labeled span mechanism to generate labeled spans whose start and end positions indicate the objects, and whose labels correspond to relations of subject and objects.

Sentence

UKP-SQuARE v3: A Platform for Multi-Agent QA Research

1 code implementation31 Mar 2023 Haritz Puerto, Tim Baumgärtner, Rachneet Sachdeva, Haishuo Fang, Hao Zhang, Sewin Tariverdian, Kexin Wang, Iryna Gurevych

To ease research in multi-agent models, we extend UKP-SQuARE, an online platform for QA research, to support three families of multi-agent systems: i) agent selection, ii) early-fusion of agents, and iii) late-fusion of agents.

Question Answering

Active Coarse-to-Fine Segmentation of Moveable Parts from Real Images

no code implementations21 Mar 2023 Ruiqi Wang, Akshay Gadi Patil, Fenggen Yu, Hao Zhang

We introduce the first active learning (AL) framework for high-accuracy instance segmentation of moveable parts from RGB images of real indoor scenes.

Active Learning Instance Segmentation +2

DS-Fusion: Artistic Typography via Discriminated and Stylized Diffusion

1 code implementation ICCV 2023 Maham Tanveer, Yizhi Wang, Ali Mahdavi-Amiri, Hao Zhang

We introduce a novel method to automatically generate an artistic typography by stylizing one or more letter fonts to visually convey the semantics of an input word, while ensuring that the output remains readable.

Denoising

A Simple Framework for Open-Vocabulary Segmentation and Detection

2 code implementations ICCV 2023 Hao Zhang, Feng Li, Xueyan Zou, Shilong Liu, Chunyuan Li, Jianfeng Gao, Jianwei Yang, Lei Zhang

We present OpenSeeD, a simple Open-vocabulary Segmentation and Detection framework that jointly learns from different segmentation and detection datasets.

Ranked #2 on Instance Segmentation on ADE20K val (using extra training data)

Instance Segmentation Panoptic Segmentation +2

Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection

7 code implementations9 Mar 2023 Shilong Liu, Zhaoyang Zeng, Tianhe Ren, Feng Li, Hao Zhang, Jie Yang, Chunyuan Li, Jianwei Yang, Hang Su, Jun Zhu, Lei Zhang

To effectively fuse language and vision modalities, we conceptually divide a closed-set detector into three phases and propose a tight fusion solution, which includes a feature enhancer, a language-guided query selection, and a cross-modality decoder for cross-modality fusion.

Referring Expression Referring Expression Comprehension +2

ConZIC: Controllable Zero-shot Image Captioning by Sampling-Based Polishing

1 code implementation CVPR 2023 Zequn Zeng, Hao Zhang, Zhengjue Wang, Ruiying Lu, Dongsheng Wang, Bo Chen

Zero-shot capability has been considered as a new revolution of deep learning, letting machines work on tasks without curated training data.

Image Captioning Language Modelling

TimeMAE: Self-Supervised Representations of Time Series with Decoupled Masked Autoencoders

1 code implementation1 Mar 2023 Mingyue Cheng, Qi Liu, Zhiding Liu, Hao Zhang, Rujiao Zhang, Enhong Chen

In this work, we propose TimeMAE, a novel self-supervised paradigm for learning transferrable time series representations based on transformer networks.

Time Series Time Series Analysis +1

Concept-Level Explanation for the Generalization of a DNN

no code implementations25 Feb 2023 Huilin Zhou, Hao Zhang, Huiqi Deng, Dongrui Liu, Wen Shen, Shih-Han Chan, Quanshi Zhang

Therefore, in this paper, we investigate the generalization power of each interactive concept, and we use the generalization power of different interactive concepts to explain the generalization power of the entire DNN.

Introducing Depth into Transformer-based 3D Object Detection

no code implementations25 Feb 2023 Hao Zhang, Hongyang Li, Ailing Zeng, Feng Li, Shilong Liu, Xingyu Liao, Lei Zhang

To address the second issue, we introduce an auxiliary learning task called Depth-aware Negative Suppression loss.

3D Object Detection Auxiliary Learning +3

AlpaServe: Statistical Multiplexing with Model Parallelism for Deep Learning Serving

2 code implementations22 Feb 2023 Zhuohan Li, Lianmin Zheng, Yinmin Zhong, Vincent Liu, Ying Sheng, Xin Jin, Yanping Huang, Zhifeng Chen, Hao Zhang, Joseph E. Gonzalez, Ion Stoica

Model parallelism is conventionally viewed as a method to scale a single large deep learning model beyond the memory limits of a single device.

Deep AHS: A Deep Learning Approach to Acoustic Howling Suppression

no code implementations18 Feb 2023 Hao Zhang, Meng Yu, Dong Yu

In this paper, we formulate acoustic howling suppression (AHS) as a supervised learning problem and propose a deep learning approach, called Deep AHS, to address it.

Speech Separation

NeuralKalman: A Learnable Kalman Filter for Acoustic Echo Cancellation

no code implementations29 Jan 2023 Yixuan Zhang, Meng Yu, Hao Zhang, Dong Yu, DeLiang Wang

The robustness of the Kalman filter to double talk and its rapid convergence make it a popular approach for addressing acoustic echo cancellation (AEC) challenges.

Acoustic echo cancellation

HAL3D: Hierarchical Active Learning for Fine-Grained 3D Part Labeling

no code implementations ICCV 2023 Fenggen Yu, Yiming Qian, Francisca Gil-Ureta, Brian Jackson, Eric Bennett, Hao Zhang

We present the first active learning tool for fine-grained 3D part labeling, a problem which challenges even the most advanced deep learning (DL) methods due to the significant structural variations among the small and intricate parts.

Active Learning

A Method For Eliminating Contour Errors In Self-Encoder Reconstructed Images

no code implementations25 Jan 2023 Yonggang Li, Hao Zhang

In this paper, we propose a self-supervised twin network approach based on this a priori.

CA$^2$T-Net: Category-Agnostic 3D Articulation Transfer from Single Image

no code implementations5 Jan 2023 Jasmine Collins, Anqi Liang, Jitendra Malik, Hao Zhang, Frédéric Devernay

We present a neural network approach to transfer the motion from a single image of an articulated object to a rest-state (i. e., unarticulated) 3D model.

Object

CC-FedAvg: Computationally Customized Federated Averaging

no code implementations28 Dec 2022 Hao Zhang, Tingting Wu, Siyao Cheng, Jie Liu

Federated learning (FL) is an emerging paradigm to train model with distributed data from numerous Internet of Things (IoT) devices.

Federated Learning

Improved Long-Form Spoken Language Translation with Large Language Models

no code implementations19 Dec 2022 Arya D. McCarthy, Hao Zhang, Shankar Kumar, Felix Stahlberg, Axel H. Ng

A challenge in spoken language translation is that plenty of spoken content is long-form, but short units are necessary for obtaining high-quality translations.

Language Modelling Large Language Model +1

ARO-Net: Learning Implicit Fields from Anchored Radial Observations

1 code implementation CVPR 2023 Yizhi Wang, Zeyu Huang, Ariel Shamir, Hui Huang, Hao Zhang, Ruizhen Hu

We introduce anchored radial observations (ARO), a novel shape encoding for learning implicit field representation of 3D shapes that is category-agnostic and generalizable amid significant shape variations.

Surface Reconstruction

Coordinating Cross-modal Distillation for Molecular Property Prediction

no code implementations30 Nov 2022 Hao Zhang, Nan Zhang, Ruixin Zhang, Lei Shen, Yingyi Zhang, Meng Liu

The existing graph methods have demonstrated that 3D geometric information is significant for better performance in MPP.

Graph Regression Graph Representation Learning +4

DQ-DETR: Dual Query Detection Transformer for Phrase Extraction and Grounding

1 code implementation28 Nov 2022 Shilong Liu, Yaoyuan Liang, Feng Li, Shijia Huang, Hao Zhang, Hang Su, Jun Zhu, Lei Zhang

As phrase extraction can be regarded as a $1$D text segmentation problem, we formulate PEG as a dual detection problem and propose a novel DQ-DETR model, which introduces dual queries to probe different features from image and text for object prediction and phrase mask prediction.

object-detection Object Detection +4

FLNeRF: 3D Facial Landmarks Estimation in Neural Radiance Fields

1 code implementation21 Nov 2022 Hao Zhang, Tianyuan Dai, Yu-Wing Tai, Chi-Keung Tang

This paper presents the first significant work on directly predicting 3D face landmarks on neural radiance fields (NeRFs).

QueryForm: A Simple Zero-shot Form Entity Query Framework

no code implementations14 Nov 2022 Zifeng Wang, Zizhao Zhang, Jacob Devlin, Chen-Yu Lee, Guolong Su, Hao Zhang, Jennifer Dy, Vincent Perot, Tomas Pfister

Zero-shot transfer learning for document understanding is a crucial yet under-investigated scenario to help reduce the high cost involved in annotating document entities.

document understanding Transfer Learning

On Optimizing the Communication of Model Parallelism

no code implementations10 Nov 2022 Yonghao Zhuang, Hexu Zhao, Lianmin Zheng, Zhuohan Li, Eric P. Xing, Qirong Ho, Joseph E. Gonzalez, Ion Stoica, Hao Zhang

This pattern emerges when the two paradigms of model parallelism - intra-operator and inter-operator parallelism - are combined to support large models on large clusters.

FF2: A Feature Fusion Two-Stream Framework for Punctuation Restoration

no code implementations9 Nov 2022 Yangjun Wu, Kebin Fang, Yao Zhao, Hao Zhang, Lifeng Shi, Mengqi Zhang

To accomplish punctuation restoration, most existing methods focus on introducing extra information (e. g., part-of-speech) or addressing the class imbalance problem.

Language Modelling Punctuation Restoration +1

MPCFormer: fast, performant and private Transformer inference with MPC

1 code implementation2 Nov 2022 Dacheng Li, Rulin Shao, Hongyi Wang, Han Guo, Eric P. Xing, Hao Zhang

Through extensive evaluations, we show that MPCFORMER significantly speeds up Transformer inference in MPC settings while achieving similar ML performance to the input model.

Knowledge Distillation

Neural Eigenfunctions Are Structured Representation Learners

1 code implementation23 Oct 2022 Zhijie Deng, Jiaxin Shi, Hao Zhang, Peng Cui, Cewu Lu, Jun Zhu

Unlike prior spectral methods such as Laplacian Eigenmap that operate in a nonparametric manner, Neural Eigenmap leverages NeuralEF to parametrically model eigenfunctions using a neural network.

Contrastive Learning Data Augmentation +7

NIFT: Neural Interaction Field and Template for Object Manipulation

no code implementations20 Oct 2022 Zeyu Huang, Juzhan Xu, Sisi Dai, Kai Xu, Hao Zhang, Hui Huang, Ruizhen Hu

Given a few object manipulation demos, NIFT guides the generation of the interaction imitation for a new object instance by matching the Neural Interaction Template (NIT) extracted from the demos in the target Neural Interaction Field (NIF) defined for the new object.

Descriptive Imitation Learning +1

Language Model Decomposition: Quantifying the Dependency and Correlation of Language Models

1 code implementation19 Oct 2022 Hao Zhang

A goodness-of-fit metric for LMD similar to the coefficient of determination is defined and used to measure the linear dependency of a set of LMs.

Language Modelling

AMP: Automatically Finding Model Parallel Strategies with Heterogeneity Awareness

1 code implementation13 Oct 2022 Dacheng Li, Hongyi Wang, Eric Xing, Hao Zhang

Scaling up model sizes can lead to fundamentally new capabilities in many machine learning (ML) tasks.

valid

Application of Deep Learning on Single-Cell RNA-sequencing Data Analysis: A Review

no code implementations11 Oct 2022 Matthew Brendel, Chang Su, Zilong Bai, Hao Zhang, Olivier Elemento, Fei Wang

Single-cell RNA-sequencing (scRNA-seq) has become a routinely used technique to quantify the gene expression profile of thousands of single cells simultaneously.

Physical Interaction: Reconstructing Hand-object Interactions with Physics

1 code implementation22 Sep 2022 Haoyu Hu, Xinyu Yi, Hao Zhang, Jun-Hai Yong, Feng Xu

Single view-based reconstruction of hand-object interaction is challenging due to the severe observation missing caused by occlusions.

Object

Learning Reconstructability for Drone Aerial Path Planning

no code implementations21 Sep 2022 Yilin Liu, Liqiang Lin, Yue Hu, Ke Xie, Chi-Wing Fu, Hao Zhang, Hui Huang

To reconstruct a new urban scene, we first build the 3D scene proxy, then rely on the predicted reconstruction quality and uncertainty measures by our network, based off of the proxy geometry, to guide the drone path planning.

3D Scene Reconstruction

DiscrimLoss: A Universal Loss for Hard Samples and Incorrect Samples Discrimination

no code implementations21 Aug 2022 Tingting Wu, Xiao Ding, Hao Zhang, Jinglong Gao, Li Du, Bing Qin, Ting Liu

To relieve this issue, curriculum learning is proposed to improve model performance and generalization by ordering training samples in a meaningful (e. g., easy to hard) sequence.

Image Classification regression

UKP-SQuARE v2: Explainability and Adversarial Attacks for Trustworthy QA

1 code implementation19 Aug 2022 Rachneet Sachdeva, Haritz Puerto, Tim Baumgärtner, Sewin Tariverdian, Hao Zhang, Kexin Wang, Hossain Shaikh Saadi, Leonardo F. R. Ribeiro, Iryna Gurevych

In this paper, we introduce SQuARE v2, the new version of SQuARE, to provide an explainability infrastructure for comparing models based on methods such as saliency maps and graph-based explanations.

Adversarial Attack Explainable Models +2

PhyGNNet: Solving spatiotemporal PDEs with Physics-informed Graph Neural Network

no code implementations7 Aug 2022 Longxiang Jiang, Liyuan Wang, Xinkun Chu, Yonghao Xiao, Hao Zhang

Solving partial differential equations (PDEs) is an important research means in the fields of physics, biology, and chemistry.

Parameterization of Cross-Token Relations with Relative Positional Encoding for Vision MLP

1 code implementation15 Jul 2022 Zhicai Wang, Yanbin Hao, Xingyu Gao, Hao Zhang, Shuo Wang, Tingting Mu, Xiangnan He

They use token-mixing layers to capture cross-token interactions, as opposed to the multi-head self-attention mechanism used by Transformers.

Long-term Leap Attention, Short-term Periodic Shift for Video Classification

1 code implementation12 Jul 2022 Hao Zhang, Lechao Cheng, Yanbin Hao, Chong-Wah Ngo

By replacing a vanilla 2D attention with the LAPS, we could adapt a static transformer into a video one, with zero extra parameters and neglectable computation overhead ($\sim$2. 6\%).

Video Classification

Data-and-Knowledge Dual-Driven Automatic Modulation Recognition for Wireless Communication Networks

no code implementations30 Jun 2022 Rui Ding, Hao Zhang, Fuhui Zhou, Qihui Wu, Zhu Han

In order to tackle these problems, a novel data-and-knowledge dual-driven automatic modulation classification scheme based on radio frequency machine learning is proposed by exploiting the attribute features of different modulations.

Attribute Automatic Modulation Recognition +1

Wavelet Regularization Benefits Adversarial Training

1 code implementation8 Jun 2022 Jun Yan, Huilin Yin, Xiaoyang Deng, Ziming Zhao, Wancheng Ge, Hao Zhang, Gerhard Rigoll

Since adversarial vulnerability can be regarded as a high-frequency phenomenon, it is essential to regulate the adversarially-trained neural network models in the frequency domain.

Adversarial Robustness

MS-RNN: A Flexible Multi-Scale Framework for Spatiotemporal Predictive Learning

1 code implementation7 Jun 2022 Zhifeng Ma, Hao Zhang, Jie Liu

Spatiotemporal predictive learning, which predicts future frames through historical prior knowledge with the aid of deep learning, is widely used in many fields.

Video Prediction

DETR++: Taming Your Multi-Scale Detection Transformer

no code implementations7 Jun 2022 Chi Zhang, Lijuan Liu, Xiaoxue Zang, Frederick Liu, Hao Zhang, Xinying Song, Jindong Chen

Convolutional Neural Networks (CNN) have dominated the field of detection ever since the success of AlexNet in ImageNet classification [12].

object-detection Small Object Detection

Why Adversarial Training of ReLU Networks Is Difficult?

no code implementations30 May 2022 Xu Cheng, Hao Zhang, Yue Xin, Wen Shen, Jie Ren, Quanshi Zhang

We also prove that adversarial training tends to strengthen the influence of unconfident input samples with large gradient norms in an exponential manner.

GALOIS: Boosting Deep Reinforcement Learning via Generalizable Logic Synthesis

no code implementations27 May 2022 Yushi Cao, Zhiming Li, Tianpei Yang, Hao Zhang, Yan Zheng, Yi Li, Jianye Hao, Yang Liu

In this paper, we combine the above two paradigms together and propose a novel Generalizable Logic Synthesis (GALOIS) framework to synthesize hierarchical and strict cause-effect logic programs.

Decision Making Program Synthesis +2

Active Domain Adaptation with Multi-level Contrastive Units for Semantic Segmentation

no code implementations23 May 2022 Hao Zhang, Ruimao Zhang, Zhanglin Peng, Junle Wang, Yanqing Jing

A simple pixel selection strategy followed with the construction of multi-level contrastive units is introduced to optimize the model for both domain adaptation and active supervised learning.

Active Learning Domain Adaptation +3

Downstream Transformer Generation of Question-Answer Pairs with Preprocessing and Postprocessing Pipelines

1 code implementation15 May 2022 Cheng Zhang, Hao Zhang, Jie Wang

We present a system called TP3 to perform a downstream task of transformers on generating question-answer pairs (QAPs) from a given article.

New-Onset Diabetes Assessment Using Artificial Intelligence-Enhanced Electrocardiography

no code implementations5 May 2022 Neil Jethani, Aahlad Puli, Hao Zhang, Leonid Garber, Lior Jankelson, Yindalon Aphinyanaphongs, Rajesh Ranganath

We found ECG-based assessment outperforms the ADA Risk test, achieving a higher area under the curve (0. 80 vs. 0. 68) and positive predictive value (13% vs. 9%) -- 2. 6 times the prevalence of diabetes in the cohort.

Adaptive Split-Fusion Transformer

1 code implementation26 Apr 2022 Zixuan Su, Hao Zhang, Jingjing Chen, Lei Pang, Chong-Wah Ngo, Yu-Gang Jiang

Neural networks for visual content understanding have recently evolved from convolutional ones (CNNs) to transformers.

Image Classification

FedCos: A Scene-adaptive Federated Optimization Enhancement for Performance Improvement

1 code implementation7 Apr 2022 Hao Zhang, Tingting Wu, Siyao Cheng, Jie Liu

On the other hand, it enlarges the distances between local models, resulting in an aggregated global model with poor performance.

Federated Learning

Quadratic Neuron-empowered Heterogeneous Autoencoder for Unsupervised Anomaly Detection

1 code implementation2 Apr 2022 Jing-Xiao Liao, Bo-Jian Hou, Hang-Cheng Dong, Hao Zhang, Xiaoge Zhang, Jinwei Sun, Shiping Zhang, Feng-Lei Fan

Encouraged by this inspiring theoretical result on heterogeneous networks, we directly integrate conventional and quadratic neurons in an autoencoder to make a new type of heterogeneous autoencoders.

Anomaly Detection

Randomized Sharpness-Aware Training for Boosting Computational Efficiency in Deep Learning

no code implementations18 Mar 2022 Yang Zhao, Hao Zhang, Xiuyuan Hu

Optimizers in RST would perform a Bernoulli trial at each iteration to choose randomly from base algorithms (SGD) and sharpness-aware algorithms (SAM) with a probability arranged by a predefined scheduling function.

Computational Efficiency Scheduling

Group Contextualization for Video Recognition

1 code implementation CVPR 2022 Yanbin Hao, Hao Zhang, Chong-Wah Ngo, Xiangnan He

By utilizing calibrators to embed feature with four different kinds of contexts in parallel, the learnt representation is expected to be more resilient to diverse types of activities.

Action Recognition Egocentric Activity Recognition +1

Boilerplate Detection via Semantic Classification of TextBlocks

no code implementations9 Mar 2022 Hao Zhang, Jie Wang

We present a hierarchical neural network model called SemText to detect HTML boilerplate based on a novel semantic representation of HTML tags, class names, and text blocks.

Classification

Contextual Networks and Unsupervised Ranking of Sentences

no code implementations9 Mar 2022 Hao Zhang, You Zhou, Jie Wang

We construct a contextual network to represent a document with syntactic and semantic relations between word-sentence pairs, based on which we devise an unsupervised algorithm called CNATAR (Contextual Network And Text Analysis Rank) to score sentences, and rank them through a bi-objective 0-1 knapsack maximization problem over topic analysis and sentence scores.

Sentence

DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection

15 code implementations7 Mar 2022 Hao Zhang, Feng Li, Shilong Liu, Lei Zhang, Hang Su, Jun Zhu, Lionel M. Ni, Heung-Yeung Shum

Compared to other models on the leaderboard, DINO significantly reduces its model size and pre-training data size while achieving better results.

Real-Time Object Detection

Vision-Language Intelligence: Tasks, Representation Learning, and Large Models

no code implementations3 Mar 2022 Feng Li, Hao Zhang, Yi-Fan Zhang, Shilong Liu, Jian Guo, Lionel M. Ni, Pengchuan Zhang, Lei Zhang

This survey is inspired by the remarkable progress in both computer vision and natural language processing, and recent trends shifting from single modality processing to multiple modality comprehension.

Few-Shot Learning Representation Learning

DN-DETR: Accelerate DETR Training by Introducing Query DeNoising

16 code implementations CVPR 2022 Feng Li, Hao Zhang, Shilong Liu, Jian Guo, Lionel M. Ni, Lei Zhang

Our method is universal and can be easily plugged into any DETR-like methods by adding dozens of lines of code to achieve a remarkable improvement.

Object Detection

Hierarchical Point Cloud Encoding and Decoding with Lightweight Self-Attention based Model

no code implementations13 Feb 2022 En Yen Puang, Hao Zhang, Hongyuan Zhu, Wei Jing

In this paper we present SA-CNN, a hierarchical and lightweight self-attention based encoding and decoding architecture for representation learning of point cloud data.

Representation Learning Retrieval

Penalizing Gradient Norm for Efficiently Improving Generalization in Deep Learning

1 code implementation8 Feb 2022 Yang Zhao, Hao Zhang, Xiuyuan Hu

In this paper, we propose an effective method to improve the model generalization by additionally penalizing the gradient norm of loss function during optimization.

A Variational Edge Partition Model for Supervised Graph Representation Learning

1 code implementation7 Feb 2022 Yilin He, Chaojie Wang, Hao Zhang, Bo Chen, Mingyuan Zhou

This paper introduces a graph generative process to model how the observed edges are generated by aggregating the node interactions over a set of overlapping node communities, each of which contributes to the edges via a logical OR mechanism.

Classification Graph Representation Learning +1

Neural Dual Contouring

2 code implementations4 Feb 2022 Zhiqin Chen, Andrea Tagliasacchi, Thomas Funkhouser, Hao Zhang

We introduce neural dual contouring (NDC), a new data-driven approach to mesh reconstruction based on dual contouring (DC).

Surface Reconstruction

RIM-Net: Recursive Implicit Fields for Unsupervised Learning of Hierarchical Shape Structures

1 code implementation CVPR 2022 Chengjie Niu, Manyi Li, Kai Xu, Hao Zhang

Each level of the tree corresponds to an assembly of shape parts, represented as implicit functions, to reconstruct the input shape.

Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning

1 code implementation28 Jan 2022 Lianmin Zheng, Zhuohan Li, Hao Zhang, Yonghao Zhuang, Zhifeng Chen, Yanping Huang, Yida Wang, Yuanzhong Xu, Danyang Zhuo, Eric P. Xing, Joseph E. Gonzalez, Ion Stoica

Existing model-parallel training systems either require users to manually create a parallelization plan or automatically generate one from a limited space of model parallelism configurations.

DAB-DETR: Dynamic Anchor Boxes are Better Queries for DETR

7 code implementations ICLR 2022 Shilong Liu, Feng Li, Hao Zhang, Xiao Yang, Xianbiao Qi, Hang Su, Jun Zhu, Lei Zhang

We present in this paper a novel query formulation using dynamic anchor boxes for DETR (DEtection TRansformer) and offer a deeper understanding of the role of queries in DETR.

Object Detection

Temporal Sentence Grounding in Videos: A Survey and Future Directions

no code implementations20 Jan 2022 Hao Zhang, Aixin Sun, Wei Jing, Joey Tianyi Zhou

Temporal sentence grounding in videos (TSGV), \aka natural language video localization (NLVL) or video moment retrieval (VMR), aims to retrieve a temporal moment that semantically corresponds to a language query from an untrimmed video.

Moment Retrieval Retrieval +2

A Privacy-Preserving Unsupervised Domain Adaptation Framework for Clinical Text Analysis

no code implementations18 Jan 2022 Qiyuan An, Ruijiang Li, Lin Gu, Hao Zhang, Qingyu Chen, Zhiyong Lu, Fei Wang, Yingying Zhu

To evaluate our proposed method's utility and privacy loss, we apply our model on a medical report disease label classification task using two noisy challenging clinical text datasets.

Inference Attack Membership Inference Attack +4

Cannot find the paper you are looking for? You can Submit a new open access paper.