Search Results for author: Hao Zhang

Found 406 papers, 149 papers with code

GUTS at SemEval-2022 Task 4: Adversarial Training and Balancing Methods for Patronizing and Condescending Language Detection

no code implementations • SemEval (NAACL) 2022 • Junyu Lu, Hao Zhang, Tongyue Zhang, Hongbo Wang, Haohao Zhu, Bo Xu, Hongfei Lin

For Subtask B, framed as a multi-label classification problem, we utilize various improved multi-label cross-entropy loss functions and analyze the performance of our method.

Binary Classification Multi-Label Classification

Paper
Add Code

Friendly Topic Assistant for Transformer Based Abstractive Summarization

no code implementations • EMNLP 2020 • Zhengjue Wang, Zhibin Duan, Hao Zhang, Chaojie Wang, Long Tian, Bo Chen, Mingyuan Zhou

Abstractive document summarization is a comprehensive task including document understanding and summary generation, in which area Transformer-based models have achieved the state-of-the-art performance.

Abstractive Text Summarization Document Summarization +2

Paper
Add Code

Incorporating Instructional Prompts into a Unified Generative Framework for Joint Multiple Intent Detection and Slot Filling

1 code implementation • COLING 2022 • Yangjun Wu, Han Wang, Dongxiang Zhang, Gang Chen, Hao Zhang

Specifically, we design 5-type templates as instructional prompts, and each template includes a question that acts as the driver to teach UGEN to grasp the paradigm, options that list the candidate intents or slots to reduce the answer search space, and the context denotes original utterance.

Intent Detection Question Answering +3

Paper
Code

BIRNAT: Bidirectional Recurrent Neural Networks with Adversarial Training for Video Snapshot Compressive Imaging

1 code implementation • ECCV 2020 • Ziheng Cheng, Ruiying Lu, Zhengjue Wang, Hao Zhang, Bo Chen, Ziyi Meng, Xin Yuan

This measurement and the modulation masks are fed into our Recurrent Neural Network (RNN) to reconstruct the desired high-speed frames.

Paper
Code

WordNet Troponymy and Extraction of “Manner-Result” Relations

no code implementations • GWC 2018 • Aliaksandr Huminski, Hao Zhang

The procedure of extraction includes three steps and the results are based on the analysis of the whole set of verbs in WordNet.

Paper
Add Code

Translate-Train Embracing Translationese Artifacts

no code implementations • ACL 2022 • Sicheng Yu, Qianru Sun, Hao Zhang, Jing Jiang

Translate-train is a general training approach to multilingual tasks.

Paper
Add Code

Multi-view Content-aware Indexing for Long Document Retrieval

no code implementations • 23 Apr 2024 • Kuicai Dong, Derrick Goh Xin Deik, Yi Quan Lee, Hao Zhang, Xiangyang Li, Cong Zhang, Yong liu

As they do not consider content structures, the resultant chunks can exclude vital information or include irrelevant content.

Chunking Question Answering +1

Paper
Add Code

Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length

1 code implementation • 12 Apr 2024 • Xuezhe Ma, Xiaomeng Yang, Wenhan Xiong, Beidi Chen, Lili Yu, Hao Zhang, Jonathan May, Luke Zettlemoyer, Omer Levy, Chunting Zhou

The quadratic complexity and weak length extrapolation of Transformers limits their ability to scale to long sequences, and while sub-quadratic solutions like linear attention and state space models exist, they empirically underperform Transformers in pretraining efficiency and downstream task accuracy.

334

Paper
Code

SSwsrNet: A Semi-Supervised Few-Shot Learning Framework for Wireless Signal Recognition

no code implementations • 3 Apr 2024 • Hao Zhang, Fuhui Zhou, Qihui Wu, Naofal Al-Dhahir

Moreover, a modular semi-supervised learning method that combines labeled and unlabeled data using MixMatch is exploited to further improve the classification performance under few-sample conditions.

Classification Few-Shot Learning

Paper
Add Code

Toward Inference-optimal Mixture-of-Expert Large Language Models

no code implementations • 3 Apr 2024 • Longfei Yun, Yonghao Zhuang, Yao Fu, Eric P Xing, Hao Zhang

Like dense models, training MoEs requires answering the same question: given a training budget, what is the optimal allocation on the model size and number of tokens?

Paper
Add Code

DPA-Net: Structured 3D Abstraction from Sparse Views via Differentiable Primitive Assembly

no code implementations • 1 Apr 2024 • Fenggen Yu, Yiming Qian, Xu Zhang, Francisca Gil-Ureta, Brian Jackson, Eric Bennett, Hao Zhang

We present a differentiable rendering framework to learn structured 3D abstractions in the form of primitive assemblies from sparse RGB images capturing a 3D object.

Test-time Adaptation

Paper
Add Code

Multi-Task Dense Prediction via Mixture of Low-Rank Experts

1 code implementation • 26 Mar 2024 • YuQi Yang, Peng-Tao Jiang, Qibin Hou, Hao Zhang, Jinwei Chen, Bo Li

Furthermore, to control the parameters and computational cost brought by the increase in the number of experts, we take inspiration from LoRA and propose to leverage the low-rank format of a vanilla convolution in the expert network.

Paper
Code

Text-IF: Leveraging Semantic Text Guidance for Degradation-Aware and Interactive Image Fusion

1 code implementation • 25 Mar 2024 • Xunpeng Yi, Han Xu, Hao Zhang, Linfeng Tang, Jiayi Ma

Through the text semantic encoder and semantic interaction fusion decoder, Text-IF is accessible to the all-in-one infrared and visible image degradation-aware processing and the interactive flexible fusion outcomes.

Paper
Code

Empowering Segmentation Ability to Multi-modal Large Language Models

no code implementations • 21 Mar 2024 • YuQi Yang, Peng-Tao Jiang, Jing Wang, Hao Zhang, Kai Zhao, Jinwei Chen, Bo Li

Multi-modal large language models (MLLMs) can understand image-language prompts and demonstrate impressive reasoning ability.

Dialogue Generation Segmentation +1

Paper
Add Code

TAPTR: Tracking Any Point with Transformers as Detection

no code implementations • 19 Mar 2024 • Hongyang Li, Hao Zhang, Shilong Liu, Zhaoyang Zeng, Tianhe Ren, Feng Li, Lei Zhang

Based on the observation that point tracking bears a great resemblance to object detection and tracking, we borrow designs from DETR-like algorithms to address the task of TAP.

object-detection Object Detection +2

Paper
Add Code

Learning Transferable Time Series Classifier with Cross-Domain Pre-training from Language Model

no code implementations • 19 Mar 2024 • Mingyue Cheng, Xiaoyu Tao, Qi Liu, Hao Zhang, Yiheng Chen, Chenyi Lei

To address this challenge, we propose CrossTimeNet, a novel cross-domain SSL learning framework to learn transferable knowledge from various domains to largely benefit the target downstream task.

Language Modelling Time Series +1

Paper
Add Code

AVIBench: Towards Evaluating the Robustness of Large Vision-Language Model on Adversarial Visual-Instructions

no code implementations • 14 Mar 2024 • Hao Zhang, Wenqi Shao, Hong Liu, Yongqiang Ma, Ping Luo, Yu Qiao, Kaipeng Zhang

To bridge this gap, we introduce AVIBench, a framework designed to analyze the robustness of LVLMs when facing various adversarial visual-instructions (AVIs), including four types of image-based AVIs, ten types of text-based AVIs, and nine types of content bias AVIs (such as gender, violence, cultural, and racial biases, among others).

Fairness Language Modelling

Paper
Add Code

Towards Personalized Evaluation of Large Language Models with An Anonymous Crowd-Sourcing Platform

no code implementations • 13 Mar 2024 • Mingyue Cheng, Hao Zhang, Jiqian Yang, Qi Liu, Li Li, Xin Huang, Liwei Song, Zhi Li, Zhenya Huang, Enhong Chen

Through this gateway, users have the opportunity to submit their questions, testing the models on a personalized and potentially broader range of capabilities.

Language Modelling Large Language Model

Paper
Add Code

Empowering Sequential Recommendation from Collaborative Signals and Semantic Relatedness

no code implementations • 12 Mar 2024 • Mingyue Cheng, Hao Zhang, Qi Liu, Fajie Yuan, Zhi Li, Zhenya Huang, Enhong Chen, Jun Zhou, Longfei Li

It is also significant to model the \textit{semantic relatedness} reflected in content features, e. g., images and text.

Sequential Recommendation

Paper
Add Code

Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference

1 code implementation • 7 Mar 2024 • Wei-Lin Chiang, Lianmin Zheng, Ying Sheng, Anastasios Nikolas Angelopoulos, Tianle Li, Dacheng Li, Hao Zhang, Banghua Zhu, Michael Jordan, Joseph E. Gonzalez, Ion Stoica

To address this issue, we introduce Chatbot Arena, an open platform for evaluating LLMs based on human preferences.

Chatbot

33,882

Paper
Code

MeaCap: Memory-Augmented Zero-shot Image Captioning

1 code implementation • 6 Mar 2024 • Zequn Zeng, Yan Xie, Hao Zhang, Chiyu Chen, Zhengjue Wang, Bo Chen

The framework of MeaCap achieves the state-of-the-art performance on a series of zero-shot IC settings.

Caption Generation Image Captioning +4

Paper
Code

Improving Adversarial Energy-Based Model via Diffusion Process

no code implementations • 4 Mar 2024 • Cong Geng, Tian Han, Peng-Tao Jiang, Hao Zhang, Jinwei Chen, Søren Hauberg, Bo Li

Generative models have shown strong generation ability while efficient likelihood estimation is less explored.

Denoising Density Estimation

Paper
Add Code

CLLMs: Consistency Large Language Models

1 code implementation • 28 Feb 2024 • Siqi Kou, Lanxiang Hu, Zhezhi He, Zhijie Deng, Hao Zhang

Parallel decoding methods such as Jacobi decoding show promise for more efficient LLM inference as it breaks the sequential nature of the LLM decoding process and transforms it into parallelizable computation.

Paper
Code

AnaMoDiff: 2D Analogical Motion Diffusion via Disentangled Denoising

no code implementations • 5 Feb 2024 • Maham Tanveer, Yizhi Wang, Ruiqi Wang, Nanxuan Zhao, Ali Mahdavi-Amiri, Hao Zhang

We present AnaMoDiff, a novel diffusion-based method for 2D motion analogies that is applied to raw, unannotated videos of articulated characters.

Denoising Optical Flow Estimation

Paper
Add Code

CNS-Edit: 3D Shape Editing via Coupled Neural Shape Optimization

no code implementations • 4 Feb 2024 • Jingyu Hu, Ka-Hei Hui, Zhengzhe Liu, Hao Zhang, Chi-Wing Fu

First, we design the coupled neural shape (CNS) representation for supporting 3D shape editing.

Paper
Add Code

Break the Sequential Dependency of LLM Inference Using Lookahead Decoding

1 code implementation • 3 Feb 2024 • Yichao Fu, Peter Bailis, Ion Stoica, Hao Zhang

Autoregressive decoding of large language models (LLMs) is memory bandwidth bounded, resulting in high latency and significant wastes of the parallel processing power of modern accelerators.

Code Completion

972

Paper
Code

APIServe: Efficient API Support for Large-Language Model Inferencing

no code implementations • 2 Feb 2024 • Reyna Abhyankar, Zijian He, Vikranth Srivatsa, Hao Zhang, Yiying Zhang

Large language models are increasingly integrated with external tools and APIs like ChatGPT plugins to extend their capability beyond language-centric tasks.

Language Modelling Large Language Model

Paper
Add Code

Activity Detection for Massive Connectivity in Cell-free Networks with Unknown Large-scale Fading, Channel Statistics, Noise Variance, and Activity Probability: A Bayesian Approach

no code implementations • 30 Jan 2024 • Hao Zhang, Qingfeng Lin, Yang Li, Lei Cheng, Yik-Chung Wu

This problem is even more severe in cell-free networks as there are many of these parameters to be acquired.

Action Detection Activity Detection +1

Paper
Add Code

Overview of Sensing Attacks on Autonomous Vehicle Technologies and Impact on Traffic Flow

no code implementations • 26 Jan 2024 • Zihao Li, Sixu Li, Hao Zhang, Yang Zhou, Siyang Xie, Yunlong Zhang

While perception systems in Connected and Autonomous Vehicles (CAVs), which encompass both communication technologies and advanced sensors, promise to significantly reduce human driving errors, they also expose CAVs to various cyberattacks.

Autonomous Vehicles

Paper
Add Code

Grounded SAM: Assembling Open-World Models for Diverse Visual Tasks

1 code implementation • 25 Jan 2024 • Tianhe Ren, Shilong Liu, Ailing Zeng, Jing Lin, Kunchang Li, He Cao, Jiayu Chen, Xinyu Huang, Yukang Chen, Feng Yan, Zhaoyang Zeng, Hao Zhang, Feng Li, Jie Yang, Hongyang Li, Qing Jiang, Lei Zhang

We introduce Grounded SAM, which uses Grounding DINO as an open-set object detector to combine with the segment anything model (SAM).

Segmentation

13,481

Paper
Code

Parameter-Efficient Conversational Recommender System as a Language Processing Task

1 code implementation • 25 Jan 2024 • Mathieu Ravaut, Hao Zhang, Lu Xu, Aixin Sun, Yong liu

Conversational recommender systems (CRS) aim to recommend relevant items to users by eliciting user preference through natural language conversation.

Dialogue Generation Knowledge Graphs +2

Paper
Code

Focaler-IoU: More Focused Intersection over Union Loss

1 code implementation • 19 Jan 2024 • Hao Zhang, Shuaijie Zhang

Existing researchs improve regression performance by utilizing the geometric relationship between bounding boxes, while ignoring the impact of difficult and easy sample distribution on bounding box regression.

Object object-detection +2

Paper
Code

Learning Implicit Representation for Reconstructing Articulated Objects

no code implementations • 16 Jan 2024 • Hao Zhang, Fang Li, Samyak Rawlekar, Narendra Ahuja

Our method simultaneously estimates the visible (explicit) representation (3D shapes, colors, camera parameters) and the implicit skeletal representation, from motion cues in the object video without 3D supervision.

3D Reconstruction Object

Paper
Add Code

Empirical Evidence for the Fragment level Understanding on Drug Molecular Structure of LLMs

1 code implementation • 15 Jan 2024 • Xiuyuan Hu, Guoqing Liu, Yang Zhao, Hao Zhang

AI for drug discovery has been a research hotspot in recent years, and SMILES-based language models has been increasingly applied in drug molecular design.

Drug Discovery

Paper
Code

Crafter: Facial Feature Crafting against Inversion-based Identity Theft on Deep Models

no code implementations • 14 Jan 2024 • Shiming Wang, Zhe Ji, Liyao Xiang, Hao Zhang, Xinbing Wang, Chenghu Zhou, Bo Li

However, such methods can not defend against adaptive attacks, in which an attacker takes a countermove against a known defence strategy.

Paper
Add Code

SnapCap: Efficient Snapshot Compressive Video Captioning

no code implementations • 10 Jan 2024 • JianQiao Sun, Yudi Su, Hao Zhang, Ziheng Cheng, Zequn Zeng, Zhengjue Wang, Bo Chen, Xin Yuan

To address these problems, in this paper, we propose a novel VC pipeline to generate captions directly from the compressed measurement, which can be captured by a snapshot compressive sensing camera and we dub our model SnapCap.

Compressive Sensing Video Captioning

Paper
Add Code

Chain-of-Table: Evolving Tables in the Reasoning Chain for Table Understanding

no code implementations • 9 Jan 2024 • Zilong Wang, Hao Zhang, Chun-Liang Li, Julian Martin Eisenschlos, Vincent Perot, Zifeng Wang, Lesly Miculicich, Yasuhisa Fujii, Jingbo Shang, Chen-Yu Lee, Tomas Pfister

We propose the Chain-of-Table framework, where tabular data is explicitly used in the reasoning chain as a proxy for intermediate thoughts.

Ranked #3 on Table-based Fact Verification on TabFact

Fact Verification In-Context Learning +3

Paper
Add Code

MPN: Leveraging Multilingual Patch Neuron for Cross-lingual Model Editing

no code implementations • 6 Jan 2024 • Nianwen Si, Hao Zhang, WeiQiang Zhang

Large language models are known for encoding a vast amount of factual knowledge, but they often becomes outdated due to the ever-changing nature of external information.

Model Editing

Paper
Add Code

FED-NeRF: Achieve High 3D Consistency and Temporal Coherence for Face Video Editing on Dynamic NeRF

1 code implementation • 5 Jan 2024 • Hao Zhang, Yu-Wing Tai, Chi-Keung Tang

However, achieving simultaneously multi-view consistency and temporal coherence while editing video sequences remains a formidable challenge.

Video Editing

Paper
Code

Shape-IoU: More Accurate Metric considering Bounding Box Shape and Scale

1 code implementation • 29 Dec 2023 • Hao Zhang, Shuaijie Zhang

As an important component of the detector localization branch, bounding box regression loss plays a significant role in object detection tasks.

object-detection Object Detection +1

Paper
Code

Deep Unfolding Network with Spatial Alignment for multi-modal MRI reconstruction

no code implementations • 28 Dec 2023 • Hao Zhang, Qi Wang, Jun Shi, Shihui Ying, Zhijie Wen

In this paper, we construct a novel Deep Unfolding Network with Spatial Alignment, termed DUN-SA, to appropriately embed the spatial alignment task into the reconstruction process.

MRI Reconstruction

Paper
Add Code

Unlocking the Potential of Large Language Models for Explainable Recommendations

1 code implementation • 25 Dec 2023 • Yucong Luo, Mingyue Cheng, Hao Zhang, Junyu Lu, Qi Liu, Enhong Chen

In this study, we propose LLMXRec, a simple yet effective two-stage explainable recommendation framework aimed at further boosting the explanation quality by employing LLMs.

Decision Making Explainable Recommendation +2

Paper
Code

LARP: Language-Agent Role Play for Open-World Games

no code implementations • 24 Dec 2023 • Ming Yan, Ruihao Li, Hao Zhang, Hao Wang, Zhilan Yang, Ji Yan

Language agents have shown impressive problem-solving skills within defined settings and brief timelines.

Decision Making

Paper
Add Code

De novo Drug Design using Reinforcement Learning with Multiple GPT Agents

1 code implementation • NeurIPS 2023 • Xiuyuan Hu, Guoqing Liu, Yang Zhao, Hao Zhang

A central challenge in this field is to generate molecules with specific properties while also producing a wide range of diverse candidates.

reinforcement-learning

Paper
Code

Compositional Zero-Shot Learning for Attribute-Based Object Reference in Human-Robot Interaction

no code implementations • 21 Dec 2023 • Peng Gao, Ahmed Jaafar, Brian Reily, Christopher Reardon, Hao Zhang

However, visual observations of an object may not be available when it is referred to, and the number of objects and attributes may also be unbounded in open worlds.

16k Attribute +3

Paper
Add Code

Beyond 1D and oversimplified kinematics: A generic analytical framework for surrogate safety measures

no code implementations • 12 Dec 2023 • Sixu Li, Mohammad Anis, Dominique Lord, Hao Zhang, Yang Zhou, Xinyue Ye

This paper presents a generic analytical framework tailored for surrogate safety measures (SSMs) that is versatile across various highway geometries, capable of encompassing vehicle dynamics of differing dimensionality and fidelity, and suitable for dynamic, real-world environments.

Paper
Add Code

Combined Invariant Subspace \& Frequency-Domain Subspace Method for Identification of Discrete-Time MIMO Linear Systems

1 code implementation • 12 Dec 2023 • Jingze You, Chao Huang, Hao Zhang

Recently, a novel system identification method based on invariant subspace theory is introduced, aiming to address the identification problem of continuous-time (CT) linear time-invariant (LTI) systems by combining time-domain and frequency-domain methods.

Paper
Code

Interfacing Foundation Models' Embeddings

1 code implementation • 12 Dec 2023 • Xueyan Zou, Linjie Li, JianFeng Wang, Jianwei Yang, Mingyu Ding, Zhengyuan Yang, Feng Li, Hao Zhang, Shilong Liu, Arul Aravinthan, Yong Jae Lee, Lijuan Wang

The proposed interface is adaptive to new tasks, and new models.

Image Segmentation Retrieval +2

Paper
Code

CSL: Class-Agnostic Structure-Constrained Learning for Segmentation Including the Unseen

no code implementations • 9 Dec 2023 • Hao Zhang, Fang Li, Lu Qi, Ming-Hsuan Yang, Narendra Ahuja

Addressing Out-Of-Distribution (OOD) Segmentation and Zero-Shot Semantic Segmentation (ZS3) is challenging, necessitating segmenting unseen classes.

Domain Adaptation Segmentation +2

Paper
Add Code

LLaVA-Grounding: Grounded Visual Chat with Large Multimodal Models

1 code implementation • 5 Dec 2023 • Hao Zhang, Hongyang Li, Feng Li, Tianhe Ren, Xueyan Zou, Shilong Liu, Shijia Huang, Jianfeng Gao, Lei Zhang, Chunyuan Li, Jianwei Yang

To address this issue, we have created GVC data that allows for the combination of grounding and chat capabilities.

235

Paper
Code

Slice3D: Multi-Slice, Occlusion-Revealing, Single View 3D Reconstruction

no code implementations • 3 Dec 2023 • Yizhi Wang, Wallace Lira, Wenqi Wang, Ali Mahdavi-Amiri, Hao Zhang

Our key observation is that object slicing is more advantageous than altering views to reveal occluded structures.

3D Reconstruction Denoising +1

Paper
Add Code

Revisiting Single Image Reflection Removal In the Wild

1 code implementation • 29 Nov 2023 • Yurui Zhu, Xueyang Fu, Peng-Tao Jiang, Hao Zhang, Qibin Sun, Jinwei Chen, Zheng-Jun Zha, Bo Li

This research focuses on the issue of single-image reflection removal (SIRR) in real-world conditions, examining it from two angles: the collection pipeline of real reflection pairs and the perception of real reflection locations.

Reflection Removal

Paper
Code

Knowledge Unlearning for LLMs: Tasks, Methods, and Challenges

no code implementations • 27 Nov 2023 • Nianwen Si, Hao Zhang, Heyu Chang, Wenlin Zhang, Dan Qu, WeiQiang Zhang

We further present evaluation datasets used in existing methods, and finally conclude this survey by presenting the ongoing challenges and future directions.

In-Context Learning Machine Unlearning +1

Paper
Add Code

Visual In-Context Prompting

3 code implementations • 22 Nov 2023 • Feng Li, Qing Jiang, Hao Zhang, Tianhe Ren, Shilong Liu, Xueyan Zou, Huaizhe xu, Hongyang Li, Chunyuan Li, Jianwei Yang, Lei Zhang, Jianfeng Gao

In-context prompting in large language models (LLMs) has become a prevalent approach to improve zero-shot capabilities, but this idea is less explored in the vision domain.

Segmentation Visual Prompting

1,920

Paper
Code

DAE-Net: Deforming Auto-Encoder for fine-grained shape co-segmentation

1 code implementation • 22 Nov 2023 • Zhiqin Chen, Qimin Chen, Hang Zhou, Hao Zhang

We present an unsupervised 3D shape co-segmentation method which learns a set of deformable part templates from a shape collection.

Paper
Code

Generalization and Hallucination of Large Vision-Language Models through a Camouflaged Lens

no code implementations • 19 Nov 2023 • Lv Tang, Peng-Tao Jiang, Zhihao Shen, Hao Zhang, Jinwei Chen, Bo Li

Large Vision-Language Model (LVLM) has seen burgeoning development and increasing attention recently.

counterfactual Hallucination +3

Paper
Add Code

LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents

1 code implementation • 9 Nov 2023 • Shilong Liu, Hao Cheng, Haotian Liu, Hao Zhang, Feng Li, Tianhe Ren, Xueyan Zou, Jianwei Yang, Hang Su, Jun Zhu, Lei Zhang, Jianfeng Gao, Chunyuan Li

LLaVA-Plus is a general-purpose multimodal assistant that expands the capabilities of large multimodal models.

Ranked #1 on LMM real-life tasks on Leaderboard

Instruction Following LLM real-life tasks +3

624

Paper
Code

Interpretable Geoscience Artificial Intelligence (XGeoS-AI): Application to Demystify Image Recognition

no code implementations • 8 Nov 2023 • Jin-Jian Xu, Hao Zhang, Chao-Sheng Tang, Lin Li, Bin Shi

Experimental results demonstrate that the effectiveness, versatility, and heuristics of the proposed framework have great potential in solving geoscience image recognition problems.

Computed Tomography (CT)

Paper
Add Code

Spatial Process Approximations: Assessing Their Necessity

no code implementations • 6 Nov 2023 • Hao Zhang

In spatial statistics and machine learning, the kernel matrix plays a pivotal role in prediction, classification, and maximum likelihood estimation.

Paper
Add Code

Inner-IoU: More Effective Intersection over Union Loss with Auxiliary Bounding Box

1 code implementation • 6 Nov 2023 • Hao Zhang, Cong Xu, Shuaijie Zhang

Based on the above, we first analyzed the BBR model and concluded that distinguishing different regression samples and using different scales of auxiliary bounding boxes to calculate losses can effectively accelerate the bounding box regression process.

Ranked #1 on Object Detection on AI-TOD (mAP50 metric)

Object Detection regression

Paper
Code

Few-shot Learning using Data Augmentation and Time-Frequency Transformation for Time Series Classification

no code implementations • 6 Nov 2023 • Hao Zhang, Zhendong Pang, Jiangpeng Wang, Teng Li

Deep neural networks (DNNs) that tackle the time series classification (TSC) task have provided a promising framework in signal processing.

Data Augmentation Few-Shot Learning +2

Paper
Add Code

Towards Automatic Sampling of User Behaviors for Sequential Recommender Systems

no code implementations • 1 Nov 2023 • Hao Zhang, Mingyue Cheng, Qi Liu, Zhiding Liu, Enhong Chen

Sequential recommender systems (SRS) have gained widespread popularity in recommendation due to their ability to effectively capture dynamic user preferences.

Future prediction Sequential Recommendation

Paper
Add Code

Sentence Bag Graph Formulation for Biomedical Distant Supervision Relation Extraction

1 code implementation • 29 Oct 2023 • Hao Zhang, Yang Liu, Xiaoyan Liu, Tianming Liang, Gaurav Sharma, Liang Xue, Maozu Guo

We introduce a novel graph-based framework for alleviating key challenges in distantly-supervised relation extraction and demonstrate its effectiveness in the challenging and important domain of biomedical data.

Relation Relation Extraction +1

Paper
Code

TLM: Token-Level Masking for Transformers

1 code implementation • 28 Oct 2023 • Yangjun Wu, Kebin Fang, Dongxiang Zhang, Han Wang, Hao Zhang, Gang Chen

Structured dropout approaches, such as attention dropout and DropHead, have been investigated to regularize the multi-head attention mechanism in Transformers.

Data-to-Text Generation Grammatical Error Correction +1

Paper
Code

Open-NeRF: Towards Open Vocabulary NeRF Decomposition

no code implementations • 25 Oct 2023 • Hao Zhang, Fang Li, Narendra Ahuja

Current techniques for NeRF decomposition involve a trade-off between the flexibility of processing open-vocabulary queries and the accuracy of 3D segmentation.

3D Reconstruction Segmentation

Paper
Add Code

Interaction-Driven Active 3D Reconstruction with Object Interiors

no code implementations • 23 Oct 2023 • Zihao Yan, Fubao Su, Mingyang Wang, Ruizhen Hu, Hao Zhang, Hui Huang

We introduce an active 3D reconstruction method which integrates visual perception, robot-object interaction, and 3D scanning to recover both the exterior and interior, i. e., unexposed, geometries of a target 3D object.

3D Reconstruction Object

Paper
Add Code

Long-Form Speech Translation through Segmentation with Finite-State Decoding Constraints on Large Language Models

no code implementations • 20 Oct 2023 • Arya D. McCarthy, Hao Zhang, Shankar Kumar, Felix Stahlberg, Ke wu

One challenge in speech translation is that plenty of spoken content is long-form, but short units are necessary for obtaining high-quality translations.

Hallucination Translation

Paper
Add Code

Experimental Results of Underwater Sound Speed Profile Inversion by Few-shot Multi-task Learning

no code implementations • 18 Oct 2023 • Wei Huang, Fan Gao, Junting Wang, Hao Zhang

Underwater Sound Speed Profile (SSP) distribution has great influence on the propagation mode of acoustic signal, thus the fast and accurate estimation of SSP is of great importance in building underwater observation systems.

Compressive Sensing Few-Shot Learning +1

Paper
Add Code

Neural Packing: from Visual Sensing to Reinforcement Learning

no code implementations • 17 Oct 2023 • Juzhan Xu, Minglun Gong, Hao Zhang, Hui Huang, Ruizhen Hu

We present a novel learning framework to solve the transport-and-packing (TAP) problem in 3D.

Combinatorial Optimization Motion Planning +2

Paper
Add Code

Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding in GPT-4V

3 code implementations • 17 Oct 2023 • Jianwei Yang, Hao Zhang, Feng Li, Xueyan Zou, Chunyuan Li, Jianfeng Gao

We present Set-of-Mark (SoM), a new visual prompting method, to unleash the visual grounding abilities of large multimodal models (LMMs), such as GPT-4V.

Interactive Segmentation Referring Expression +4

4,043

Paper
Code

Explaining How a Neural Network Play the Go Game and Let People Learn

no code implementations • 15 Oct 2023 • Huilin Zhou, Huijie Tang, Mingjie Li, Hao Zhang, Zhenyu Liu, Quanshi Zhang

The AI model has surpassed human players in the game of Go, and it is widely believed that the AI model has encoded new knowledge about the Go game beyond human players.

Game of Go

Paper
Add Code

From CLIP to DINO: Visual Encoders Shout in Multi-modal Large Language Models

1 code implementation • 13 Oct 2023 • Dongsheng Jiang, Yuchen Liu, Songlin Liu, Jin'e Zhao, Hao Zhang, Zhen Gao, Xiaopeng Zhang, Jin Li, Hongkai Xiong

By simply equipping it with an MLP layer for alignment, DINO surpasses CLIP in fine-grained related perception tasks.

Hallucination Image Captioning +3

174

Paper
Code

Underwater Sound Speed Profile Construction: A Review

no code implementations • 12 Oct 2023 • Wei Huang, Jixuan Zhou, Fan Gao, Jiajun Lu, Sijia Li, Pengfei Wu, Junting Wang, Hao Zhang, Tianhe Xu

The proposal of SSP inversion method greatly improves the convenience and real--time performance, but the accuracy is not as good as the direct measurement method.

Compressive Sensing

Paper
Add Code

Fast Ray-Tracing-Based Precise Underwater Acoustic Localization without Prior Acknowledgment of Target Depth

no code implementations • 12 Oct 2023 • Wei Huang, Hao Zhang, Kaitao Meng, Fan Gao, Wenzhou Sun, Jianxu Shu, Tianhe Xu, Deshi Li

To tackle this issue, we propose an iterative ray tracing 3D underwater localization (IRTUL) method for stratification compensation.

Paper
Add Code

Online Speculative Decoding

no code implementations • 11 Oct 2023 • Xiaoxuan Liu, Lanxiang Hu, Peter Bailis, Ion Stoica, Zhijie Deng, Alvin Cheung, Hao Zhang

We develop a prototype of online speculative decoding based on online knowledge distillation and evaluate it using both synthetic and real query data on several popular LLMs.

Knowledge Distillation

Paper
Add Code

Open-Vocabulary Animal Keypoint Detection with Semantic-feature Matching

no code implementations • 8 Oct 2023 • Hao Zhang, Lumin Xu, Shenqi Lai, Wenqi Shao, Nanning Zheng, Ping Luo, Yu Qiao, Kaipeng Zhang

Current image-based keypoint detection methods for animal (including human) bodies and faces are generally divided into full-supervised and few-shot class-agnostic approaches.

Keypoint Detection

Paper
Add Code

DISTFLASHATTN: Distributed Memory-efficient Attention for Long-context LLMs Training

1 code implementation • 5 Oct 2023 • Dacheng Li, Rulin Shao, Anze Xie, Eric P. Xing, Xuezhe Ma, Ion Stoica, Joseph E. Gonzalez, Hao Zhang

FlashAttention (Dao, 2023) effectively reduces the quadratic peak memory usage to linear in training transformer-based large language models (LLMs) on a single GPU.

147

Paper
Code

Tuning Large language model for End-to-end Speech Translation

no code implementations • 3 Oct 2023 • Hao Zhang, Nianwen Si, Yaqi Chen, Wenlin Zhang, Xukui Yang, Dan Qu, Xiaolin Jiao

The training of LST consists of two stages: (1) Modality adjustment, where the adapter is tuned to align speech representation with text embedding space, and (2) Downstream task fine-tuning, where both the adapter and LLM model are trained to optimize performance on the E2EST task.

Language Modelling Large Language Model +2

Paper
Add Code

Advancing Acoustic Howling Suppression through Recursive Training of Neural Networks

no code implementations • 27 Sep 2023 • Hao Zhang, Yixuan Zhang, Meng Yu, Dong Yu

In this paper, we introduce a novel training framework designed to comprehensively address the acoustic howling issue by examining its fundamental formation process.

Acoustic echo cancellation

Paper
Add Code

Neural Network Augmented Kalman Filter for Robust Acoustic Howling Suppression

no code implementations • 27 Sep 2023 • Yixuan Zhang, Hao Zhang, Meng Yu, Dong Yu

Acoustic howling suppression (AHS) is a critical challenge in audio communication systems.

Paper
Add Code

Boundary-Aware Proposal Generation Method for Temporal Action Localization

no code implementations • 25 Sep 2023 • Hao Zhang, Chunyan Feng, Jiahui Yang, Zheng Li, Caili Guo

More importantly, few works consider the background frames that are similar to action frames in pixels but dissimilar in semantics, which also leads to inaccurate temporal boundaries.

Action Recognition Contrastive Learning +1

Paper
Add Code

LMSYS-Chat-1M: A Large-Scale Real-World LLM Conversation Dataset

1 code implementation • 21 Sep 2023 • Lianmin Zheng, Wei-Lin Chiang, Ying Sheng, Tianle Li, Siyuan Zhuang, Zhanghao Wu, Yonghao Zhuang, Zhuohan Li, Zi Lin, Eric P. Xing, Joseph E. Gonzalez, Ion Stoica, Hao Zhang

Studying how people interact with large language models (LLMs) in real-world scenarios is increasingly important due to their widespread use in various applications.

Chatbot Instruction Following

33,882

Paper
Code

LMDX: Language Model-based Document Information Extraction and Localization

no code implementations • 19 Sep 2023 • Vincent Perot, Kai Kang, Florian Luisier, Guolong Su, Xiaoyu Sun, Ramya Sree Boppana, Zilong Wang, Jiaqi Mu, Hao Zhang, Nan Hua

Large Language Models (LLM) have revolutionized Natural Language Processing (NLP), improving state-of-the-art on many existing tasks and exhibiting emergent capabilities.

Language Modelling

Paper
Add Code

Reformulating Sequential Recommendation: Learning Dynamic User Interest with Content-enriched Language Modeling

1 code implementation • 19 Sep 2023 • Junzhe Jiang, Shang Qu, Mingyue Cheng, Qi Liu, Zhiding Liu, Hao Zhang, Rujiao Zhang, Kai Zhang, Rui Li, Jiatong Li, Min Gao

Recommender systems are indispensable in the realm of online applications, and sequential recommendation has enjoyed considerable prevalence due to its capacity to encapsulate the dynamic shifts in user interests.

Language Modelling Sequential Recommendation +1

Paper
Code

Source-free Active Domain Adaptation for Diabetic Retinopathy Grading Based on Ultra-wide-field Fundus Image

1 code implementation • 19 Sep 2023 • Jinye Ran, Guanghua Zhang, Ximei Zhang, Juan Xie, Fan Xia, Hao Zhang

Domain adaptation (DA) has been widely applied in the diabetic retinopathy (DR) grading of unannotated ultra-wide-field (UWF) fundus images, which can transfer annotated knowledge from labeled color fundus images.

Computational Efficiency Diabetic Retinopathy Grading +1

Paper
Code

Unifying Robustness and Fidelity: A Comprehensive Study of Pretrained Generative Methods for Speech Enhancement in Adverse Conditions

no code implementations • 16 Sep 2023 • Heming Wang, Meng Yu, Hao Zhang, Chunlei Zhang, Zhongweiyang Xu, Muqiao Yang, Yixuan Zhang, Dong Yu

Enhancing speech signal quality in adverse acoustic environments is a persistent challenge in speech processing.

Speech Enhancement

Paper
Add Code

Text-Guided Generation and Editing of Compositional 3D Avatars

no code implementations • 13 Sep 2023 • Hao Zhang, Yao Feng, Peter Kulits, Yandong Wen, Justus Thies, Michael J. Black

We argue that existing methods are limited because they employ a monolithic modeling approach, using a single representation for the head, face, hair, and accessories.

text-guided-generation Virtual Try-on

Paper
Add Code

When Geoscience Meets Foundation Models: Towards General Geoscience Artificial Intelligence System

no code implementations • 13 Sep 2023 • Hao Zhang, Jin-Jian Xu, Hong-Wei Cui, Lin Li, Yaowen Yang, Chao-Sheng Tang, Niklas Boers

Critically, the scalability and generalizability of GFMs empower them to address a wide array of prediction, simulation, and decision tasks related to the intricate interactions among Earth system components.

Paper
Add Code

Efficient Memory Management for Large Language Model Serving with PagedAttention

4 code implementations • 12 Sep 2023 • Woosuk Kwon, Zhuohan Li, Siyuan Zhuang, Ying Sheng, Lianmin Zheng, Cody Hao Yu, Joseph E. Gonzalez, Hao Zhang, Ion Stoica

On top of it, we build vLLM, an LLM serving system that achieves (1) near-zero waste in KV cache memory and (2) flexible sharing of KV cache within and across requests to further reduce memory usage.

Language Modelling Large Language Model +1

18,492

Paper
Code

Diff-Retinex: Rethinking Low-light Image Enhancement with A Generative Diffusion Model

no code implementations • ICCV 2023 • Xunpeng Yi, Han Xu, Hao Zhang, Linfeng Tang, Jiayi Ma

Therefore, Diff-Retinex formulates the low-light image enhancement problem into Retinex decomposition and conditional image generation.

Conditional Image Generation Low-Light Image Enhancement

Paper
Add Code

Text Injection for Capitalization and Turn-Taking Prediction in Speech Models

no code implementations • 14 Aug 2023 • Shaan Bijwadia, Shuo-Yiin Chang, Weiran Wang, Zhong Meng, Hao Zhang, Tara N. Sainath

Text injection for automatic speech recognition (ASR), wherein unpaired text-only data is used to supplement paired audio-text data, has shown promising improvements for word error rate.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

DFA3D: 3D Deformable Attention For 2D-to-3D Feature Lifting

no code implementations • ICCV 2023 • Hongyang Li, Hao Zhang, Zhaoyang Zeng, Shilong Liu, Feng Li, Tianhe Ren, Lei Zhang

Existing feature lifting approaches, such as Lift-Splat-based and 2D attention-based, either use estimated depth to get pseudo LiDAR features and then splat them to a 3D space, which is a one-pass operation without feature refinement, or ignore depth and lift features by 2D attention mechanisms, which achieve finer semantics while suffering from a depth ambiguity problem.

3D Object Detection object-detection

Paper
Add Code

Semantic-SAM: Segment and Recognize Anything at Any Granularity

1 code implementation • 10 Jul 2023 • Feng Li, Hao Zhang, Peize Sun, Xueyan Zou, Shilong Liu, Jianwei Yang, Chunyuan Li, Lei Zhang, Jianfeng Gao

In this paper, we introduce Semantic-SAM, a universal image segmentation model to enable segment and recognize anything at any desired granularity.

Image Segmentation Segmentation +1

1,920

Paper
Code

Computron: Serving Distributed Deep Learning Models with Model Parallel Swapping

1 code implementation • 24 Jun 2023 • Daniel Zou, Xinchen Jin, Xueyang Yu, Hao Zhang, James Demmel

In anticipation of workloads that involve serving many of such large models to handle different tasks, we develop Computron, a system that uses memory swapping to serve multiple distributed models on a shared GPU cluster.

Paper
Code

TSNet-SAC: Leveraging Transformers for Efficient Task Scheduling

no code implementations • 16 Jun 2023 • Ke Deng, Zhiyuan He, Hao Zhang, Haohan Lin, DeSheng Wang

In future 6G Mobile Edge Computing (MEC), autopilot systems require the capability of processing multimodal data with strong interdependencies.

Edge-computing Scheduling

Paper
Add Code

Evaluating the Robustness of Text-to-image Diffusion Models against Real-world Attacks

no code implementations • 16 Jun 2023 • Hongcheng Gao, Hao Zhang, Yinpeng Dong, Zhijie Deng

Text-to-image (T2I) diffusion models (DMs) have shown promise in generating high-quality images from textual descriptions.

Paper
Add Code

CLIPXPlore: Coupled CLIP and Shape Spaces for 3D Shape Exploration

no code implementations • 14 Jun 2023 • Jingyu Hu, Ka-Hei Hui, Zhengzhe Liu, Hao Zhang, Chi-Wing Fu

This paper presents CLIPXPlore, a new framework that leverages a vision-language model to guide the exploration of the 3D shape space.

Attribute Language Modelling

Paper
Add Code

detrex: Benchmarking Detection Transformers

1 code implementation • 12 Jun 2023 • Tianhe Ren, Shilong Liu, Feng Li, Hao Zhang, Ailing Zeng, Jie Yang, Xingyu Liao, Ding Jia, Hongyang Li, He Cao, Jianan Wang, Zhaoyang Zeng, Xianbiao Qi, Yuhui Yuan, Jianwei Yang, Lei Zhang

To address this issue, we develop a unified, highly modular, and lightweight codebase called detrex, which supports a majority of the mainstream DETR-based instance recognition algorithms, covering various fundamental tasks, including object detection, segmentation, and pose estimation.

Benchmarking object-detection +2

1,821

Paper
Code

Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena

5 code implementations • NeurIPS 2023 • Lianmin Zheng, Wei-Lin Chiang, Ying Sheng, Siyuan Zhuang, Zhanghao Wu, Yonghao Zhuang, Zi Lin, Zhuohan Li, Dacheng Li, Eric P. Xing, Hao Zhang, Joseph E. Gonzalez, Ion Stoica

Evaluating large language model (LLM) based chat assistants is challenging due to their broad capabilities and the inadequacy of existing benchmarks in measuring human preferences.

Ranked #3 on Long-Context Understanding on Ada-LEval (TSort)

Chatbot Language Modelling +2

33,882

Paper
Code

How Can Recommender Systems Benefit from Large Language Models: A Survey

1 code implementation • 9 Jun 2023 • Jianghao Lin, Xinyi Dai, Yunjia Xi, Weiwen Liu, Bo Chen, Hao Zhang, Yong liu, Chuhan Wu, Xiangyang Li, Chenxu Zhu, Huifeng Guo, Yong Yu, Ruiming Tang, Weinan Zhang

In this paper, we conduct a comprehensive survey on this research direction from the perspective of the whole pipeline in real-world recommender systems.

Ethics Feature Engineering +5

750

Paper
Code

ShaDDR: Interactive Example-Based Geometry and Texture Generation via 3D Shape Detailization and Differentiable Rendering

1 code implementation • 8 Jun 2023 • Qimin Chen, Zhiqin Chen, Hang Zhou, Hao Zhang

Furthermore, we showcase the ability of our method to learn geometric details and textures from shapes reconstructed from real-world photos.

Texture Synthesis

Paper
Code

DiViNeT: 3D Reconstruction from Disparate Views via Neural Template Regularization

no code implementations • 7 Jun 2023 • Aditya Vora, Akshay Gadi Patil, Hao Zhang

We demonstrate that our approach is not only able to complete the surface geometry but also reconstructs surface details to a reasonable extent from a few disparate input views.

3D Reconstruction Surface Reconstruction

Paper
Add Code

FaceDNeRF: Semantics-Driven Face Reconstruction, Prompt Editing and Relighting with Diffusion Models

2 code implementations • NeurIPS 2023 • Hao Zhang, Yanbo Xu, Tianyuan Dai, Yu-Wing Tai, Chi-Keung Tang

The ability to create high-quality 3D faces from a single image has become increasingly important with wide applications in video conferencing, AR/VR, and advanced video editing in movie industries.

3D Face Reconstruction Video Editing +1

Paper
Code

MS-DETR: Natural Language Video Localization with Sampling Moment-Moment Interaction

1 code implementation • 30 May 2023 • Jing Wang, Aixin Sun, Hao Zhang, XiaoLi Li

Given a query, the task of Natural Language Video Localization (NLVL) is to localize a temporal moment in an untrimmed video that semantically matches the query.

Paper
Code

BRICS: Bi-level feature Representation of Image CollectionS

no code implementations • 29 May 2023 • Dingdong Yang, Yizhi Wang, Ali Mahdavi-Amiri, Hao Zhang

Our key codes and feature grids are jointly trained continuously with well-defined gradient flows, leading to high usage rates of the feature grids and improved generative modeling compared to discrete Vector Quantization (VQ).

Image Generation Quantization

Paper
Add Code

Semantic Segmentation with Bidirectional Language Models Improves Long-form ASR

no code implementations • 28 May 2023 • W. Ronny Huang, Hao Zhang, Shankar Kumar, Shuo-Yiin Chang, Tara N. Sainath

We address this limitation by distilling punctuation knowledge from a bidirectional teacher language model (LM) trained on written, punctuated text.

Language Modelling Semantic Segmentation +1

Paper
Add Code

Mobile Safety Application for Pedestrians

no code implementations • 27 May 2023 • Sukru Yaren Gelbal, Mustafa Ridvan Cantas, Bilin Aksun Guvenc, Levent Guvenc, Gopichandra Surnilla, Hao Zhang

The work we discuss in this paper is related to a mobile application that utilizes the mobile phone sensors and Bluetooth communication to implement Personal Safety Message (PSM) broadcast using the SAE J2735 standard to create a Pedestrian to Vehicle (P2V) based safety warning structure.

Paper
Add Code

Efficient Detection of LLM-generated Texts with a Bayesian Surrogate Model

no code implementations • 26 May 2023 • Zhijie Deng, Hongcheng Gao, Yibo Miao, Hao Zhang

The detection of machine-generated text, especially from large language models (LLMs), is crucial in preventing serious social problems resulting from their misuse.

Paper
Add Code

NoisywikiHow: A Benchmark for Learning with Real-world Noisy Labels in Natural Language Processing

1 code implementation • 18 May 2023 • Tingting Wu, Xiao Ding, Minji Tang, Hao Zhang, Bing Qin, Ting Liu

To mitigate the effects of label noise, learning with noisy labels (LNL) methods are designed to achieve better generalization performance.

Learning with noisy labels

Paper
Code

Hybrid AHS: A Hybrid of Kalman Filter and Deep Learning for Acoustic Howling Suppression

no code implementations • 4 May 2023 • Hao Zhang, Meng Yu, Yuzhong Wu, Tao Yu, Dong Yu

During offline training, a pre-processed signal obtained from the Kalman filter and an ideal microphone signal generated via teacher-forced training strategy are used to train the deep neural network (DNN).

Paper
Add Code

FormNetV2: Multimodal Graph Contrastive Learning for Form Document Information Extraction

no code implementations • 4 May 2023 • Chen-Yu Lee, Chun-Liang Li, Hao Zhang, Timothy Dozat, Vincent Perot, Guolong Su, Xiang Zhang, Kihyuk Sohn, Nikolai Glushnev, Renshen Wang, Joshua Ainslie, Shangbang Long, Siyang Qin, Yasuhisa Fujii, Nan Hua, Tomas Pfister

In FormNetV2, we introduce a centralized multimodal graph contrastive learning strategy to unify self-supervised pre-training for all modalities in one loss.

Contrastive Learning document understanding +1

Paper
Add Code

Deep Learning for Joint Acoustic Echo and Acoustic Howling Suppression in Hybrid Meetings

no code implementations • 2 May 2023 • Hao Zhang, Meng Yu, Dong Yu

In particular, the interplay between acoustic echo and acoustic howling in a hybrid meeting makes the joint suppression of them difficult.

Speech Separation

Paper
Add Code

A Strong and Reproducible Object Detector with Only Public Datasets

2 code implementations • 25 Apr 2023 • Tianhe Ren, Jianwei Yang, Shilong Liu, Ailing Zeng, Feng Li, Hao Zhang, Hongyang Li, Zhaoyang Zeng, Lei Zhang

This work presents Focal-Stable-DINO, a strong and reproducible object detection model which achieves 64. 6 AP on COCO val2017 and 64. 8 AP on COCO test-dev using only 700M parameters without any test time augmentation.

Ranked #5 on Object Detection on COCO minival (using extra training data)

object-detection Object Detection

650

Paper
Code

Improving Speech Translation by Cross-Modal Multi-Grained Contrastive Learning

no code implementations • 20 Apr 2023 • Hao Zhang, Nianwen Si, Yaqi Chen, Wenlin Zhang, Xukui Yang, Dan Qu, Wei-Qiang Zhang

However, the final model often performs worse on the MT task than the MT model trained alone, which means that the knowledge transfer ability of this method is also limited.

Contrastive Learning Machine Translation +3

Paper
Add Code

Decouple Non-parametric Knowledge Distillation For End-to-end Speech Translation

no code implementations • 20 Apr 2023 • Hao Zhang, Nianwen Si, Yaqi Chen, Wenlin Zhang, Xukui Yang, Dan Qu, Zhen Li

Existing techniques often attempt to make knowledge transfer from a powerful machine translation (MT) to speech translation (ST) model with some elaborate techniques, which often requires transcription as extra input during training.

Knowledge Distillation Machine Translation +3

Paper
Add Code

DropDim: A Regularization Method for Transformer Networks

no code implementations • 20 Apr 2023 • Hao Zhang, Dan Qu, Keji Shao, Xukui Yang

In contrast to the general dropout method, which randomly drops neurons, DropDim drops part of the embedding dimensions.

Paper
Add Code

MS-LSTM: Exploring Spatiotemporal Multiscale Representations in Video Prediction Domain

no code implementations • 16 Apr 2023 • Zhifeng Ma, Hao Zhang, Jie Liu

The drastic variation of motion in spatial and temporal dimensions makes the video prediction task extremely challenging.

Video Prediction

Paper
Add Code

Segment Everything Everywhere All at Once

2 code implementations • NeurIPS 2023 • Xueyan Zou, Jianwei Yang, Hao Zhang, Feng Li, Linjie Li, JianFeng Wang, Lijuan Wang, Jianfeng Gao, Yong Jae Lee

In SEEM, we propose a novel decoding mechanism that enables diverse prompting for all types of segmentation tasks, aiming at a universal segmentation interface that behaves like large language models (LLMs).

Image Segmentation Interactive Segmentation +4

13,481

Paper
Code

RoSI: Recovering 3D Shape Interiors from Few Articulation Images

no code implementations • 13 Apr 2023 • Akshay Gadi Patil, Yiming Qian, Shan Yang, Brian Jackson, Eric Bennett, Hao Zhang

The dominant majority of 3D models that appear in gaming, VR/AR, and those we use to train geometric deep learning algorithms are incomplete, since they are modeled as surface meshes and missing their interior structures.

Object

Paper
Add Code

Detection Transformer with Stable Matching

1 code implementation • ICCV 2023 • Shilong Liu, Tianhe Ren, Jiayu Chen, Zhaoyang Zeng, Hao Zhang, Feng Li, Hongyang Li, Jun Huang, Hang Su, Jun Zhu, Lei Zhang

We point out that the unstable matching in DETR is caused by a multi-optimization path problem, which is highlighted by the one-to-one matching design in DETR.

Position

177

Paper
Code

SpanRE: Entities and Overlapping Relations Extraction Based on Spans and Entity Attention

no code implementations • 6 Apr 2023 • Hao Zhang

Then we present a labeled span mechanism to extract the objects and relations simultaneously, we use the labeled span mechanism to generate labeled spans whose start and end positions indicate the objects, and whose labels correspond to relations of subject and objects.

Sentence

Paper
Add Code

UKP-SQuARE v3: A Platform for Multi-Agent QA Research

1 code implementation • 31 Mar 2023 • Haritz Puerto, Tim Baumgärtner, Rachneet Sachdeva, Haishuo Fang, Hao Zhang, Sewin Tariverdian, Kexin Wang, Iryna Gurevych

To ease research in multi-agent models, we extend UKP-SQuARE, an online platform for QA research, to support three families of multi-agent systems: i) agent selection, ii) early-fusion of agents, and iii) late-fusion of agents.

Question Answering

Paper
Code

Active Coarse-to-Fine Segmentation of Moveable Parts from Real Images

no code implementations • 21 Mar 2023 • Ruiqi Wang, Akshay Gadi Patil, Fenggen Yu, Hao Zhang

We introduce the first active learning (AL) framework for high-accuracy instance segmentation of moveable parts from RGB images of real indoor scenes.

Active Learning Instance Segmentation +2

Paper
Add Code

A Region-Prompted Adapter Tuning for Visual Abductive Reasoning

no code implementations • 18 Mar 2023 • Hao Zhang, Yeo Keat Ee, Basura Fernando

Existing works highlight cues utilizing a specific prompt (e. g., colorful prompt).

Ranked #1 on Visual Abductive Reasoning on SHERLOCK

Visual Abductive Reasoning

Paper
Add Code

SoftZoo: A Soft Robot Co-design Benchmark For Locomotion In Diverse Environments

no code implementations • 16 Mar 2023 • Tsun-Hsuan Wang, Pingchuan Ma, Andrew Everett Spielberg, Zhou Xian, Hao Zhang, Joshua B. Tenenbaum, Daniela Rus, Chuang Gan

Existing work has typically been tailored for particular environments or representations.

Paper
Add Code

DS-Fusion: Artistic Typography via Discriminated and Stylized Diffusion

1 code implementation • ICCV 2023 • Maham Tanveer, Yizhi Wang, Ali Mahdavi-Amiri, Hao Zhang

We introduce a novel method to automatically generate an artistic typography by stylizing one or more letter fonts to visually convey the semantics of an input word, while ensuring that the output remains readable.

Denoising

136

Paper
Code

A Simple Framework for Open-Vocabulary Segmentation and Detection

2 code implementations • ICCV 2023 • Hao Zhang, Feng Li, Xueyan Zou, Shilong Liu, Chunyuan Li, Jianfeng Gao, Jianwei Yang, Lei Zhang

We present OpenSeeD, a simple Open-vocabulary Segmentation and Detection framework that jointly learns from different segmentation and detection datasets.

Ranked #2 on Instance Segmentation on ADE20K val (using extra training data)

Instance Segmentation Panoptic Segmentation +2

1,247

Paper
Code

Lite DETR : An Interleaved Multi-Scale Encoder for Efficient DETR

1 code implementation • 13 Mar 2023 • Feng Li, Ailing Zeng, Shilong Liu, Hao Zhang, Hongyang Li, Lei Zhang, Lionel M. Ni

Recent DEtection TRansformer-based (DETR) models have obtained remarkable performance.

object-detection Object Detection

176

Paper
Code

MP-Former: Mask-Piloted Transformer for Image Segmentation

1 code implementation • CVPR 2023 • Hao Zhang, Feng Li, Huaizhe xu, Shijia Huang, Shilong Liu, Lionel M. Ni, Lei Zhang

We present a mask-piloted Transformer which improves masked-attention in Mask2Former for image segmentation.

Image Segmentation Segmentation +1

106

Paper
Code

Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection

7 code implementations • 9 Mar 2023 • Shilong Liu, Zhaoyang Zeng, Tianhe Ren, Feng Li, Hao Zhang, Jie Yang, Chunyuan Li, Jianwei Yang, Hang Su, Jun Zhu, Lei Zhang

To effectively fuse language and vision modalities, we conceptually divide a closed-set detector into three phases and propose a tight fusion solution, which includes a feature enhancer, a language-guided query selection, and a cross-modality decoder for cross-modality fusion.

Ranked #1 on Zero-Shot Object Detection on MSCOCO

Referring Expression Referring Expression Comprehension +2

125,059

Paper
Code

ConZIC: Controllable Zero-shot Image Captioning by Sampling-Based Polishing

1 code implementation • CVPR 2023 • Zequn Zeng, Hao Zhang, Zhengjue Wang, Ruiying Lu, Dongsheng Wang, Bo Chen

Zero-shot capability has been considered as a new revolution of deep learning, letting machines work on tasks without curated training data.

Image Captioning Language Modelling

Paper
Code

TimeMAE: Self-Supervised Representations of Time Series with Decoupled Masked Autoencoders

1 code implementation • 1 Mar 2023 • Mingyue Cheng, Qi Liu, Zhiding Liu, Hao Zhang, Rujiao Zhang, Enhong Chen

In this work, we propose TimeMAE, a novel self-supervised paradigm for learning transferrable time series representations based on transformer networks.

Time Series Time Series Analysis +1

Paper
Code

Concept-Level Explanation for the Generalization of a DNN

no code implementations • 25 Feb 2023 • Huilin Zhou, Hao Zhang, Huiqi Deng, Dongrui Liu, Wen Shen, Shih-Han Chan, Quanshi Zhang

Therefore, in this paper, we investigate the generalization power of each interactive concept, and we use the generalization power of different interactive concepts to explain the generalization power of the entire DNN.

Paper
Add Code

Introducing Depth into Transformer-based 3D Object Detection

no code implementations • 25 Feb 2023 • Hao Zhang, Hongyang Li, Ailing Zeng, Feng Li, Shilong Liu, Xingyu Liao, Lei Zhang

To address the second issue, we introduce an auxiliary learning task called Depth-aware Negative Suppression loss.

3D Object Detection Auxiliary Learning +3

Paper
Add Code

AlpaServe: Statistical Multiplexing with Model Parallelism for Deep Learning Serving

2 code implementations • 22 Feb 2023 • Zhuohan Li, Lianmin Zheng, Yinmin Zhong, Vincent Liu, Ying Sheng, Xin Jin, Yanping Huang, Zhifeng Chen, Hao Zhang, Joseph E. Gonzalez, Ion Stoica

Model parallelism is conventionally viewed as a method to scale a single large deep learning model beyond the memory limits of a single device.

2,984

Paper
Code

Deep AHS: A Deep Learning Approach to Acoustic Howling Suppression

no code implementations • 18 Feb 2023 • Hao Zhang, Meng Yu, Dong Yu

In this paper, we formulate acoustic howling suppression (AHS) as a supervised learning problem and propose a deep learning approach, called Deep AHS, to address it.

Speech Separation

Paper
Add Code

Futuristic Variations and Analysis in Fundus Images Corresponding to Biological Traits

no code implementations • 8 Feb 2023 • Muhammad Hassan, Hao Zhang, Ahmed Fateh Ameen, Home Wu Zeng, Shuye Ma, Wen Liang, Dingqi Shang, Jiaming Ding, Ziheng Zhan, Tsz Kwan Lam, Ming Xu, Qiming Huang, Dongmei Wu, Can Yang Zhang, Zhou You, Awiwu Ain, Pei Wu Qin

Our proposed DL models, named FAG-Net and FGC-Net, correspondingly estimate biological traits (age and gender) and generates fundus images.

Gender Classification

Paper
Add Code

NeuralKalman: A Learnable Kalman Filter for Acoustic Echo Cancellation

no code implementations • 29 Jan 2023 • Yixuan Zhang, Meng Yu, Hao Zhang, Dong Yu, DeLiang Wang

The robustness of the Kalman filter to double talk and its rapid convergence make it a popular approach for addressing acoustic echo cancellation (AEC) challenges.

Acoustic echo cancellation

Paper
Add Code

HAL3D: Hierarchical Active Learning for Fine-Grained 3D Part Labeling

no code implementations • ICCV 2023 • Fenggen Yu, Yiming Qian, Francisca Gil-Ureta, Brian Jackson, Eric Bennett, Hao Zhang

We present the first active learning tool for fine-grained 3D part labeling, a problem which challenges even the most advanced deep learning (DL) methods due to the significant structural variations among the small and intricate parts.

Active Learning

Paper
Add Code

A Method For Eliminating Contour Errors In Self-Encoder Reconstructed Images

no code implementations • 25 Jan 2023 • Yonggang Li, Hao Zhang

In this paper, we propose a self-supervised twin network approach based on this a priori.

Paper
Add Code

See, Think, Confirm: Interactive Prompting Between Vision and Language Models for Knowledge-based Visual Reasoning

1 code implementation • 12 Jan 2023 • Zhenfang Chen, Qinhong Zhou, Yikang Shen, Yining Hong, Hao Zhang, Chuang Gan

The see stage scans the image and grounds the visual concept candidates with a visual perception model.

Few-Shot Learning Image Captioning +4

Paper
Code

CA$^2$T-Net: Category-Agnostic 3D Articulation Transfer from Single Image

no code implementations • 5 Jan 2023 • Jasmine Collins, Anqi Liang, Jitendra Malik, Hao Zhang, Frédéric Devernay

We present a neural network approach to transfer the motion from a single image of an articulated object to a rest-state (i. e., unarticulated) 3D model.

Object

Paper
Add Code

Lite DETR: An Interleaved Multi-Scale Encoder for Efficient DETR

no code implementations • CVPR 2023 • Feng Li, Ailing Zeng, Shilong Liu, Hao Zhang, Hongyang Li, Lei Zhang, Lionel M. Ni

Recent DEtection TRansformer-based (DETR) models have obtained remarkable performance.

object-detection Object Detection

Paper
Add Code

CC-FedAvg: Computationally Customized Federated Averaging

no code implementations • 28 Dec 2022 • Hao Zhang, Tingting Wu, Siyao Cheng, Jie Liu

Federated learning (FL) is an emerging paradigm to train model with distributed data from numerous Internet of Things (IoT) devices.

Federated Learning

Paper
Add Code

Improved Long-Form Spoken Language Translation with Large Language Models

no code implementations • 19 Dec 2022 • Arya D. McCarthy, Hao Zhang, Shankar Kumar, Felix Stahlberg, Axel H. Ng

A challenge in spoken language translation is that plenty of spoken content is long-form, but short units are necessary for obtaining high-quality translations.

Language Modelling Large Language Model +1

Paper
Add Code

ARO-Net: Learning Implicit Fields from Anchored Radial Observations

1 code implementation • CVPR 2023 • Yizhi Wang, Zeyu Huang, Ariel Shamir, Hui Huang, Hao Zhang, Ruizhen Hu

We introduce anchored radial observations (ARO), a novel shape encoding for learning implicit field representation of 3D shapes that is category-agnostic and generalizable amid significant shape variations.

Surface Reconstruction

Paper
Code

Coordinating Cross-modal Distillation for Molecular Property Prediction

no code implementations • 30 Nov 2022 • Hao Zhang, Nan Zhang, Ruixin Zhang, Lei Shen, Yingyi Zhang, Meng Liu

The existing graph methods have demonstrated that 3D geometric information is significant for better performance in MPP.

Graph Regression Graph Representation Learning +4

Paper
Add Code

DQ-DETR: Dual Query Detection Transformer for Phrase Extraction and Grounding

1 code implementation • 28 Nov 2022 • Shilong Liu, Yaoyuan Liang, Feng Li, Shijia Huang, Hao Zhang, Hang Su, Jun Zhu, Lei Zhang

As phrase extraction can be regarded as a $1$D text segmentation problem, we formulate PEG as a dual detection problem and propose a novel DQ-DETR model, which introduces dual queries to probe different features from image and text for object prediction and phrase mask prediction.

Ranked #7 on Referring Expression Comprehension on RefCOCO

object-detection Object Detection +4

Paper
Code

FLNeRF: 3D Facial Landmarks Estimation in Neural Radiance Fields

1 code implementation • 21 Nov 2022 • Hao Zhang, Tianyuan Dai, Yu-Wing Tai, Chi-Keung Tang

This paper presents the first significant work on directly predicting 3D face landmarks on neural radiance fields (NeRFs).

Paper
Code

A Unified Mutual Supervision Framework for Referring Expression Segmentation and Generation

no code implementations • 15 Nov 2022 • Shijia Huang, Feng Li, Hao Zhang, Shilong Liu, Lei Zhang, LiWei Wang

Our mutual supervision contains two directions.

Reference Expression Generation Referring Expression +2

Paper
Add Code

QueryForm: A Simple Zero-shot Form Entity Query Framework

no code implementations • 14 Nov 2022 • Zifeng Wang, Zizhao Zhang, Jacob Devlin, Chen-Yu Lee, Guolong Su, Hao Zhang, Jennifer Dy, Vincent Perot, Tomas Pfister

Zero-shot transfer learning for document understanding is a crucial yet under-investigated scenario to help reduce the high cost involved in annotating document entities.

document understanding Transfer Learning

Paper
Add Code

On Optimizing the Communication of Model Parallelism

no code implementations • 10 Nov 2022 • Yonghao Zhuang, Hexu Zhao, Lianmin Zheng, Zhuohan Li, Eric P. Xing, Qirong Ho, Joseph E. Gonzalez, Ion Stoica, Hao Zhang

This pattern emerges when the two paradigms of model parallelism - intra-operator and inter-operator parallelism - are combined to support large models on large clusters.

Paper
Add Code

FF2: A Feature Fusion Two-Stream Framework for Punctuation Restoration

no code implementations • 9 Nov 2022 • Yangjun Wu, Kebin Fang, Yao Zhao, Hao Zhang, Lifeng Shi, Mengqi Zhang

To accomplish punctuation restoration, most existing methods focus on introducing extra information (e. g., part-of-speech) or addressing the class imbalance problem.

Language Modelling Punctuation Restoration +1

Paper
Add Code

MPCFormer: fast, performant and private Transformer inference with MPC

1 code implementation • 2 Nov 2022 • Dacheng Li, Rulin Shao, Hongyi Wang, Han Guo, Eric P. Xing, Hao Zhang

Through extensive evaluations, we show that MPCFORMER significantly speeds up Transformer inference in MPC settings while achieving similar ML performance to the input model.

Knowledge Distillation

Paper
Code

Neural Eigenfunctions Are Structured Representation Learners

1 code implementation • 23 Oct 2022 • Zhijie Deng, Jiaxin Shi, Hao Zhang, Peng Cui, Cewu Lu, Jun Zhu

Unlike prior spectral methods such as Laplacian Eigenmap that operate in a nonparametric manner, Neural Eigenmap leverages NeuralEF to parametrically model eigenfunctions using a neural network.

Contrastive Learning Data Augmentation +7

Paper
Code

NIFT: Neural Interaction Field and Template for Object Manipulation

no code implementations • 20 Oct 2022 • Zeyu Huang, Juzhan Xu, Sisi Dai, Kai Xu, Hao Zhang, Hui Huang, Ruizhen Hu

Given a few object manipulation demos, NIFT guides the generation of the interaction imitation for a new object instance by matching the Neural Interaction Template (NIT) extracted from the demos in the target Neural Interaction Field (NIF) defined for the new object.

Descriptive Imitation Learning +1

Paper
Add Code

Language Model Decomposition: Quantifying the Dependency and Correlation of Language Models

1 code implementation • 19 Oct 2022 • Hao Zhang

A goodness-of-fit metric for LMD similar to the coefficient of determination is defined and used to measure the linear dependency of a set of LMs.

Language Modelling

Paper
Code

Regularized Data Programming with Automated Bayesian Prior Selection

no code implementations • 17 Oct 2022 • Jacqueline R. M. A. Maasch, Hao Zhang, Qian Yang, Fei Wang, Volodymyr Kuleshov

The cost of manual data labeling can be a significant obstacle in supervised learning.

Paper
Add Code

AMP: Automatically Finding Model Parallel Strategies with Heterogeneity Awareness

1 code implementation • 13 Oct 2022 • Dacheng Li, Hongyi Wang, Eric Xing, Hao Zhang

Scaling up model sizes can lead to fundamentally new capabilities in many machine learning (ML) tasks.

valid

Paper
Code

Application of Deep Learning on Single-Cell RNA-sequencing Data Analysis: A Review

no code implementations • 11 Oct 2022 • Matthew Brendel, Chang Su, Zilong Bai, Hao Zhang, Olivier Elemento, Fei Wang

Single-cell RNA-sequencing (scRNA-seq) has become a routinely used technique to quantify the gene expression profile of thousands of single cells simultaneously.

Paper
Add Code

Exploiting Transformer in Sparse Reward Reinforcement Learning for Interpretable Temporal Logic Motion Planning

1 code implementation • 27 Sep 2022 • Hao Zhang, Hao Wang, Zhen Kan

Automaton based approaches have enabled robots to perform various complex tasks.

Motion Planning reinforcement-learning +1

Paper
Code

Physical Interaction: Reconstructing Hand-object Interactions with Physics

1 code implementation • 22 Sep 2022 • Haoyu Hu, Xinyu Yi, Hao Zhang, Jun-Hai Yong, Feng Xu

Single view-based reconstruction of hand-object interaction is challenging due to the severe observation missing caused by occlusions.

Object

Paper
Code

Learning Reconstructability for Drone Aerial Path Planning

no code implementations • 21 Sep 2022 • Yilin Liu, Liqiang Lin, Yue Hu, Ke Xie, Chi-Wing Fu, Hao Zhang, Hui Huang

To reconstruct a new urban scene, we first build the 3D scene proxy, then rely on the predicted reconstruction quality and uncertainty measures by our network, based off of the proxy geometry, to guide the drone path planning.

3D Scene Reconstruction

Paper
Add Code

DiscrimLoss: A Universal Loss for Hard Samples and Incorrect Samples Discrimination

no code implementations • 21 Aug 2022 • Tingting Wu, Xiao Ding, Hao Zhang, Jinglong Gao, Li Du, Bing Qin, Ting Liu

To relieve this issue, curriculum learning is proposed to improve model performance and generalization by ordering training samples in a meaningful (e. g., easy to hard) sequence.

Image Classification regression

Paper
Add Code

UKP-SQuARE v2: Explainability and Adversarial Attacks for Trustworthy QA

1 code implementation • 19 Aug 2022 • Rachneet Sachdeva, Haritz Puerto, Tim Baumgärtner, Sewin Tariverdian, Hao Zhang, Kexin Wang, Hossain Shaikh Saadi, Leonardo F. R. Ribeiro, Iryna Gurevych

In this paper, we introduce SQuARE v2, the new version of SQuARE, to provide an explainability infrastructure for comparing models based on methods such as saliency maps and graph-based explanations.

Adversarial Attack Explainable Models +2

Paper
Code

PhyGNNet: Solving spatiotemporal PDEs with Physics-informed Graph Neural Network

no code implementations • 7 Aug 2022 • Longxiang Jiang, Liyuan Wang, Xinkun Chu, Yonghao Xiao, Hao Zhang

Solving partial differential equations (PDEs) is an important research means in the fields of physics, biology, and chemistry.

Paper
Add Code

Parameterization of Cross-Token Relations with Relative Positional Encoding for Vision MLP

1 code implementation • 15 Jul 2022 • Zhicai Wang, Yanbin Hao, Xingyu Gao, Hao Zhang, Shuo Wang, Tingting Mu, Xiangnan He

They use token-mixing layers to capture cross-token interactions, as opposed to the multi-head self-attention mechanism used by Transformers.

Paper
Code

Long-term Leap Attention, Short-term Periodic Shift for Video Classification

1 code implementation • 12 Jul 2022 • Hao Zhang, Lechao Cheng, Yanbin Hao, Chong-Wah Ngo

By replacing a vanilla 2D attention with the LAPS, we could adapt a static transformer into a video one, with zero extra parameters and neglectable computation overhead ($\sim$2. 6\%).

Video Classification

Paper
Code

Data-and-Knowledge Dual-Driven Automatic Modulation Recognition for Wireless Communication Networks

no code implementations • 30 Jun 2022 • Rui Ding, Hao Zhang, Fuhui Zhou, Qihui Wu, Zhu Han

In order to tackle these problems, a novel data-and-knowledge dual-driven automatic modulation classification scheme based on radio frequency machine learning is proposed by exploiting the attribute features of different modulations.

Attribute Automatic Modulation Recognition +1

Paper
Add Code

Wavelet Regularization Benefits Adversarial Training

1 code implementation • 8 Jun 2022 • Jun Yan, Huilin Yin, Xiaoyang Deng, Ziming Zhao, Wancheng Ge, Hao Zhang, Gerhard Rigoll

Since adversarial vulnerability can be regarded as a high-frequency phenomenon, it is essential to regulate the adversarially-trained neural network models in the frequency domain.

Adversarial Robustness

Paper
Code

MS-RNN: A Flexible Multi-Scale Framework for Spatiotemporal Predictive Learning

1 code implementation • 7 Jun 2022 • Zhifeng Ma, Hao Zhang, Jie Liu

Spatiotemporal predictive learning, which predicts future frames through historical prior knowledge with the aid of deep learning, is widely used in many fields.

Video Prediction

Paper
Code

DETR++: Taming Your Multi-Scale Detection Transformer

no code implementations • 7 Jun 2022 • Chi Zhang, Lijuan Liu, Xiaoxue Zang, Frederick Liu, Hao Zhang, Xinying Song, Jindong Chen

Convolutional Neural Networks (CNN) have dominated the field of detection ever since the success of AlexNet in ImageNet classification [12].

object-detection Small Object Detection

Paper
Add Code

Mask DINO: Towards A Unified Transformer-based Framework for Object Detection and Segmentation

9 code implementations • CVPR 2023 • Feng Li, Hao Zhang, Huaizhe xu, Shilong Liu, Lei Zhang, Lionel M. Ni, Heung-Yeung Shum

In this paper we present Mask DINO, a unified object detection and segmentation framework.

Ranked #1 on Panoptic Segmentation on COCO test-dev

Image Segmentation Instance Segmentation +3

12,066

Paper
Code

Why Adversarial Training of ReLU Networks Is Difficult?

no code implementations • 30 May 2022 • Xu Cheng, Hao Zhang, Yue Xin, Wen Shen, Jie Ren, Quanshi Zhang

We also prove that adversarial training tends to strengthen the influence of unconfident input samples with large gradient norms in an exponential manner.

Paper
Add Code

GALOIS: Boosting Deep Reinforcement Learning via Generalizable Logic Synthesis

no code implementations • 27 May 2022 • Yushi Cao, Zhiming Li, Tianpei Yang, Hao Zhang, Yan Zheng, Yi Li, Jianye Hao, Yang Liu

In this paper, we combine the above two paradigms together and propose a novel Generalizable Logic Synthesis (GALOIS) framework to synthesize hierarchical and strict cause-effect logic programs.

Decision Making Program Synthesis +2

Paper
Add Code

Active Domain Adaptation with Multi-level Contrastive Units for Semantic Segmentation

no code implementations • 23 May 2022 • Hao Zhang, Ruimao Zhang, Zhanglin Peng, Junle Wang, Yanqing Jing

A simple pixel selection strategy followed with the construction of multi-level contrastive units is introduced to optimize the model for both domain adaptation and active supervised learning.

Active Learning Domain Adaptation +3

Paper
Add Code

Downstream Transformer Generation of Question-Answer Pairs with Preprocessing and Postprocessing Pipelines

1 code implementation • 15 May 2022 • Cheng Zhang, Hao Zhang, Jie Wang

We present a system called TP3 to perform a downstream task of transformers on generating question-answer pairs (QAPs) from a given article.

Paper
Code

Block Modulating Video Compression: An Ultra Low Complexity Image Compression Encoder for Resource Limited Platforms

no code implementations • 7 May 2022 • Yujia Xue, Siming Zheng, Waleed Tahir, Zhengjue Wang, Hao Zhang, Ziyi Meng, Lei Tian, Xin Yuan

We consider the image and video compression on resource limited platforms.

Image Compression Quantization +1

Paper
Add Code

New-Onset Diabetes Assessment Using Artificial Intelligence-Enhanced Electrocardiography

no code implementations • 5 May 2022 • Neil Jethani, Aahlad Puli, Hao Zhang, Leonid Garber, Lior Jankelson, Yindalon Aphinyanaphongs, Rajesh Ranganath

We found ECG-based assessment outperforms the ADA Risk test, achieving a higher area under the curve (0. 80 vs. 0. 68) and positive predictive value (13% vs. 9%) -- 2. 6 times the prevalence of diabetes in the cohort.

Paper
Add Code

Adaptive Split-Fusion Transformer

1 code implementation • 26 Apr 2022 • Zixuan Su, Hao Zhang, Jingjing Chen, Lei Pang, Chong-Wah Ngo, Yu-Gang Jiang

Neural networks for visual content understanding have recently evolved from convolutional ones (CNNs) to transformers.

Ranked #1 on Image Classification on CIFAR-10 Image Classification

Image Classification

Paper
Code

FedCos: A Scene-adaptive Federated Optimization Enhancement for Performance Improvement

1 code implementation • 7 Apr 2022 • Hao Zhang, Tingting Wu, Siyao Cheng, Jie Liu

On the other hand, it enlarges the distances between local models, resulting in an aggregated global model with poor performance.

Federated Learning

Paper
Code

Quadratic Neuron-empowered Heterogeneous Autoencoder for Unsupervised Anomaly Detection

1 code implementation • 2 Apr 2022 • Jing-Xiao Liao, Bo-Jian Hou, Hang-Cheng Dong, Hao Zhang, Xiaoge Zhang, Jinwei Sun, Shiping Zhang, Feng-Lei Fan

Encouraged by this inspiring theoretical result on heterogeneous networks, we directly integrate conventional and quadratic neurons in an autoencoder to make a new type of heterogeneous autoencoders.

Anomaly Detection

Paper
Code

Randomized Sharpness-Aware Training for Boosting Computational Efficiency in Deep Learning

no code implementations • 18 Mar 2022 • Yang Zhao, Hao Zhang, Xiuyuan Hu

Optimizers in RST would perform a Bernoulli trial at each iteration to choose randomly from base algorithms (SGD) and sharpness-aware algorithms (SAM) with a probability arranged by a predefined scheduling function.

Computational Efficiency Scheduling

Paper
Add Code

Group Contextualization for Video Recognition

1 code implementation • CVPR 2022 • Yanbin Hao, Hao Zhang, Chong-Wah Ngo, Xiangnan He

By utilizing calibrators to embed feature with four different kinds of contexts in parallel, the learnt representation is expected to be more resilient to diverse types of activities.

Ranked #3 on Egocentric Activity Recognition on EGTEA

Action Recognition Egocentric Activity Recognition +1

Paper
Code

Boilerplate Detection via Semantic Classification of TextBlocks

no code implementations • 9 Mar 2022 • Hao Zhang, Jie Wang

We present a hierarchical neural network model called SemText to detect HTML boilerplate based on a novel semantic representation of HTML tags, class names, and text blocks.

Classification

Paper
Add Code

Contextual Networks and Unsupervised Ranking of Sentences

no code implementations • 9 Mar 2022 • Hao Zhang, You Zhou, Jie Wang

We construct a contextual network to represent a document with syntactic and semantic relations between word-sentence pairs, based on which we devise an unsupervised algorithm called CNATAR (Contextual Network And Text Analysis Rank) to score sentences, and rank them through a bi-objective 0-1 knapsack maximization problem over topic analysis and sentence scores.

Sentence

Paper
Add Code

DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection

15 code implementations • 7 Mar 2022 • Hao Zhang, Feng Li, Shilong Liu, Lei Zhang, Hang Su, Jun Zhu, Lionel M. Ni, Heung-Yeung Shum

Compared to other models on the leaderboard, DINO significantly reduces its model size and pre-training data size while achieving better results.

Ranked #1 on Real-Time Object Detection on COCO 2017 val

Real-Time Object Detection

13,481

Paper
Code

Vision-Language Intelligence: Tasks, Representation Learning, and Large Models

no code implementations • 3 Mar 2022 • Feng Li, Hao Zhang, Yi-Fan Zhang, Shilong Liu, Jian Guo, Lionel M. Ni, Pengchuan Zhang, Lei Zhang

This survey is inspired by the remarkable progress in both computer vision and natural language processing, and recent trends shifting from single modality processing to multiple modality comprehension.

Few-Shot Learning Representation Learning

Paper
Add Code

DN-DETR: Accelerate DETR Training by Introducing Query DeNoising

16 code implementations • CVPR 2022 • Feng Li, Hao Zhang, Shilong Liu, Jian Guo, Lionel M. Ni, Lei Zhang

Our method is universal and can be easily plugged into any DETR-like methods by adding dozens of lines of code to achieve a remarkable improvement.

Object Detection

1,979

Paper
Code

Capitalization Normalization for Language Modeling with an Accurate and Efficient Hierarchical RNN Model

no code implementations • 16 Feb 2022 • Hao Zhang, You-Chi Cheng, Shankar Kumar, W. Ronny Huang, Mingqing Chen, Rajiv Mathews

Capitalization normalization (truecasing) is the task of restoring the correct case (uppercase or lowercase) of noisy text.

Federated Learning Language Modelling

Paper
Add Code

Hierarchical Point Cloud Encoding and Decoding with Lightweight Self-Attention based Model

no code implementations • 13 Feb 2022 • En Yen Puang, Hao Zhang, Hongyuan Zhu, Wei Jing

In this paper we present SA-CNN, a hierarchical and lightweight self-attention based encoding and decoding architecture for representation learning of point cloud data.

Representation Learning Retrieval

Paper
Add Code

Multi-relation Message Passing for Multi-label Text Classification

1 code implementation • 10 Feb 2022 • Muberra Ozmen, Hao Zhang, Pengyun Wang, Mark Coates

These examples motivate the modelling of multiple types of bi-directional relationships between labels.

Multi-Label Classification Multi-Label Image Classification +4

Paper
Code

Penalizing Gradient Norm for Efficiently Improving Generalization in Deep Learning

1 code implementation • 8 Feb 2022 • Yang Zhao, Hao Zhang, Xiuyuan Hu

In this paper, we propose an effective method to improve the model generalization by additionally penalizing the gradient norm of loss function during optimization.

Paper
Code

A Variational Edge Partition Model for Supervised Graph Representation Learning

1 code implementation • 7 Feb 2022 • Yilin He, Chaojie Wang, Hao Zhang, Bo Chen, Mingyuan Zhou

This paper introduces a graph generative process to model how the observed edges are generated by aggregating the node interactions over a set of overlapping node communities, each of which contributes to the edges via a logical OR mechanism.

Classification Graph Representation Learning +1

Paper
Code

Neural Dual Contouring

2 code implementations • 4 Feb 2022 • Zhiqin Chen, Andrea Tagliasacchi, Thomas Funkhouser, Hao Zhang

We introduce neural dual contouring (NDC), a new data-driven approach to mesh reconstruction based on dual contouring (DC).

Surface Reconstruction

213

Paper
Code

RIM-Net: Recursive Implicit Fields for Unsupervised Learning of Hierarchical Shape Structures

1 code implementation • CVPR 2022 • Chengjie Niu, Manyi Li, Kai Xu, Hao Zhang

Each level of the tree corresponds to an assembly of shape parts, represented as implicit functions, to reconstruct the input shape.

Paper
Code

Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning

1 code implementation • 28 Jan 2022 • Lianmin Zheng, Zhuohan Li, Hao Zhang, Yonghao Zhuang, Zhifeng Chen, Yanping Huang, Yida Wang, Yuanzhong Xu, Danyang Zhuo, Eric P. Xing, Joseph E. Gonzalez, Ion Stoica

Existing model-parallel training systems either require users to manually create a parallelization plan or automatically generate one from a limited space of model parallelism configurations.

2,984

Paper
Code

DAB-DETR: Dynamic Anchor Boxes are Better Queries for DETR

7 code implementations • ICLR 2022 • Shilong Liu, Feng Li, Hao Zhang, Xiao Yang, Xianbiao Qi, Hang Su, Jun Zhu, Lei Zhang

We present in this paper a novel query formulation using dynamic anchor boxes for DETR (DEtection TRansformer) and offer a deeper understanding of the role of queries in DETR.

Ranked #11 on 2D Object Detection on SARDet-100K

Object Detection

1,821

Paper
Code

Temporal Sentence Grounding in Videos: A Survey and Future Directions

no code implementations • 20 Jan 2022 • Hao Zhang, Aixin Sun, Wei Jing, Joey Tianyi Zhou

Temporal sentence grounding in videos (TSGV), \aka natural language video localization (NLVL) or video moment retrieval (VMR), aims to retrieve a temporal moment that semantically corresponds to a language query from an untrimmed video.

Moment Retrieval Retrieval +2

Paper
Add Code

A Privacy-Preserving Unsupervised Domain Adaptation Framework for Clinical Text Analysis

no code implementations • 18 Jan 2022 • Qiyuan An, Ruijiang Li, Lin Gu, Hao Zhang, Qingyu Chen, Zhiyong Lu, Fei Wang, Yingying Zhu

To evaluate our proposed method's utility and privacy loss, we apply our model on a medical report disease label classification task using two noisy challenging clinical text datasets.

Inference Attack Membership Inference Attack +4

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.