Search Results for author: Daquan Zhou

Found 37 papers, 24 papers with code

StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation

1 code implementation • 2 May 2024 • Yupeng Zhou, Daquan Zhou, Ming-Ming Cheng, Jiashi Feng, Qibin Hou

This module converts the generated sequence of images into videos with smooth transitions and consistent subjects that are significantly more stable than the modules based on latent spaces only, especially in the context of long video generation.

motion prediction Story Generation +1

4,406

Paper
Code

PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning

1 code implementation • arXiv 2024 • Lin Xu, Yilin Zhao, Daquan Zhou, Zhijie Lin, See Kiong Ng, Jiashi Feng

PLLaVA achieves new state-of-the-art performance on modern benchmark datasets for both video question-answer and captioning tasks.

Ranked #1 on Video-based Generative Performance Benchmarking (Correctness of Information) on VideoInstruct

Dense Captioning Video-based Generative Performance Benchmarking +1

330

Paper
Code

Chain of Thought Explanation for Dialogue State Tracking

no code implementations • 7 Mar 2024 • Lin Xu, Ningxin Peng, Daquan Zhou, See-Kiong Ng, Jinlan Fu

Dialogue state tracking (DST) aims to record user queries and goals during a conversational interaction achieved by maintaining a predefined set of slots and their corresponding values.

Dialogue State Tracking

Paper
Add Code

Sora Generates Videos with Stunning Geometrical Consistency

no code implementations • 27 Feb 2024 • XuanYi Li, Daquan Zhou, Chenxu Zhang, Shaodong Wei, Qibin Hou, Ming-Ming Cheng

We employ a method that transforms the generated videos into 3D models, leveraging the premise that the accuracy of 3D reconstruction is heavily contingent on the video quality.

3D Reconstruction Video Generation

Paper
Add Code

Magic-Me: Identity-Specific Video Customized Diffusion

1 code implementation • 14 Feb 2024 • Ze Ma, Daquan Zhou, Chun-Hsiao Yeh, Xue-She Wang, Xiuyu Li, Huanrui Yang, Zhen Dong, Kurt Keutzer, Jiashi Feng

To achieve this, we propose three novel components that are essential for high-quality identity preservation and stable video generation: 1) a noise initialization method with 3D Gaussian Noise Prior for better inter-frame stability; 2) an ID module based on extended Textual Inversion trained with the cropped identity to disentangle the ID information from the background 3) Face VCD and Tiled VCD modules to reinforce faces and upscale the video to higher resolution while preserving the identity's features.

Text-to-Image Generation Video Generation

430

Paper
Code

MagicVideo-V2: Multi-Stage High-Aesthetic Video Generation

no code implementations • 9 Jan 2024 • Weimin WANG, Jiawei Liu, Zhijie Lin, Jiangqiao Yan, Shuo Chen, Chetwin Low, Tuyen Hoang, Jie Wu, Jun Hao Liew, Hanshu Yan, Daquan Zhou, Jiashi Feng

The growing demand for high-fidelity video generation from textual descriptions has catalyzed significant research in this field.

MORPH Video Generation

Paper
Add Code

A Dataset and Benchmark for Copyright Protection from Text-to-Image Diffusion Models

no code implementations • 4 Jan 2024 • Rui Ma, Qiang Zhou, Bangjun Xiao, Yizhu Jin, Daquan Zhou, Xiuyu Li, Aishani Singh, Yi Qu, Kurt Keutzer, Xiaodong Xie, Jingtong Hu, Zhen Dong, Shanghang Zhang

Copyright is a legal right that grants creators the exclusive authority to reproduce, distribute, and profit from their creative works.

Text-to-Image Generation

Paper
Add Code

Factorization Vision Transformer: Modeling Long Range Dependency with Local Window Cost

1 code implementation • 14 Dec 2023 • Haolin Qin, Daquan Zhou, Tingfa Xu, Ziyang Bian, Jianan Li

Accordingly, we propose a novel factorization self-attention mechanism (FaSA) that enjoys both the advantages of local window cost and long-range dependency modeling capability.

Paper
Code

MAgIC: Investigation of Large Language Model Powered Multi-Agent in Cognition, Adaptability, Rationality and Collaboration

1 code implementation • 14 Nov 2023 • Lin Xu, Zhiyuan Hu, Daquan Zhou, Hongyu Ren, Zhen Dong, Kurt Keutzer, See Kiong Ng, Jiashi Feng

Large Language Models (LLMs) have marked a significant advancement in the field of natural language processing, demonstrating exceptional capabilities in reasoning, tool usage, and memory.

Benchmarking Language Modelling +1

Paper
Code

ChatAnything: Facetime Chat with LLM-Enhanced Personas

no code implementations • 12 Nov 2023 • Yilin Zhao, Xinbin Yuan, ShangHua Gao, Zhijie Lin, Qibin Hou, Jiashi Feng, Daquan Zhou

For MoV, we utilize the text-to-speech (TTS) algorithms with a variety of pre-defined tones and select the most matching one based on the user-provided text description automatically.

In-Context Learning Novel Concepts +2

Paper
Add Code

EPIM: Efficient Processing-In-Memory Accelerators based on Epitome

no code implementations • 12 Nov 2023 • Chenyu Wang, Zhen Dong, Daquan Zhou, Zhenhua Zhu, Yu Wang, Jiashi Feng, Kurt Keutzer

On the hardware side, we modify the datapath of current PIM accelerators to accommodate epitomes and implement a feature map reuse technique to reduce computation cost.

Model Compression Neural Architecture Search +1

Paper
Add Code

Low-Resolution Self-Attention for Semantic Segmentation

no code implementations • 8 Oct 2023 • Yu-Huan Wu, Shi-Chen Zhang, Yun Liu, Le Zhang, Xin Zhan, Daquan Zhou, Jiashi Feng, Ming-Ming Cheng, Liangli Zhen

Semantic segmentation tasks naturally require high-resolution information for pixel-wise segmentation and global context information for class prediction.

Decoder Segmentation +1

Paper
Add Code

MaskDiffusion: Boosting Text-to-Image Consistency with Conditional Mask

no code implementations • 8 Sep 2023 • Yupeng Zhou, Daquan Zhou, Zuo-Liang Zhu, Yaxing Wang, Qibin Hou, Jiashi Feng

In this work, we identify that a crucial factor leading to the text-image mismatch issue is the inadequate cross-modality relation learning between the prompt and the output image.

Paper
Add Code

Dataset Quantization

1 code implementation • ICCV 2023 • Daquan Zhou, Kai Wang, Jianyang Gu, Xiangyu Peng, Dongze Lian, Yifan Zhang, Yang You, Jiashi Feng

Extensive experiments demonstrate that DQ is able to generate condensed small datasets for training unseen network architectures with state-of-the-art compression ratios for lossless model training.

object-detection Object Detection +2

235

Paper
Code

BuboGPT: Enabling Visual Grounding in Multi-Modal LLMs

1 code implementation • 17 Jul 2023 • Yang Zhao, Zhijie Lin, Daquan Zhou, Zilong Huang, Jiashi Feng, Bingyi Kang

Our experiments show that BuboGPT achieves impressive multi-modality understanding and visual grounding abilities during the interaction with human.

Instruction Following Sentence +1

477

Paper
Code

DiffFit: Unlocking Transferability of Large Diffusion Models via Simple Parameter-Efficient Fine-Tuning

1 code implementation • ICCV 2023 • Enze Xie, Lewei Yao, Han Shi, Zhili Liu, Daquan Zhou, Zhaoqiang Liu, Jiawei Li, Zhenguo Li

This paper proposes DiffFit, a parameter-efficient strategy to fine-tune large pre-trained diffusion models that enable fast adaptation to new domains.

Efficient Diffusion Personalization

Paper
Code

DiM: Distilling Dataset into Generative Model

2 code implementations • 8 Mar 2023 • Kai Wang, Jianyang Gu, Daquan Zhou, Zheng Zhu, Wei Jiang, Yang You

To the best of our knowledge, we are the first to achieve higher accuracy on complex architectures than simple ones, such as 75. 1\% with ResNet-18 and 72. 6\% with ConvNet-3 on ten images per class of CIFAR-10.

1,192

Paper
Code

InfoBatch: Lossless Training Speed Up by Unbiased Dynamic Data Pruning

1 code implementation • 8 Mar 2023 • Ziheng Qin, Kai Wang, Zangwei Zheng, Jianyang Gu, Xiangyu Peng, Zhaopan Xu, Daquan Zhou, Lei Shang, Baigui Sun, Xuansong Xie, Yang You

To solve this problem, we propose \textbf{InfoBatch}, a novel framework aiming to achieve lossless training acceleration by unbiased dynamic data pruning.

Semantic Segmentation

279

Paper
Code

Diffusion Probabilistic Model Made Slim

no code implementations • CVPR 2023 • Xingyi Yang, Daquan Zhou, Jiashi Feng, Xinchao Wang

Despite the recent visually-pleasing results achieved, the massive computational cost has been a long-standing flaw for diffusion probabilistic models (DPMs), which, in turn, greatly limits their applications on resource-limited platforms.

Image Generation Unconditional Image Generation

Paper
Add Code

Expanding Small-Scale Datasets with Guided Imagination

1 code implementation • NeurIPS 2023 • Yifan Zhang, Daquan Zhou, Bryan Hooi, Kai Wang, Jiashi Feng

Specifically, GIF conducts data imagination by optimizing the latent features of the seed data in the semantically meaningful space of the prior model, resulting in the creation of photo-realistic images with new content.

Paper
Code

MagicVideo: Efficient Video Generation With Latent Diffusion Models

no code implementations • 20 Nov 2022 • Daquan Zhou, Weimin WANG, Hanshu Yan, Weiwei Lv, Yizhe Zhu, Jiashi Feng

In specific, unlike existing works that directly train video models in the RGB space, we use a pre-trained VAE to map video clips into a low-dimensional latent space and learn the distribution of videos' latent codes via a diffusion model.

Ranked #10 on Text-to-Video Generation on MSR-VTT

Text-to-Video Generation Video Generation

Paper
Add Code

MagicMix: Semantic Mixing with Diffusion Models

2 code implementations • 28 Oct 2022 • Jun Hao Liew, Hanshu Yan, Daquan Zhou, Jiashi Feng

Unlike style transfer, where an image is stylized according to the reference style without changing the image content, semantic blending mixes two different concepts in a semantic manner to synthesize a novel concept while preserving the spatial layout and geometry.

Denoising Style Transfer

Paper
Code

Deep Model Reassembly

1 code implementation • 24 Oct 2022 • Xingyi Yang, Daquan Zhou, Songhua Liu, Jingwen Ye, Xinchao Wang

Given a collection of heterogeneous models pre-trained from distinct sources and with diverse architectures, the goal of DeRy, as its name implies, is to first dissect each model into distinctive building blocks, and then selectively reassemble the derived blocks to produce customized networks under both the hardware resource and performance constraints.

Transfer Learning

255

Paper
Code

Scaling & Shifting Your Features: A New Baseline for Efficient Model Tuning

1 code implementation • 17 Oct 2022 • Dongze Lian, Daquan Zhou, Jiashi Feng, Xinchao Wang

With the proposed SSF, our model obtains 2. 46% (90. 72% vs. 88. 54%) and 11. 48% (73. 10% vs. 65. 57%) performance improvement on FGVC and VTAB-1k in terms of Top-1 accuracy compared to the full fine-tuning but only fine-tuning about 0. 3M parameters.

Image Classification

151

Paper
Code

Sharpness-Aware Training for Free

1 code implementation • 27 May 2022 • Jiawei Du, Daquan Zhou, Jiashi Feng, Vincent Y. F. Tan, Joey Tianyi Zhou

Intuitively, SAF achieves this by avoiding sudden drops in the loss in the sharp local minima throughout the trajectory of the updates of the weights.

Paper
Code

Understanding The Robustness in Vision Transformers

2 code implementations • 26 Apr 2022 • Daquan Zhou, Zhiding Yu, Enze Xie, Chaowei Xiao, Anima Anandkumar, Jiashi Feng, Jose M. Alvarez

Our study is motivated by the intriguing properties of the emerging visual grouping in Vision Transformers, which indicates that self-attention may promote robustness through improved mid-level representations.

Ranked #4 on Domain Generalization on ImageNet-R (using extra training data)

Domain Generalization Image Classification +3

460

Paper
Code

M$^2$BEV: Multi-Camera Joint 3D Detection and Segmentation with Unified Birds-Eye View Representation

no code implementations • 11 Apr 2022 • Enze Xie, Zhiding Yu, Daquan Zhou, Jonah Philion, Anima Anandkumar, Sanja Fidler, Ping Luo, Jose M. Alvarez

In this paper, we propose M$^2$BEV, a unified framework that jointly performs 3D object detection and map segmentation in the Birds Eye View~(BEV) space with multi-camera image inputs.

3D Object Detection object-detection +1

Paper
Add Code

Shunted Self-Attention via Multi-Scale Token Aggregation

1 code implementation • CVPR 2022 • Sucheng Ren, Daquan Zhou, Shengfeng He, Jiashi Feng, Xinchao Wang

This novel merging scheme enables the self-attention to learn relationships between objects with different sizes and simultaneously reduces the token numbers and the computational cost.

204

Paper
Code

Refiner: Refining Self-attention for Vision Transformers

1 code implementation • 7 Jun 2021 • Daquan Zhou, Yujun Shi, Bingyi Kang, Weihao Yu, Zihang Jiang, Yuan Li, Xiaojie Jin, Qibin Hou, Jiashi Feng

Vision Transformers (ViTs) have shown competitive accuracy in image classification tasks compared with CNNs.

Ranked #174 on Image Classification on ImageNet

Image Classification

106

Paper
Code

All Tokens Matter: Token Labeling for Training Better Vision Transformers

6 code implementations • NeurIPS 2021 • Zihang Jiang, Qibin Hou, Li Yuan, Daquan Zhou, Yujun Shi, Xiaojie Jin, Anran Wang, Jiashi Feng

In this paper, we present token labeling -- a new training objective for training high-performance vision transformers (ViTs).

Ranked #3 on Efficient ViTs on ImageNet-1K (With LV-ViT-S)

Efficient ViTs General Classification +1

420

Paper
Code

AutoSpace: Neural Architecture Search with Less Human Interference

1 code implementation • ICCV 2021 • Daquan Zhou, Xiaojie Jin, Xiaochen Lian, Linjie Yang, Yujing Xue, Qibin Hou, Jiashi Feng

Current neural architecture search (NAS) algorithms still require expert knowledge and effort to design a search space for network construction.

Neural Architecture Search

Paper
Code

DeepViT: Towards Deeper Vision Transformer

5 code implementations • 22 Mar 2021 • Daquan Zhou, Bingyi Kang, Xiaojie Jin, Linjie Yang, Xiaochen Lian, Zihang Jiang, Qibin Hou, Jiashi Feng

In this paper, we show that, unlike convolution neural networks (CNNs)that can be improved by stacking more convolutional layers, the performance of ViTs saturate fast when scaled to be deeper.

Ranked #426 on Image Classification on ImageNet

Image Classification Representation Learning

134

Paper
Code

Coordinate Attention for Efficient Mobile Network Design

2 code implementations • CVPR 2021 • Qibin Hou, Daquan Zhou, Jiashi Feng

Recent studies on mobile network design have demonstrated the remarkable effectiveness of channel attention (e. g., the Squeeze-and-Excitation attention) for lifting model performance, but they generally neglect the positional information, which is important for generating spatially selective attention maps.

object-detection Object Detection +1

959

Paper
Code

ConvBERT: Improving BERT with Span-based Dynamic Convolution

7 code implementations • NeurIPS 2020 • Zi-Hang Jiang, Weihao Yu, Daquan Zhou, Yunpeng Chen, Jiashi Feng, Shuicheng Yan

The novel convolution heads, together with the rest self-attention heads, form a new mixed attention block that is more efficient at both global and local context learning.

Natural Language Understanding

126,108

Paper
Code

Progressive Tandem Learning for Pattern Recognition with Deep Spiking Neural Networks

no code implementations • 2 Jul 2020 • Jibin Wu, Cheng-Lin Xu, Daquan Zhou, Haizhou Li, Kay Chen Tan

In this paper, we propose a novel ANN-to-SNN conversion and layer-wise learning framework for rapid and efficient pattern recognition, which is referred to as progressive tandem learning of deep SNNs.

Computational Efficiency Image Reconstruction +2

Paper
Add Code

PANet: Few-Shot Image Semantic Segmentation with Prototype Alignment

5 code implementations • ICCV 2019 • Kaixin Wang, Jun Hao Liew, Yingtian Zou, Daquan Zhou, Jiashi Feng

In this paper, we tackle the challenging few-shot segmentation problem from a metric learning perspective and present PANet, a novel prototype alignment network to better utilize the information of the support set.

Ranked #70 on Few-Shot Semantic Segmentation on COCO-20i (5-shot)

Few-Shot Semantic Segmentation Metric Learning +2

304

Paper
Code

Neural Epitome Search for Architecture-Agnostic Network Compression

no code implementations • ICLR 2020 • Daquan Zhou, Xiaojie Jin, Qibin Hou, Kaixin Wang, Jianchao Yang, Jiashi Feng

The recent WSNet [1] is a new model compression method through sampling filterweights from a compact set and has demonstrated to be effective for 1D convolutionneural networks (CNNs).

Model Compression Neural Architecture Search

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.