Search Results for author: Weixuan Sun

Found 26 papers, 19 papers with code

Various Lengths, Constant Speed: Efficient Language Modeling with Lightning Attention

1 code implementation • 27 May 2024 • Zhen Qin, Weigao Sun, Dong Li, Xuyang Shen, Weixuan Sun, Yiran Zhong

This eliminates the need for cumsum in the linear attention calculation.

217

Paper
Code

LAM3D: Large Image-Point-Cloud Alignment Model for 3D Reconstruction from Single Image

no code implementations • 24 May 2024 • Ruikai Cui, Xibin Song, Weixuan Sun, Senbo Wang, Weizhe Liu, Shenzhou Chen, Taizhang Shang, Yang Li, Nick Barnes, Hongdong Li, Pan Ji

Large Reconstruction Models have made significant strides in the realm of automated 3D content generation from single or multiple input images.

3D Reconstruction

Paper
Add Code

HGRN2: Gated Linear RNNs with State Expansion

3 code implementations • 11 Apr 2024 • Zhen Qin, Songlin Yang, Weixuan Sun, Xuyang Shen, Dong Li, Weigao Sun, Yiran Zhong

Hierarchically gated linear RNN (HGRN, Qin et al. 2023) has demonstrated competitive training speed and performance in language modeling, while offering efficient inference.

Image Classification Language Modelling

555

Paper
Code

NeuSDFusion: A Spatial-Aware Generative Model for 3D Shape Completion, Reconstruction, and Generation

no code implementations • 27 Mar 2024 • Ruikai Cui, Weizhe Liu, Weixuan Sun, Senbo Wang, Taizhang Shang, Yang Li, Xibin Song, Han Yan, Zhennan Wu, Shenzhou Chen, Hongdong Li, Pan Ji

3D shape generation aims to produce innovative 3D content adhering to specific conditions and constraints.

3D Shape Generation 3D Shape Modeling

Paper
Add Code

Frankenstein: Generating Semantic-Compositional 3D Scenes in One Tri-Plane

no code implementations • 24 Mar 2024 • Han Yan, Yang Li, Zhennan Wu, Shenzhou Chen, Weixuan Sun, Taizhang Shang, Weizhe Liu, Tian Chen, Xiaqiang Dai, Chao Ma, Hongdong Li, Pan Ji

We present Frankenstein, a diffusion-based framework that can generate semantic-compositional 3D scenes in a single pass.

Denoising

Paper
Add Code

BlockFusion: Expandable 3D Scene Generation using Latent Tri-plane Extrapolation

1 code implementation • 30 Jan 2024 • Zhennan Wu, Yang Li, Han Yan, Taizhang Shang, Weixuan Sun, Senbo Wang, Ruikai Cui, Weizhe Liu, Hiroyuki Sato, Hongdong Li, Pan Ji

A variational auto-encoder is employed to compress the tri-planes into the latent tri-plane space, on which the denoising diffusion process is performed.

Denoising Scene Generation

Paper
Code

CO2: Efficient Distributed Training with Full Communication-Computation Overlap

1 code implementation • 29 Jan 2024 • Weigao Sun, Zhen Qin, Weixuan Sun, Shidi Li, Dong Li, Xuyang Shen, Yu Qiao, Yiran Zhong

CO2 is able to attain a high scalability even on extensive multi-node clusters constrained by very limited communication bandwidth.

Paper
Code

Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models

2 code implementations • 9 Jan 2024 • Zhen Qin, Weigao Sun, Dong Li, Xuyang Shen, Weixuan Sun, Yiran Zhong

With its ability to process tokens in linear computational complexities, linear attention, in theory, can handle sequences of unlimited length without sacrificing speed, i. e., maintaining a constant training speed for various sequence lengths with a fixed memory consumption.

209

Paper
Code

All-pairs Consistency Learning for Weakly Supervised Semantic Segmentation

1 code implementation • 8 Aug 2023 • Weixuan Sun, Yanhao Zhang, Zhen Qin, Zheyuan Liu, Lin Cheng, Fanyi Wang, Yiran Zhong, Nick Barnes

Given a pair of augmented views, our approach regularizes the activation intensities between a pair of augmented views, while also ensuring that the affinity across regions within each view remains consistent.

Ranked #15 on Weakly-Supervised Semantic Segmentation on COCO 2014 val

Object Localization Weakly supervised Semantic Segmentation +1

Paper
Code

TransNormerLLM: A Faster and Better Large Language Model with Improved TransNormer

2 code implementations • 27 Jul 2023 • Zhen Qin, Dong Li, Weigao Sun, Weixuan Sun, Xuyang Shen, Xiaodong Han, Yunshen Wei, Baohong Lv, Xiao Luo, Yu Qiao, Yiran Zhong

TransNormerLLM evolves from the previous linear attention architecture TransNormer by making advanced modifications that include positional embedding, linear attention acceleration, gating mechanisms, tensor normalization, and inference acceleration and stabilization.

Language Modelling Large Language Model

217

Paper
Code

Linearized Relative Positional Encoding

no code implementations • 18 Jul 2023 • Zhen Qin, Weixuan Sun, Kaiyue Lu, Hui Deng, Dongxu Li, Xiaodong Han, Yuchao Dai, Lingpeng Kong, Yiran Zhong

Meanwhile, it emphasizes a general paradigm for designing broadly more relative positional encoding methods that are applicable to linear transformers.

Image Classification Language Modelling +2

Paper
Add Code

Candidate Set Re-ranking for Composed Image Retrieval with Dual Multi-modal Encoder

2 code implementations • 25 May 2023 • Zheyuan Liu, Weixuan Sun, Damien Teney, Stephen Gould

An alternative approach is to allow interactions between the query and every possible candidate, i. e., reference-text-candidate triplets, and pick the best from the entire set.

Ranked #2 on Image Retrieval on CIRR

Composed Image Retrieval (CoIR) Re-Ranking +1

Paper
Code

Toeplitz Neural Network for Sequence Modeling

2 code implementations • 8 May 2023 • Zhen Qin, Xiaodong Han, Weixuan Sun, Bowen He, Dong Li, Dongxu Li, Yuchao Dai, Lingpeng Kong, Yiran Zhong

Sequence modeling has important applications in natural language processing and computer vision.

Language Modelling Position

Paper
Code

An Alternative to WSSS? An Empirical Study of the Segment Anything Model (SAM) on Weakly-Supervised Semantic Segmentation Problems

1 code implementation • 2 May 2023 • Weixuan Sun, Zheyuan Liu, Yanhao Zhang, Yiran Zhong, Nick Barnes

The Segment Anything Model (SAM) has demonstrated exceptional performance and versatility, making it a promising tool for various related tasks.

Ranked #3 on Weakly-Supervised Semantic Segmentation on COCO 2014 val (using extra training data)

Pseudo Label Weakly supervised Semantic Segmentation +1

Paper
Code

Bi-directional Training for Composed Image Retrieval via Text Prompt Learning

1 code implementation • 29 Mar 2023 • Zheyuan Liu, Weixuan Sun, Yicong Hong, Damien Teney, Stephen Gould

Composed image retrieval searches for a target image based on a multi-modal user query comprised of a reference image and modification text describing the desired changes.

Ranked #6 on Image Retrieval on Fashion IQ

Composed Image Retrieval (CoIR) Retrieval

Paper
Code

Learning Audio-Visual Source Localization via False Negative Aware Contrastive Learning

1 code implementation • CVPR 2023 • Weixuan Sun, Jiayi Zhang, Jianyuan Wang, Zheyuan Liu, Yiran Zhong, Tianpeng Feng, Yandong Guo, Yanhao Zhang, Nick Barnes

Based on this observation, we propose a new learning strategy named False Negative Aware Contrastive (FNAC) to mitigate the problem of misleading the training with such false negative samples.

Contrastive Learning

Paper
Code

Audio-Visual Segmentation with Semantics

1 code implementation • 30 Jan 2023 • Jinxing Zhou, Xuyang Shen, Jianyuan Wang, Jiayi Zhang, Weixuan Sun, Jing Zhang, Stan Birchfield, Dan Guo, Lingpeng Kong, Meng Wang, Yiran Zhong

To deal with these problems, we propose a new baseline method that uses a temporal pixel-wise audio-visual interaction module to inject audio semantics as guidance for the visual segmentation process.

Segmentation Semantic Segmentation +1

432

Paper
Code

The Devil in Linear Transformer

1 code implementation • 19 Oct 2022 • Zhen Qin, Xiaodong Han, Weixuan Sun, Dongxu Li, Lingpeng Kong, Nick Barnes, Yiran Zhong

In this paper, we examine existing kernel-based linear transformers and identify two key issues that lead to such performance gaps: 1) unbounded gradients in the attention computation adversely impact the convergence of linear transformer models; 2) attention dilution which trivially distributes attention scores over long sequences while neglecting neighbouring structures.

Language Modelling Text Classification

Paper
Code

Linear Video Transformer with Feature Fixation

no code implementations • 15 Oct 2022 • Kaiyue Lu, Zexiang Liu, Jianyuan Wang, Weixuan Sun, Zhen Qin, Dong Li, Xuyang Shen, Hui Deng, Xiaodong Han, Yuchao Dai, Yiran Zhong

Therefore, we propose a feature fixation module to reweight the feature importance of the query and key before computing linear attention.

Feature Importance Video Classification

Paper
Add Code

Neural Architecture Search on Efficient Transformers and Beyond

no code implementations • 28 Jul 2022 • Zexiang Liu, Dong Li, Kaiyue Lu, Zhen Qin, Weixuan Sun, Jiacheng Xu, Yiran Zhong

To address this issue, we propose a new framework to find optimal architectures for efficient Transformers with the neural architecture search (NAS) technique.

Computational Efficiency Image Classification +2

Paper
Add Code

Audio-Visual Segmentation

1 code implementation • 11 Jul 2022 • Jinxing Zhou, Jianyuan Wang, Jiayi Zhang, Weixuan Sun, Jing Zhang, Stan Birchfield, Dan Guo, Lingpeng Kong, Meng Wang, Yiran Zhong

To deal with the AVS problem, we propose a novel method that uses a temporal pixel-wise audio-visual interaction module to inject audio semantics as guidance for the visual segmentation process.

Segmentation

432

Paper
Code

Vicinity Vision Transformer

1 code implementation • 21 Jun 2022 • Weixuan Sun, Zhen Qin, Hui Deng, Jianyuan Wang, Yi Zhang, Kaihao Zhang, Nick Barnes, Stan Birchfield, Lingpeng Kong, Yiran Zhong

Based on this observation, we present a Vicinity Attention that introduces a locality bias to vision transformers with linear complexity.

Image Classification

Paper
Code

cosFormer: Rethinking Softmax in Attention

3 code implementations • ICLR 2022 • Zhen Qin, Weixuan Sun, Hui Deng, Dongxu Li, Yunshen Wei, Baohong Lv, Junjie Yan, Lingpeng Kong, Yiran Zhong

As one of its core components, the softmax attention helps to capture long-range dependencies yet prohibits its scale-up due to the quadratic space and time complexity to the sequence length.

Ranked #4 on Offline RL on D4RL

D4RL Language Modelling +1

174

Paper
Code

GETAM: Gradient-weighted Element-wise Transformer Attention Map for Weakly-supervised Semantic segmentation

1 code implementation • 6 Dec 2021 • Weixuan Sun, Jing Zhang, Zheyuan Liu, Yiran Zhong, Nick Barnes

To bridge their gap, a Class Activation Map (CAM) is usually generated to provide pixel level pseudo labels.

Ranked #19 on Weakly-Supervised Semantic Segmentation on PASCAL VOC 2012 test

Weakly supervised Semantic Segmentation Weakly-Supervised Semantic Segmentation

Paper
Code

Inferring the Class Conditional Response Map for Weakly Supervised Semantic Segmentation

1 code implementation • 27 Oct 2021 • Weixuan Sun, Jing Zhang, Nick Barnes

To solve this, most existing approaches follow a multi-training pipeline to refine CAMs for better pseudo-labels, which includes: 1) re-training the classification model to generate CAMs; 2) post-processing CAMs to obtain pseudo labels; and 3) training a semantic segmentation model with the obtained pseudo labels.

Ranked #22 on Weakly-Supervised Semantic Segmentation on PASCAL VOC 2012 test (using extra training data)

Segmentation Weakly supervised Semantic Segmentation +1

Paper
Code

3D Guided Weakly Supervised Semantic Segmentation

no code implementations • 1 Dec 2020 • Weixuan Sun, Jing Zhang, Nick Barnes

In this paper, we propose a weakly supervised 2D semantic segmentation model by incorporating sparse bounding box labels with available 3D information, which is much easier to obtain with advanced sensors.

2D Semantic Segmentation Segmentation +2

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.