no code implementations • 7 May 2024 • Xiao Xiang Zhu, Zhitong Xiong, Yi Wang, Adam J. Stewart, Konrad Heidler, Yuanyuan Wang, Zhenghang Yuan, Thomas Dujardin, Qingsong Xu, Yilei Shi
Foundation models have enormous potential in advancing Earth and climate sciences, however, current approaches may not be optimal as they focus on a few basic features of a desirable Earth and climate foundation model.
2 code implementations • 22 Mar 2024 • Zhitong Xiong, Yi Wang, Fahong Zhang, Adam J. Stewart, Joëlle Hanna, Damian Borth, Ioannis Papoutsis, Bertrand Le Saux, Gustau Camps-Valls, Xiao Xiang Zhu
The development of foundation models has revolutionized our ability to interpret the Earth's surface using satellite observational data.
no code implementations • 13 Mar 2024 • Shan Zhao, Ioannis Prapas, Ilektra Karasante, Zhitong Xiong, Ioannis Papoutsis, Gustau Camps-Valls, Xiao Xiang Zhu
In that direction, we propose integrating causality with Graph Neural Networks (GNNs) that explicitly model the causal mechanism among complex variables via graph learning.
no code implementations • 17 Feb 2024 • Zhenghang Yuan, Zhitong Xiong, Lichao Mou, Xiao Xiang Zhu
In this context, we introduce a global-scale, high-quality image-text dataset for remote sensing, providing natural language descriptions for Sentinel-2 data to facilitate the understanding of satellite imagery for common users.
no code implementations • 31 Jan 2024 • Shan Zhao, Zhitong Xiong, Xiao Xiang Zhu
Subseasonal forecasting, which is pivotal for agriculture, water resource management, and early warning of disasters, faces challenges due to the chaotic nature of the atmosphere.
1 code implementation • 18 Jan 2024 • Yang Zhan, Zhitong Xiong, Yuan Yuan
Specifically, after projecting RS visual features to the language domain via an alignment layer, they are fed jointly with task-specific instructions into an LLM-based RS decoder to predict answers for RS open-ended tasks.
no code implementations • 15 Jan 2024 • Zhitong Xiong, Yi Wang, Fahong Zhang, Xiao Xiang Zhu
Current remote sensing foundation models typically specialize in a single modality or a specific spatial resolution range, limiting their versatility for downstream datasets.
1 code implementation • 13 Dec 2023 • Yang Zhan, Yuan Yuan, Zhitong Xiong
To foster this task, we propose Mono3DVG-TR, an end-to-end transformer-based network, which takes advantage of both the appearance and geometry information in text embeddings for multi-modal learning and 3D object localization.
1 code implementation • 28 Sep 2023 • Sining Chen, Yilei Shi, Zhitong Xiong, Xiao Xiang Zhu
To tackle this problem, we propose a method for monocular height estimation from optical imagery, which is currently one of the richest sources of remote sensing data.
1 code implementation • 19 Sep 2023 • Fahong Zhang, Yilei Shi, Zhitong Xiong, Xiao Xiang Zhu
In this context, few-shot object detection (FSOD) has emerged as a promising direction, which aims at enabling the model to detect novel objects with only few of them annotated.
no code implementations • 11 Sep 2023 • Shan Zhao, Sudipan Saha, Zhitong Xiong, Niklas Boers, Xiao Xiang Zhu
Motivated by this, we explore a geometric deep learning-based temporal Graph Convolutional Network (GCN) for precipitation nowcasting.
2 code implementations • 11 Sep 2023 • Yi Wang, Conrad M Albrecht, Nassim Ait Ali Braham, Chenying Liu, Zhitong Xiong, Xiao Xiang Zhu
We propose Decoupling Common and Unique Representations (DeCUR), a simple yet effective method for multimodal self-supervised learning.
1 code implementation • 24 Aug 2023 • Yuan Yuan, Yang Zhan, Zhitong Xiong
To address this issue, in this work, we investigate the parameter-efficient transfer learning (PETL) method to effectively and efficiently transfer visual-language knowledge from the natural domain to the RS domain on the image-text retrieval task.
Ranked #3 on Cross-Modal Retrieval on RSICD
1 code implementation • 17 Jul 2023 • Zhaiyu Chen, Yilei Shi, Liangliang Nan, Zhitong Xiong, Xiao Xiang Zhu
We present PolyGNN, a polyhedron-based graph neural network for 3D building reconstruction from point clouds.
no code implementations • 4 Jun 2023 • Zhitong Xiong, Yanfeng Liu, Qi Wang, Xiao Xiang Zhu
We present the RSSOD-Bench dataset for salient object detection (SOD) in optical remote sensing imagery.
1 code implementation • 24 May 2023 • Zhitong Xiong, Sining Chen, Yi Wang, Lichao Mou, Xiao Xiang Zhu
Towards a fair and comprehensive analysis of existing methods, the proposed benchmark consists of 1) a large-scale dataset including co-registered RGB and nDSM pairs and pixel-wise semantic labels; 2) a comprehensive evaluation and analysis of existing multi-modal fusion strategies for both convolutional and Transformer-based networks on remote sensing data.
Ranked #1 on Semantic Segmentation on GAMUS
3 code implementations • 13 Nov 2022 • Yi Wang, Nassim Ait Ali Braham, Zhitong Xiong, Chenying Liu, Conrad M Albrecht, Xiao Xiang Zhu
Self-supervised pre-training bears potential to generate expressive representations without human annotation.
Ranked #1 on Multi-Label Image Classification on BigEarthNet (official test set) (using extra training data)
1 code implementation • 23 Oct 2022 • Yang Zhan, Zhitong Xiong, Yuan Yuan
However, the object-level visual grounding on RS images is still under-explored.
no code implementations • 10 Oct 2022 • Zhitong Xiong, Fahong Zhang, Yi Wang, Yilei Shi, Xiao Xiang Zhu
Furthermore, a new platform for EO, termed EarthNets, is released to achieve a fair and consistent evaluation of deep learning methods on remote sensing data.
1 code implementation • 30 Jul 2022 • Zhitong Xiong, Haopeng Li, Xiao Xiang Zhu
To address this problem, we propose to aggregate the learnable covariance matrices with a deformable 4D Transformer to effectively predict the segmentation map.
Ranked #1 on Few-Shot Semantic Segmentation on FSS-1000 (5-shot)
1 code implementation • 17 Jan 2022 • Zhitong Xiong, Sining Chen, Yilei Shi, Xiao Xiang Zhu
Furthermore, a novel unsupervised semantic segmentation task based on height estimation is first introduced in this work.
no code implementations • 30 Dec 2021 • Zhitong Xiong, Wei Huang, Jingtao Hu, Xiao Xiang Zhu
Therefore, we propose a new benchmark dataset to study the transferability of height estimation models in a cross-dataset setting.
1 code implementation • 12 Dec 2021 • Zhenghang Yuan, Lichao Mou, Zhitong Xiong, Xiaoxiang Zhu
In order to provide every user with flexible access to change information and help them better understand land-cover changes, we introduce a novel task: change detection-based visual question answering (CDVQA) on multi-temporal aerial images.
no code implementations • 14 Oct 2021 • Zhitong Xiong, Yuan Yuan, Qi Wang
Discriminative local theme-level and object-level representations can be selected with the DLFS module from the spatially-correlated multi-modal RGB-D features.
no code implementations • 30 Nov 2020 • Chuang Yang, Mulin Chen, Zhitong Xiong, Yuan Yuan, Qi Wang
Extensive experiments demonstrate the proposed CM is efficient and robust to fit arbitrary-shaped text instances, and also validate the effectiveness of MPF and constraints loss for discriminative text features recognition.
no code implementations • CVPR 2020 • Zhitong Xiong, Yuan Yuan, Nianhui Guo, Qi Wang
The main contributions of this work are as follows: 1) a novel VCD module is proposed, which exploits learnable Gaussian kernels to enable feature learning with structured adaptive-context; 2) variational Bayesian probabilistic modeling is introduced for the training of VCD module, which can make it continuous and more stable; 3) a perspective-aware guidance module is designed to take advantage of multi-modal information for RGB-D segmentation.
Ranked #1 on Scene Parsing on Cityscapes test
no code implementations • 5 May 2019 • Yuan Yuan, Zhitong Xiong, Student Member, Qi. Wang, Senior Member, IEEE
Our contributions are as follows: 1) We propose a multi-resolution feature fusion network architecture which exploits densely connected deconvolution layers with skip connections, and can learn more effective features for the small size object; 2) We frame the traffic sign detection as a spatial sequence classification and regression task, and propose a vertical spatial sequence attention (VSSA) module to gain more context information for better detection performance.