NLCA-Net v2 for Stereo Matching in ECCV'20 Robust Vision Challenge

1 Nov 2020  ·  Zhibo Rao, Mingyi He, Bo Li, Renjie He ·

Our goal in this ECCV'20 Robust Vision Challenge (RVC) is to develop a novel stereo matching system that is robust over multiple datasets instead of one specific dataset. Our developed NLCA-Net v2 used in this RVC is an extension of our previous NLCA-Net [1]. NLCA-Net v2 applied NLCA-Net as backbone and used an improved cost volume construction and the refinement modules, inspired by GwcNet and CSPN. Moreover, we updated our model normalization and activate function to further improve the performance on the multiple cross datasets. To make the results close to individual training on each dataset, multiple data augmentations are used to train the model, which include random cropping, translation transformations, and datasets balance. The contest's results have shown: 1), our model can reach top performance in the robust vision challenge; 2), our training strategy can alleviate the detrimental effect of mixed training, leading to the state-of-the-art result. 1. Our Method Given this goal, we employ our previous work NLCA-Net [1] as the backbone, which has demonstrated very good performance on individual datasets. The network architecture used in this RVC, called as NLCA-Net v2, is consists of four parts: feature extraction, cost volume construction, feature matching, and refinement, as shown in Fig. 1. As NLCA-Net can be found from our recent publication [1], in this report, only the three modifications essential to RVC will be presented briefly, including the cost volume construction, the refinement module, and the model normalization. Fig. 1: Non-local context attention network v2 (NLCA-Net v2). (1) Cost Volume Construction: Inspired by GwcNet [2], we adopt this idea to improve our previous structure in the cost volume construction. However, we use our variance volume [1] to replace group-wise correlation volume [2]. Therefore, we apply concatenation volume (concat volume) and variance volume to construct combined volume in the cost volume construction. Then, this combined volume is feed to our non-local attention matching module. (2) Refinement Module: Inspired by the convolutional spatial propagation network (CSPN) [3], we design a new refinement process with a simple structure (2D U-Net) to generate an affinity matrix as guidance and choose the probability parameter τ lager than 0.65 to construct a confident map as a sparse disparity sample. Finally, a geometry refinement module in NLCA-Net is used at the end of the network. (3) Model Normalization and Activation Function: As normalization and activation functions serve critical components and performance evaluation of the neural networks [4-6], normalization strategies and activation functions have been investigated and compared experimentally. The group normalization (GN) [7] and mish activation function (Mish) [8] are finally selected for relieving the small-batch issue and improving performance in our stereo matching on ECCV'20 RVC.

PDF
No code implementations yet. Submit your code now

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods