no code implementations • 26 Mar 2023 • Huiru Wang, Xiuhong Li, Zenyu Ren, Dan Yang, chunming Ma
To remove redundant information and make the network pay more attention to the correlation between image and text features, CNN and CBAM attention are added after splicing text features and picture features, to improve the feature representation ability.