Exploring Multimodal Features and Fusion Strategies for Analyzing Disaster Tweets

6 Nov 2020  ·  Raj Ratn Pranesh, Ambesh Shekhar, Anish Kumar ·

Social media platforms such as Twitter often provide firsthand news during the outbreak of a crisis. It is extremely essential to process these facts quickly to plan the response efforts for minimal loss. Processing this social media information poses multiple challenges such as parsing noisy messages containing both texts and images. Furthermore, these messages are diverse, from personal achievements and opinions to situational crises. Therefore, in this paper, we present an analysis of various multimodal feature fusion techniques to analyze and classify the disaster tweets into multiple crisis events via transfer learning. In our study, we utilized three image modelsVGG19(Simonyan and Zisserman 2014), ResNet-50(He et al. 2016) and AlexNet pre-trained on ImageNet(Deng et al. 2009) dataset and three fine-tuned language modelsBERT(Devlin et al. 2018), ALBERT(Lan et al. 2019) and RoBERTa(Liu et al. 2019) to learn the visual and textual feature of the data and combine them to make predictions. We have presented a systematic analysis of multiple intramodal as well as cross-modal fusion strategies and their effect over the performance of the multimodal disaster classification system. In our experiment, we used 8,242 disaster tweets each consisting of image and text data with five disaster event classes. The results show that the multimodal with transformer-attention mechanism and factorized bilinear pooling (FBP)(Zhang, Wang, and Du 2019) for intramodal and cross-modal feature fusion achieved the best performance.

PDF

Datasets


Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods