no code implementations • 18 Dec 2020 • Thomas Winterbottom, Sarah Xiao, Alistair McLean, Noura Al Moubayed
We share our results on the TVQA baseline model, and the recently proposed heterogeneous-memory-enchanced multimodal attention (HME) model.
1 code implementation • 18 Dec 2020 • Thomas Winterbottom, Sarah Xiao, Alistair McLean, Noura Al Moubayed
Our results demonstrate that models trained on only the visual information can answer ~45% of the questions, while using only the subtitles achieves ~68%.