no code implementations • 19 Jan 2020 • Kaiyu Shan, Yongtao Wang, Zhuoying Wang, TingTing Liang, Zhi Tang, Ying Chen, Yangyan Li
To efficiently extract spatiotemporal features of video for action recognition, most state-of-the-art methods integrate 1D temporal convolution into a conventional 2D CNN backbone.