no code implementations • 1 Nov 2023 • Yuxiang Bao, Di Qiu, Guoliang Kang, Baochang Zhang, Bo Jin, Kaiye Wang, Pengfei Yan
As a result, the corresponding regions across the adjacent frames can share closely-related query tokens and attention outputs, which can further improve latent-level consistency to enhance visual temporal coherence of generated videos.
no code implementations • 21 Jul 2016 • Kaiye Wang, Qiyue Yin, Wei Wang, Shu Wu, Liang Wang
To speed up the cross-modal retrieval, a number of binary representation learning methods are proposed to map different modalities of data into a common Hamming space.