no code implementations • 27 Apr 2024 • Jingxue Huang, Xilai Li, Tianshu Tan, Xiaosong Li, Tao Ye
We separately trained specialized feature encoders for different modal and implemented a cross-scale fusion strategy to maintain the features from different modalities within the same representation space, ensuring a balanced information fusion process.