Search Results for author: Ying Cheng

Found 9 papers, 3 papers with code

Modality-Aware Contrastive Instance Learning with Self-Distillation for Weakly-Supervised Audio-Visual Violence Detection

1 code implementation • 12 Jul 2022 • Jiashuo Yu, Jinyu Liu, Ying Cheng, Rui Feng, Yuejie Zhang

In this paper, we analyze the modality asynchrony and undifferentiated instances phenomena of the multiple instance learning (MIL) procedure, and further investigate its negative impact on weakly-supervised audio-visual learning.

Ranked #6 on Anomaly Detection In Surveillance Videos on XD-Violence

Anomaly Detection In Surveillance Videos audio-visual learning +1

Paper
Code

IDEA: Increasing Text Diversity via Online Multi-Label Recognition for Vision-Language Pre-training

1 code implementation • 12 Jul 2022 • Xinyu Huang, Youcai Zhang, Ying Cheng, Weiwei Tian, RuiWei Zhao, Rui Feng, Yuejie Zhang, Yaqian Li, Yandong Guo, Xiaobo Zhang

However, the image-text pairs co-occurrent on the Internet typically lack explicit alignment information, which is suboptimal for VLP.

Multi-Label Learning Object +1

Paper
Code

Learning Music-Dance Representations through Explicit-Implicit Rhythm Synchronization

no code implementations • 7 Jul 2022 • Jiashuo Yu, Junfu Pu, Ying Cheng, Rui Feng, Ying Shan

Although audio-visual representation has been proved to be applicable in many downstream tasks, the representation of dancing videos, which is more specific and always accompanied by music with complex auditory contents, remains challenging and uninvestigated.

Contrastive Learning Representation Learning +2

Paper
Add Code

Self-Supervised Video Representation Learning with Motion-Contrastive Perception

no code implementations • 10 Apr 2022 • Jinyu Liu, Ying Cheng, Yuejie Zhang, Rui-Wei Zhao, Rui Feng

Visual-only self-supervised learning has achieved significant improvement in video representation learning.

Contrastive Learning Representation Learning +1

Paper
Add Code

MM-Pyramid: Multimodal Pyramid Attentional Network for Audio-Visual Event Localization and Video Parsing

1 code implementation • 24 Nov 2021 • Jiashuo Yu, Ying Cheng, Rui-Wei Zhao, Rui Feng, Yuejie Zhang

Recognizing and localizing events in videos is a fundamental task for video understanding.

audio-visual event localization Video Understanding

Paper
Code

Domain Adaptive Cascade R-CNN for MItosis DOmain Generalization (MIDOG) Challenge

no code implementations • 1 Sep 2021 • Xi Long, Ying Cheng, Xiao Mu, Lian Liu, Jingxin Liu

We present a summary of the domain adaptive cascade R-CNN method for mitosis detection of digital histopathology images.

Data Augmentation Domain Generalization +1

Paper
Add Code

MPN: Multimodal Parallel Network for Audio-Visual Event Localization

no code implementations • 7 Apr 2021 • Jiashuo Yu, Ying Cheng, Rui Feng

The localization subnetwork consists of Multimodal Bottleneck Attention Module (MBAM), which is designed to extract fine-grained segment-level contents.

audio-visual event localization General Classification

Paper
Add Code

Look, Listen, and Attend: Co-Attention Network for Self-Supervised Audio-Visual Representation Learning

no code implementations • 13 Aug 2020 • Ying Cheng, Ruize Wang, Zhihao Pan, Rui Feng, Yuejie Zhang

When watching videos, the occurrence of a visual event is often accompanied by an audio event, e. g., the voice of lip motion, the music of playing instruments.

Action Recognition Audio-Visual Synchronization +1

Paper
Add Code

Keep it Consistent: Topic-Aware Storytelling from an Image Stream via Iterative Multi-agent Communication

no code implementations • COLING 2020 • Ruize Wang, Zhongyu Wei, Ying Cheng, Piji Li, Haijun Shan, Ji Zhang, Qi Zhang, Xuanjing Huang

Visual storytelling aims to generate a narrative paragraph from a sequence of images automatically.

Ranked #9 on Visual Storytelling on VIST

Image Captioning Question Generation +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.