Search Results for author: Jianyuan Sun

Found 9 papers, 2 papers with code

Dual Transformer Decoder based Features Fusion Network for Automated Audio Captioning

no code implementations • 30 May 2023 • Jianyuan Sun, Xubo Liu, Xinhao Mei, Volkan Kılıç, Mark D. Plumbley, Wenwu Wang

Experimental results show that LHDFF outperforms existing audio captioning models.

Paper
Add Code

Towards Generating Diverse Audio Captions via Adversarial Training

no code implementations • 5 Dec 2022 • Xinhao Mei, Xubo Liu, Jianyuan Sun, Mark D. Plumbley, Wenwu Wang

Captions generated by existing models are generally faithful to the content of audio clips, however, these machine-generated captions are often deterministic (e. g., generating a fixed caption for a given audio clip), simple (e. g., using common words and simple grammar), and generic (e. g., generating the same caption for similar audio clips).

Audio captioning Generative Adversarial Network

Paper
Add Code

Visually-Aware Audio Captioning With Adaptive Audio-Visual Attention

1 code implementation • 28 Oct 2022 • Xubo Liu, Qiushi Huang, Xinhao Mei, Haohe Liu, Qiuqiang Kong, Jianyuan Sun, Shengchen Li, Tom Ko, Yu Zhang, Lilian H. Tang, Mark D. Plumbley, Volkan Kılıç, Wenwu Wang

Audio captioning aims to generate text descriptions of audio clips.

AudioCaps Audio captioning +1

Paper
Code

Automated Audio Captioning via Fusion of Low- and High- Dimensional Features

no code implementations • 10 Oct 2022 • Jianyuan Sun, Xubo Liu, Xinhao Mei, Mark D. Plumbley, Volkan Kilic, Wenwu Wang

Moreover, in LHDFF, a new PANNs encoder is proposed called Residual PANNs (RPANNs) by fusing the low-dimensional feature from the intermediate convolution layer output and the high-dimensional feature from the final layer output of PANNs.

AudioCaps Audio captioning +2

Paper
Add Code

On Metric Learning for Audio-Text Cross-Modal Retrieval

1 code implementation • 29 Mar 2022 • Xinhao Mei, Xubo Liu, Jianyuan Sun, Mark D. Plumbley, Wenwu Wang

We present an extensive evaluation of popular metric learning objectives on the AudioCaps and Clotho datasets.

AudioCaps Cross-Modal Retrieval +4

Paper
Code

Deep Neural Decision Forest for Acoustic Scene Classification

no code implementations • 7 Mar 2022 • Jianyuan Sun, Xubo Liu, Xinhao Mei, Jinzheng Zhao, Mark D. Plumbley, Volkan Kılıç, Wenwu Wang

In this paper, we propose a novel approach for ASC using deep neural decision forest (DNDF).

Acoustic Scene Classification Classification +1

Paper
Add Code

Leveraging Pre-trained BERT for Audio Captioning

no code implementations • 6 Mar 2022 • Xubo Liu, Xinhao Mei, Qiushi Huang, Jianyuan Sun, Jinzheng Zhao, Haohe Liu, Mark D. Plumbley, Volkan Kılıç, Wenwu Wang

BERT is a pre-trained language model that has been extensively used in Natural Language Processing (NLP) tasks.

AudioCaps Audio captioning +2

Paper
Add Code

Diverse Audio Captioning via Adversarial Training

no code implementations • 13 Oct 2021 • Xinhao Mei, Xubo Liu, Jianyuan Sun, Mark D. Plumbley, Wenwu Wang

As different people may describe an audio clip from different aspects using distinct words and grammars, we argue that an audio captioning system should have the ability to generate diverse captions for a fixed audio clip and across similar audio clips.

Audio captioning Generative Adversarial Network +1

Paper
Add Code

Banzhaf Random Forests

no code implementations • 22 Jul 2015 • Jianyuan Sun, Guoqiang Zhong, Junyu Dong, Yajuan Cai

Random forests are a type of ensemble method which makes predictions by combining the results of several independent trees.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.