no code implementations • 28 Oct 2020 • Marimuthu Kalimuthu, Aditya Mogadala, Marius Mosbach, Dietrich Klakow
Building on these recent developments, and with the aim of improving the quality of generated captions, the contribution of our work in this paper is two-fold: First, we propose a generic multimodal model fusion framework for caption generation as well as emendation where we utilize different fusion strategies to integrate a pretrained Auxiliary Language Model (AuxLM) within the traditional encoder-decoder visual captioning frameworks.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +5
no code implementations • 22 Jul 2020 • Aditya Mogadala, Xiaoyu Shen, Dietrich Klakow
Particularly, these image features are subdivided into global and local features, where global features are extracted from the global representation of the image, while local features are extracted from the objects detected locally in an image.
no code implementations • 12 Jul 2020 • Aditya Mogadala, Marius Mosbach, Dietrich Klakow
Generating longer textual sequences when conditioned on the visual information is an interesting problem to explore.
no code implementations • 16 Dec 2019 • Dawei Zhu, Aditya Mogadala, Dietrich Klakow
We propose the Two-sidEd Attentive conditional Generative Adversarial Network (TEA-cGAN) to generate semantically manipulated images while preserving other contents such as background intact.
no code implementations • 22 Jul 2019 • Aditya Mogadala, Marimuthu Kalimuthu, Dietrich Klakow
Interest in Artificial Intelligence (AI) and its applications has seen unprecedented growth in the last few years.
no code implementations • 25 Oct 2017 • Aditya Mogadala, Dominik Jung, Achim Rettinger
But the gap in word usage between informal social media content such as tweets and diligently written content (e. g. news articles) make such assembling difficult.
no code implementations • 17 Oct 2017 • Aditya Mogadala, Umanga Bista, Lexing Xie, Achim Rettinger
Images in the wild encapsulate rich knowledge about varied abstract concepts and cannot be sufficiently described with models built only using image-caption pairs containing selected objects.