MTTN is a large scale derived and synthesized dataset built with on real prompts and indexed with popular image-text datasets like MS-COCO, Flickr, etc. MTTN consists of over 2.4M sentences that are divided over 5 stages creating a combination amounting to over 12M pairs, along with a vocab size of consisting more than 300 thousands unique words that creates an abundance of variations.
Source: MTTN: Multi-Pair Text to Text Narratives for Prompt GenerationPaper | Code | Results | Date | Stars |
---|