VIST-E consists of 49,913 training samples, 4,963 validation samples and 5,030 test samples, which is modified from VIST dataset. As every sample in VIST contains a story of five sentences, each sample in VIST-E contains the story ending, the ending-related image and the first four sentences in the story as the story context. Additionally, each sentence is trimmed down to a maximum of 40 words.
Paper | Code | Results | Date | Stars |
---|