SHERLOCK is a corpus of 363K commonsense inferences grounded in 103K images. Annotators highlight localized clues (color bubbles) and draw plausible abductive inferences about them (speech bubbles). It can be used for testing machine capacity for abductive reasoning beyond literal image contents.
Paper | Code | Results | Date | Stars |
---|