no code implementations • 8 Dec 2023 • Mohammad Reza Taesiri, Tianjun Feng, Anh Nguyen, Cor-Paul Bezemer
To address this gap, we introduce GlitchBench, a novel benchmark derived from video game quality assurance tasks, to test and evaluate the reasoning capabilities of LMMs.