no code implementations • 19 Mar 2024 • Dojun Park, Jiwoo Lee, Hyeyun Jeong, Seohyun Park, Sungeun Lee
The current evaluation of Large Language Models (LLMs) predominantly relies on benchmarks focusing on their embedded knowledge by testing through multiple-choice questions (MCQs), a format inherently suited for automated evaluation.