KAgentBench is a benchmark dataset of over 3,000 human-edited, automated evaluation data for testing agent capabilities, with evaluation dimensions including planning, tool use, reflection, concluding, and profiling.

Papers


Paper Code Results Date Stars

Dataset Loaders


No data loaders found. You can submit your data loader here.

Tasks


License


  • Unknown

Modalities


Languages