KAgentBench is a benchmark dataset of over 3,000 human-edited, automated evaluation data for testing agent capabilities, with evaluation dimensions including planning, tool use, reflection, concluding, and profiling.
Paper | Code | Results | Date | Stars |
---|