BenchPress: Analyzing Android App Vulnerability Benchmark Suites
In recent years, various benchmark suites have been developed to evaluate the efficacy of Android security analysis tools. The choice of such benchmark suites used in tool evaluations is often based on the availability and popularity of suites and not on their characteristics and relevance. One of the reasons for such choices is the lack of information about the characteristics and relevance of benchmarks suites. In this context, we empirically evaluated four Android specific benchmark suites: DroidBench, Ghera, IccBench, and UBCBench. For each benchmark suite, we identified the APIs used by the suite that were discussed on Stack Overflow in the context of Android app development and measured the usage of these APIs in a sample of 227K real world apps (coverage). We also compared each pair of benchmark suites to identify the differences between them in terms of API usage. Finally, we identified security-related APIs used in real-world apps but not in any of the above benchmark suites to assess the opportunities to extend benchmark suites (gaps). The findings in this paper can help 1) Android security analysis tool developers choose benchmark suites that are best suited to evaluate their tools (informed by coverage and pairwise comparison) and 2) Android app vulnerability benchmark creators develop and extend benchmark suites (informed by gaps).
PDF Abstract