Active Learning of Input Grammars

29 Aug 2017 · Höschele Matthias, Kampmann Alexander, Zeller Andreas ·

Knowing the precise format of a program's input is a necessary prerequisite for systematic testing. Given a program and a small set of sample inputs, we (1) track the data flow of inputs to aggregate input fragments that share the same data flow through program execution into lexical and syntactic entities; (2) assign these entities names that are based on the associated variable and function identifiers; and (3) systematically generalize production rules by means of membership queries. As a result, we need only a minimal set of sample inputs to obtain human-readable context-free grammars that reflect valid input structure. In our evaluation on inputs like URLs, spreadsheets, or configuration files, our AUTOGRAM prototype obtains input grammars that are both accurate and very readable - and that can be directly fed into test generators for comprehensive automated testing.

PDF Abstract

Code

Add Remove Mark official

No code implementations yet. Submit your code now

Datasets

Add Datasets introduced or used in this paper

Edit Social Preview

Active Learning of Input Grammars

Code Edit Add Remove Mark official

Categories

Datasets Edit

Code

Add Remove Mark official

Datasets