Active Learning of Input Grammars

29 Aug 2017  ·  Höschele Matthias, Kampmann Alexander, Zeller Andreas ·

Knowing the precise format of a program's input is a necessary prerequisite for systematic testing. Given a program and a small set of sample inputs, we (1) track the data flow of inputs to aggregate input fragments that share the same data flow through program execution into lexical and syntactic entities; (2) assign these entities names that are based on the associated variable and function identifiers; and (3) systematically generalize production rules by means of membership queries. As a result, we need only a minimal set of sample inputs to obtain human-readable context-free grammars that reflect valid input structure. In our evaluation on inputs like URLs, spreadsheets, or configuration files, our AUTOGRAM prototype obtains input grammars that are both accurate and very readable - and that can be directly fed into test generators for comprehensive automated testing.

PDF Abstract
No code implementations yet. Submit your code now

Categories


Programming Languages Formal Languages and Automata Theory

Datasets


  Add Datasets introduced or used in this paper