Exploring General Intelligence of Program Analysis for Multiple Tasks

29 Sep 2021  ·  Yixin Guo, Pengcheng Li, Yingwei Luo, Xiaolin Wang, Zhenlin Wang ·

Artificial intelligence are gaining more attractions for program analysis and semantic understanding. Nowadays, the prevalent program embedding techniques usually target at one single task, for example detection of binary similarity, program classification, program comment auto-complement, etc, due to the ever-growing program complexities and scale. To this end, we explore a generic program embedding approach that aim at solving multiple program analysis tasks. We design models to extract features of a program, represent the program as an embedding, and use this embedding to solve various analysis tasks. Since different tasks require not only access to the features of the source code, but also are highly relevant to its compilation process, traditional source code or AST-based embedding approaches are no longer applicable. Therefore, we propose a new program embedding approach that constructs a program representation based on the assembly code and simultaneously exploits the rich graph structure information present in the program. We tested our model on two tasks, program classification and binary similarity detection, and obtained accuracy of 80.35% and 45.16%, respectively.

PDF Abstract
No code implementations yet. Submit your code now

Tasks


Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here