Characterizing the Accuracy/Complexity Landscape of Explanations of Deep Networks through Knowledge Extraction

ICLR 2019  ·  Simon Odense, Artur d'Avila Garcez ·

Knowledge extraction techniques are used to convert neural networks into symbolic descriptions with the objective of producing more comprehensible learning models. The central challenge is to find an explanation which is more comprehensible than the original model while still representing that model faithfully. The distributed nature of deep networks has led many to believe that the hidden features of a neural network cannot be explained by logical descriptions simple enough to be understood by humans, and that decompositional knowledge extraction should be abandoned in favour of other methods. In this paper we examine this question systematically by proposing a knowledge extraction method using \textit{M-of-N} rules which allows us to map the complexity/accuracy landscape of rules describing hidden features in a Convolutional Neural Network (CNN). Experiments reported in this paper show that the shape of this landscape reveals an optimal trade off between comprehensibility and accuracy, showing that each latent variable has an optimal \textit{M-of-N} rule to describe its behaviour. We find that the rules with optimal tradeoff in the first and final layer have a high degree of explainability whereas the rules with the optimal tradeoff in the second and third layer are less explainable. The results shed light on the feasibility of rule extraction from deep networks, and point to the value of decompositional knowledge extraction as a method of explainability.

PDF Abstract
No code implementations yet. Submit your code now

Tasks


Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here