Formal Reasoning on Neural Networks


The project addresses the problem of explaining neural networks with particular focus on classification. In this context Space Explanations are introduced as a novel type of explanation. They are defined via logic formulae that provide sufficient conditions for classification. The purpose of the research is to extend the state-of-the-art so as to obtain a new flexible and expressive framework to produce explanations. Traditional explanation techniques are typically not logic-based and usually provide sample-based explanations. We believe the use of logics in the loop to be a sound approach to overcome some limitations of classical approaches.

 

We introduce the SpEXplAIn tool for the computation of Space Explanations. The tool is able to encode details regarding the classification procedure into logic formulae and reason over them to provide logic guarantees over a subset of the feature space that is identified as a region of space where the neural network classification is constant, and therefore the belonging of a sample to such a region is deemed to be a sufficient condition for classification. The tool supports as additional options the choice of the strategy employed to compute explanations. Different strategies lead to explanations of different logical strength, i.e. to regions of the feature space of different size.

 

Extensive experimentation showed that the approach is sound and flexible. As an emergent behavior Space Explanations proved themselves to be helpful also for the task of locally estimating the decision boundary of a classifier neural network.

 

Publications:

  1. Space Explanation of Neural Network Classification, Faezeh Labbaf, Tomáš Kolárik, Martin Blicha, Grigory Fedyukovich, Michael Wand, and Natasha Sharygina, CAV 2025.