The project addresses the problem of explaining neural networks with particular focus on classification. While at the rise of AI-based reasoning, explainability of the NN classification has mostly been done using statistical methods, nowadays, a more reliable trend of formal logic-based methods is gaining popularity. The advantage of the formal approach is that it gives strict and provable guarantees of the classification. We compute logical explanations of the NN behavior that are general and not restricted to a specific shape, allowing us to capture relationships between input features.
We implemented a prototype tool SpEXplAIn, which automatically computes Space Explanations, the most general abduction-based explanations for classifying NNs with provable guarantees of the behavior of the network in continuous areas of the input feature space. The tool leverages an SMT solver compatible with a range of flexible Craig interpolation algorithms and unsatisfiable core generation.
We experimented on a range of real-life case studies from different domains and illustrated the usefulness and flexibility of the new proof-based explanation framework. Refer to the papers below for more details.
Publications:
- Space Explanation of Neural Network Classification, Faezeh Labbaf, Tomáš Kolárik, Martin Blicha, Grigory Fedyukovich, Michael Wand, and Natasha Sharygina, CAV 2025.
- Interpreting Logical Explanations of Classifying Neural Networks, Fabrizio Leopardi, Faezeh Labbaf, Tomáš Kolárik, Michael Wand, and Natasha Sharygina, CAV 2025.





