Where do Models go Wrong? Parameter-Space Saliency Maps for Explainabilit

Year

2022

Type(s)

Conference proceedings

Author(s)

Roman Levin, Manli Shu, Eitan Borgnia, Furong Huang, Micah Goldblum, Tom Goldstein

Source

Neural Information Processing System (NeurIPS), 2022.

Url

https://arxiv.org/abs/2108.01335?source=techstories.org

BibTeX

Conventional saliency maps highlight input features to which neural network predictions are highly sensitive. We take a different approach to saliency, in which we identify and analyze the network parameters, rather than inputs, which are responsible for erroneous decisions. We find that samples which cause similar parameters to malfunction are semantically similar. We also show that pruning the most salient parameters for a wrongly classified sample often improves model behavior. Furthermore, fine-tuning a small number of the most salient parameters on a single sample results in error correction on other samples that are misclassified for similar reasons. Based on our parameter saliency method, we also introduce an input-space saliency technique that reveals how image features cause specific network components to malfunction. Further, we rigorously validate the meaningfulness of our saliency maps on both the dataset and case-study levels.

Furong Huang

Associate Professor @ University of Maryland

Where do Models go Wrong? Parameter-Space Saliency Maps for Explainabilit

BibTeX

Where Has Furong Been? Behind the Scenes of Our NeurIPS Competition

Past News

NeurIPS ’22 Main Conference Papers from Huang Lab @UMD