A Unified, Resilient, and Explainable Adversarial Patch Detector

Vishesh Kumar

Deep Neural Networks (DNNs), backbone architecture in `almost' every computer vision task, are vulnerable to adversarial attacks, particularly physical out-of-distribution (OOD) adversarial patches. Existing defense models often struggle with interpreting these attacks in ways that align with human visual perception. Our proposed AdvPatchXAI approach introduces a generalized, robust, and explainable defense algorithm designed to defend DNNs against physical adversarial threats. AdvPatchXAI employs a novel patch decorrelation loss that reduces feature redundancy and enhances the distinctiveness of patch representations, enabling better generalization across unseen adversarial scenarios. It learns prototypical parts self-supervised, enhancing interpretability and correlation with human vision. The model utilizes a sparse linear layer for classification, making the decision process globally interpretable through a set of learned prototypes and locally explainable by pinpointing relevant prototypes within an image. Our comprehensive evaluation shows that AdvPatchXAI closes the ``semantic'' gap between latent space and pixel space and effectively handles unseen adversarial patches even perturbed with unseen corruptions, thereby significantly advancing DNN robustness in practical settings