Grad-CAM | Notion

https://arxiv.org/pdf/1610.02391.pdf

Another language vision technique that influences the final cnn layer with the gradients to produce a localization map highlighting regions, introducing explainability and interpretability.

Screen Shot 2023-08-26 at 2.54.35 PM.png

<aside> 💡 In the context of Grad-CAM, a "concept" is a target class or label for which you want to understand the model's reasoning. It could be a class like 'dog' in image classification or a sequence of words in captioning tasks. A gradient is how much it contributes to the concept. A 2D localization map is used to overlay the image and identify which is significant to the concept. Gradients focus on a small piece of image, but work with a greater area of gradients to determine which is relevant to the concept.

</aside>

How does the final cnn layer produce localization map based on the gradients?

These gradients are used to weight the neuron activations in the final CNN layer, producing a weighted feature map. This weighted feature map is then resized to the dimensions of the input image to generate the localization map.

Where is the concept-specific information stored in the gradients before the final layer?

The gradients are calculated with respect to a specific target class or concept. They represent the sensitivity of the model's output to changes in each neuron's activation for that specific concept.

Shows that

lend insights into failure modes of these models (showing that seemingly unreasonable predictions have reasonable explanations). That means these failures have merit
outperform previous methods on the ILSVRC-15 weakly-supervised localization task,
- What is localizzation task
visualizations show that even non-attention based models learn to localize discriminative regions of input image. dont use attention to find similar commonalities in gradients.

Imoprtant neurons are identified by GradCAM which relate to concepts.

Partial Linearization & Importance

Definition: The weight ��αck "partially linearizes" the network, meaning it approximates the complex functions in the network with something simpler (linear).
Intuitive Explanation: It's like summarizing a complicated story (neural network operations) into a simple sentence (linear equation) that captures the essence of the story for a specific topic (class).
Barebones Example: The weight ��αck is determined by how much the activation ��Ak affects the class score ��yc.