TensorFlow Model Interpretability: GradCAM Visualization (2026)

Grad-CAM lets you see where a convolutional neural network is looking to make its decisions.

Let’s see it in action. Imagine we have a model trained to classify images of cats and dogs. We feed it an image of a cat and it correctly predicts "cat." But why did it think it was a cat? Grad-CAM will highlight the regions of the image that most strongly influenced this decision.

Here’s a simplified Python snippet using TensorFlow and Keras. This isn’t runnable code on its own, but it shows the core idea:

import tensorflow as tf
from tensorflow.keras.models import Model
from tensorflow.keras.applications import VGG16
from tensorflow.keras.applications.vgg16 import preprocess_input, decode_predictions
from tensorflow.keras.preprocessing import image
import numpy as np
import cv2 # OpenCV for image manipulation

# 1. Load a pre-trained model (e.g., VGG16)
base_model = VGG16(weights='imagenet')

# 2. Choose the layer to visualize (a convolutional layer)
# VGG16 has several conv layers. Let's pick one from the later blocks.
# Common choices are 'block5_conv1', 'block4_conv1', etc.
layer_name = 'block5_conv1'
layer_output = base_model.get_layer(layer_name).output

# 3. Create a new model that outputs the layer's feature maps
# and the final prediction (e.g., the logit for the predicted class)
# We need the output *before* the final softmax to get the logit.
# For VGG16, the final dense layer is 'predictions', and its output index
# corresponds to the class. We need to find the index of the predicted class.

# Create a model that outputs the feature maps of the chosen layer
conv_model = Model(inputs=base_model.input, outputs=layer_output)

# Create a model that outputs the final prediction logits
# For VGG16, the output layer is 'predictions' which has 1000 units.
# We'll need to know which index corresponds to our prediction.
# A simpler approach for demonstration is to get the final classification layer's output.
# Let's assume we want to visualize the top predicted class.

# Model to get the final prediction
prediction_model = Model(inputs=base_model.input, outputs=base_model.output)

# 4. Load and preprocess an image
img_path = 'path/to/your/cat_image.jpg' # Replace with an actual image path
img = image.load_img(img_path, target_size=(224, 224))
x = image.img_to_array(img)
x = np.expand_dims(x, axis=0)
x = preprocess_input(x)

# 5. Get model predictions and the predicted class index
preds = prediction_model.predict(x)
predicted_class_index = np.argmax(preds[0])
predicted_class_label = decode_predictions(preds, top=1)[0][0][1]

# 6. Compute Gradients
# We need the gradients of the predicted class's logit with respect to the
# feature maps of our chosen convolutional layer.
# This requires a custom gradient tape.

# Get the output of the chosen convolutional layer for our image
conv_output = conv_model.predict(x) # Shape: (1, H, W, C) where C is num filters

# We need to calculate gradients of the *predicted class's score*
# with respect to the *activation maps* of the chosen layer.
# For this, we'll use a custom gradient computation.

# Let's create a temporary model that outputs the logit for the predicted class
# and the feature maps of the chosen layer.
class GradCamModel(Model):
    def __init__(self, base_model, layer_name):
        super().__init__()
        self.base_model = base_model
        self.layer_name = layer_name
        self.layer_output = base_model.get_layer(layer_name).output
        self.logit_output = base_model.output # This is the raw logit output

    def call(self, inputs):
        # Get feature maps
        feature_maps = self.base_model.get_layer(self.layer_name)(inputs)
        # Get logits
        logits = self.base_model(inputs)
        return logits, feature_maps

# Instantiate the custom model
gradcam_model = GradCamModel(base_model, layer_name)

with tf.GradientTape() as tape:
    logits, conv_outputs = gradcam_model(x)
    # Select the logit for the predicted class
    class_logit = logits[:, predicted_class_index]

# Compute gradients of the class logit with respect to the convolutional layer's outputs
# The output `conv_outputs` has shape (1, H, W, C). We want gradients w.r.t. each (H, W, C) element.
# The tape will compute d(class_logit) / d(conv_outputs).
gradients = tape.gradient(class_logit, conv_outputs) # Shape: (1, H, W, C)

# 7. Compute the Grad-CAM weights
# Global Average Pooling of the gradients. This gives a weight for each filter.
# The weights represent the importance of each filter for the predicted class.
weights = tf.reduce_mean(gradients, axis=(1, 2)) # Shape: (1, C)

# 8. Compute the weighted sum of feature maps
# Multiply each feature map by its corresponding weight and sum them up.
cam = np.sum(weights.numpy()[0, :] * conv_outputs.numpy()[0, :, :, :], axis=-1) # Shape: (H, W)

# 9. Post-process the Grad-CAM map
# Apply ReLU to keep only positive contributions, resize to original image size, and normalize.
cam = cv2.resize(cam, (img.size[0], img.size[1]), interpolation=cv2.INTER_LINEAR)
cam = np.maximum(cam, 0) # ReLU
cam_max = np.max(cam)
if cam_max != 0:
    cam = cam / cam_max

# 10. Visualize the result
# Overlay the heatmap on the original image.
img_original = image.load_img(img_path)
img_original = image.img_to_array(img_original)

heatmap = cv2.applyColorMap(np.uint8(255 * cam), cv2.COLORMAP_JET) # Use JET colormap
superimposed_img = heatmap * 0.4 + img_original # Blend with original image
superimposed_img = np.clip(superimposed_img, 0, 255).astype(np.uint8)

# Display or save the superimposed image
# cv2.imshow('Grad-CAM', superimposed_img)
# cv2.waitKey(0)
# cv2.destroyAllWindows()
# cv2.imwrite('gradcam_cat.jpg', superimposed_img)

The core problem Grad-CAM solves is the "black box" nature of deep neural networks, especially convolutional neural networks (CNNs). You train a model, it performs well, but you have no idea why it made a particular classification. Grad-CAM addresses this by providing a coarse localization map highlighting the important regions in the input image that were influential for a specific prediction. It doesn’t tell you what features the network learned (like "edges" or "textures") but rather where in the image the network focused its attention.

Internally, Grad-CAM works by taking the gradients of the target class score with respect to the feature maps of a chosen convolutional layer. These gradients tell us how much each feature map influences the final prediction. We then perform a global average pooling on these gradients to obtain a weight for each filter in that layer. These weights represent the "importance" of each filter for the specific class. Finally, we compute a weighted sum of the feature maps from that layer, using the computed weights. This weighted sum, after some post-processing (ReLU and normalization), forms the Grad-CAM heatmap. The heatmap is then overlaid on the original image to visualize the regions of interest.

The exact convolutional layer you choose to extract feature maps from significantly impacts the visualization. Earlier layers in the network capture more low-level features (edges, corners), while later layers capture more high-level semantic features. Visualizing with a layer from earlier in the network might highlight simpler shapes or textures, whereas a later layer might highlight more object-part specific regions. The choice depends on what level of interpretability you’re aiming for.

The one thing most people don’t realize is that Grad-CAM is a family of methods. While the original Grad-CAM uses global average pooling of gradients, variations like Grad-CAM++ or Score-CAM use different strategies to derive the weights for the feature maps. These variations aim to produce heatmaps that are more faithful to the network’s decision-making process or that highlight finer details. The core principle of using gradients and feature maps remains, but the aggregation and weighting mechanisms differ.

The next step is exploring how to use these visualizations for debugging and improving model performance.

More Deep Dives in Tensorflow