发布时间:2023-10-07 14:00
可解释性问题。为了建立对智能系统的信任,并将他们有意义地融入我们的日常生活中,很显然我们必须建立“透明”的模型来解释为什么它们这么预测。广义而言,这种透明度在人工智能(AI)演变的三个不同阶段非常有用。首先,当AI比人类弱得多并且还不能可靠地“部署”时(例如视觉问答[3]),透明度和解释的目标是识别失效模式[1,17],从而帮助研究人员集中精力在最富有成果的研究方向上。其次,当人工智能与人类平等并且可靠地“可部署”时(例如,在一组类别上训练了足够多的数据的图像分类[22]),目标是在用户中建立适当的信任和置信度。第三,当AI比人类强得多时(例如国际象棋或Go [39]),解释的目标是在机器教学中[20] - 即一台机器教人如何做出更好的决策。
以前的一些文章已经断言,CNN中的更深层次的表现可以捕捉到更高层次的视觉结构[5,31]。 此外,卷积特征保留了在全连接层中丢失的空间信息,因此我们可以猜想最后的卷积层在高级语义和详细空间信息之间具有最佳折衷。
对于所有类,除了所需的类(虎猫)的梯度设置为1,其余的梯度设置为零。然后将该信号反向传播到所关注的整形卷积特征图,其中我们结合起来计算粗糙的Grad-CAM定位(蓝色热力图),它表明了模型需要看哪里去做出精确决定。最后,我们将热力图与导向反向传播逐点相乘,获得高分辨率和特定概念的Guided Grad-CAM可视化。
1)求图像经过特征提取后最后一次卷积后得到的特征图(也就是VGG16 conv5_3的特征图(7x7x512))
4) 将处理后的heatmap放缩到图像尺寸大小,便于与图像加权
class CamExtractor():
Extracts cam features from the model
def __init__(self, model, target_layer):
self.model = model
self.target_layer = target_layer
self.gradients = None
def save_gradient(self, grad):
self.gradients = grad
def forward_pass_on_convolutions(self, x):
Does a forward pass on convolutions, hooks the function at given layer
conv_output = None
for module_pos, module in self.model.features._modules.items():
x = module(x) # Forward
if int(module_pos) == self.target_layer:
conv_output = x # Save the convolution output on that layer
return conv_output, x
def forward_pass(self, x):
Does a full forward pass on the model
# Forward pass on the convolutions
conv_output, x = self.forward_pass_on_convolutions(x)
x = x.view(x.size(0), -1) # Flatten
# Forward pass on the classifier
x = self.model.classifier(x)
return conv_output, x
class GradCam():
Produces class activation map
def __init__(self, model, target_layer):
self.model = model
# Define extractor
self.extractor = CamExtractor(self.model, target_layer)
def generate_cam(self, input_image, target_class=None):
# Full forward pass
# conv_output is the output of convolutions at specified layer
# model_output is the final output of the model (1, 1000)
conv_output, model_output = self.extractor.forward_pass(input_image)
if target_class is None:
target_class = np.argmax(model_output.data.numpy())
# Target for backprop
one_hot_output = torch.FloatTensor(1, model_output.size()[-1]).zero_()
one_hot_output[0][target_class] = 1
# Zero grads
# Backward pass with specified target
model_output.backward(gradient=one_hot_output, retain_graph=True)
# Get hooked gradients
guided_gradients = self.extractor.gradients.data.numpy()[0]
# Get convolution outputs
target = conv_output.data.numpy()[0]
# Get weights from gradients
weights = np.mean(guided_gradients, axis=(1, 2)) # Take averages for each gradient
# Create empty numpy array for cam
cam = np.ones(target.shape[1:], dtype=np.float32)
# Multiply each weight with its conv output and then, sum
for i, w in enumerate(weights):
cam += w * target[i, :, :]
cam = np.maximum(cam, 0)
cam = (cam - np.min(cam)) / (np.max(cam) - np.min(cam)) # Normalize between 0-1
cam = np.uint8(cam * 255) # Scale between 0-255 to visualize
cam = np.uint8(Image.fromarray(cam).resize((input_image.shape[2],
input_image.shape[3]), Image.ANTIALIAS))/255
# ^ I am extremely unhappy with this line. Originally resizing was done in cv2 which
# supports resizing numpy matrices with antialiasing, however,
# when I moved the repository to PIL, this option was out of the window.
# So, in order to use resizing with ANTIALIAS feature of PIL,
# I briefly convert matrix to PIL image and then back.
# If there is a more beautiful way, do not hesitate to send a PR.
# You can also use the code below instead of the code line above, suggested by @ ptschandl
# from scipy.ndimage.interpolation import zoom
# cam = zoom(cam, np.array(input_image[0].shape[1:])/np.array(cam.shape))
return cam