??CAM僅適用于網絡結構是最后一個卷積層+GAP(即global average pooling,全局平均池化)+fc的情況。這種情況時,最后一個卷積層輸出的特征圖feature_map(在下面代碼中我們命名為hook_a)經過GAP池化處理后會從C×H×W變成C×1×1,這樣最后的全連接層fc的輸入節點數就是C,所以fc的權重weights就是C × num_classes的。如果我們把weights中每個類別的向量和最后卷積層的特征圖相乘后再相加,就可以得到一個H×W尺寸的圖,這個圖成為熱圖heatmap,它反映了feature_map中每個像素對最后分類的重要程度。而卷積神經網絡始終只是進行縮放而沒有旋轉變換,所以hook_a和原輸入圖片的坐標位置是對應的,因此可以resize后和原圖片疊加起來用于顯示原圖中每個區域對指定類的貢獻程度。 ??好吧,用人類語言解釋這個事太費勁了,估計大家沒怎么看懂,show you the code:
import numpy as np
import torch.nn.functional as F
import torchvision.models as models
from torchvision.transforms.functional import normalize, resize, to_tensor, to_pil_image
import matplotlib.pyplot as plt
from matplotlib import cm
from PIL import Imagenet = models.resnet18(pretrained=True).cuda()hook_a = None
def _hook_a(module,inp,out):global hook_ahook_a = outsubmodule_dict =dict(net.named_modules())
target_layer = submodule_dict['layer4']
hook1 = target_layer.register_forward_hook(_hook_a)img_path ='images/border_collie2.jpg'
img = Image.open(img_path, mode='r').convert('RGB')
img_tensor =normalize(to_tensor(resize(img,(224,224))),[0.485,0.456,0.406],[0.229,0.224,0.225]).cuda()scores =net(img_tensor.unsqueeze(0))
hook1.remove()
class_idx =232 # class 232 corresponding to the border collie
weights = net.fc.weight.data[class_idx,:]
cam =(weights.view(*weights.shape,1,1)* hook_a.squeeze(0)).sum(0)
cam = F.relu(cam)
cam.sub_(cam.flatten(start_dim=-2).min(-1).values.unsqueeze(-1).unsqueeze(-1))
cam.div_(cam.flatten(start_dim=-2).max(-1).values.unsqueeze(-1).unsqueeze(-1))
cam = cam.data.cpu().numpy()heatmap =to_pil_image(cam, mode='F')
overlay = heatmap.resize(img.size, resample=Image.BICUBIC)
cmap = cm.get_cmap('jet')
overlay =(255*cmap(np.asarray(overlay)**2)[:,:,:3]).astype(np.uint8)
alpha =.7
result =(alpha * np.asarray(img)+(1- alpha)* overlay).astype(np.uint8)
plt.imshow(result)
圖1.原圖
圖2. CAM法輸出結果