Spatial Transformer Networks(STN)代码分析
生活随笔
收集整理的這篇文章主要介紹了
Spatial Transformer Networks(STN)代码分析
小編覺得挺不錯的,現在分享給大家,幫大家做個參考.
這是比較早的關于 attention的 文章了。
早且作用大,效果也不錯。
關于這篇文章的解讀有很多,一找一大堆,就不再贅述。
首先看看文章的解讀,看懂原理,然后找到代碼,對著看看,明白之后就自己會改了,就可以用到自己需要的地方了。
例如,文章解說和代碼可參考:
一個文章解說地址
一個code地址
簡單來說,就是在分類之前,先將原圖作用于一個變換矩陣得到新的圖,再去分類。
所以核心就是
1、得到變換矩陣,一個2*3的矩陣,可以實現平移縮放旋轉裁剪等操作。
2、通過變換矩陣得到射變換前后的坐標的映射關系,即grid。
2、原圖作用于grid之后得到新圖,再卷積輸出分類。
一個使用代碼如下:
class STNSVHNet(nn.Module):def __init__(self, spatial_dim,in_channels, stn_kernel_size, kernel_size, num_classes=10, use_dropout=False):super(STNSVHNet, self).__init__()self._in_ch = in_channels self._ksize = kernel_size self._sksize = stn_kernel_sizeself.ncls = num_classes self.dropout = use_dropout self.drop_prob = 0.5self.stride = 1 self.spatial_dim = spatial_dimself.stnmod = STNModule.SpatialTransformer(self._in_ch, self.spatial_dim, self._sksize)self.conv1 = nn.Conv2d(self._in_ch, 32, kernel_size=self._ksize, stride=self.stride, padding=1, bias=False)self.conv2 = nn.Conv2d(32, 64, kernel_size=self._ksize, stride=1, padding=1, bias=False)self.conv3 = nn.Conv2d(64, 128, kernel_size=self._ksize, stride=1, padding=1, bias=False)self.fc1 = nn.Linear(128*4*4, 3092)self.fc2 = nn.Linear(3092, self.ncls)def forward(self, x):rois, affine_grid = self.stnmod(x)out = F.relu(self.conv1(rois))out = F.max_pool2d(out, 2)out = F.relu(self.conv2(out))out = F.max_pool2d(out, 2)out = F.relu(self.conv3(out))out = out.view(-1, 128*4*4)if self.dropout:out = F.dropout(self.fc1(out), p=0.5)else:out = self.fc1(out)out = self.fc2(out)return out被調用的STN代如下:
class SpatialTransformer(nn.Module):"""Implements a spatial transformer as proposed in the Jaderberg paper. Comprises of 3 parts:1. Localization Net2. A grid generator 3. A roi pooled module.The current implementation uses a very small convolutional net with 2 convolutional layers and 2 fully connected layers. Backends can be swapped in favor of VGG, ResNets etc. TTMVReturns:A roi feature map with the same input spatial dimension as the input feature map. """def __init__(self, in_channels, spatial_dims, kernel_size,use_dropout=False):super(SpatialTransformer, self).__init__()self._h, self._w = spatial_dims self._in_ch = in_channels self._ksize = kernel_sizeself.dropout = use_dropout# localization net self.conv1 = nn.Conv2d(in_channels, 32, kernel_size=self._ksize, stride=1, padding=1, bias=False) # size : [1x3x32x32]self.conv2 = nn.Conv2d(32, 32, kernel_size=self._ksize, stride=1, padding=1, bias=False)self.conv3 = nn.Conv2d(32, 32, kernel_size=self._ksize, stride=1, padding=1, bias=False)self.conv4 = nn.Conv2d(32, 32, kernel_size=self._ksize, stride=1, padding=1, bias=False)self.fc1 = nn.Linear(32*4*4, 1024)self.fc2 = nn.Linear(1024, 6)def forward(self, x): """Forward pass of the STN module. x -> input feature map """batch_images = xx = F.relu(self.conv1(x.detach()))x = F.relu(self.conv2(x))x = F.max_pool2d(x, 2)x = F.relu(self.conv3(x))x = F.max_pool2d(x,2)x = F.relu(self.conv3(x))x = F.max_pool2d(x, 2)print("Pre view size:{}".format(x.size()))x = x.view(-1, 32*4*4)if self.dropout:x = F.dropout(self.fc1(x), p=0.5)x = F.dropout(self.fc2(x), p=0.5)else:x = self.fc1(x)x = self.fc2(x) # params [Nx6]x = x.view(-1, 2,3) # change it to the 2x3 matrix print(x.size())affine_grid_points = F.affine_grid(x, torch.Size((x.size(0), self._in_ch, self._h, self._w)))assert(affine_grid_points.size(0) == batch_images.size(0)), "The batch sizes of the input images must be same as the generated grid."rois = F.grid_sample(batch_images, affine_grid_points)print("rois found to be of size:{}".format(rois.size()))return rois, affine_grid_points核心代碼就兩句
affine_grid_points = F.affine_grid(x, torch.Size((x.size(0), self._in_ch, self._h, self._w))) rois = F.grid_sample(batch_images, affine_grid_points)可以參考這個理解一下:
Pytorch中的仿射變換(affine_grid)
- batch_images:是原圖
- X:是2*3的變換矩陣,是原圖經過一系列卷積等網絡結構得到。
- X后面的參數:表示在仿射變換中的輸出的shape,其格式 [N, C, H, W],這里使得輸出的size大小維度和原圖一致。
- F.affine_grid:即affine_grid_points 是得到仿射變換前后的坐標的映射關系。返回Shape為 [N, H, W, 2] 的4-D Tensor,表示其中,N、H、W分別為仿射變換中輸出feature map的batch size、高和寬。
- grid_sample:就是將映射關系作用于原圖,得到新的圖,再將新圖進行卷積等操作,輸出即可。
因為是有監督學習,所以X會自己學習得到。后面就都有了。
《新程序員》:云原生和全面數字化實踐50位技術專家共同創作,文字、視頻、音頻交互閱讀總結
以上是生活随笔為你收集整理的Spatial Transformer Networks(STN)代码分析的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 最近看了看保险
- 下一篇: scipy minimize当目标函数需