目标检测之Faster-RCNN的pytorch代码详解(数据预处理篇)
首先貼上代碼原作者的github:https://github.com/chenyuntc/simple-faster-rcnn-pytorch(非代碼作者,博文只解釋代碼)
今天看完了simple-faster-rcnn-pytorch-master代碼的最后一個(gè)train.py文件,是時(shí)候認(rèn)真的總結(jié)一下了,我打算一共總結(jié)四篇博客用來(lái)詳細(xì)的分析Faster-RCNN的代碼的pytorch實(shí)現(xiàn),?四篇博客的內(nèi)容及目錄結(jié)構(gòu)如下:
1 Faster-RCNN的數(shù)據(jù)讀取及預(yù)處理部分:(對(duì)應(yīng)于代碼的/simple-faster-rcnn-pytorch-master/data文件夾):https://www.cnblogs.com/kerwins-AC/p/9734381.html?
2 Faster-RCNN的模型準(zhǔn)備部分:(對(duì)應(yīng)于代碼目錄/simple-faster-rcnn-pytorch-master/model/utils/文件夾):https://www.cnblogs.com/kerwins-AC/p/9752679.html
3 Faster-RCNN的模型正式介紹:(對(duì)應(yīng)于代碼目錄/simple-faster-rcnn-pytorch-master/model/文件夾):? ? ? ? ?尚未完成
4 Faster-RCNN的訓(xùn)練代碼部分:(對(duì)應(yīng)于代碼目錄/simple-faster-rcnn-pytorch-master/train.py,trainer.py代碼):https://www.cnblogs.com/kerwins-AC/p/9728731.html
?本篇博客主要介紹代碼的數(shù)據(jù)預(yù)處理部分的內(nèi)容,對(duì)應(yīng)于以下幾個(gè)文件:
?首先是dataset.py文件,我們用函數(shù)流程圖看一下它的結(jié)構(gòu):
?然后老規(guī)矩一個(gè)函數(shù)一個(gè)函數(shù)的分析它的內(nèi)容和功能!
1 def inverse_normalize(img)函數(shù)代碼如下:
1 def inverse_normalize(img): 2 if opt.caffe_pretrain: 3 img = img + (np.array([122.7717, 115.9465, 102.9801]).reshape(3, 1, 1)) 4 return img[::-1, :, :] 5 # approximate un-normalize for visualize 6 return (img * 0.225 + 0.45).clip(min=0, max=1) * 255inverse_normalize()
函數(shù)首先讀取opt.caffe_pretrain判斷是否使用caffe_pretrain進(jìn)行預(yù)訓(xùn)練如果是的話,對(duì)圖片進(jìn)行逆正則化處理,就是將圖片處理成caffe模型需要的格式
2 def pytorch_normalize(img) 函數(shù)代碼如下:
?
1 def pytorch_normalze(img): 2 """ 3 https://github.com/pytorch/vision/issues/223 4 return appr -1~1 RGB 5 """ 6 normalize = tvtsf.Normalize(mean=[0.485, 0.456, 0.406], 7 std=[0.229, 0.224, 0.225]) 8 img = normalize(t.from_numpy(img)) 9 return img.numpy()pytorch_normalize
?
函數(shù)首先設(shè)置歸一化參數(shù)normalize=tvtsf.Normalize(mean=[0.485,0.456,0.406],std=[0.229,0.224,0.225]) 然后對(duì)圖片進(jìn)行歸一化處理img=normalize(t.from_numpy(img))
3 def caffe_normalize(img)函數(shù)代碼如下:
1 def caffe_normalize(img): 2 """ 3 return appr -125-125 BGR 4 """ 5 img = img[[2, 1, 0], :, :] # RGB-BGR 6 img = img * 255 7 mean = np.array([122.7717, 115.9465, 102.9801]).reshape(3, 1, 1) 8 img = (img - mean).astype(np.float32, copy=True) 9 return imgcaffe_normalize(img)
caffe的圖片格式是BGR,所以需要img[[2,1,0],:,:]將RGB轉(zhuǎn)換成BGR的格式,然后圖片img = img*255 , mean = np.array([122.7717,115.9465,102.9801]).reshape(3,1,1)設(shè)置圖片均值
然后用圖片減去均值完成caffe形式的歸一化處理
4 def preprocess(img, min_size=600, max_size=1000)函數(shù)代碼如下:
1 def preprocess(img, min_size=600, max_size=1000): 2 """Preprocess an image for feature extraction. 3 4 The length of the shorter edge is scaled to :obj:`self.min_size`. 5 After the scaling, if the length of the longer edge is longer than 6 :param min_size: 7 :obj:`self.max_size`, the image is scaled to fit the longer edge 8 to :obj:`self.max_size`. 9 10 After resizing the image, the image is subtracted by a mean image value 11 :obj:`self.mean`. 12 13 Args: 14 img (~numpy.ndarray): An image. This is in CHW and RGB format. 15 The range of its value is :math:`[0, 255]`. 16 17 Returns: 18 ~numpy.ndarray: A preprocessed image. 19 20 """ 21 C, H, W = img.shape 22 scale1 = min_size / min(H, W) 23 scale2 = max_size / max(H, W) 24 scale = min(scale1, scale2) 25 img = img / 255. 26 img = sktsf.resize(img, (C, H * scale, W * scale), mode='reflect',anti_aliasing=False) 27 # both the longer and shorter should be less than 28 # max_size and min_size 29 if opt.caffe_pretrain: 30 normalize = caffe_normalize 31 else: 32 normalize = pytorch_normalze 33 return normalize(img)preprocess()
圖片處理函數(shù),C,H,W = img.shape 讀取圖片格式通道,高度,寬度?
Scale1 = min_size/min(H,W)
Scale2 = max_size / max(H,W)
Scale = min(scale1,scale2)設(shè)置放縮比,這個(gè)過(guò)程很直覺(jué),選小的方便大的和小的都能夠放縮到合適的位置
img? = img/ 255
img = sktsf.resize(img,(C,H*scale,W*scale),model='reflecct')將圖片調(diào)整到合適的大小位于(min_size,max_size)之間、
然后根據(jù)opt.caffe_pretrain是否存在選擇調(diào)用前面的pytorch正則化還是caffe_pretrain正則化
?
5 class Transform(object):代碼如下
?
1 class Transform(object): 2 3 def __init__(self, min_size=600, max_size=1000): 4 self.min_size = min_size 5 self.max_size = max_size 6 7 def __call__(self, in_data): 8 img, bbox, label = in_data 9 _, H, W = img.shape 10 img = preprocess(img, self.min_size, self.max_size) 11 _, o_H, o_W = img.shape 12 scale = o_H / H 13 bbox = util.resize_bbox(bbox, (H, W), (o_H, o_W)) 14 15 # horizontally flip 16 img, params = util.random_flip( 17 img, x_random=True, return_param=True) 18 bbox = util.flip_bbox( 19 bbox, (o_H, o_W), x_flip=params['x_flip']) 20 21 return img, bbox, label, scaleTransform
?
__init__函數(shù)設(shè)置了圖片的最小最大尺寸,本pytorch代碼中min_size=600,max_size=1000
__call__函數(shù)中 從in_data中讀取 img,bbox,label 圖片,bboxes的框框和label
然后從_,H,W = img.shape讀取出圖片的長(zhǎng)和寬
img = preposses(img,self.min_size,self.max_size)將圖片進(jìn)行最小最大化放縮然后進(jìn)行歸一化
_,o_H,o_W = img.shape 讀取放縮后圖片的shape?
scale = o_H/H 放縮前后相除,得出放縮比因子
bbox = util.reszie_bbox(bbox,(H,W),(o_H,o_W)) 重新調(diào)整bboxes框的大小
img,params = utils.random_flip(img.x_random =True,return_param=True)進(jìn)行圖片的隨機(jī)反轉(zhuǎn),圖片旋轉(zhuǎn)不變性,增強(qiáng)網(wǎng)絡(luò)的魯棒性!
同樣的對(duì)bboxes進(jìn)行隨機(jī)反轉(zhuǎn),最后返回img,bbox,label,scale
6 class Dataset 代碼如下
?
1 class Dataset: 2 def __init__(self, opt): 3 self.opt = opt 4 self.db = VOCBboxDataset(opt.voc_data_dir) 5 self.tsf = Transform(opt.min_size, opt.max_size) 6 7 def __getitem__(self, idx): 8 ori_img, bbox, label, difficult = self.db.get_example(idx) 9 10 img, bbox, label, scale = self.tsf((ori_img, bbox, label)) 11 # TODO: check whose stride is negative to fix this instead copy all 12 # some of the strides of a given numpy array are negative. 13 return img.copy(), bbox.copy(), label.copy(), scale 14 15 def __len__(self): 16 return len(self.db)class Dataset
?
__init__初始化設(shè)置self.opt =opt ,self.db = VOCBboxDataset(opt.voc_data_dir)以及self.tsf = Transform(opt.min_size,opt.max_size)?
—getitem__可以簡(jiǎn)單的理解為從數(shù)據(jù)集存儲(chǔ)路徑中將例子一個(gè)個(gè)的獲取出來(lái),然后調(diào)用前面的Transform函數(shù)將圖片,label進(jìn)行最小值最大值放縮歸一化,重新調(diào)整bboxes的大小,然后隨機(jī)反轉(zhuǎn),最后將數(shù)據(jù)集返回!
7 class TestDataset 代碼如下
?
1 class TestDataset: 2 def __init__(self, opt, split='test', use_difficult=True): 3 self.opt = opt 4 self.db = VOCBboxDataset(opt.voc_data_dir, split=split, use_difficult=use_difficult) 5 6 def __getitem__(self, idx): 7 ori_img, bbox, label, difficult = self.db.get_example(idx) 8 img = preprocess(ori_img) 9 return img, ori_img.shape[1:], bbox, label, difficult 10 11 def __len__(self): 12 return len(self.db)TestDataset
?
?
TestData完成的功能和前面類似,但是獲取調(diào)用的數(shù)據(jù)集是不同的,因?yàn)閐ef __init__(self,opt,split='test',use_difficult=True)可以看到它在從Voc_data_dir中獲取數(shù)據(jù)的時(shí)候使用了split='test'也就是從test往后分割的部分?jǐn)?shù)據(jù)送入到TestDataset的self.db中,然后在進(jìn)行圖片處理的時(shí)候,并沒(méi)有調(diào)用transform函數(shù),因?yàn)闇y(cè)試圖片集沒(méi)有bboxes需要考慮,同時(shí)測(cè)試圖片集也不需要隨機(jī)反轉(zhuǎn),反轉(zhuǎn)無(wú)疑為測(cè)試準(zhǔn)確率設(shè)置了阻礙!所以直接調(diào)用preposses()函數(shù)進(jìn)行最大值最小值裁剪然后歸一化就完成了測(cè)試數(shù)據(jù)集的處理!最后將整個(gè)self.db返回,至此,dataset.py介紹完畢
?
轉(zhuǎn)載于:https://www.cnblogs.com/kerwins-AC/p/9734381.html
總結(jié)
以上是生活随笔為你收集整理的目标检测之Faster-RCNN的pytorch代码详解(数据预处理篇)的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問(wèn)題。
- 上一篇: 求一个女生qq闺蜜网名
- 下一篇: 血糖仪多少钱啊?