當前位置：首頁 > 人工智能 > 目标检测 >内容正文

目标检测

动手学CV-目标检测入门教程6：训练与测试

發(fā)布時間：2024/7/23 目标检测 76 豆豆

生活随笔收集整理的這篇文章主要介紹了动手学CV-目标检测入门教程6：训练与测试小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

3.6、訓練與測試

本文來自開源組織 DataWhale 🐳 CV小組創(chuàng)作的目標檢測入門教程。

對應開源項目《動手學CV-Pytorch》的第3章的內容，教程中涉及的代碼也可以在項目中找到，后續(xù)會持續(xù)更新更多的優(yōu)質內容，歡迎??。

如果使用我們教程的內容或圖片，請在文章醒目位置注明我們的github主頁鏈接：https://github.com/datawhalechina/dive-into-cv-pytorch

3.6.1 模型訓練

前面的章節(jié)，我們已經對目標檢測訓練的各個重要的知識點進行了講解，下面我們需要將整個流程串起來，對模型進行訓練。

目標檢測網絡的訓練大致是如下的流程：

設置各種超參數
定義數據加載模塊 dataloader
定義網絡 model
定義損失函數 loss
定義優(yōu)化器 optimizer
遍歷訓練數據，預測-計算loss-反向傳播

首先，我們導入必要的庫，然后設定各種超參數

import time import torch.backends.cudnn as cudnn import torch.optim import torch.utils.data from model import tiny_detector, MultiBoxLoss from datasets import PascalVOCDataset from utils import *device = torch.device("cuda" if torch.cuda.is_available() else "cpu") cudnn.benchmark = True# Data parameters data_folder = '../../../dataset/VOCdevkit' # data files root path keep_difficult = True # use objects considered difficult to detect? n_classes = len(label_map) # number of different types of objects# Learning parameters total_epochs = 230 # number of epochs to train batch_size = 32 # batch size workers = 4 # number of workers for loading data in the DataLoader print_freq = 100 # print training status every __ batches lr = 1e-3 # learning rate decay_lr_at = [150, 190] # decay learning rate after these many epochs decay_lr_to = 0.1 # decay learning rate to this fraction of the existing learning rate momentum = 0.9 # momentum weight_decay = 5e-4 # weight decay

按照上面梳理的流程，編寫訓練代碼如下：

def main():"""Training."""# Initialize model and optimizermodel = tiny_detector(n_classes=n_classes)criterion = MultiBoxLoss(priors_cxcy=model.priors_cxcy)optimizer = torch.optim.SGD(params=model.parameters(),lr=lr, momentum=momentum,weight_decay=weight_decay)# Move to default devicemodel = model.to(device)criterion = criterion.to(device)# Custom dataloaderstrain_dataset = PascalVOCDataset(data_folder,split='train',keep_difficult=keep_difficult)train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=batch_size,shuffle=True,collate_fn=train_dataset.collate_fn, num_workers=workers,pin_memory=True) # Epochsfor epoch in range(total_epochs):# Decay learning rate at particular epochsif epoch in decay_lr_at:adjust_learning_rate(optimizer, decay_lr_to)# One epoch's training train(train_loader=train_loader,model=model,criterion=criterion,optimizer=optimizer,epoch=epoch)# Save checkpointsave_checkpoint(epoch, model, optimizer)

其中，我們對單個epoch的訓練邏輯進行了封裝，其具體實現如下：

def train(train_loader, model, criterion, optimizer, epoch):"""One epoch's training.:param train_loader: DataLoader for training data:param model: model:param criterion: MultiBox loss:param optimizer: optimizer:param epoch: epoch number"""model.train() # training mode enables dropoutbatch_time = AverageMeter() # forward prop. + back prop. timedata_time = AverageMeter() # data loading timelosses = AverageMeter() # lossstart = time.time()# Batchesfor i, (images, boxes, labels, _) in enumerate(train_loader):data_time.update(time.time() - start)# Move to default deviceimages = images.to(device) # (batch_size (N), 3, 224, 224)boxes = [b.to(device) for b in boxes]labels = [l.to(device) for l in labels]# Forward prop.predicted_locs, predicted_scores = model(images) # (N, 441, 4), (N, 441, n_classes)# Lossloss = criterion(predicted_locs, predicted_scores, boxes, labels) # scalar# Backward prop.optimizer.zero_grad()loss.backward()# Update modeloptimizer.step()losses.update(loss.item(), images.size(0))batch_time.update(time.time() - start)start = time.time()# Print statusif i % print_freq == 0:print('Epoch: [{0}][{1}/{2}]\t''Batch Time {batch_time.val:.3f} ({batch_time.avg:.3f})\t''Data Time {data_time.val:.3f} ({data_time.avg:.3f})\t''Loss {loss.val:.4f} ({loss.avg:.4f})\t'.format(epoch,i, len(train_loader),batch_time=batch_time,data_time=data_time, loss=losses))del predicted_locs, predicted_scores, images, boxes, labels # free some memory since their histories may be stored

完成了代碼的編寫后，我們就可以開始訓練模型了，訓練過程類似下圖所示：

$ python train.py Loaded base model.Epoch: [0][0/518] Batch Time 6.556 (6.556) Data Time 3.879 (3.879) Loss 27.7129 (27.7129) Epoch: [0][100/518] Batch Time 0.185 (0.516) Data Time 0.000 (0.306) Loss 6.1569 (8.4569) Epoch: [0][200/518] Batch Time 1.251 (0.487) Data Time 1.065 (0.289) Loss 6.3175 (7.3364) Epoch: [0][300/518] Batch Time 1.207 (0.476) Data Time 1.019 (0.282) Loss 5.6598 (6.9211) Epoch: [0][400/518] Batch Time 1.174 (0.470) Data Time 0.988 (0.278) Loss 6.2519 (6.6751) Epoch: [0][500/518] Batch Time 1.303 (0.468) Data Time 1.117 (0.276) Loss 5.4864 (6.4894) Epoch: [1][0/518] Batch Time 1.061 (1.061) Data Time 0.871 (0.871) Loss 5.7480 (5.7480) Epoch: [1][100/518] Batch Time 0.189 (0.227) Data Time 0.000 (0.037) Loss 5.8557 (5.6431) Epoch: [1][200/518] Batch Time 0.188 (0.225) Data Time 0.000 (0.036) Loss 5.2024 (5.5586) Epoch: [1][300/518] Batch Time 0.190 (0.225) Data Time 0.000 (0.036) Loss 5.5348 (5.4957) Epoch: [1][400/518] Batch Time 0.188 (0.226) Data Time 0.000 (0.036) Loss 5.2623 (5.4442) Epoch: [1][500/518] Batch Time 0.190 (0.225) Data Time 0.000 (0.035) Loss 5.3105 (5.3835) Epoch: [2][0/518] Batch Time 1.156 (1.156) Data Time 0.967 (0.967) Loss 5.3755 (5.3755) Epoch: [2][100/518] Batch Time 0.206 (0.232) Data Time 0.016 (0.042) Loss 5.6532 (5.1418) Epoch: [2][200/518] Batch Time 0.197 (0.226) Data Time 0.007 (0.036) Loss 4.6704 (5.0717)

剩下的就是等待了～

3.6.2 后處理

3.6.2.1 目標框信息解碼

之前我們的提到過，模型不是直接預測的目標框信息，而是預測的基于anchor的偏移，且經過了編碼。因此后處理的第一步，就是對模型的回歸頭的輸出進行解碼，拿到真正意義上的目標框的預測結果。

后處理還需要做什么呢？由于我們預設了大量的先驗框，因此預測時在目標周圍會形成大量高度重合的檢測框，而我們目標檢測的結果只希望保留一個足夠準確的預測框，所以就需要使用某些算法對檢測框去重。這個去重算法叫做NMS，下面我們詳細來講一講。

3.6.2.2 NMS非極大值抑制

NMS的大致算法步驟如下：

按照類別分組，依次遍歷每個類別。

當前類別按分類置信度排序，并且設置一個最低置信度閾值如0.05，低于這個閾值的目標框直接舍棄。

當前概率最高的框作為候選框，其它所有與候選框的IOU高于一個閾值（自己設定，如0.5）的框認為需要被抑制，從剩余框數組中刪除。

然后在剩余的框里尋找概率第二大的框，其它所有與第二大的框的IOU高于設定閾值的框被抑制。

依次類推重復這個過程，直至遍歷完所有剩余框，所有沒被抑制的框即為最終檢測框。

圖2-29 NMS過程

3.6.2.3 代碼實現:

整個后處理過程的代碼實現位于model.py中tiny_detector類的detect_objects函數中

def detect_objects(self, predicted_locs, predicted_scores, min_score, max_overlap, top_k):""" Decipher the 441 locations and class scores (output of the tiny_detector) to detect objects.For each class, perform Non-Maximum Suppression (NMS) on boxes that are above a minimum threshold.:param predicted_locs: predicted locations/boxes w.r.t the 441 prior boxes, a tensor of dimensions (N, 441, 4):param predicted_scores: class scores for each of the encoded locations/boxes, a tensor of dimensions (N, 441, n_classes):param min_score: minimum threshold for a box to be considered a match for a certain class:param max_overlap: maximum overlap two boxes can have so that the one with the lower score is not suppressed via NMS:param top_k: if there are a lot of resulting detection across all classes, keep only the top 'k':return: detections (boxes, labels, and scores), lists of length batch_size"""batch_size = predicted_locs.size(0)n_priors = self.priors_cxcy.size(0)predicted_scores = F.softmax(predicted_scores, dim=2) # (N, 441, n_classes)# Lists to store final predicted boxes, labels, and scores for all images in batchall_images_boxes = list()all_images_labels = list()all_images_scores = list()assert n_priors == predicted_locs.size(1) == predicted_scores.size(1)for i in range(batch_size):# Decode object coordinates from the form we regressed predicted boxes todecoded_locs = cxcy_to_xy( gcxgcy_to_cxcy(predicted_locs[i], self.priors_cxcy)) # (441, 4), these are fractional pt. coordinates# Lists to store boxes and scores for this imageimage_boxes = list()image_labels = list()image_scores = list()max_scores, best_label = predicted_scores[i].max(dim=1) # (441)# Check for each classfor c in range(1, self.n_classes):# Keep only predicted boxes and scores where scores for this class are above the minimum scoreclass_scores = predicted_scores[i][:, c] # (441)score_above_min_score = class_scores > min_score # torch.uint8 (byte) tensor, for indexingn_above_min_score = score_above_min_score.sum().item()if n_above_min_score == 0:continueclass_scores = class_scores[score_above_min_score] # (n_qualified), n_min_score <= 441class_decoded_locs = decoded_locs[score_above_min_score] # (n_qualified, 4)# Sort predicted boxes and scores by scoresclass_scores, sort_ind = class_scores.sort(dim=0, descending=True) # (n_qualified), (n_min_score)class_decoded_locs = class_decoded_locs[sort_ind] # (n_min_score, 4)# Find the overlap between predicted boxesoverlap = find_jaccard_overlap(class_decoded_locs, class_decoded_locs) # (n_qualified, n_min_score)# Non-Maximum Suppression (NMS)# A torch.uint8 (byte) tensor to keep track of which predicted boxes to suppress# 1 implies suppress, 0 implies don't suppresssuppress = torch.zeros((n_above_min_score), dtype=torch.uint8).to(device) # (n_qualified)# Consider each box in order of decreasing scoresfor box in range(class_decoded_locs.size(0)):# If this box is already marked for suppressionif suppress[box] == 1:continue# Suppress boxes whose overlaps (with current box) are greater than maximum overlap# Find such boxes and update suppress indicessuppress = torch.max(suppress, (overlap[box] > max_overlap).to(torch.uint8))# The max operation retains previously suppressed boxes, like an 'OR' operation# Don't suppress this box, even though it has an overlap of 1 with itselfsuppress[box] = 0# Store only unsuppressed boxes for this classimage_boxes.append(class_decoded_locs[1 - suppress])image_labels.append(torch.LongTensor((1 - suppress).sum().item() * [c]).to(device))image_scores.append(class_scores[1 - suppress])# If no object in any class is found, store a placeholder for 'background'if len(image_boxes) == 0:image_boxes.append(torch.FloatTensor([[0., 0., 1., 1.]]).to(device))image_labels.append(torch.LongTensor([0]).to(device))image_scores.append(torch.FloatTensor([0.]).to(device))# Concatenate into single tensorsimage_boxes = torch.cat(image_boxes, dim=0) # (n_objects, 4)image_labels = torch.cat(image_labels, dim=0) # (n_objects)image_scores = torch.cat(image_scores, dim=0) # (n_objects)n_objects = image_scores.size(0)# Keep only the top k objectsif n_objects > top_k:image_scores, sort_ind = image_scores.sort(dim=0, descending=True)image_scores = image_scores[:top_k] # (top_k)image_boxes = image_boxes[sort_ind][:top_k] # (top_k, 4)image_labels = image_labels[sort_ind][:top_k] # (top_k)# Append to lists that store predicted boxes and scores for all imagesall_images_boxes.append(image_boxes)all_images_labels.append(image_labels)all_images_scores.append(image_scores)return all_images_boxes, all_images_labels, all_images_scores # lists of length batch_size

我們的后處理代碼中NMS的部分著實有些繞，大家可以參考下Fast R-CNN中的NMS實現，更簡潔清晰一些

# -------------------------------------------------------- # Fast R-CNN # Copyright (c) 2015 Microsoft # Licensed under The MIT License [see LICENSE for details] # Written by Ross Girshick # -------------------------------------------------------- import numpy as np # dets: 檢測的 boxes 及對應的 scores； # thresh: 設定的閾值def nms(dets,thresh):# boxes 位置x1 = dets[:,0] y1 = dets[:,1] x2 = dets[:,2]y2 = dets[:,3]# boxes scoresscores = dets[:,4]areas = (x2-x1+1)*(y2-y1+1) # 各box的面積order = scores.argsort()[::-1] # 分類置信度排序keep = [] # 記錄保留下的 boxeswhile order.size > 0:i = order[0] # score最大的box對應的 indexkeep.append(i) # 將本輪score最大的box的index保留\# 計算剩余 boxes 與當前 box 的重疊程度 IoUxx1 = np.maximum(x1[i],x1[order[1:]])yy1 = np.maximum(y1[i],y1[order[1:]])xx2 = np.minimum(x2[i],x2[order[1:]])yy2 = np.minimum(y2[i],y2[order[1:]])w = np.maximum(0.0,xx2-xx1+1) # IoUh = np.maximum(0.0,yy2-yy1+1)inter = w*hovr = inter/(areas[i]+areas[order[1:]]-inter)\# 保留 IoU 小于設定閾值的 boxesinds = np.where(ovr<=thresh)[0]order = order[inds+1]return keep

3.6.3 單圖預測推理

當模型已經訓練完成后，下面我們來看下如何對單張圖片進行推理，得到目標檢測結果。

首先我們需要導入必要的python包，然后加載訓練好的模型權重。

隨后我們需要定義預處理函數。為了達到最好的預測效果，測試環(huán)節(jié)的預處理方案需要和訓練時保持一致，僅去除掉數據增強相關的變換即可。

因此，這里我們需要進行的預處理為：

將圖片縮放為 224 * 224 的大小
轉換為 Tensor 并除 255
進行減均值除方差的歸一化

# Set detect transforms (It's important to be consistent with training) resize = transforms.Resize((224, 224)) to_tensor = transforms.ToTensor() normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406],std=[0.229, 0.224, 0.225])

接著我們就來進行推理，過程很簡單，核心流程可以概括為：

讀取一張圖片
預處理
模型預測
對模型預測進行后處理

核心代碼如下：

# Transform the image image = normalize(to_tensor(resize(original_image)))# Move to default device image = image.to(device)# Forward prop. predicted_locs, predicted_scores = model(image.unsqueeze(0))# Post process, get the final detect objects from our tiny detector output det_boxes, det_labels, det_scores = model.detect_objects(predicted_locs, predicted_scores, min_score=min_score, max_overlap=max_overlap, top_k=top_k)

這里的detect_objects 函數完成模型預測結果的后處理，主要工作有兩個，首先對模型的輸出進行解碼，得到代表具體位置信息的預測框，隨后對所有預測框按類別進行NMS，來過濾掉一些多余的檢測框，也就是我們上一小節(jié)介紹的內容。

最后，我們將最終得到的檢測框結果進行繪制，得到類似如下圖的檢測結果：

完整代碼見 detect.py 腳本，下面是更多的一些VOC測試集中圖片的預測結果展示：

可以看到，我們的 tiny_detector 模型對于一些簡單的測試圖片檢測效果還是不錯的。一些更難的圖片的預測效果如下：

可以看到，當面對一些稍微有挑戰(zhàn)性的圖片的時候，我們的檢測器就開始暴露出各種個樣的問題，包括但不限于：

漏框（右圖有很多瓶子沒有檢測出來）
誤檢（右圖誤檢了一個瓶子）
重復檢測（左圖的汽車和右圖最前面的人）
定位不準，尤其是對小物體

不妨運行下 detect.py，趕快看看你訓練的模型效果如何吧，你觀察到了哪些問題，有沒有什么優(yōu)化思路呢？

3.6.4 VOC測試集評測

3.6.4.1 介紹map指標

以分類模型中最簡單的二分類為例，對于這種問題，我們的模型最終需要判斷樣本的結果是0還是1，或者說是positive還是negative。我們通過樣本的采集，能夠直接知道真實情況下，哪些數據結果是positive，哪些結果是negative。同時，我們通過用樣本數據跑出分類模型的結果，也可以知道模型認為這些數據哪些是positive，哪些是negative。因此，我們就能得到這樣四個基礎指標，稱他們是一級指標（最底層的）：

1）真實值是positive，模型認為是positive的數量（True Positive=TP）

2）真實值是positive，模型認為是negative的數量（False Negative = FN）：這就是統(tǒng)計學上的第二類錯誤（Type II Error）

3）真實值是negative，模型認為是positive的數量（False Positive = FP）：這就是統(tǒng)計學上的第一類錯誤（Type I Error）

4）真實值是negative，模型認為是negative的數量（True Negative = TN）

在機器學習領域，混淆矩陣（confusion matrix），又稱為可能性表格或錯誤矩陣。它是一種特定的矩陣用來呈現算法性能的可視化效果，通常用于監(jiān)督學習（非監(jiān)督學習，通常用匹配矩陣：matching matrix）。其每一列代表預測值，每一行代表的是實際的類別。這個名字來源于它可以非常容易的表明多個類別是否有混淆（也就是一個class被預測成另一個class）。

Example 假設有一個用來對貓（cats）、狗（dogs）、兔子（rabbits）進行分類的系統(tǒng)，混淆矩陣就是為了進一步分析性能而對該算法測試結果做出的總結。假設總共有27只動物：8只貓、6條狗、13只兔子。結果的混淆矩陣如下表：

表3-30

二級指標：混淆矩陣里面統(tǒng)計的是個數，有時候面對大量的數據，光憑算個數，很難衡量模型的優(yōu)劣。因此混淆矩陣在基本的統(tǒng)計結果上又延伸了如下4個指標，我稱他們是二級指標（通過最底層指標加減乘除得到的）：

1）準確率（Accuracy）-----針對整個模型

2）精確率（Precision）

3）靈敏度（Sensitivity）：就是召回率（Recall）

4）特異度（Specificity）

用表格的方式將這四種指標的定義、計算、理解進行匯總：

表3-31

通過上面的四個二級指標，可以將混淆矩陣中數量的結果轉化為0-1之間的比率。便于進行標準化的衡量。

三級指標：這個指標叫做F1 Score。他的計算公式是：

F1 Score = 2PR / P+R

其中，P代表Precision，R代表Recall（召回率）。F1-Score指標綜合了Precision與Recall的產出的結果。F1-Score的取值范圍從0到1,1代表模型的輸出最好，0代表模型的輸出結果最差。

AP指標即Average Precision 即平均精確度。

mAP即Mean Average Precision即平均AP值，是對多個驗證集個體求平均AP值，作為object detection中衡量檢測精度的指標。

在目標檢測場景如何計算AP呢，這里需要引出P-R曲線，即以precision和recall作為縱、橫軸坐標的二維曲線。通過選取不同閾值時對應的精度和召回率畫出，如下圖所示：

圖3-32 PR曲線

P-R曲線的總體趨勢是，精度越高，召回越低，當召回到達1時，對應概率分數最低的正樣本，這個時候正樣本數量除以所有大于等于該閾值的樣本數量就是最低的精度值。另外，P-R曲線圍起來的面積就是AP值，通常來說一個越好的分類器，AP值越高。

總結：在目標檢測中，每一類都可以根據recall和precision繪制P-R曲線，AP就是該曲線下的面積，mAP就是所有類的AP的平均值。(這里說的是VOC數據集的mAP指標的計算方法，COCO數據集的計算方法略有差異）

3.6.4.2 Tiny-Detection VOC測試集評測

運行 eval.py 腳本，評估模型在VOC2007測試集上的效果，結果如下：

python eval.py

$ python eval.py ... ... Evaluating: 100%|███████████████████████████████| 78/78 [00:57<00:00, 1.35it/s] {'aeroplane': 0.6086561679840088,'bicycle': 0.7144593596458435,'bird': 0.5847545862197876,'boat': 0.44902321696281433,'bottle': 0.2160634696483612,'bus': 0.7212041616439819,'car': 0.629608154296875,'cat': 0.8124480843544006,'chair': 0.3599272668361664,'cow': 0.5980824828147888,'diningtable': 0.6459739804267883,'dog': 0.7577021718025208,'horse': 0.7861635088920593,'motorbike': 0.702280580997467,'person': 0.5821948051452637,'pottedplant': 0.2793791592121124,'sheep': 0.5655995607376099,'sofa': 0.708049476146698,'train': 0.7575671672821045,'tvmonitor': 0.5641061663627625}Mean Average Precision (mAP): 0.602

可以看到，模型的mAP得分為60.2，比經典的YOLO網絡的63.4的得分稍低，得分還是說的過去的～

同時，我們也可以觀察到，某幾個類別，例如bottle和pottedplant的檢測效果是很差的，說明我們的模型對于小物體，較為密集的物體的檢測是存在明顯問題的。

總結

以上是生活随笔為你收集整理的动手学CV-目标检测入门教程6：训练与测试的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： PID控制器改进笔记之四：改进PID控制
下一篇： PID控制器开发笔记之七：微分先行PID