當前位置：首頁 > 人工智能 > 目标检测 >内容正文

目标检测

深度学习目标检测(YoloV5)项目——从0开始到项目落地部署

發布時間：2025/3/21 目标检测 79 豆豆

生活随笔收集整理的這篇文章主要介紹了深度学习目标检测(YoloV5)项目——从0开始到项目落地部署小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

前言

訓練和開發環境是win10,顯卡RTX3080;cuda10.2,cudnn7.1;OpenCV4.5;yolov5用的是5s的模型，2020年8月13日的發布v3.0這個版本; ncnn版本是20210525;C++ IDE vs2019，Anaconda 3.5。

一、環境安裝

1.anaconda環境

創建環境

conda create --name yolov5 python=3.7activate yolov5

退出環境

conda deactivate

查看已安裝的環境

conda info --env

刪除環境

conda env remove -n yolov5

2.安裝依賴

git clone https://github.com/ultralytics/yolov5.gitcd yolov5pip install -r requirements.txt

或者

git clone https://github.com/ultralytics/yolov5.git cd yolov5 conda install pytorch torchvision cudatoolkit=10.2 -c pytorch pip install cython matplotlib tqdm opencv-python tensorboard scipy pillow onnx pyyaml pandas seaborn

win下盡量不要用cuda11,試了幾次都是要么找不到GPU,要么跑到一半崩了。

二、數據處理

1.數據標注用labelme，身份證的數據我從網上找了一些公開的模板數據，然后用對抗生成了一批數據進行標注，300張樣本左右，labelme標注出來的數據格式是xml。
2.在yolo/data 目錄下創建一個存放數據集的目錄,目錄下再分兩個目錄，JPEGImages存放原始圖像，Annotations存在放標簽文件。
3.數據標注用labelme標注成.xml，但yolo要的標簽格式是.txt,所以要把數據轉換過來。

數據生成訓練集與驗證集,在data/xxxx目錄下會 train.txt 和val.txt，輸出所有標注的類名,并在JPEGImages下生成與文件名對應的.txt文件。
執行命令：

python generate_txt.py --img_path data/XXXXX/JPEGImages --xml_path data/XXXXX/Annotations --out_path data/XXXXX

輸出標注的類名樣例：如[‘ida’, ‘idb’]。
生成的.txt文件
類名歸一化后的目標坐標點

0 0.518 0.7724887556221889 0.296 0.15367316341829085 3 0.4475 0.7694902548725637 0.089 0.08620689655172414

數據處理代碼
generate_txt.py

import os import glob import argparse import random import xml.etree.ElementTree as ET from PIL import Image from tqdm import tqdmdef get_all_classes(xml_path):xml_fns = glob.glob(os.path.join(xml_path, '*.xml'))class_names = []for xml_fn in xml_fns:tree = ET.parse(xml_fn)root = tree.getroot()for obj in root.iter('object'):cls = obj.find('name').textclass_names.append(cls)return sorted(list(set(class_names)))def convert_annotation(img_path, xml_path, class_names, out_path):output = []im_fns = glob.glob(os.path.join(img_path, '*.jpg'))for im_fn in tqdm(im_fns):if os.path.getsize(im_fn) == 0:continuexml_fn = os.path.join(xml_path, os.path.splitext(os.path.basename(im_fn))[0] + '.xml')if not os.path.exists(xml_fn):continueimg = Image.open(im_fn)height, width = img.height, img.widthtree = ET.parse(xml_fn)root = tree.getroot()anno = []xml_height = int(root.find('size').find('height').text)xml_width = int(root.find('size').find('width').text)if height != xml_height or width != xml_width:print((height, width), (xml_height, xml_width), im_fn)continuefor obj in root.iter('object'):cls = obj.find('name').textcls_id = class_names.index(cls)xmlbox = obj.find('bndbox')xmin = int(xmlbox.find('xmin').text)ymin = int(xmlbox.find('ymin').text)xmax = int(xmlbox.find('xmax').text)ymax = int(xmlbox.find('ymax').text)cx = (xmax + xmin) / 2.0 / widthcy = (ymax + ymin) / 2.0 / heightbw = (xmax - xmin) * 1.0 / widthbh = (ymax - ymin) * 1.0 / heightanno.append('{} {} {} {} {}'.format(cls_id, cx, cy, bw, bh))if len(anno) > 0:output.append(im_fn)with open(im_fn.replace('.jpg', '.txt'), 'w') as f:f.write('\n'.join(anno))random.shuffle(output)train_num = int(len(output) * 0.9)with open(os.path.join(out_path, 'train.txt'), 'w') as f:f.write('\n'.join(output[:train_num]))with open(os.path.join(out_path, 'val.txt'), 'w') as f:f.write('\n'.join(output[train_num:]))def parse_args():parser = argparse.ArgumentParser('generate annotation')parser.add_argument('--img_path', type=str, help='input image directory')parser.add_argument('--xml_path', type=str, help='input xml directory')parser.add_argument('--out_path', type=str, help='output directory')args = parser.parse_args()return argsif __name__ == '__main__':args = parse_args()class_names = get_all_classes(args.xml_path)print(class_names)convert_annotation(args.img_path, args.xml_path, class_names, args.out_path)

三、模型訓練

1.model/yolov5s.yaml,更改nc數目。

# parameters nc: 2 # 檢測總類別 depth_multiple: 0.33 # model depth multiple 網絡的深度系數 width_multiple: 0.50 # layer channel multiple 卷積核的系數# anchors 候選框，可以改成自己目標的尺寸，也可以增加候選框 anchors:- [10,13, 16,30, 33,23] # P3/8- [30,61, 62,45, 59,119] # P4/16- [116,90, 156,198, 373,326] # P5/32# YOLOv5 backbone backbone: #特征提取模塊# [from, number, module, args]# from - 輸入是什么，-1：上一層的輸出結果;# number - 該層的重復的次數，要乘以系數，小于1則等于1 源碼（ n = max(round(n * gd), 1) if n > 1 else n）# module - 層的名字# args - 卷積核的個數[[-1, 1, Focus, [64, 3]], # 0-P1/2 # 64要乘以卷積核的個數 64*0.5 = 32個特征圖[-1, 1, Conv, [128, 3, 2]], # 1-P2/4[-1, 3, BottleneckCSP, [128]],[-1, 1, Conv, [256, 3, 2]], # 3-P3/8[-1, 9, BottleneckCSP, [256]],[-1, 1, Conv, [512, 3, 2]], # 5-P4/16[-1, 9, BottleneckCSP, [512]],[-1, 1, Conv, [1024, 3, 2]], # 7-P5/32[-1, 1, SPP, [1024, [5, 9, 13]]],[-1, 3, BottleneckCSP, [1024, False]], # 9]# YOLOv5 head head:[[-1, 1, Conv, [512, 1, 1]],[-1, 1, nn.Upsample, [None, 2, 'nearest']],[[-1, 6], 1, Concat, [1]], # cat backbone P4[-1, 3, BottleneckCSP, [512, False]], # 13[-1, 1, Conv, [256, 1, 1]],[-1, 1, nn.Upsample, [None, 2, 'nearest']],[[-1, 4], 1, Concat, [1]], # cat backbone P3[-1, 3, BottleneckCSP, [256, False]], # 17 (P3/8-small)[-1, 1, Conv, [256, 3, 2]],[[-1, 14], 1, Concat, [1]], # cat head P4[-1, 3, BottleneckCSP, [512, False]], # 20 (P4/16-medium)[-1, 1, Conv, [512, 3, 2]],[[-1, 10], 1, Concat, [1]], # cat head P5[-1, 3, BottleneckCSP, [1024, False]], # 23 (P5/32-large)[[17, 20, 23], 1, Detect, [nc, anchors]], # Detect(P3, P4, P5) [17,20,23] #17層、20層、23層;]

2.在data目錄下添加一個xxx.yaml訓練數據配置文件。

# download command/URL (optional) download: bash data/scripts/get_voc.sh# 訓練集txt與驗證集txt路徑 train: data/xxx/train.txt val: data/xxx/val.txt# 總類別數 nc: 2# 類名 names: ['ida', 'idb']

3.訓練參數

parser = argparse.ArgumentParser()parser.add_argument('--weights', type=str, default='yolov5s.pt', help='initial weights path') # 權重文件，是否在使用預訓練權重文件parser.add_argument('--cfg', type=str, default='', help='model.yaml path') # 網絡配置文件parser.add_argument('--data', type=str, default='data/coco128.yaml', help='data.yaml path') # 訓練數據集目錄parser.add_argument('--hyp', type=str, default='data/hyp.scratch.yaml', help='hyperparameters path') #超參數配置文件parser.add_argument('--epochs', type=int, default=300) # 訓練迭代次數parser.add_argument('--batch-size', type=int, default=32, help='total batch size for all GPUs') # batch-size大小parser.add_argument('--img-size', nargs='+', type=int, default=[640, 640], help='[train, test] image sizes') # 訓練圖像大小parser.add_argument('--rect', action='store_true', help='rectangular training') #矩形訓練parser.add_argument('--resume', nargs='?', const=True, default=False, help='resume most recent training') # 是否接著上一次的日志權重繼續訓練parser.add_argument('--nosave', action='store_true', help='only save final checkpoint') # 不保存parser.add_argument('--notest', action='store_true', help='only test final epoch') # 不測試parser.add_argument('--noautoanchor', action='store_true', help='disable autoanchor check')parser.add_argument('--evolve', action='store_true', help='evolve hyperparameters') #超參數范圍parser.add_argument('--bucket', type=str, default='', help='gsutil bucket')parser.add_argument('--cache-images', action='store_true', help='cache images for faster training') #是否緩存圖像parser.add_argument('--image-weights', action='store_true', help='use weighted image selection for training')parser.add_argument('--device', default='', help='cuda device, i.e. 0 or 0,1,2,3 or cpu') # 用GPU或者CPU進行訓練parser.add_argument('--multi-scale', action='store_true', help='vary img-size +/- 50%%') #是否多尺度訓練parser.add_argument('--single-cls', action='store_true', help='train as single-class dataset') # 是否一個類別parser.add_argument('--adam', action='store_true', help='use torch.optim.Adam() optimizer') # 優化器先擇parser.add_argument('--sync-bn', action='store_true', help='use SyncBatchNorm, only available in DDP mode')parser.add_argument('--local_rank', type=int, default=-1, help='DDP parameter, do not modify')parser.add_argument('--log-imgs', type=int, default=16, help='number of images for W&B logging, max 100')parser.add_argument('--workers', type=int, default=8, help='maximum number of dataloader workers') #win不能改，win上改不改都容易崩parser.add_argument('--project', default='runs/train', help='save to project/name')parser.add_argument('--name', default='exp', help='save to project/name')parser.add_argument('--exist-ok', action='store_true', help='existing project/name ok, do not increment')opt = parser.parse_args()

4.訓練命令

單卡：

python train.py --cfg models/yolov5s.yaml --data data/ODID.yaml --hyp data/hyps/hyp.scratch.yaml --epochs 100 --multi-scale --device 0

多卡：

python train.py --cfg models/yolov5s.yaml --data data/ODID.yaml --hyp data/hyps/hyp.scratch.yaml --epochs 100 --multi-scale --device 0，1

5.測試模型

python test.py --weights runs/train/exp/weights/best.pt --data data/ODID.yaml --device 0 --verbose --weights: 訓練得到的模型 --data：數據配置文件.txt --device：選擇gpu進行評測 --verbose：是否打印每一類的評測指標

OpenCV DNN C++ 推理

1.由于OpenCV DNN中的slice層不支持step為2，所以在轉換模型時需要修改代碼，修改的地方在models/common.py中Focus類

修改前：

class Focus(nn.Module):# Focus wh information into c-spacedef __init__(self, c1, c2, k=1, s=1, p=None, g=1, act=True): # ch_in, ch_out, kernel, stride, padding, groupssuper(Focus, self).__init__()self.conv = Conv(c1 * 4, c2, k, s, p, g, act)def forward(self, x): # x(b,c,w,h) -> y(b,4c,w/2,h/2)return self.conv(torch.cat([x[..., ::2, ::2], x[..., 1::2, ::2], x[..., ::2, 1::2], x[..., 1::2, 1::2]], 1))

修改后

class Focus(nn.Module):# Focus wh information into c-spacedef __init__(self, c1, c2, k=1, s=1, p=None, g=1, act=True): # ch_in, ch_out, kernel, stride, padding, groupssuper(Focus, self).__init__()self.conv = Conv(c1 * 4, c2, k, s, p, g, act)def forward(self, x): # x(b,c,w,h) -> y(b,4c,w/2,h/2)#return self.conv(torch.cat([x[..., ::2, ::2], x[..., 1::2, ::2], x[..., ::2, 1::2], x[..., 1::2, 1::2]], 1))return self.conv(x)

2.轉換模型

python models/export.py --weights runs/exp/weights/best.pt # --weights: 訓練得到的模型

運行后，onnx模型保存為了runs/exp/weights/best.onnx，這個模型就可以用OpenCV DNN進行推理。

3.DNN C++推理

#include <iostream> #include <string> #include <vector> #include <fstream> #include <sstream> #include <opencv2/opencv.hpp> #include <opencv2/dnn.hpp>void imshow(std::string name, const cv::Mat& cv_src) {cv::namedWindow(name, 0);int max_rows = 800;int max_cols = 800;if (cv_src.rows >= cv_src.cols && cv_src.rows > max_rows){cv::resizeWindow(name, cv::Size(cv_src.cols * max_rows / cv_src.rows, max_rows));}else if (cv_src.cols >= cv_src.rows && cv_src.cols > max_cols){cv::resizeWindow(name, cv::Size(max_cols, cv_src.rows * max_cols / cv_src.cols));}cv::imshow(name, cv_src); }inline float sigmoid(float x) {return 1.f / (1.f + exp(-x)); }void sliceAndConcat(cv::Mat& img, cv::Mat* input) {const float* srcData = img.ptr<float>();float* dstData = input->ptr<float>();using Vec12f = cv::Vec<float, 12>;for (int i = 0; i < input->size[2]; i++){for (int j = 0; j < input->size[3]; j++){for (int k = 0; k < 3; ++k){dstData[k * input->size[2] * input->size[3] + i * input->size[3] + j] =srcData[k * img.size[2] * img.size[3] + 2 * i * img.size[3] + 2 * j];}for (int k = 0; k < 3; ++k){dstData[(3 + k) * input->size[2] * input->size[3] + i * input->size[3] + j] =srcData[k * img.size[2] * img.size[3] + (2 * i + 1) * img.size[3] + 2 * j];}for (int k = 0; k < 3; ++k) {dstData[(6 + k) * input->size[2] * input->size[3] + i * input->size[3] + j] =srcData[k * img.size[2] * img.size[3] + 2 * i * img.size[3] + 2 * j + 1];}for (int k = 0; k < 3; ++k){dstData[(9 + k) * input->size[2] * input->size[3] + i * input->size[3] + j] =srcData[k * img.size[2] * img.size[3] + (2 * i + 1) * img.size[3] + 2 * j + 1];}}} }std::vector<cv::String> getOutputNames(const cv::dnn::Net& net) {static std::vector<cv::String> names;if (names.empty()){std::vector<int> outLayers = net.getUnconnectedOutLayers();std::vector<cv::String> layersNames = net.getLayerNames();names.resize(outLayers.size());for (size_t i = 0; i < outLayers.size(); i++){names[i] = layersNames[outLayers[i] - 1];}}return names; }void drawPred(int classId, float conf, int left, int top, int right, int bottom, cv::Mat& frame,const std::vector<std::string> &classes) {cv::rectangle(frame, cv::Point(left, top), cv::Point(right, bottom), cv::Scalar(0, 255, 0), 3);std::string label = cv::format("%.2f", conf);if (!classes.empty()) {CV_Assert(classId < (int)classes.size());label = classes[classId] + ": " + label;}int baseLine;cv::Size labelSize = cv::getTextSize(label, cv::FONT_HERSHEY_SIMPLEX, 0.5, 1, &baseLine);top = std::max(top, labelSize.height);cv::rectangle(frame, cv::Point(left, top - round(1.5 * labelSize.height)), cv::Point(left + round(1.5 * labelSize.width), top + baseLine), cv::Scalar(0, 255, 0), cv::FILLED);cv::putText(frame, label, cv::Point(left, top), cv::FONT_HERSHEY_SIMPLEX, 0.75, cv::Scalar(), 2); }void postprocess(cv::Mat& cv_src, std::vector<cv::Mat>& outs, const std::vector<std::string>& classes, int net_size) {float confThreshold = 0.4f;float nmsThreshold = 0.5f;std::vector<int> classIds;std::vector<float> confidences;std::vector<cv::Rect> boxes;int strides[] = { 8, 16, 32 };std::vector<std::vector<int> > anchors = {{ 10,13, 16,30, 33,23 },{ 30,61, 62,45, 59,119 },{ 116,90, 156,198, 373,326 }};for (size_t k = 0; k < outs.size(); k++){float* data = outs[k].ptr<float>();int stride = strides[k];int num_classes = outs[k].size[4] - 5;for (int i = 0; i < outs[k].size[2]; i++){for (int j = 0; j < outs[k].size[3]; j++){for (int a = 0; a < outs[k].size[1]; ++a){float* record = data + a * outs[k].size[2] * outs[k].size[3] * outs[k].size[4] +i * outs[k].size[3] * outs[k].size[4] + j * outs[k].size[4];float* cls_ptr = record + 5;for (int cls = 0; cls < num_classes; cls++) {float score = sigmoid(cls_ptr[cls]) * sigmoid(record[4]);if (score > confThreshold){float cx = (sigmoid(record[0]) * 2.f - 0.5f + (float)j) * (float)stride;float cy = (sigmoid(record[1]) * 2.f - 0.5f + (float)i) * (float)stride;float w = pow(sigmoid(record[2]) * 2.f, 2) * anchors[k][2 * a];float h = pow(sigmoid(record[3]) * 2.f, 2) * anchors[k][2 * a + 1];float x1 = std::max(0, std::min(cv_src.cols, int((cx - w / 2.f) * (float)cv_src.cols / (float)net_size)));float y1 = std::max(0, std::min(cv_src.rows, int((cy - h / 2.f) * (float)cv_src.rows / (float)net_size)));float x2 = std::max(0, std::min(cv_src.cols, int((cx + w / 2.f) * (float)cv_src.cols / (float)net_size)));float y2 = std::max(0, std::min(cv_src.rows, int((cy + h / 2.f) * (float)cv_src.rows / (float)net_size)));classIds.push_back(cls);confidences.push_back(score);boxes.push_back(cv::Rect(cv::Point(x1, y1), cv::Point(x2, y2)));}}}}}}std::vector<int> indices;cv::dnn::NMSBoxes(boxes, confidences, confThreshold, nmsThreshold, indices);for (size_t i = 0; i < indices.size(); i++) {int idx = indices[i];cv::Rect box = boxes[idx];drawPred(classIds[idx], confidences[idx], box.x, box.y,box.x + box.width, box.y + box.height, cv_src, classes);} }int main(int argc, char* argv[]) {std::string path = "images";std::vector<std::string> filenames;cv::glob(path, filenames, false);for (auto name : filenames){cv::Mat cv_src = cv::imread(name);if (cv_src.empty()){continue;}std::vector<std::string> class_names{ "ida","idb" };int net_size = 640;cv::Mat blob = cv::dnn::blobFromImage(cv_src, 1.0 / 255, cv::Size(net_size, net_size),cv::Scalar(0, 0, 0), true, false);cv::dnn::Net net = cv::dnn::readNet("model/ODID_DNN.onnx");const int sz[] = { 1, 12, net_size / 2, net_size / 2 };cv::Mat input = cv::Mat(4, sz, blob.type());sliceAndConcat(blob, &input);net.setInput(input);auto t0 = cv::getTickCount();std::vector<cv::Mat> outs;net.forward(outs, getOutputNames(net));postprocess(cv_src, outs, class_names, net_size);auto t1 = cv::getTickCount();std::cout << "elapsed time: " << (t1 - t0) * 1000.0 / cv::getTickFrequency() << "ms" << std::endl;imshow("img", cv_src);cv::waitKey();}return 0; }

四、NCNN推理

NCNN是目前我用到過最好用，也是最容易白嫖的推理加速庫，特別是在移動端部署的時候，真的不能更好的了，在些萬分感激nihui大佬的無私貢獻。這里用的是ncnn編好的ncnn-20210525-windows-vs2019這個版本。
關于yolov5 ncnn推理可以看nihui大佬的知乎。

1.模型簡化
https://github.com/daquexian/onnx-simplifier
2 .onnx轉ncnn模型

onnx2ncnn yolov5s-sim.onnx yolov5s.param yolov5s.bin

onnx轉為 ncnn 模型，會輸出很多 Unsupported slice step！，這是focus模塊轉換的報錯.
Focus模塊在v5中是圖片進入backbone前，對圖片進行切片操作，具體操作是在一張圖片中每隔一個像素拿到一個值，類似于鄰近下采樣，這樣就拿到了四張圖片，四張圖片互補，長的差不多，但是沒有信息丟失，這樣一來，將W、H信息就集中到了通道空間，輸入通道擴充了4倍，即拼接起來的圖片相對于原先的RGB三通道模式變成了12個通道，最后將得到的新圖片再經過卷積操作，最終得到了沒有信息丟失情況下的二倍下采樣特征圖。以yolov5s為例，原始的640 × 640 × 3的圖像輸入Focus結構，采用切片操作，先變成320 × 320 × 12的特征圖，再經過一次卷積操作，最終變成320 × 320 × 64的特征圖。
yolov5 Focus模塊實現

class Focus(Layer):def __init__(self, filters, kernel_size, strides=1, padding='SAME'):super(Focus, self).__init__()self.conv = Conv(filters, kernel_size, strides, padding)def call(self, x):return self.conv(tf.concat([x[..., ::2, ::2, :],x[..., 1::2, ::2, :],x[..., ::2, 1::2, :],x[..., 1::2, 1::2, :]],axis=-1))

對應的模型結構：

Split splitncnn_input0 1 4 images images_splitncnn_0 images_splitncnn_1 images_splitncnn_2 images_splitncnn_3 Crop Slice_4 1 1 images_splitncnn_3 171 -23309=1,0 -23310=1,2147483647 -23311=1,1 Crop Slice_9 1 1 171 176 -23309=1,0 -23310=1,2147483647 -23311=1,2 Crop Slice_14 1 1 images_splitncnn_2 181 -23309=1,1 -23310=1,2147483647 -23311=1,1 Crop Slice_19 1 1 181 186 -23309=1,0 -23310=1,2147483647 -23311=1,2 Crop Slice_24 1 1 images_splitncnn_1 191 -23309=1,0 -23310=1,2147483647 -23311=1,1 Crop Slice_29 1 1 191 196 -23309=1,1 -23310=1,2147483647 -23311=1,2 Crop Slice_34 1 1 images_splitncnn_0 201 -23309=1,1 -23310=1,2147483647 -23311=1,1 Crop Slice_39 1 1 201 206 -23309=1,1 -23310=1,2147483647 -23311=1,2 Concat Concat_40 4 1 176 186 196 206 207 0=0

可視化：

Focus模塊的優點：
Focus的作用無非是使圖片在下采樣的過程中，不帶來信息丟失的情況下，將W、H的信息集中到通道上，再使用3 × 3的卷積對其進行特征提取，使得特征提取得更加的充分。

3 . 替換Focus模塊

更改.param文件
更改前：

Input images 0 1 images Split splitncnn_input0 1 4 images images_splitncnn_0 images_splitncnn_1 images_splitncnn_2 images_splitncnn_3 Crop Slice_4 1 1 images_splitncnn_3 171 -23309=1,0 -23310=1,2147483647 -23311=1,1 Crop Slice_9 1 1 171 176 -23309=1,0 -23310=1,2147483647 -23311=1,2 Crop Slice_14 1 1 images_splitncnn_2 181 -23309=1,1 -23310=1,2147483647 -23311=1,1 Crop Slice_19 1 1 181 186 -23309=1,0 -23310=1,2147483647 -23311=1,2 Crop Slice_24 1 1 images_splitncnn_1 191 -23309=1,0 -23310=1,2147483647 -23311=1,1 Crop Slice_29 1 1 191 196 -23309=1,1 -23310=1,2147483647 -23311=1,2 Crop Slice_34 1 1 images_splitncnn_0 201 -23309=1,1 -23310=1,2147483647 -23311=1,1 Crop Slice_39 1 1 201 206 -23309=1,1 -23310=1,2147483647 -23311=1,2 Concat Concat_40 4 1 176 186 196 206 207 0=0 Convolution Conv_41 1 1 207 208 0=32 1=3 11=3 2=1 12=1 3=1 13=1 4=1 14=1 15=1 16=1 5=1 6=3456

更改后：

Input images 0 1 images YoloV5Focus focus 1 1 images 207 Convolution Conv_41 1 1 207 208 0=32 1=3 11=3 2=1 12=1 3=1 13=1 4=1 14=1 15=1 16=1 5=1 6=3456

4.動態尺寸推理更改

靜態尺寸推理：按長邊縮放到 640xH 或 Wx640，padding 到 640x640 再檢測，如果 H/W 比較小，會在 padding 上浪費大量運算。
動態尺寸推理：按長邊縮放到 640xH 或 Wx640，padding 到 640xH2 或 W2x640 再檢測，其中 H2/W2 是 H/W 向上取32倍數，計算量少，速度更快。
yolov5支持動態尺寸推理，但這里Reshape 層把輸出grid數寫死了，不把這三個參數更改成-1的話，則檢測的時候會檢測不到目標或者檢測到滿圖像都是框。
更改前：

更改后：

5.更改部層數，改到跟當前層數一樣大小。

6.轉成FP16模型

ncnnoptimize yolov5s.param yolov5s.bin yolov5s-opt.param yolov5s-opt.bin 65536

6.yolov5s模型輸出
anchor（先驗框）的信息在 yolov5/models/yolov5s.yaml文件里，pytorch的后處理在 yolov5/models/yolo.py Detect類 forward函數，要對著改成c++代碼。
模型有3個輸出blob，分別對應于 stride 8/16/32 的輸出。
每個輸出shape的格式是WHC:

w=n+5，對應于bbox的dx,dy,dw,dh，bbox置信度，n種分類的置信度。
h=6400，對應于整個圖片里全部anchor的xy，這個1600是stride=8的情況，輸入640的圖片，寬高劃分為640/8=80塊，80x80即6400
c=3，對應于三種anchor。

7.NCNN推理代碼，動態注冊了YoloV5Focus層。

#include "YoloV5Detect.h"class YoloV5Focus : public ncnn::Layer { public:YoloV5Focus(){one_blob_only = true;}virtual int forward(const ncnn::Mat& bottom_blob, ncnn::Mat& top_blob, const ncnn::Option& opt) const{int w = bottom_blob.w;int h = bottom_blob.h;int channels = bottom_blob.c;int outw = w / 2;int outh = h / 2;int outc = channels * 4;top_blob.create(outw, outh, outc, 4u, 1, opt.blob_allocator);if (top_blob.empty())return -100;#pragma omp parallel for num_threads(opt.num_threads)for (int p = 0; p < outc; p++){const float* ptr = bottom_blob.channel(p % channels).row((p / channels) % 2) + ((p / channels) / 2);float* outptr = top_blob.channel(p);for (int i = 0; i < outh; i++){for (int j = 0; j < outw; j++){*outptr = *ptr;outptr += 1;ptr += 2;}ptr += w;}}return 0;} };DEFINE_LAYER_CREATOR(YoloV5Focus)int initYolov5Net(std::string& param_path, std::string& bin_path, ncnn::Net& yolov5_net,bool use_gpu) {bool has_gpu = false;yolov5_net.clear();//CPU相關設置(只實現了安卓端)/// 0 = all cores enabled(default)/// 1 = only little clusters enabled/// 2 = only big clusters enabled//ncnn::set_cpu_powersave(2);//ncnn::set_omp_num_threads(ncnn::get_big_cpu_count()); #if NCNN_VULKANncnn::create_gpu_instance();has_gpu = ncnn::get_gpu_count() > 0; #endifyolov5_net.opt.use_vulkan_compute = (use_gpu && has_gpu);yolov5_net.opt.use_bf16_storage = true;//動態注冊層yolov5_net.register_custom_layer("YoloV5Focus", YoloV5Focus_layer_creator);//讀取模型int rp = yolov5_net.load_param(param_path.c_str());int rb = yolov5_net.load_model(bin_path.c_str());if (rp < 0 || rb < 0){return -1;}return 0; }static inline float sigmoid(float x) {return static_cast<float>(1.f / (1.f + exp(-x))); }static void generateProposals(const ncnn::Mat& anchors, int stride, const ncnn::Mat& in_pad, const ncnn::Mat& feat_blob, float prob_threshold, std::vector<Object>& objects) {const int num_grid = feat_blob.h;int num_grid_x;int num_grid_y;if (in_pad.w > in_pad.h){num_grid_x = in_pad.w / stride;num_grid_y = num_grid / num_grid_x;}else{num_grid_y = in_pad.h / stride;num_grid_x = num_grid / num_grid_y;}const int num_class = feat_blob.w - 5;const int num_anchors = anchors.w / 2;for (int q = 0; q < num_anchors; q++){const float anchor_w = anchors[q * 2];const float anchor_h = anchors[q * 2 + 1];const ncnn::Mat feat = feat_blob.channel(q);for (int i = 0; i < num_grid_y; i++){for (int j = 0; j < num_grid_x; j++){const float* featptr = feat.row(i * num_grid_x + j);// find class index with max class scoreint class_index = 0;float class_score = -FLT_MAX;for (int k = 0; k < num_class; k++){float score = featptr[5 + k];if (score > class_score){class_index = k;class_score = score;}}float box_score = featptr[4];float confidence = sigmoid(box_score) * sigmoid(class_score);if (confidence >= prob_threshold){// yolov5/models/yolo.py Detect forward// y = x[i].sigmoid()// y[..., 0:2] = (y[..., 0:2] * 2. - 0.5 + self.grid[i].to(x[i].device)) * self.stride[i] # xy// y[..., 2:4] = (y[..., 2:4] * 2) ** 2 * self.anchor_grid[i] # whfloat dx = sigmoid(featptr[0]);float dy = sigmoid(featptr[1]);float dw = sigmoid(featptr[2]);float dh = sigmoid(featptr[3]);float pb_cx = (dx * 2.f - 0.5f + j) * stride;float pb_cy = (dy * 2.f - 0.5f + i) * stride;float pb_w = pow(dw * 2.f, 2) * anchor_w;float pb_h = pow(dh * 2.f, 2) * anchor_h;float x0 = pb_cx - pb_w * 0.5f;float y0 = pb_cy - pb_h * 0.5f;float x1 = pb_cx + pb_w * 0.5f;float y1 = pb_cy + pb_h * 0.5f;Object obj;obj.rect.x = x0;obj.rect.y = y0;obj.rect.width = x1 - x0;obj.rect.height = y1 - y0;obj.label = class_index;obj.prob = confidence;objects.push_back(obj);}}}} }static inline float intersectionArea(const Object& a, const Object& b) {cv::Rect_<float> inter = a.rect & b.rect;return inter.area(); }static void qsortDescentInplace(std::vector<Object>& faceobjects, int left, int right) {int i = left;int j = right;float p = faceobjects[(left + right) / 2].prob;while (i <= j){while (faceobjects[i].prob > p)i++;while (faceobjects[j].prob < p)j--;if (i <= j){// swapstd::swap(faceobjects[i], faceobjects[j]);i++;j--;}}#pragma omp parallel sections{ #pragma omp section{if (left < j) qsortDescentInplace(faceobjects, left, j);} #pragma omp section{if (i < right) qsortDescentInplace(faceobjects, i, right);}} }static void qsortDescentInplace(std::vector<Object>& faceobjects) {if (faceobjects.empty())return;qsortDescentInplace(faceobjects, 0, faceobjects.size() - 1); }static void nmsSortedBboxes(const std::vector<Object>& faceobjects, std::vector<int>& picked, float nms_threshold) {picked.clear();const int n = faceobjects.size();std::vector<float> areas(n);for (int i = 0; i < n; i++){areas[i] = faceobjects[i].rect.area();}for (int i = 0; i < n; i++){const Object& a = faceobjects[i];int keep = 1;for (int j = 0; j < (int)picked.size(); j++){const Object& b = faceobjects[picked[j]];// intersection over unionfloat inter_area = intersectionArea(a, b);float union_area = areas[i] + areas[picked[j]] - inter_area;// float IoU = inter_area / union_areaif (inter_area / union_area > nms_threshold)keep = 0;}if (keep){picked.push_back(i);}} }int targetDetection(cv::Mat& cv_src, ncnn::Net& yolov5_net, std::vector<Object>& objects, int target_size,float prob_threshold, float nms_threshold) {int w = cv_src.cols, h = cv_src.rows;float scale = 1.0f;if (w > h){scale = (float)target_size / (float)w;w = target_size;h = h * scale;}else{scale = (float)target_size / (float)h;h = target_size;w = w * scale;}ncnn::Mat ncnn_in = ncnn::Mat::from_pixels_resize(cv_src.data, ncnn::Mat::PIXEL_BGR2RGB, cv_src.cols, cv_src.rows, w, h);//邊緣擴展檢測的尺寸//源碼在 yolov5/utils/datasets.py letterbox方法int wpad = (w + 31) / 32 * 32 - w;int hpad = (h + 31) / 32 * 32 - h;ncnn::Mat in_pad;ncnn::copy_make_border(ncnn_in, in_pad, hpad / 2, hpad - hpad / 2, wpad / 2, wpad - wpad / 2, ncnn::BORDER_CONSTANT, 114.f);const float norm_vals[3] = { 1 / 255.f, 1 / 255.f, 1 / 255.f };in_pad.substract_mean_normalize(0, norm_vals);//創建一個提取器ncnn::Extractor ex = yolov5_net.create_extractor();ex.input("images", in_pad);std::vector<Object> proposals;//stride 8{ncnn::Mat out;ex.extract("750", out);ncnn::Mat anchors(6);anchors[0] = 10.f;anchors[1] = 13.f;anchors[2] = 16.f;anchors[3] = 30.f;anchors[4] = 33.f;anchors[5] = 23.f;std::vector<Object> objects8;generateProposals(anchors, 8, in_pad, out, prob_threshold, objects8);proposals.insert(proposals.end(), objects8.begin(), objects8.end());} stride 16{ncnn::Mat out;ex.extract("771", out);ncnn::Mat anchors(6);anchors[0] = 30.f;anchors[1] = 61.f;anchors[2] = 62.f;anchors[3] = 45.f;anchors[4] = 59.f;anchors[5] = 119.f;std::vector<Object> objects16;generateProposals(anchors, 16, in_pad, out, prob_threshold, objects16);proposals.insert(proposals.end(), objects16.begin(), objects16.end());}// stride 32{ncnn::Mat out;ex.extract("791", out);ncnn::Mat anchors(6);anchors[0] = 116.f;anchors[1] = 90.f;anchors[2] = 156.f;anchors[3] = 198.f;anchors[4] = 373.f;anchors[5] = 326.f;std::vector<Object> objects32;generateProposals(anchors, 32, in_pad, out, prob_threshold, objects32);proposals.insert(proposals.end(), objects32.begin(), objects32.end());}// sort all proposals by score from highest to lowestqsortDescentInplace(proposals);// apply nms with nms_thresholdstd::vector<int> picked;nmsSortedBboxes(proposals, picked, nms_threshold);int count = picked.size();objects.resize(count);for (int i = 0; i < count; i++){objects[i] = proposals[picked[i]];// adjust offset to original unpaddedfloat x0 = (objects[i].rect.x - (wpad / 2)) / scale;float y0 = (objects[i].rect.y - (hpad / 2)) / scale;float x1 = (objects[i].rect.x + objects[i].rect.width - (wpad / 2)) / scale;float y1 = (objects[i].rect.y + objects[i].rect.height - (hpad / 2)) / scale;// clipx0 = std::max(std::min(x0, (float)(cv_src.cols - 1)), 0.f);y0 = std::max(std::min(y0, (float)(cv_src.rows - 1)), 0.f);x1 = std::max(std::min(x1, (float)(cv_src.cols - 1)), 0.f);y1 = std::max(std::min(y1, (float)(cv_src.rows - 1)), 0.f);objects[i].rect.x = x0;objects[i].rect.y = y0;objects[i].rect.width = x1 - x0;objects[i].rect.height = y1 - y0;}return 0; }void drawObjects(const cv::Mat& cv_src, const std::vector<Object>& objects,std::vector<std::string> & class_names) {cv::Mat cv_detect = cv_src.clone();for (size_t i = 0; i < objects.size(); i++){const Object& obj = objects[i];std::cout << "Object label:" << obj.label << " Object prod:" << obj.prob<<" Object rect" << obj.rect << std::endl;cv::rectangle(cv_detect, obj.rect, cv::Scalar(255, 0, 0));std::string text = class_names[obj.label] + " " +std::to_string(int(obj.prob * 100)) +"%";int baseLine = 0;cv::Size label_size = cv::getTextSize(text, cv::FONT_HERSHEY_SIMPLEX, 0.5, 1, &baseLine);int x = obj.rect.x;int y = obj.rect.y - label_size.height - baseLine;if (y < 0)y = 0;if (x + label_size.width > cv_detect.cols)x = cv_detect.cols - label_size.width;cv::rectangle(cv_detect, cv::Rect(cv::Point(x, y), cv::Size(label_size.width, label_size.height + baseLine)),cv::Scalar(255, 255, 255), -1);cv::putText(cv_detect, text, cv::Point(x, y + label_size.height),cv::FONT_HERSHEY_SIMPLEX, 0.5, cv::Scalar(0, 0, 0));}cv::imshow("image", cv_detect);}int main(void) {std::string parma_path = "models/ODIDF16.param";std::string bin_parh = "models/ODIDF16.bin";ncnn::Net yolov5_net;initYolov5Net(parma_path,bin_parh,yolov5_net,true);std::vector<std::string> class_names{ "ida", "idb", "idback", "idhead" };std::string path = "images";std::vector<std::string> filenames;cv::glob(path, filenames, false);for (auto name : filenames){cv::Mat cv_src = cv::imread(name);if (cv_src.empty()){continue;}std::vector<Object> objects;double start = static_cast<double>(cv::getTickCount());targetDetection(cv_src, yolov5_net, objects);double time = ((double)cv::getTickCount() - start) / cv::getTickFrequency();std::cout << name <<"Detection time:" << time << "(second) " << std::endl;drawObjects(cv_src, objects, class_names);cv::waitKey();}return 0; }

五、編譯NCNN

1.依賴庫：

protobuf-3.4.0
下載地址：https://github.com/google/protobuf/archive/v3.4.0.zip
打開VS2017或者VS2019本機工具命令，切到源碼目錄

cd protobuf mkdir build cd build cmake -G"NMake Makefiles" -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=%cd%/install -Dprotobuf_BUILD_TESTS=OFF -Dprotobuf_MSVC_STATIC_RUNTIME=OFF ../cmake nmake nmake install

Vulkan
https://vulkan.lunarg.com/sdk/home
版本：VulkanSDK-1.2.141.2
直接點擊安裝，之后驗證是否安裝成功，運行C:\VulkanSDK\1.2.141.2\Bin\vkcube.exe，出現下面圖像代表安裝成功。
glfw
https://www.glfw.org/
把glfw-3.3.2.bin.WIN64復制到VulkanSDK\1.2.141.2\Third-Party
GLM
https://github.com/g-truc/glm/
把GLM復制到VulkanSDK\1.2.141.2\Third-Party
添加系統路徑

2.NCNN增加自定義層
在代碼里面注冊自定義層時，用ncnn2mem轉換模型之后在移動端推理時會報讀入模型錯誤的問題，ncnn2mem之后的模型是以.h方式全部讀入到內存，內存方式注冊自定義層的時候，要用 TYPEINDEX 枚舉，這里可參考ncnn的增加自定義層。之前用的ncnn庫都是下載編譯好的庫，要增加自定義則要git源碼進行重新編譯。
2.1 添加自己定義層。
git 源碼

git clone https://github.com/Tencent/ncnn.git cd ncnn git submodule update --init

在ncnn定義源碼添加.h文件：src/layer/YoloV5Focus.h
YoloV5Focus.h

#ifndef LAYER_YOLOOCUS_H #define LAYER_YOLOOCUS_H#include "layer.h" namespace ncnn {class YoloV5Focus :public Layer{public:YoloV5Focus();virtual int forward(const ncnn::Mat& bottom_blob, ncnn::Mat& top_blob, const ncnn::Option& opt) const;}; } #endif

在ncnn定義源碼添加.cpp文件：src/layer/YoloV5Focus.cpp
YoloV5Focus.cpp

#include "YoloV5Focus.h" namespace ncnn {YoloV5Focus::YoloV5Focus(){one_blob_only = true;//support_inplace = true;}int YoloV5Focus::forward(const ncnn::Mat& bottom_blob, ncnn::Mat& top_blob, const ncnn::Option& opt) const{int w = bottom_blob.w;int h = bottom_blob.h;int channels = bottom_blob.c;int outw = w / 2;int outh = h / 2;int outc = channels * 4;top_blob.create(outw, outh, outc, 4u, 1, opt.blob_allocator);if (top_blob.empty())return -100;#pragma omp parallel for num_threads(opt.num_threads)for (int p = 0; p < outc; p++){const float* ptr = bottom_blob.channel(p % channels).row((p / channels) % 2) + ((p / channels) / 2);float* outptr = top_blob.channel(p);for (int i = 0; i < outh; i++){for (int j = 0; j < outw; j++){*outptr = *ptr;outptr += 1;ptr += 2;}ptr += w;}}return 0;} }

修改 src/CMakeLists.txt 注冊 layer/YoloV5Focus

ncnn_add_layer(GroupNorm) ncnn_add_layer(LayerNorm) ncnn_add_layer(YoloV5Focus)

win下OP的名字是大小寫不分的，但在別的系統或者移動端要注意層名稱的大小寫問題。
編譯ncnn
打開VS2017或者VS2019本機工具命令，切到源碼目錄

mkdir build cd build cmake -G"NMake Makefiles" -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=%cd%/install -DProtobuf_INCLUDE_DIR=D:/LIB/protobuf/build/install/include -DProtobuf_LIBRARIES=D:/LIB/protobuf/build/install/lib/libprotobuf.lib -DProtobuf_PROTOC_EXECUTABLE=D:/LIB/protobuf/build/install/bin/protoc.exe -DNCNN_VULKAN=ON .. nmake nmake install

2.使用添加自己定義層的NCNN庫的話，上面的推理代碼就可以不用動態注冊層的那部分

class YoloV5Focus : public ncnn::Layer { public:YoloV5Focus(){one_blob_only = true;}virtual int forward(const ncnn::Mat& bottom_blob, ncnn::Mat& top_blob, const ncnn::Option& opt) const{int w = bottom_blob.w;int h = bottom_blob.h;int channels = bottom_blob.c;int outw = w / 2;int outh = h / 2;int outc = channels * 4;top_blob.create(outw, outh, outc, 4u, 1, opt.blob_allocator);if (top_blob.empty())return -100;#pragma omp parallel for num_threads(opt.num_threads)for (int p = 0; p < outc; p++){const float* ptr = bottom_blob.channel(p % channels).row((p / channels) % 2) + ((p / channels) / 2);float* outptr = top_blob.channel(p);for (int i = 0; i < outh; i++){for (int j = 0; j < outw; j++){*outptr = *ptr;outptr += 1;ptr += 2;}ptr += w;}}return 0;} };DEFINE_LAYER_CREATOR(YoloV5Focus)//動態注冊層 yolov5_net.register_custom_layer("YoloV5Focus", YoloV5Focus_layer_creator);

六、NCNN Int8量化模型

1.優化模型

./ncnnoptimize yolov5.param yolov5.bin yolov5-opt.param yolov5-opt.bin 0

2.生成校準表

./ncnn2table yolov5s-opt.param yolov5s-opt.bin imagelist.txt yolov5s-opt.table mean=[0,0,0] norm=[0.0039215,0.0039215,0.0039215] shape=[416,416,3] pixel=BGR thread=8 method=kl

3.int8量化模型

./ncnn2int8 yolov5s-opt.param yolov5s-opt.bin yolov5s-int8.param yolov5s-int8.bin yolov5s.table

4.int 8量化過的模型在移動端和一些邊緣設備上的速度有明顯的提升，但精度有少許下降。

《新程序員》：云原生和全面數字化實踐50位技術專家共同創作，文字、視頻、音頻交互閱讀

總結

以上是生活随笔為你收集整理的深度学习目标检测(YoloV5)项目——从0开始到项目落地部署的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： Android NDK开发——人脸检测与
下一篇： javascript实战项目——网页版贪