深度学习目标检测(YoloV5)项目——从0开始到项目落地部署
前言
訓練和開發環境是win10,顯卡RTX3080;cuda10.2,cudnn7.1;OpenCV4.5;yolov5用的是5s的模型,2020年8月13日的發布v3.0這個版本; ncnn版本是20210525;C++ IDE vs2019,Anaconda 3.5。
一、環境安裝
1.anaconda環境
- 創建環境
- 退出環境
查看已安裝的環境
conda info --env- 刪除環境
2.安裝依賴
git clone https://github.com/ultralytics/yolov5.gitcd yolov5pip install -r requirements.txt或者
git clone https://github.com/ultralytics/yolov5.git cd yolov5 conda install pytorch torchvision cudatoolkit=10.2 -c pytorch pip install cython matplotlib tqdm opencv-python tensorboard scipy pillow onnx pyyaml pandas seabornwin下盡量不要用cuda11,試了幾次都是要么找不到GPU,要么跑到一半崩了。
二、數據處理
1.數據標注用labelme,身份證的數據我從網上找了一些公開的模板數據,然后用對抗生成了一批數據進行標注,300張樣本左右,labelme標注出來的數據格式是xml。
2.在yolo/data 目錄下創建一個存放數據集的目錄,目錄下再分兩個目錄,JPEGImages存放原始圖像,Annotations存在放標簽文件。
3.數據標注用labelme標注成.xml,但yolo要的標簽格式是.txt,所以要把數據轉換過來。
- 數據生成訓練集與驗證集,在data/xxxx目錄下會 train.txt 和val.txt,輸出所有標注的類名,并在JPEGImages下生成與文件名對應的.txt文件。
執行命令:
-
輸出標注的類名樣例:如[‘ida’, ‘idb’]。
-
生成的.txt文件
類名 歸一化后的目標坐標點
- 數據處理代碼
- generate_txt.py
三、模型訓練
1.model/yolov5s.yaml,更改nc數目。
# parameters nc: 2 # 檢測總類別 depth_multiple: 0.33 # model depth multiple 網絡的深度系數 width_multiple: 0.50 # layer channel multiple 卷積核的系數# anchors 候選框,可以改成自己目標的尺寸,也可以增加候選框 anchors:- [10,13, 16,30, 33,23] # P3/8- [30,61, 62,45, 59,119] # P4/16- [116,90, 156,198, 373,326] # P5/32# YOLOv5 backbone backbone: #特征提取模塊# [from, number, module, args]# from - 輸入是什么,-1:上一層的輸出結果;# number - 該層的重復的次數,要乘以系數,小于1則等于1 源碼( n = max(round(n * gd), 1) if n > 1 else n)# module - 層的名字# args - 卷積核的個數[[-1, 1, Focus, [64, 3]], # 0-P1/2 # 64要乘以卷積核的個數 64*0.5 = 32個特征圖[-1, 1, Conv, [128, 3, 2]], # 1-P2/4[-1, 3, BottleneckCSP, [128]],[-1, 1, Conv, [256, 3, 2]], # 3-P3/8[-1, 9, BottleneckCSP, [256]],[-1, 1, Conv, [512, 3, 2]], # 5-P4/16[-1, 9, BottleneckCSP, [512]],[-1, 1, Conv, [1024, 3, 2]], # 7-P5/32[-1, 1, SPP, [1024, [5, 9, 13]]],[-1, 3, BottleneckCSP, [1024, False]], # 9]# YOLOv5 head head:[[-1, 1, Conv, [512, 1, 1]],[-1, 1, nn.Upsample, [None, 2, 'nearest']],[[-1, 6], 1, Concat, [1]], # cat backbone P4[-1, 3, BottleneckCSP, [512, False]], # 13[-1, 1, Conv, [256, 1, 1]],[-1, 1, nn.Upsample, [None, 2, 'nearest']],[[-1, 4], 1, Concat, [1]], # cat backbone P3[-1, 3, BottleneckCSP, [256, False]], # 17 (P3/8-small)[-1, 1, Conv, [256, 3, 2]],[[-1, 14], 1, Concat, [1]], # cat head P4[-1, 3, BottleneckCSP, [512, False]], # 20 (P4/16-medium)[-1, 1, Conv, [512, 3, 2]],[[-1, 10], 1, Concat, [1]], # cat head P5[-1, 3, BottleneckCSP, [1024, False]], # 23 (P5/32-large)[[17, 20, 23], 1, Detect, [nc, anchors]], # Detect(P3, P4, P5) [17,20,23] #17層、20層、23層;]2.在data目錄下添加一個xxx.yaml訓練數據配置文件。
# download command/URL (optional) download: bash data/scripts/get_voc.sh# 訓練集txt與驗證集txt路徑 train: data/xxx/train.txt val: data/xxx/val.txt# 總類別數 nc: 2# 類名 names: ['ida', 'idb']3.訓練參數
parser = argparse.ArgumentParser()parser.add_argument('--weights', type=str, default='yolov5s.pt', help='initial weights path') # 權重文件,是否在使用預訓練權重文件parser.add_argument('--cfg', type=str, default='', help='model.yaml path') # 網絡配置文件parser.add_argument('--data', type=str, default='data/coco128.yaml', help='data.yaml path') # 訓練數據集目錄parser.add_argument('--hyp', type=str, default='data/hyp.scratch.yaml', help='hyperparameters path') #超參數配置文件parser.add_argument('--epochs', type=int, default=300) # 訓練迭代次數parser.add_argument('--batch-size', type=int, default=32, help='total batch size for all GPUs') # batch-size大小parser.add_argument('--img-size', nargs='+', type=int, default=[640, 640], help='[train, test] image sizes') # 訓練圖像大小parser.add_argument('--rect', action='store_true', help='rectangular training') #矩形訓練parser.add_argument('--resume', nargs='?', const=True, default=False, help='resume most recent training') # 是否接著上一次的日志權重繼續訓練parser.add_argument('--nosave', action='store_true', help='only save final checkpoint') # 不保存parser.add_argument('--notest', action='store_true', help='only test final epoch') # 不測試parser.add_argument('--noautoanchor', action='store_true', help='disable autoanchor check')parser.add_argument('--evolve', action='store_true', help='evolve hyperparameters') #超參數范圍parser.add_argument('--bucket', type=str, default='', help='gsutil bucket')parser.add_argument('--cache-images', action='store_true', help='cache images for faster training') #是否緩存圖像parser.add_argument('--image-weights', action='store_true', help='use weighted image selection for training')parser.add_argument('--device', default='', help='cuda device, i.e. 0 or 0,1,2,3 or cpu') # 用GPU或者CPU進行訓練parser.add_argument('--multi-scale', action='store_true', help='vary img-size +/- 50%%') #是否多尺度訓練parser.add_argument('--single-cls', action='store_true', help='train as single-class dataset') # 是否一個類別parser.add_argument('--adam', action='store_true', help='use torch.optim.Adam() optimizer') # 優化器先擇parser.add_argument('--sync-bn', action='store_true', help='use SyncBatchNorm, only available in DDP mode')parser.add_argument('--local_rank', type=int, default=-1, help='DDP parameter, do not modify')parser.add_argument('--log-imgs', type=int, default=16, help='number of images for W&B logging, max 100')parser.add_argument('--workers', type=int, default=8, help='maximum number of dataloader workers') #win不能改,win上改不改都容易崩parser.add_argument('--project', default='runs/train', help='save to project/name')parser.add_argument('--name', default='exp', help='save to project/name')parser.add_argument('--exist-ok', action='store_true', help='existing project/name ok, do not increment')opt = parser.parse_args()4.訓練命令
- 單卡:
- 多卡:
5.測試模型
python test.py --weights runs/train/exp/weights/best.pt --data data/ODID.yaml --device 0 --verbose --weights: 訓練得到的模型 --data:數據配置文件.txt --device:選擇gpu進行評測 --verbose:是否打印每一類的評測指標OpenCV DNN C++ 推理
1.由于OpenCV DNN中的slice層不支持step為2,所以在轉換模型時需要修改代碼,修改的地方在models/common.py中Focus類
- 修改前:
- 修改后
2.轉換模型
python models/export.py --weights runs/exp/weights/best.pt # --weights: 訓練得到的模型運行后,onnx模型保存為了runs/exp/weights/best.onnx,這個模型就可以用OpenCV DNN進行推理。
3.DNN C++推理
#include <iostream> #include <string> #include <vector> #include <fstream> #include <sstream> #include <opencv2/opencv.hpp> #include <opencv2/dnn.hpp>void imshow(std::string name, const cv::Mat& cv_src) {cv::namedWindow(name, 0);int max_rows = 800;int max_cols = 800;if (cv_src.rows >= cv_src.cols && cv_src.rows > max_rows){cv::resizeWindow(name, cv::Size(cv_src.cols * max_rows / cv_src.rows, max_rows));}else if (cv_src.cols >= cv_src.rows && cv_src.cols > max_cols){cv::resizeWindow(name, cv::Size(max_cols, cv_src.rows * max_cols / cv_src.cols));}cv::imshow(name, cv_src); }inline float sigmoid(float x) {return 1.f / (1.f + exp(-x)); }void sliceAndConcat(cv::Mat& img, cv::Mat* input) {const float* srcData = img.ptr<float>();float* dstData = input->ptr<float>();using Vec12f = cv::Vec<float, 12>;for (int i = 0; i < input->size[2]; i++){for (int j = 0; j < input->size[3]; j++){for (int k = 0; k < 3; ++k){dstData[k * input->size[2] * input->size[3] + i * input->size[3] + j] =srcData[k * img.size[2] * img.size[3] + 2 * i * img.size[3] + 2 * j];}for (int k = 0; k < 3; ++k){dstData[(3 + k) * input->size[2] * input->size[3] + i * input->size[3] + j] =srcData[k * img.size[2] * img.size[3] + (2 * i + 1) * img.size[3] + 2 * j];}for (int k = 0; k < 3; ++k) {dstData[(6 + k) * input->size[2] * input->size[3] + i * input->size[3] + j] =srcData[k * img.size[2] * img.size[3] + 2 * i * img.size[3] + 2 * j + 1];}for (int k = 0; k < 3; ++k){dstData[(9 + k) * input->size[2] * input->size[3] + i * input->size[3] + j] =srcData[k * img.size[2] * img.size[3] + (2 * i + 1) * img.size[3] + 2 * j + 1];}}} }std::vector<cv::String> getOutputNames(const cv::dnn::Net& net) {static std::vector<cv::String> names;if (names.empty()){std::vector<int> outLayers = net.getUnconnectedOutLayers();std::vector<cv::String> layersNames = net.getLayerNames();names.resize(outLayers.size());for (size_t i = 0; i < outLayers.size(); i++){names[i] = layersNames[outLayers[i] - 1];}}return names; }void drawPred(int classId, float conf, int left, int top, int right, int bottom, cv::Mat& frame,const std::vector<std::string> &classes) {cv::rectangle(frame, cv::Point(left, top), cv::Point(right, bottom), cv::Scalar(0, 255, 0), 3);std::string label = cv::format("%.2f", conf);if (!classes.empty()) {CV_Assert(classId < (int)classes.size());label = classes[classId] + ": " + label;}int baseLine;cv::Size labelSize = cv::getTextSize(label, cv::FONT_HERSHEY_SIMPLEX, 0.5, 1, &baseLine);top = std::max(top, labelSize.height);cv::rectangle(frame, cv::Point(left, top - round(1.5 * labelSize.height)), cv::Point(left + round(1.5 * labelSize.width), top + baseLine), cv::Scalar(0, 255, 0), cv::FILLED);cv::putText(frame, label, cv::Point(left, top), cv::FONT_HERSHEY_SIMPLEX, 0.75, cv::Scalar(), 2); }void postprocess(cv::Mat& cv_src, std::vector<cv::Mat>& outs, const std::vector<std::string>& classes, int net_size) {float confThreshold = 0.4f;float nmsThreshold = 0.5f;std::vector<int> classIds;std::vector<float> confidences;std::vector<cv::Rect> boxes;int strides[] = { 8, 16, 32 };std::vector<std::vector<int> > anchors = {{ 10,13, 16,30, 33,23 },{ 30,61, 62,45, 59,119 },{ 116,90, 156,198, 373,326 }};for (size_t k = 0; k < outs.size(); k++){float* data = outs[k].ptr<float>();int stride = strides[k];int num_classes = outs[k].size[4] - 5;for (int i = 0; i < outs[k].size[2]; i++){for (int j = 0; j < outs[k].size[3]; j++){for (int a = 0; a < outs[k].size[1]; ++a){float* record = data + a * outs[k].size[2] * outs[k].size[3] * outs[k].size[4] +i * outs[k].size[3] * outs[k].size[4] + j * outs[k].size[4];float* cls_ptr = record + 5;for (int cls = 0; cls < num_classes; cls++) {float score = sigmoid(cls_ptr[cls]) * sigmoid(record[4]);if (score > confThreshold){float cx = (sigmoid(record[0]) * 2.f - 0.5f + (float)j) * (float)stride;float cy = (sigmoid(record[1]) * 2.f - 0.5f + (float)i) * (float)stride;float w = pow(sigmoid(record[2]) * 2.f, 2) * anchors[k][2 * a];float h = pow(sigmoid(record[3]) * 2.f, 2) * anchors[k][2 * a + 1];float x1 = std::max(0, std::min(cv_src.cols, int((cx - w / 2.f) * (float)cv_src.cols / (float)net_size)));float y1 = std::max(0, std::min(cv_src.rows, int((cy - h / 2.f) * (float)cv_src.rows / (float)net_size)));float x2 = std::max(0, std::min(cv_src.cols, int((cx + w / 2.f) * (float)cv_src.cols / (float)net_size)));float y2 = std::max(0, std::min(cv_src.rows, int((cy + h / 2.f) * (float)cv_src.rows / (float)net_size)));classIds.push_back(cls);confidences.push_back(score);boxes.push_back(cv::Rect(cv::Point(x1, y1), cv::Point(x2, y2)));}}}}}}std::vector<int> indices;cv::dnn::NMSBoxes(boxes, confidences, confThreshold, nmsThreshold, indices);for (size_t i = 0; i < indices.size(); i++) {int idx = indices[i];cv::Rect box = boxes[idx];drawPred(classIds[idx], confidences[idx], box.x, box.y,box.x + box.width, box.y + box.height, cv_src, classes);} }int main(int argc, char* argv[]) {std::string path = "images";std::vector<std::string> filenames;cv::glob(path, filenames, false);for (auto name : filenames){cv::Mat cv_src = cv::imread(name);if (cv_src.empty()){continue;}std::vector<std::string> class_names{ "ida","idb" };int net_size = 640;cv::Mat blob = cv::dnn::blobFromImage(cv_src, 1.0 / 255, cv::Size(net_size, net_size),cv::Scalar(0, 0, 0), true, false);cv::dnn::Net net = cv::dnn::readNet("model/ODID_DNN.onnx");const int sz[] = { 1, 12, net_size / 2, net_size / 2 };cv::Mat input = cv::Mat(4, sz, blob.type());sliceAndConcat(blob, &input);net.setInput(input);auto t0 = cv::getTickCount();std::vector<cv::Mat> outs;net.forward(outs, getOutputNames(net));postprocess(cv_src, outs, class_names, net_size);auto t1 = cv::getTickCount();std::cout << "elapsed time: " << (t1 - t0) * 1000.0 / cv::getTickFrequency() << "ms" << std::endl;imshow("img", cv_src);cv::waitKey();}return 0; }四、NCNN推理
NCNN是目前我用到過最好用,也是最容易白嫖的推理加速庫,特別是在移動端部署的時候,真的不能更好的了,在些萬分感激nihui大佬的無私貢獻。這里用的是ncnn編好的ncnn-20210525-windows-vs2019這個版本。
關于yolov5 ncnn推理可以看nihui大佬的知乎。
1.模型簡化
https://github.com/daquexian/onnx-simplifier
2 .onnx轉ncnn模型
-
onnx轉為 ncnn 模型,會輸出很多 Unsupported slice step!,這是focus模塊轉換的報錯.
-
Focus模塊在v5中是圖片進入backbone前,對圖片進行切片操作,具體操作是在一張圖片中每隔一個像素拿到一個值,類似于鄰近下采樣,這樣就拿到了四張圖片,四張圖片互補,長的差不多,但是沒有信息丟失,這樣一來,將W、H信息就集中到了通道空間,輸入通道擴充了4倍,即拼接起來的圖片相對于原先的RGB三通道模式變成了12個通道,最后將得到的新圖片再經過卷積操作,最終得到了沒有信息丟失情況下的二倍下采樣特征圖。以yolov5s為例,原始的640 × 640 × 3的圖像輸入Focus結構,采用切片操作,先變成320 × 320 × 12的特征圖,再經過一次卷積操作,最終變成320 × 320 × 64的特征圖。
-
yolov5 Focus模塊實現
對應的模型結構:
Split splitncnn_input0 1 4 images images_splitncnn_0 images_splitncnn_1 images_splitncnn_2 images_splitncnn_3 Crop Slice_4 1 1 images_splitncnn_3 171 -23309=1,0 -23310=1,2147483647 -23311=1,1 Crop Slice_9 1 1 171 176 -23309=1,0 -23310=1,2147483647 -23311=1,2 Crop Slice_14 1 1 images_splitncnn_2 181 -23309=1,1 -23310=1,2147483647 -23311=1,1 Crop Slice_19 1 1 181 186 -23309=1,0 -23310=1,2147483647 -23311=1,2 Crop Slice_24 1 1 images_splitncnn_1 191 -23309=1,0 -23310=1,2147483647 -23311=1,1 Crop Slice_29 1 1 191 196 -23309=1,1 -23310=1,2147483647 -23311=1,2 Crop Slice_34 1 1 images_splitncnn_0 201 -23309=1,1 -23310=1,2147483647 -23311=1,1 Crop Slice_39 1 1 201 206 -23309=1,1 -23310=1,2147483647 -23311=1,2 Concat Concat_40 4 1 176 186 196 206 207 0=0可視化:
- Focus模塊的優點:
Focus的作用無非是使圖片在下采樣的過程中,不帶來信息丟失的情況下,將W、H的信息集中到通道上,再使用3 × 3的卷積對其進行特征提取,使得特征提取得更加的充分。
3 . 替換Focus模塊
- 更改.param文件
更改前:
更改后:
Input images 0 1 images YoloV5Focus focus 1 1 images 207 Convolution Conv_41 1 1 207 208 0=32 1=3 11=3 2=1 12=1 3=1 13=1 4=1 14=1 15=1 16=1 5=1 6=34564.動態尺寸推理更改
- 靜態尺寸推理:按長邊縮放到 640xH 或 Wx640,padding 到 640x640 再檢測,如果 H/W 比較小,會在 padding 上浪費大量運算。
- 動態尺寸推理:按長邊縮放到 640xH 或 Wx640,padding 到 640xH2 或 W2x640 再檢測,其中 H2/W2 是 H/W 向上取32倍數,計算量少,速度更快。
- yolov5支持動態尺寸推理, 但這里Reshape 層把輸出grid數寫死了,不把這三個參數更改成-1的話,則檢測的時候會檢測不到目標或者檢測到滿圖像都是框。
更改前:
更改后:
5.更改部層數,改到跟當前層數一樣大小。
6.轉成FP16模型
6.yolov5s模型輸出
anchor(先驗框)的信息在 yolov5/models/yolov5s.yaml文件里,pytorch的后處理在 yolov5/models/yolo.py Detect類 forward函數,要對著改成c++代碼。
模型有3個輸出blob,分別對應于 stride 8/16/32 的輸出。
每個輸出shape的格式是WHC:
- w=n+5,對應于bbox的dx,dy,dw,dh,bbox置信度,n種分類的置信度。
- h=6400,對應于整個圖片里全部anchor的xy,這個1600是stride=8的情況,輸入640的圖片,寬高劃分為640/8=80塊,80x80即6400
- c=3,對應于三種anchor。
7.NCNN推理代碼,動態注冊了YoloV5Focus層。
#include "YoloV5Detect.h"class YoloV5Focus : public ncnn::Layer { public:YoloV5Focus(){one_blob_only = true;}virtual int forward(const ncnn::Mat& bottom_blob, ncnn::Mat& top_blob, const ncnn::Option& opt) const{int w = bottom_blob.w;int h = bottom_blob.h;int channels = bottom_blob.c;int outw = w / 2;int outh = h / 2;int outc = channels * 4;top_blob.create(outw, outh, outc, 4u, 1, opt.blob_allocator);if (top_blob.empty())return -100;#pragma omp parallel for num_threads(opt.num_threads)for (int p = 0; p < outc; p++){const float* ptr = bottom_blob.channel(p % channels).row((p / channels) % 2) + ((p / channels) / 2);float* outptr = top_blob.channel(p);for (int i = 0; i < outh; i++){for (int j = 0; j < outw; j++){*outptr = *ptr;outptr += 1;ptr += 2;}ptr += w;}}return 0;} };DEFINE_LAYER_CREATOR(YoloV5Focus)int initYolov5Net(std::string& param_path, std::string& bin_path, ncnn::Net& yolov5_net,bool use_gpu) {bool has_gpu = false;yolov5_net.clear();//CPU相關設置(只實現了安卓端)/// 0 = all cores enabled(default)/// 1 = only little clusters enabled/// 2 = only big clusters enabled//ncnn::set_cpu_powersave(2);//ncnn::set_omp_num_threads(ncnn::get_big_cpu_count()); #if NCNN_VULKANncnn::create_gpu_instance();has_gpu = ncnn::get_gpu_count() > 0; #endifyolov5_net.opt.use_vulkan_compute = (use_gpu && has_gpu);yolov5_net.opt.use_bf16_storage = true;//動態注冊層yolov5_net.register_custom_layer("YoloV5Focus", YoloV5Focus_layer_creator);//讀取模型int rp = yolov5_net.load_param(param_path.c_str());int rb = yolov5_net.load_model(bin_path.c_str());if (rp < 0 || rb < 0){return -1;}return 0; }static inline float sigmoid(float x) {return static_cast<float>(1.f / (1.f + exp(-x))); }static void generateProposals(const ncnn::Mat& anchors, int stride, const ncnn::Mat& in_pad, const ncnn::Mat& feat_blob, float prob_threshold, std::vector<Object>& objects) {const int num_grid = feat_blob.h;int num_grid_x;int num_grid_y;if (in_pad.w > in_pad.h){num_grid_x = in_pad.w / stride;num_grid_y = num_grid / num_grid_x;}else{num_grid_y = in_pad.h / stride;num_grid_x = num_grid / num_grid_y;}const int num_class = feat_blob.w - 5;const int num_anchors = anchors.w / 2;for (int q = 0; q < num_anchors; q++){const float anchor_w = anchors[q * 2];const float anchor_h = anchors[q * 2 + 1];const ncnn::Mat feat = feat_blob.channel(q);for (int i = 0; i < num_grid_y; i++){for (int j = 0; j < num_grid_x; j++){const float* featptr = feat.row(i * num_grid_x + j);// find class index with max class scoreint class_index = 0;float class_score = -FLT_MAX;for (int k = 0; k < num_class; k++){float score = featptr[5 + k];if (score > class_score){class_index = k;class_score = score;}}float box_score = featptr[4];float confidence = sigmoid(box_score) * sigmoid(class_score);if (confidence >= prob_threshold){// yolov5/models/yolo.py Detect forward// y = x[i].sigmoid()// y[..., 0:2] = (y[..., 0:2] * 2. - 0.5 + self.grid[i].to(x[i].device)) * self.stride[i] # xy// y[..., 2:4] = (y[..., 2:4] * 2) ** 2 * self.anchor_grid[i] # whfloat dx = sigmoid(featptr[0]);float dy = sigmoid(featptr[1]);float dw = sigmoid(featptr[2]);float dh = sigmoid(featptr[3]);float pb_cx = (dx * 2.f - 0.5f + j) * stride;float pb_cy = (dy * 2.f - 0.5f + i) * stride;float pb_w = pow(dw * 2.f, 2) * anchor_w;float pb_h = pow(dh * 2.f, 2) * anchor_h;float x0 = pb_cx - pb_w * 0.5f;float y0 = pb_cy - pb_h * 0.5f;float x1 = pb_cx + pb_w * 0.5f;float y1 = pb_cy + pb_h * 0.5f;Object obj;obj.rect.x = x0;obj.rect.y = y0;obj.rect.width = x1 - x0;obj.rect.height = y1 - y0;obj.label = class_index;obj.prob = confidence;objects.push_back(obj);}}}} }static inline float intersectionArea(const Object& a, const Object& b) {cv::Rect_<float> inter = a.rect & b.rect;return inter.area(); }static void qsortDescentInplace(std::vector<Object>& faceobjects, int left, int right) {int i = left;int j = right;float p = faceobjects[(left + right) / 2].prob;while (i <= j){while (faceobjects[i].prob > p)i++;while (faceobjects[j].prob < p)j--;if (i <= j){// swapstd::swap(faceobjects[i], faceobjects[j]);i++;j--;}}#pragma omp parallel sections{ #pragma omp section{if (left < j) qsortDescentInplace(faceobjects, left, j);} #pragma omp section{if (i < right) qsortDescentInplace(faceobjects, i, right);}} }static void qsortDescentInplace(std::vector<Object>& faceobjects) {if (faceobjects.empty())return;qsortDescentInplace(faceobjects, 0, faceobjects.size() - 1); }static void nmsSortedBboxes(const std::vector<Object>& faceobjects, std::vector<int>& picked, float nms_threshold) {picked.clear();const int n = faceobjects.size();std::vector<float> areas(n);for (int i = 0; i < n; i++){areas[i] = faceobjects[i].rect.area();}for (int i = 0; i < n; i++){const Object& a = faceobjects[i];int keep = 1;for (int j = 0; j < (int)picked.size(); j++){const Object& b = faceobjects[picked[j]];// intersection over unionfloat inter_area = intersectionArea(a, b);float union_area = areas[i] + areas[picked[j]] - inter_area;// float IoU = inter_area / union_areaif (inter_area / union_area > nms_threshold)keep = 0;}if (keep){picked.push_back(i);}} }int targetDetection(cv::Mat& cv_src, ncnn::Net& yolov5_net, std::vector<Object>& objects, int target_size,float prob_threshold, float nms_threshold) {int w = cv_src.cols, h = cv_src.rows;float scale = 1.0f;if (w > h){scale = (float)target_size / (float)w;w = target_size;h = h * scale;}else{scale = (float)target_size / (float)h;h = target_size;w = w * scale;}ncnn::Mat ncnn_in = ncnn::Mat::from_pixels_resize(cv_src.data, ncnn::Mat::PIXEL_BGR2RGB, cv_src.cols, cv_src.rows, w, h);//邊緣擴展檢測的尺寸//源碼在 yolov5/utils/datasets.py letterbox方法int wpad = (w + 31) / 32 * 32 - w;int hpad = (h + 31) / 32 * 32 - h;ncnn::Mat in_pad;ncnn::copy_make_border(ncnn_in, in_pad, hpad / 2, hpad - hpad / 2, wpad / 2, wpad - wpad / 2, ncnn::BORDER_CONSTANT, 114.f);const float norm_vals[3] = { 1 / 255.f, 1 / 255.f, 1 / 255.f };in_pad.substract_mean_normalize(0, norm_vals);//創建一個提取器ncnn::Extractor ex = yolov5_net.create_extractor();ex.input("images", in_pad);std::vector<Object> proposals;//stride 8{ncnn::Mat out;ex.extract("750", out);ncnn::Mat anchors(6);anchors[0] = 10.f;anchors[1] = 13.f;anchors[2] = 16.f;anchors[3] = 30.f;anchors[4] = 33.f;anchors[5] = 23.f;std::vector<Object> objects8;generateProposals(anchors, 8, in_pad, out, prob_threshold, objects8);proposals.insert(proposals.end(), objects8.begin(), objects8.end());} stride 16{ncnn::Mat out;ex.extract("771", out);ncnn::Mat anchors(6);anchors[0] = 30.f;anchors[1] = 61.f;anchors[2] = 62.f;anchors[3] = 45.f;anchors[4] = 59.f;anchors[5] = 119.f;std::vector<Object> objects16;generateProposals(anchors, 16, in_pad, out, prob_threshold, objects16);proposals.insert(proposals.end(), objects16.begin(), objects16.end());}// stride 32{ncnn::Mat out;ex.extract("791", out);ncnn::Mat anchors(6);anchors[0] = 116.f;anchors[1] = 90.f;anchors[2] = 156.f;anchors[3] = 198.f;anchors[4] = 373.f;anchors[5] = 326.f;std::vector<Object> objects32;generateProposals(anchors, 32, in_pad, out, prob_threshold, objects32);proposals.insert(proposals.end(), objects32.begin(), objects32.end());}// sort all proposals by score from highest to lowestqsortDescentInplace(proposals);// apply nms with nms_thresholdstd::vector<int> picked;nmsSortedBboxes(proposals, picked, nms_threshold);int count = picked.size();objects.resize(count);for (int i = 0; i < count; i++){objects[i] = proposals[picked[i]];// adjust offset to original unpaddedfloat x0 = (objects[i].rect.x - (wpad / 2)) / scale;float y0 = (objects[i].rect.y - (hpad / 2)) / scale;float x1 = (objects[i].rect.x + objects[i].rect.width - (wpad / 2)) / scale;float y1 = (objects[i].rect.y + objects[i].rect.height - (hpad / 2)) / scale;// clipx0 = std::max(std::min(x0, (float)(cv_src.cols - 1)), 0.f);y0 = std::max(std::min(y0, (float)(cv_src.rows - 1)), 0.f);x1 = std::max(std::min(x1, (float)(cv_src.cols - 1)), 0.f);y1 = std::max(std::min(y1, (float)(cv_src.rows - 1)), 0.f);objects[i].rect.x = x0;objects[i].rect.y = y0;objects[i].rect.width = x1 - x0;objects[i].rect.height = y1 - y0;}return 0; }void drawObjects(const cv::Mat& cv_src, const std::vector<Object>& objects,std::vector<std::string> & class_names) {cv::Mat cv_detect = cv_src.clone();for (size_t i = 0; i < objects.size(); i++){const Object& obj = objects[i];std::cout << "Object label:" << obj.label << " Object prod:" << obj.prob<<" Object rect" << obj.rect << std::endl;cv::rectangle(cv_detect, obj.rect, cv::Scalar(255, 0, 0));std::string text = class_names[obj.label] + " " +std::to_string(int(obj.prob * 100)) +"%";int baseLine = 0;cv::Size label_size = cv::getTextSize(text, cv::FONT_HERSHEY_SIMPLEX, 0.5, 1, &baseLine);int x = obj.rect.x;int y = obj.rect.y - label_size.height - baseLine;if (y < 0)y = 0;if (x + label_size.width > cv_detect.cols)x = cv_detect.cols - label_size.width;cv::rectangle(cv_detect, cv::Rect(cv::Point(x, y), cv::Size(label_size.width, label_size.height + baseLine)),cv::Scalar(255, 255, 255), -1);cv::putText(cv_detect, text, cv::Point(x, y + label_size.height),cv::FONT_HERSHEY_SIMPLEX, 0.5, cv::Scalar(0, 0, 0));}cv::imshow("image", cv_detect);}int main(void) {std::string parma_path = "models/ODIDF16.param";std::string bin_parh = "models/ODIDF16.bin";ncnn::Net yolov5_net;initYolov5Net(parma_path,bin_parh,yolov5_net,true);std::vector<std::string> class_names{ "ida", "idb", "idback", "idhead" };std::string path = "images";std::vector<std::string> filenames;cv::glob(path, filenames, false);for (auto name : filenames){cv::Mat cv_src = cv::imread(name);if (cv_src.empty()){continue;}std::vector<Object> objects;double start = static_cast<double>(cv::getTickCount());targetDetection(cv_src, yolov5_net, objects);double time = ((double)cv::getTickCount() - start) / cv::getTickFrequency();std::cout << name <<"Detection time:" << time << "(second) " << std::endl;drawObjects(cv_src, objects, class_names);cv::waitKey();}return 0; }五、 編譯NCNN
1.依賴庫:
- protobuf-3.4.0
下載地址:https://github.com/google/protobuf/archive/v3.4.0.zip
打開VS2017或者VS2019本機工具命令,切到源碼目錄
-
Vulkan
https://vulkan.lunarg.com/sdk/home
版本:VulkanSDK-1.2.141.2
直接點擊安裝,之后驗證是否安裝成功,運行C:\VulkanSDK\1.2.141.2\Bin\vkcube.exe,出現下面圖像代表安裝成功。
-
glfw
https://www.glfw.org/
把glfw-3.3.2.bin.WIN64復制到VulkanSDK\1.2.141.2\Third-Party -
GLM
https://github.com/g-truc/glm/
把GLM復制到VulkanSDK\1.2.141.2\Third-Party -
添加系統路徑
2.NCNN增加自定義層
在代碼里面注冊自定義層時,用ncnn2mem轉換模型之后在移動端推理時會報讀入模型錯誤的問題,ncnn2mem之后的模型是以.h方式全部讀入到內存,內存方式注冊自定義層的時候,要用 TYPEINDEX 枚舉,這里可參考ncnn的增加自定義層。之前用的ncnn庫都是下載編譯好的庫,要增加自定義則要git源碼進行重新編譯。
2.1 添加自己定義層。 -
git 源碼
- 在ncnn定義源碼添加.h文件:src/layer/YoloV5Focus.h
YoloV5Focus.h
- 在ncnn定義源碼添加.cpp文件:src/layer/YoloV5Focus.cpp
YoloV5Focus.cpp
- 修改 src/CMakeLists.txt 注冊 layer/YoloV5Focus
- win下OP的名字是大小寫不分的,但在別的系統或者移動端要注意層名稱的大小寫問題。
- 編譯ncnn
打開VS2017或者VS2019本機工具命令,切到源碼目錄
2.使用添加自己定義層的NCNN庫的話,上面的推理代碼就可以不用動態注冊層的那部分
class YoloV5Focus : public ncnn::Layer { public:YoloV5Focus(){one_blob_only = true;}virtual int forward(const ncnn::Mat& bottom_blob, ncnn::Mat& top_blob, const ncnn::Option& opt) const{int w = bottom_blob.w;int h = bottom_blob.h;int channels = bottom_blob.c;int outw = w / 2;int outh = h / 2;int outc = channels * 4;top_blob.create(outw, outh, outc, 4u, 1, opt.blob_allocator);if (top_blob.empty())return -100;#pragma omp parallel for num_threads(opt.num_threads)for (int p = 0; p < outc; p++){const float* ptr = bottom_blob.channel(p % channels).row((p / channels) % 2) + ((p / channels) / 2);float* outptr = top_blob.channel(p);for (int i = 0; i < outh; i++){for (int j = 0; j < outw; j++){*outptr = *ptr;outptr += 1;ptr += 2;}ptr += w;}}return 0;} };DEFINE_LAYER_CREATOR(YoloV5Focus)//動態注冊層 yolov5_net.register_custom_layer("YoloV5Focus", YoloV5Focus_layer_creator);六、NCNN Int8量化模型
1.優化模型
./ncnnoptimize yolov5.param yolov5.bin yolov5-opt.param yolov5-opt.bin 02.生成校準表
./ncnn2table yolov5s-opt.param yolov5s-opt.bin imagelist.txt yolov5s-opt.table mean=[0,0,0] norm=[0.0039215,0.0039215,0.0039215] shape=[416,416,3] pixel=BGR thread=8 method=kl3.int8量化模型
./ncnn2int8 yolov5s-opt.param yolov5s-opt.bin yolov5s-int8.param yolov5s-int8.bin yolov5s.table4.int 8量化過的模型在移動端和一些邊緣設備上的速度有明顯的提升,但精度有少許下降。
《新程序員》:云原生和全面數字化實踐50位技術專家共同創作,文字、視頻、音頻交互閱讀總結
以上是生活随笔為你收集整理的深度学习目标检测(YoloV5)项目——从0开始到项目落地部署的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: Android NDK开发——人脸检测与
- 下一篇: javascript实战项目——网页版贪