當(dāng)前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

TensorRT实现RetinaFace推理加速（一）

發(fā)布時(shí)間：2024/3/26 编程问答 29 豆豆

生活随笔收集整理的這篇文章主要介紹了 TensorRT实现RetinaFace推理加速（一）小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

一、參考資料

tensorrtx/retinaface
TensorRT實(shí)現(xiàn)yolov5推理加速（一）
TensorRT實(shí)現(xiàn)yolov5推理加速（二）

二、實(shí)驗(yàn)環(huán)境

##系統(tǒng)環(huán)境

Environment Operating System + Version: Ubuntu + 16.04 TensorRT Version: 7.1.3.4 GPU Type: GeForce GTX1650,4GB Nvidia Driver Version: 470.63.01 CUDA Version: 10.2.300 CUDNN Version: 7.6.5 Python Version (if applicable): 3.7.3 Anaconda Version：4.10.3 gcc：7.5.0 g++：7.5.0

tensorRT-yolov5.yaml

name: tensorRT-yolov5 channels:- <unknown>- http://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main- http://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/r- http://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/msys2 dependencies:- _libgcc_mutex=0.1=main- _openmp_mutex=4.5=1_gnu- blas=1.0=mkl- bzip2=1.0.8=h7b6447c_0- ca-certificates=2021.7.5=h06a4308_1- certifi=2021.5.30=py37h06a4308_0- cudatoolkit=10.2.89=hfd86e86_1- ffmpeg=4.2.2=h20bf706_0- freetype=2.10.4=h5ab3b9f_0- gmp=6.2.1=h2531618_2- gnutls=3.6.15=he1e5248_0- jpeg=9b=h024ee3a_2- lame=3.100=h7b6447c_0- lcms2=2.12=h3be6417_0- libedit=3.1.20210714=h7f8727e_0- libffi=3.2.1=hf484d3e_1007- libgcc-ng=9.3.0=h5101ec6_17- libgomp=9.3.0=h5101ec6_17- libidn2=2.3.2=h7f8727e_0- libopus=1.3.1=h7b6447c_0- libpng=1.6.37=hbc83047_0- libstdcxx-ng=9.3.0=hd4cf53a_17- libtasn1=4.16.0=h27cfd23_0- libtiff=4.2.0=h85742a9_0- libunistring=0.9.10=h27cfd23_0- libuv=1.40.0=h7b6447c_0- libvpx=1.7.0=h439df22_0- libwebp-base=1.2.0=h27cfd23_0- lz4-c=1.9.3=h295c915_1- mkl_fft=1.3.0=py37h42c9631_2- mkl_random=1.2.2=py37h51133e4_0- ncurses=6.2=he6710b0_1- nettle=3.7.3=hbbd107a_1- ninja=1.10.2=hff7bd54_1- numpy-base=1.20.3=py37h74d4b33_0- openh264=2.1.0=hd408876_0- openjpeg=2.4.0=h3ad879b_0- openssl=1.1.1l=h7f8727e_0- pip=21.2.2=py37h06a4308_0- python=3.7.3=h0371630_0- pytorch=1.8.0=py3.7_cuda10.2_cudnn7.6.5_0- readline=7.0=h7b6447c_5- setuptools=52.0.0=py37h06a4308_0- six=1.16.0=pyhd3eb1b0_0- sqlite=3.33.0=h62c20be_0- tk=8.6.10=hbc83047_0- torchvision=0.9.0=py37_cu102- typing_extensions=3.10.0.0=pyh06a4308_0- wheel=0.37.0=pyhd3eb1b0_0- x264=1!157.20191217=h7b6447c_0- xz=5.2.5=h7b6447c_0- zlib=1.2.11=h7b6447c_3- zstd=1.4.9=haebb681_0- pip:- appdirs==1.4.4- charset-normalizer==2.0.4- cycler==0.10.0- dpcpp-cpp-rt==2021.3.0- flatbuffers==2.0- graphsurgeon==0.4.5- idna==3.2- intel-cmplr-lib-rt==2021.3.0- intel-cmplr-lic-rt==2021.3.0- intel-opencl-rt==2021.3.0- intel-openmp==2021.3.0- kiwisolver==1.3.1- mako==1.1.5- markupsafe==2.0.1- matplotlib==3.4.3- mkl==2021.3.0- mkl-fft==1.3.0- mkl-service==2.4.0- netron==5.1.6- numpy==1.21.2- olefile==0.46- onnx==1.10.1- onnx-simplifier==0.3.6- onnxoptimizer==0.2.6- onnxruntime==1.8.1- opencv-python==4.5.3.56- pandas==1.3.2- pillow==8.3.2- protobuf==3.17.3- pycuda==2021.1- pyparsing==2.4.7- python-dateutil==2.8.2- pytools==2021.2.8- pytz==2021.1- pyyaml==5.4.1- requests==2.26.0- scipy==1.7.1- seaborn==0.11.2- tbb==2021.3.0- tensorrt==7.1.3.4- torchsummary==1.5.1- tqdm==4.62.2- typing-extensions==3.10.0.2- uff==0.6.9- urllib3==1.26.6 prefix: /home/yichao/miniconda3/envs/tensorRT-yolov5

requirements-gpu.txt

appdirs==1.4.4 certifi==2021.5.30 charset-normalizer==2.0.4 cycler==0.10.0 dpcpp-cpp-rt==2021.3.0 flatbuffers==2.0 graphsurgeon @ file:///home/yichao/360Downloads/TensorRT-7.1.3.4/graphsurgeon/graphsurgeon-0.4.5-py2.py3-none-any.whl idna==3.2 intel-cmplr-lib-rt==2021.3.0 intel-cmplr-lic-rt==2021.3.0 intel-opencl-rt==2021.3.0 intel-openmp==2021.3.0 kiwisolver==1.3.1 Mako==1.1.5 MarkupSafe==2.0.1 matplotlib==3.4.3 mkl==2021.3.0 mkl-fft==1.3.0 mkl-random @ file:///tmp/build/80754af9/mkl_random_1626179032232/work mkl-service==2.4.0 netron==5.1.6 numpy==1.21.2 olefile==0.46 onnx==1.10.1 onnx-simplifier==0.3.6 onnxoptimizer==0.2.6 onnxruntime==1.8.1 opencv-python==4.5.3.56 pandas==1.3.2 Pillow==8.3.2 protobuf==3.17.3 pycuda==2021.1 pyparsing==2.4.7 python-dateutil==2.8.2 pytools==2021.2.8 pytz==2021.1 PyYAML==5.4.1 requests==2.26.0 scipy==1.7.1 seaborn==0.11.2 six @ file:///tmp/build/80754af9/six_1623709665295/work tbb==2021.3.0 tensorrt @ file:///home/yichao/360Downloads/TensorRT-7.1.3.4/python/tensorrt-7.1.3.4-cp37-none-linux_x86_64.whl torch==1.8.0 torchsummary==1.5.1 torchvision==0.9.0 tqdm==4.62.2 typing-extensions==3.10.0.2 uff @ file:///home/yichao/360Downloads/TensorRT-7.1.3.4/uff/uff-0.6.9-py2.py3-none-any.whl urllib3==1.26.6

三、重要說明

3.1 配置文件

Input shape INPUT_H, INPUT_W defined in decode.h
INT8/FP16/FP32 can be selected by the macro USE_FP16 or USE_INT8 or USE_FP32 in retina_r50.cpp
GPU id can be selected by the macro DEVICE in retina_r50.cpp
Batchsize can be selected by the macro BATCHSIZE in retina_r50.cpp

3.2 預(yù)訓(xùn)練模型下載

face-recognition-models

face-detection-models

face-alignment-models

face-attribute-models

四、關(guān)鍵步驟

以FP16為例

4.1 pytorch預(yù)訓(xùn)練模型生成wts

4.1.1 下載github代碼倉庫

git clone https://github.com/wang-xinyu/Pytorch_Retinaface.git // download its weights 'Resnet50_Final.pth', put it in Pytorch_Retinaface/weights

4.1.2 下載預(yù)訓(xùn)練模型

cd Pytorch_Retinaface python detect.py --save_model

4.1.3 生成wts

python genwts.py // a file 'retinaface.wts' will be generated.

4.2 tensorrtx準(zhǔn)備工作

git clone https://github.com/wang-xinyu/tensorrtx.git cd tensorrtx/retinaface // put retinaface.wts here mkdir build cd build

4.3 cmake編譯

yichao@yichao:~/MyDocuments/tensorrtx/retinaface/build$ cmake .. CMake Deprecation Warning at CMakeLists.txt:1 (cmake_minimum_required):Compatibility with CMake < 2.8.12 will be removed from a future version ofCMake.Update the VERSION argument <min> value or use a ...<max> suffix to tellCMake that the project does not need compatibility with older versions.-- The C compiler identification is GNU 7.5.0 -- The CXX compiler identification is GNU 7.5.0 -- Detecting C compiler ABI info -- Detecting C compiler ABI info - done -- Check for working C compiler: /usr/bin/cc - skipped -- Detecting C compile features -- Detecting C compile features - done -- Detecting CXX compiler ABI info -- Detecting CXX compiler ABI info - done -- Check for working CXX compiler: /usr/bin/c++ - skipped -- Detecting CXX compile features -- Detecting CXX compile features - done -- Found CUDA: /usr/local/cuda (found version "10.2") embed_platform off -- Found OpenCV: /usr/local/opencv3.3.0 (found version "3.3.0") -- Configuring done -- Generating done -- Build files have been written to: /home/yichao/MyDocuments/tensorrtx/retinaface/build

4.4 make -j8編譯

# 打印所有的日志信息 make VERBOSE=1 (tensorRT-yolov5) yichao@yichao:~/MyDocuments/tensorrtx/retinaface/build$ make -j8 [ 12%] Building NVCC (Device) object CMakeFiles/decodeplugin.dir/decodeplugin_generated_decode.cu.o /home/yichao/MyDocuments/tensorrtx/retinaface/decode.h(73): warning: function "nvinfer1::IPluginV2Ext::configurePlugin(const nvinfer1::Dims *, int, const nvinfer1::Dims *, int, const nvinfer1::DataType *, const nvinfer1::DataType *, const __nv_bool *, const __nv_bool *, nvinfer1::PluginFormat, int)" is hidden by "nvinfer1::DecodePlugin::configurePlugin" -- virtual function override intended?/home/yichao/MyDocuments/tensorrtx/retinaface/decode.h(73): warning: function "nvinfer1::IPluginV2Ext::configurePlugin(const nvinfer1::Dims *, int, const nvinfer1::Dims *, int, const nvinfer1::DataType *, const nvinfer1::DataType *, const bool *, const bool *, nvinfer1::PluginFormat, int)" is hidden by "nvinfer1::DecodePlugin::configurePlugin" -- virtual function override intended? ... ... ... [ 87%] Linking CXX executable retina_mnet [100%] Linking CXX executable retina_r50 [100%] Built target retina_r50 [100%] Built target retina_mnet

4.5 生成engine引擎

./retina_r50 -s (tensorRT-yolov5) yichao@yichao:~/MyDocuments/tensorrtx/retinaface/build$ time ./retina_r50 -s Loading weights: ../retinaface.wts Building engine, please wait for a while... Build engine successfully!real 1m3.483s user 0m33.287s sys 0m5.715s生成engine引擎大小為78.2MB

4.5.1 顯存占用情況

4.6 infer推理

4.6.1 下載圖片。

wget https://github.com/Tencent/FaceDetection-DSFD/raw/master/data/worlds-largest-selfie.jpg如果下載圖片太慢了，改成： wget https://github.com/Tencent.cnpmjs.org/FaceDetection-DSFD/raw/master/data/worlds-largest-selfie.jpg (tensorRT-yolov5) yichao@yichao:~/MyDocuments/tensorrtx/retinaface/build$ wget https://github.com.cnpmjs.org/Tencent/FaceDetection-DSFD/raw/master/data/worlds-largest-selfie.jpg --2022-01-13 15:02:13-- https://github.com.cnpmjs.org/Tencent/FaceDetection-DSFD/raw/master/data/worlds-largest-selfie.jpg 正在解析主機(jī) github.com.cnpmjs.org (github.com.cnpmjs.org)... 47.241.4.205 正在連接 github.com.cnpmjs.org (github.com.cnpmjs.org)|47.241.4.205|:443... 已連接。已發(fā)出 HTTP 請(qǐng)求，正在等待回應(yīng)... 302 Found 位置：https://raw.githubusercontent.com/Tencent/FaceDetection-DSFD/master/data/worlds-largest-selfie.jpg [跟隨至新的 URL] --2022-01-13 15:02:14-- https://raw.githubusercontent.com/Tencent/FaceDetection-DSFD/master/data/worlds-largest-selfie.jpg 正在解析主機(jī) raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.72.133 正在連接 raw.githubusercontent.com (raw.githubusercontent.com)|151.101.72.133|:443... 已連接。已發(fā)出 HTTP 請(qǐng)求，正在等待回應(yīng)... 200 OK 長度： 471393 (460K) [image/jpeg] 正在保存至: “worlds-largest-selfie.jpg”worlds-largest-selfi 100%[===================>] 460.34K 13.0KB/s in 28s 2022-01-13 15:02:44 (16.5 KB/s) - 已保存 “worlds-largest-selfie.jpg” [471393/471393])

4.6.2 測試推理速度

./retina_r50 -d (tensorRT-yolov5) yichao@yichao:~/MyDocuments/tensorrtx/retinaface/build$ ./retina_r50 -d 445571us 19030us ... ... ... 15157us 15870us umber of detections -> 1433-> 515.064 after nms -> 256

4.7 python infer

修改 retinaface_trt.py 中的圖片路徑。

input_image_paths = ["worlds-largest-selfie.jpg"] (tensorRT-yolov5) yichao@yichao:~/MyDocuments/tensorrtx/retinaface$ python retinaface_trt.py 3.9774467945098877 0.017582416534423828 0.01763463020324707 0.021233797073364258 0.017621517181396484 0.017649412155151367 0.017993688583374023 0.017635107040405273 0.01763153076171875 0.017618894577026367

五、tensorRT FP32 推理

TensorRT實(shí)現(xiàn)yolov5推理加速（一）

修改 retina_r50.cpp 文件中的 USE_FP32，其他操作參考上文中的關(guān)鍵步驟。

5.1 生成engine引擎

(tensorRT-yolov5) yichao@yichao:~/MyDocuments/tensorrtx/retinaface/build$ time ./retina_r50 -s Loading weights: ../retinaface.wts Building engine, please wait for a while... Build engine successfully!real 0m27.783s user 0m18.162s sys 0m2.295s生成engine引擎大小為154.2MB

5.1.1 顯存占用情況

5.2 infer 推理

(tensorRT-yolov5) yichao@yichao:~/MyDocuments/tensorrtx/retinaface/build$ ./retina_r50 -d 436509us 30747us 30568us ... ... ... 29127us 28726us 28716us number of detections -> 1433-> 515.075 after nms -> 257

5.3 python infer

(tensorRT-yolov5) yichao@yichao:~/MyDocuments/tensorrtx/retinaface$ python retinaface_trt.py 3.919330358505249 0.03155779838562012 0.031530141830444336 0.03136157989501953 0.03149151802062988 0.0314486026763916 0.03205513954162598 0.03142070770263672 0.03142905235290527 0.03143477439880371

六、tensorRT FP16 推理

TensorRT實(shí)現(xiàn)yolov5推理加速（一）

修改 retina_r50.cpp 文件中的 USE_FP16。

七、tensorRT INT8 推理

7.1 校準(zhǔn)數(shù)據(jù)集

7.1.1 下載校準(zhǔn)數(shù)據(jù)集

download my calibration images widerface_calib from GoogleDrive or BaiduPan pwd: a9wh

7.1.2 解壓到 retinaface/build 目錄

7.2 修改 retina_r50.cpp 文件

USE_INT8

7.3 make -j8 編譯

make -j8

7.4 生成 engine 引擎

./retina_r50 -s (tensorRT-yolov5) yichao@yichao:~/MyDocuments/tensorrtx/retinaface/build$ time ./retina_r50 -s Loading weights: ../retinaface.wts Your platform support int8: 1 Building engine, please wait for a while... reading calib cache: r50_int8calib.table 2--Demonstration_2_Demonstration_Political_Rally_2_488.jpg 0 29--Students_Schoolkids_29_Students_Schoolkids_Students_Schoolkids_29_517.jpg 1 39--Ice_Skating_39_Ice_Skating_Ice_Skating_39_344.jpg 2 ... ... ... 61--Street_Battle_61_Street_Battle_streetfight_61_566.jpg 998 2--Demonstration_2_Demonstration_Demonstration_Or_Protest_2_260.jpg 999 reading calib cache: r50_int8calib.table writing calib cache: r50_int8calib.table size: 12200 Build engine successfully!real 7m25.594s user 5m58.694s sys 1m34.686s生成engine引擎大小為30.1MB

7.4.1 顯存占用情況

7.5 infer 推理

./retina_r50 -d (tensorRT-yolov5) yichao@yichao:~/MyDocuments/tensorrtx/retinaface/build$ time ./retina_r50 -d 424574us 13240us 14247us ... ... ... 11711us 11662us 11103us number of detections -> 1382-> 11.1058 after nms -> 246

7.6 python infer

(tensorRT-yolov5) yichao@yichao:~/MyDocuments/tensorrtx/retinaface$ python retinaface_trt.py 3.9951412677764893 0.014085054397583008 0.014075279235839844 0.013991594314575195 0.014072656631469727 0.014059305191040039 0.014052867889404297 0.014079093933105469 0.01405954360961914 0.014012575149536133

八、RetinaFace性能分析

人臉檢測器RetinaFace性能分析

精度Infer Time

FP32	29ms
FP16	15ms
INT8	11ms

總結(jié)：FP 16加速比是FP 32的2倍，INT8 相對(duì)于 FP 16加速不明顯。

九、可能出現(xiàn)的問題

Q1：opencv與CUDA版本不匹配，導(dǎo)致 cmake失敗

CMake Error at /usr/local/opencv3.3.0/share/OpenCV/OpenCVConfig.cmake:108 (message):OpenCV static library was compiled with CUDA 10.2 support. Please, use thesame version or rebuild OpenCV with CUDA 11.0 Call Stack (most recent call first):CMakeLists.txt:28 (find_package) 錯(cuò)誤原因： opencv版本與CUDA版本不匹配。博主使用CUDA10.3編譯opencv3.3.0，正確的應(yīng)該是opencv3.3.0匹配CUDA10.2，而當(dāng)前的opencv版本為3.3.0、CUDA版本為11.0。解決辦法：因?yàn)橹匦戮幾gopencv比較麻煩，直接切換cuda10.2即可，參考博客 [CUDA在ubuntu多版本切換共存](https://blog.csdn.net/m0_37605642/article/details/120098215)注意：切換cuda版本之后，清空build目錄中的文件，重新cmake

Q2：找不到 NvInfer.h 文件

fatal error: NvInfer.h: No such file or directory | TensorRT 報(bào)錯(cuò)處理 | 【成功解決】

yichao@yichao:~/MyDocuments/tensorrtx/retinaface/build$ make -j8 [ 12%] Building NVCC (Device) object CMakeFiles/decodeplugin.dir/decodeplugin_generated_decode.cu.o In file included from /home/yichao/MyDocuments/tensorrtx/retinaface/decode.cu:1:0: /home/yichao/MyDocuments/tensorrtx/retinaface/decode.h:6:10: fatal error: NvInfer.h: 沒有那個(gè)文件或目錄#include "NvInfer.h"^~~~~~~~~~~ compilation terminated. CMake Error at decodeplugin_generated_decode.cu.o.Debug.cmake:220 (message):Error generating/home/yichao/MyDocuments/tensorrtx/retinaface/build/CMakeFiles/decodeplugin.dir//./decodeplugin_generated_decode.cu.o 錯(cuò)誤原因： NvInfer.h 頭文件屬于 TensorRT 下的一個(gè)專有頭文件，在編譯C++ 代碼時(shí)需要找到它。解決辦法： /home/yichao/MyDocuments/tensorrtx/retinaface/CMakeLists.txt，增加tensorRT的依賴庫# tensorRT include_directories(/home/yichao/360Downloads/TensorRT-7.1.3.4/include) link_directories(/home/yichao/360Downloads/TensorRT-7.1.3.4/lib/)

Q3：不支持tensorRT8

32 errors detected in the compilation of "/tmp/tmpxft_00003bbc_00000000-6_decode.cpp1.ii". -- Removing /home/yichao/MyDocuments/tensorrtx/retinaface/build/CMakeFiles/decodeplugin.dir//./decodeplugin_generated_decode.cu.o /home/yichao/360Downloads/cmake-3.21.1-linux-x86_64/bin/cmake -E rm -f /home/yichao/MyDocuments/tensorrtx/retinaface/build/CMakeFiles/decodeplugin.dir//./decodeplugin_generated_decode.cu.o CMake Error at decodeplugin_generated_decode.cu.o.Debug.cmake:280 (message):Error generating file/home/yichao/MyDocuments/tensorrtx/retinaface/build/CMakeFiles/decodeplugin.dir//./decodeplugin_generated_decode.cu.oCMakeFiles/decodeplugin.dir/build.make:75: recipe for target 'CMakeFiles/decodeplugin.dir/decodeplugin_generated_decode.cu.o' failed make[2]: *** [CMakeFiles/decodeplugin.dir/decodeplugin_generated_decode.cu.o] Error 1 make[2]: Leaving directory '/home/yichao/MyDocuments/tensorrtx/retinaface/build' CMakeFiles/Makefile2:86: recipe for target 'CMakeFiles/decodeplugin.dir/all' failed make[1]: *** [CMakeFiles/decodeplugin.dir/all] Error 2 make[1]: Leaving directory '/home/yichao/MyDocuments/tensorrtx/retinaface/build' Makefile:90: recipe for target 'all' failed make: *** [all] Error 2 錯(cuò)誤原因： CMakeLists.txt中的tensorRT配置問題，make編譯使用的tensorRT版本與系統(tǒng)的tensorRT版本要一致。解決辦法： /home/yichao/MyDocuments/tensorrtx/retinaface/CMakeLists.txt修改tensorRT的配置 # tensorRT include_directories(/home/yichao/360Downloads/TensorRT-8.0.1.6/include) link_directories(/home/yichao/360Downloads/TensorRT-8.0.1.6/lib/) 改為 # tensorRT include_directories(/home/yichao/360Downloads/TensorRT-7.1.3.4/include) link_directories(/home/yichao/360Downloads/TensorRT-7.1.3.4/lib/)

Q4：找不到 lnvinfer

解決Make時(shí)，“/usr/bin/ld: 找不到 -lXXX”問題的四種方法

/usr/bin/ld: 找不到 -lnvinfer collect2: error: ld returned 1 exit status CMakeFiles/decodeplugin.dir/build.make:90: recipe for target 'libdecodeplugin.so' failed make[2]: *** [libdecodeplugin.so] Error 1 CMakeFiles/Makefile2:86: recipe for target 'CMakeFiles/decodeplugin.dir/all' failed make[1]: *** [CMakeFiles/decodeplugin.dir/all] Error 2 Makefile:90: recipe for target 'all' failed make: *** [all] Error 2 錯(cuò)誤原因：找不到nvinfer庫文件。這個(gè)庫的文件名應(yīng)該為“l(fā)ibnvinfer.so”，其命名規(guī)則是：lib+庫名(即xxx)+.so。解決辦法： 1. 找到 libnvinfer.so 文件（用find）find / -name libnvinfer.so 或者（用locate）locate libnvinfer.so# 輸出 /home/yichao/360Downloads/TensorRT-7.1.3.4/lib/libnvinfer.so2. 創(chuàng)建軟鏈接 sudo ln -s /home/yichao/360Downloads/TensorRT-7.1.3.4/lib/libnvinfer.so /usr/lib/libnvinfer.so

Q5：源代碼錯(cuò)誤

TensorRT實(shí)現(xiàn)yolov5推理加速（二）

make[2]: *** [CMakeFiles/retina_r50.dir/calibrator.cpp.o] Error 1 make[2]: *** 正在等待未完成的任務(wù).... /home/yichao/MyDocuments/tensorrtx/retinaface/calibrator.cpp: In member function ‘virtual bool Int8EntropyCalibrator2::getBatch(void**, const char**, int)’: /home/yichao/MyDocuments/tensorrtx/retinaface/calibrator.cpp:52:131: error: too many arguments to function ‘cv::Mat cv::dnn::experimental_dnn_v1::blobFromImages(const std::vector<cv::Mat>&, double, cv::Size, const Scalar&, bool)’cv::Mat blob = cv::dnn::blobFromImages(input_imgs_, 1.0, cv::Size(input_w_, input_h_), cv::Scalar(104, 117, 123), false, false);^ compilation terminated due to -Wfatal-errors. CMakeFiles/retina_mnet.dir/build.make:75: recipe for target 'CMakeFiles/retina_mnet.dir/calibrator.cpp.o' failed make[2]: *** [CMakeFiles/retina_mnet.dir/calibrator.cpp.o] Error 1 make[2]: *** 正在等待未完成的任務(wù).... CMakeFiles/Makefile2:138: recipe for target 'CMakeFiles/retina_mnet.dir/all' failed make[1]: *** [CMakeFiles/retina_mnet.dir/all] Error 2 make[1]: *** 正在等待未完成的任務(wù).... CMakeFiles/Makefile2:112: recipe for target 'CMakeFiles/retina_r50.dir/all' failed make[1]: *** [CMakeFiles/retina_r50.dir/all] Error 2 Makefile:90: recipe for target 'all' failed make: *** [all] Error 2 錯(cuò)誤原因：源碼錯(cuò)誤 /home/yichao/MyDocuments/tensorrtx/retinaface/calibrator.cpp:52解決辦法：修改源碼cv::Mat blob = cv::dnn::blobFromImages(input_imgs_, 1.0, cv::Size(input_w_, input_h_), cv::Scalar(104, 117, 123), false, false); 修改為 cv::Mat blob = cv::dnn::blobFromImages(input_imgs_, 1.0, cv::Size(input_w_, input_h_), cv::Scalar(104, 117, 123), false);

Q6：顯存不足

Cuda Error in allocate: 2 (out of memory) - GPU Memory Leak? #851

顯存不足，生成engine引擎失敗。

[TensorRT] ERROR: ../rtSafe/safeRuntime.cpp (25) - Cuda Error in allocate: 2 (out of memory) [TensorRT] ERROR: ../rtSafe/safeRuntime.cpp (25) - Cuda Error in allocate: 2 (out of memory) Traceback (most recent call last):File "/media/yichao/蟻巢文件/YOYOFile/YOYOFile/demo/build_engine.py", line 146, in <module>main(args)File "/media/yichao/蟻巢文件/YOYOFile/YOYOFile/demo/build_engine.py", line 126, in mainbuilder.create_engine(args.engine, args.precision)File "/media/yichao/蟻巢文件/YOYOFile/YOYOFile/demo/build_engine.py", line 118, in create_enginewith self.builder.build_engine(self.network, self.config) as engine, open(engine_path, "wb") as f: AttributeError: __enter__ 錯(cuò)誤原因：我用python API，在GeForce GTX 1650(4GB)服務(wù)器上生成引擎失敗。在Jetson TX2(8GB)開發(fā)板上測試也失敗。解釋一： Same problem. But this problem only happens when my system is 1080ti+tensorRT7.0+cuda10.0+centos7.6. When I change to 2080ti+tensorRT7.0, everything works fine. 解釋二： I face the problem with 1080 and no problem on 2080. And I don't found any debug means.

總結(jié)

以上是生活随笔為你收集整理的TensorRT实现RetinaFace推理加速（一）的全部內(nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯(cuò)，歡迎將生活随笔推薦給好友。

上一篇：使用MCP2515和TJA1050构成C
下一篇： opencv读取realsense