TensorRT实现RetinaFace推理加速(一)
一、參考資料
tensorrtx/retinaface
TensorRT實(shí)現(xiàn)yolov5推理加速(一)
TensorRT實(shí)現(xiàn)yolov5推理加速(二)
二、實(shí)驗(yàn)環(huán)境
##系統(tǒng)環(huán)境
Environment Operating System + Version: Ubuntu + 16.04 TensorRT Version: 7.1.3.4 GPU Type: GeForce GTX1650,4GB Nvidia Driver Version: 470.63.01 CUDA Version: 10.2.300 CUDNN Version: 7.6.5 Python Version (if applicable): 3.7.3 Anaconda Version:4.10.3 gcc:7.5.0 g++:7.5.0tensorRT-yolov5.yaml
name: tensorRT-yolov5 channels:- <unknown>- http://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main- http://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/r- http://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/msys2 dependencies:- _libgcc_mutex=0.1=main- _openmp_mutex=4.5=1_gnu- blas=1.0=mkl- bzip2=1.0.8=h7b6447c_0- ca-certificates=2021.7.5=h06a4308_1- certifi=2021.5.30=py37h06a4308_0- cudatoolkit=10.2.89=hfd86e86_1- ffmpeg=4.2.2=h20bf706_0- freetype=2.10.4=h5ab3b9f_0- gmp=6.2.1=h2531618_2- gnutls=3.6.15=he1e5248_0- jpeg=9b=h024ee3a_2- lame=3.100=h7b6447c_0- lcms2=2.12=h3be6417_0- libedit=3.1.20210714=h7f8727e_0- libffi=3.2.1=hf484d3e_1007- libgcc-ng=9.3.0=h5101ec6_17- libgomp=9.3.0=h5101ec6_17- libidn2=2.3.2=h7f8727e_0- libopus=1.3.1=h7b6447c_0- libpng=1.6.37=hbc83047_0- libstdcxx-ng=9.3.0=hd4cf53a_17- libtasn1=4.16.0=h27cfd23_0- libtiff=4.2.0=h85742a9_0- libunistring=0.9.10=h27cfd23_0- libuv=1.40.0=h7b6447c_0- libvpx=1.7.0=h439df22_0- libwebp-base=1.2.0=h27cfd23_0- lz4-c=1.9.3=h295c915_1- mkl_fft=1.3.0=py37h42c9631_2- mkl_random=1.2.2=py37h51133e4_0- ncurses=6.2=he6710b0_1- nettle=3.7.3=hbbd107a_1- ninja=1.10.2=hff7bd54_1- numpy-base=1.20.3=py37h74d4b33_0- openh264=2.1.0=hd408876_0- openjpeg=2.4.0=h3ad879b_0- openssl=1.1.1l=h7f8727e_0- pip=21.2.2=py37h06a4308_0- python=3.7.3=h0371630_0- pytorch=1.8.0=py3.7_cuda10.2_cudnn7.6.5_0- readline=7.0=h7b6447c_5- setuptools=52.0.0=py37h06a4308_0- six=1.16.0=pyhd3eb1b0_0- sqlite=3.33.0=h62c20be_0- tk=8.6.10=hbc83047_0- torchvision=0.9.0=py37_cu102- typing_extensions=3.10.0.0=pyh06a4308_0- wheel=0.37.0=pyhd3eb1b0_0- x264=1!157.20191217=h7b6447c_0- xz=5.2.5=h7b6447c_0- zlib=1.2.11=h7b6447c_3- zstd=1.4.9=haebb681_0- pip:- appdirs==1.4.4- charset-normalizer==2.0.4- cycler==0.10.0- dpcpp-cpp-rt==2021.3.0- flatbuffers==2.0- graphsurgeon==0.4.5- idna==3.2- intel-cmplr-lib-rt==2021.3.0- intel-cmplr-lic-rt==2021.3.0- intel-opencl-rt==2021.3.0- intel-openmp==2021.3.0- kiwisolver==1.3.1- mako==1.1.5- markupsafe==2.0.1- matplotlib==3.4.3- mkl==2021.3.0- mkl-fft==1.3.0- mkl-service==2.4.0- netron==5.1.6- numpy==1.21.2- olefile==0.46- onnx==1.10.1- onnx-simplifier==0.3.6- onnxoptimizer==0.2.6- onnxruntime==1.8.1- opencv-python==4.5.3.56- pandas==1.3.2- pillow==8.3.2- protobuf==3.17.3- pycuda==2021.1- pyparsing==2.4.7- python-dateutil==2.8.2- pytools==2021.2.8- pytz==2021.1- pyyaml==5.4.1- requests==2.26.0- scipy==1.7.1- seaborn==0.11.2- tbb==2021.3.0- tensorrt==7.1.3.4- torchsummary==1.5.1- tqdm==4.62.2- typing-extensions==3.10.0.2- uff==0.6.9- urllib3==1.26.6 prefix: /home/yichao/miniconda3/envs/tensorRT-yolov5requirements-gpu.txt
appdirs==1.4.4 certifi==2021.5.30 charset-normalizer==2.0.4 cycler==0.10.0 dpcpp-cpp-rt==2021.3.0 flatbuffers==2.0 graphsurgeon @ file:///home/yichao/360Downloads/TensorRT-7.1.3.4/graphsurgeon/graphsurgeon-0.4.5-py2.py3-none-any.whl idna==3.2 intel-cmplr-lib-rt==2021.3.0 intel-cmplr-lic-rt==2021.3.0 intel-opencl-rt==2021.3.0 intel-openmp==2021.3.0 kiwisolver==1.3.1 Mako==1.1.5 MarkupSafe==2.0.1 matplotlib==3.4.3 mkl==2021.3.0 mkl-fft==1.3.0 mkl-random @ file:///tmp/build/80754af9/mkl_random_1626179032232/work mkl-service==2.4.0 netron==5.1.6 numpy==1.21.2 olefile==0.46 onnx==1.10.1 onnx-simplifier==0.3.6 onnxoptimizer==0.2.6 onnxruntime==1.8.1 opencv-python==4.5.3.56 pandas==1.3.2 Pillow==8.3.2 protobuf==3.17.3 pycuda==2021.1 pyparsing==2.4.7 python-dateutil==2.8.2 pytools==2021.2.8 pytz==2021.1 PyYAML==5.4.1 requests==2.26.0 scipy==1.7.1 seaborn==0.11.2 six @ file:///tmp/build/80754af9/six_1623709665295/work tbb==2021.3.0 tensorrt @ file:///home/yichao/360Downloads/TensorRT-7.1.3.4/python/tensorrt-7.1.3.4-cp37-none-linux_x86_64.whl torch==1.8.0 torchsummary==1.5.1 torchvision==0.9.0 tqdm==4.62.2 typing-extensions==3.10.0.2 uff @ file:///home/yichao/360Downloads/TensorRT-7.1.3.4/uff/uff-0.6.9-py2.py3-none-any.whl urllib3==1.26.6三、重要說明
3.1 配置文件
- Input shape INPUT_H, INPUT_W defined in decode.h
- INT8/FP16/FP32 can be selected by the macro USE_FP16 or USE_INT8 or USE_FP32 in retina_r50.cpp
- GPU id can be selected by the macro DEVICE in retina_r50.cpp
- Batchsize can be selected by the macro BATCHSIZE in retina_r50.cpp
3.2 預(yù)訓(xùn)練模型下載
face-recognition-models
face-detection-models
face-alignment-models
face-attribute-models
四、關(guān)鍵步驟
以FP16為例
4.1 pytorch預(yù)訓(xùn)練模型生成wts
4.1.1 下載github代碼倉庫
git clone https://github.com/wang-xinyu/Pytorch_Retinaface.git // download its weights 'Resnet50_Final.pth', put it in Pytorch_Retinaface/weights4.1.2 下載預(yù)訓(xùn)練模型
cd Pytorch_Retinaface python detect.py --save_model4.1.3 生成wts
python genwts.py // a file 'retinaface.wts' will be generated.4.2 tensorrtx準(zhǔn)備工作
git clone https://github.com/wang-xinyu/tensorrtx.git cd tensorrtx/retinaface // put retinaface.wts here mkdir build cd build4.3 cmake編譯
yichao@yichao:~/MyDocuments/tensorrtx/retinaface/build$ cmake .. CMake Deprecation Warning at CMakeLists.txt:1 (cmake_minimum_required):Compatibility with CMake < 2.8.12 will be removed from a future version ofCMake.Update the VERSION argument <min> value or use a ...<max> suffix to tellCMake that the project does not need compatibility with older versions.-- The C compiler identification is GNU 7.5.0 -- The CXX compiler identification is GNU 7.5.0 -- Detecting C compiler ABI info -- Detecting C compiler ABI info - done -- Check for working C compiler: /usr/bin/cc - skipped -- Detecting C compile features -- Detecting C compile features - done -- Detecting CXX compiler ABI info -- Detecting CXX compiler ABI info - done -- Check for working CXX compiler: /usr/bin/c++ - skipped -- Detecting CXX compile features -- Detecting CXX compile features - done -- Found CUDA: /usr/local/cuda (found version "10.2") embed_platform off -- Found OpenCV: /usr/local/opencv3.3.0 (found version "3.3.0") -- Configuring done -- Generating done -- Build files have been written to: /home/yichao/MyDocuments/tensorrtx/retinaface/build4.4 make -j8編譯
# 打印所有的日志信息 make VERBOSE=1 (tensorRT-yolov5) yichao@yichao:~/MyDocuments/tensorrtx/retinaface/build$ make -j8 [ 12%] Building NVCC (Device) object CMakeFiles/decodeplugin.dir/decodeplugin_generated_decode.cu.o /home/yichao/MyDocuments/tensorrtx/retinaface/decode.h(73): warning: function "nvinfer1::IPluginV2Ext::configurePlugin(const nvinfer1::Dims *, int, const nvinfer1::Dims *, int, const nvinfer1::DataType *, const nvinfer1::DataType *, const __nv_bool *, const __nv_bool *, nvinfer1::PluginFormat, int)" is hidden by "nvinfer1::DecodePlugin::configurePlugin" -- virtual function override intended?/home/yichao/MyDocuments/tensorrtx/retinaface/decode.h(73): warning: function "nvinfer1::IPluginV2Ext::configurePlugin(const nvinfer1::Dims *, int, const nvinfer1::Dims *, int, const nvinfer1::DataType *, const nvinfer1::DataType *, const bool *, const bool *, nvinfer1::PluginFormat, int)" is hidden by "nvinfer1::DecodePlugin::configurePlugin" -- virtual function override intended? ... ... ... [ 87%] Linking CXX executable retina_mnet [100%] Linking CXX executable retina_r50 [100%] Built target retina_r50 [100%] Built target retina_mnet4.5 生成engine引擎
./retina_r50 -s (tensorRT-yolov5) yichao@yichao:~/MyDocuments/tensorrtx/retinaface/build$ time ./retina_r50 -s Loading weights: ../retinaface.wts Building engine, please wait for a while... Build engine successfully!real 1m3.483s user 0m33.287s sys 0m5.715s生成engine引擎大小為78.2MB4.5.1 顯存占用情況
Thu Jan 13 16:00:02 2022 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 470.63.01 Driver Version: 470.63.01 CUDA Version: 11.4 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 NVIDIA GeForce ... Off | 00000000:01:00.0 On | N/A | | 27% 36C P0 28W / 75W | 828MiB / 3903MiB | 63% Default | | | | N/A | +-------------------------------+----------------------+----------------------++-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | 0 N/A N/A 1623 G /usr/lib/xorg/Xorg 209MiB | | 0 N/A N/A 23027 C ./retina_r50 615MiB | +-----------------------------------------------------------------------------+4.6 infer推理
4.6.1 下載圖片。
wget https://github.com/Tencent/FaceDetection-DSFD/raw/master/data/worlds-largest-selfie.jpg如果下載圖片太慢了,改成: wget https://github.com/Tencent.cnpmjs.org/FaceDetection-DSFD/raw/master/data/worlds-largest-selfie.jpg (tensorRT-yolov5) yichao@yichao:~/MyDocuments/tensorrtx/retinaface/build$ wget https://github.com.cnpmjs.org/Tencent/FaceDetection-DSFD/raw/master/data/worlds-largest-selfie.jpg --2022-01-13 15:02:13-- https://github.com.cnpmjs.org/Tencent/FaceDetection-DSFD/raw/master/data/worlds-largest-selfie.jpg 正在解析主機(jī) github.com.cnpmjs.org (github.com.cnpmjs.org)... 47.241.4.205 正在連接 github.com.cnpmjs.org (github.com.cnpmjs.org)|47.241.4.205|:443... 已連接。 已發(fā)出 HTTP 請(qǐng)求,正在等待回應(yīng)... 302 Found 位置:https://raw.githubusercontent.com/Tencent/FaceDetection-DSFD/master/data/worlds-largest-selfie.jpg [跟隨至新的 URL] --2022-01-13 15:02:14-- https://raw.githubusercontent.com/Tencent/FaceDetection-DSFD/master/data/worlds-largest-selfie.jpg 正在解析主機(jī) raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.72.133 正在連接 raw.githubusercontent.com (raw.githubusercontent.com)|151.101.72.133|:443... 已連接。 已發(fā)出 HTTP 請(qǐng)求,正在等待回應(yīng)... 200 OK 長度: 471393 (460K) [image/jpeg] 正在保存至: “worlds-largest-selfie.jpg”worlds-largest-selfi 100%[===================>] 460.34K 13.0KB/s in 28s 2022-01-13 15:02:44 (16.5 KB/s) - 已保存 “worlds-largest-selfie.jpg” [471393/471393])4.6.2 測試推理速度
./retina_r50 -d (tensorRT-yolov5) yichao@yichao:~/MyDocuments/tensorrtx/retinaface/build$ ./retina_r50 -d 445571us 19030us ... ... ... 15157us 15870us umber of detections -> 1433-> 515.064 after nms -> 2564.7 python infer
修改 retinaface_trt.py 中的圖片路徑。
input_image_paths = ["worlds-largest-selfie.jpg"] (tensorRT-yolov5) yichao@yichao:~/MyDocuments/tensorrtx/retinaface$ python retinaface_trt.py 3.9774467945098877 0.017582416534423828 0.01763463020324707 0.021233797073364258 0.017621517181396484 0.017649412155151367 0.017993688583374023 0.017635107040405273 0.01763153076171875 0.017618894577026367五、tensorRT FP32 推理
TensorRT實(shí)現(xiàn)yolov5推理加速(一)
修改 retina_r50.cpp 文件中的 USE_FP32,其他操作參考上文中的關(guān)鍵步驟。
5.1 生成engine引擎
(tensorRT-yolov5) yichao@yichao:~/MyDocuments/tensorrtx/retinaface/build$ time ./retina_r50 -s Loading weights: ../retinaface.wts Building engine, please wait for a while... Build engine successfully!real 0m27.783s user 0m18.162s sys 0m2.295s生成engine引擎大小為154.2MB5.1.1 顯存占用情況
Thu Jan 13 16:10:38 2022 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 470.63.01 Driver Version: 470.63.01 CUDA Version: 11.4 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 NVIDIA GeForce ... Off | 00000000:01:00.0 On | N/A | | 27% 36C P0 42W / 75W | 834MiB / 3903MiB | 56% Default | | | | N/A | +-------------------------------+----------------------+----------------------++-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | 0 N/A N/A 1623 G /usr/lib/xorg/Xorg 209MiB | | 0 N/A N/A 23509 C ./retina_r50 621MiB | +-----------------------------------------------------------------------------+5.2 infer 推理
(tensorRT-yolov5) yichao@yichao:~/MyDocuments/tensorrtx/retinaface/build$ ./retina_r50 -d 436509us 30747us 30568us ... ... ... 29127us 28726us 28716us number of detections -> 1433-> 515.075 after nms -> 2575.3 python infer
(tensorRT-yolov5) yichao@yichao:~/MyDocuments/tensorrtx/retinaface$ python retinaface_trt.py 3.919330358505249 0.03155779838562012 0.031530141830444336 0.03136157989501953 0.03149151802062988 0.0314486026763916 0.03205513954162598 0.03142070770263672 0.03142905235290527 0.03143477439880371六、tensorRT FP16 推理
TensorRT實(shí)現(xiàn)yolov5推理加速(一)
修改 retina_r50.cpp 文件中的 USE_FP16。
七、tensorRT INT8 推理
7.1 校準(zhǔn)數(shù)據(jù)集
7.1.1 下載校準(zhǔn)數(shù)據(jù)集
download my calibration images widerface_calib from GoogleDrive or BaiduPan pwd: a9wh
7.1.2 解壓到 retinaface/build 目錄
7.2 修改 retina_r50.cpp 文件
USE_INT8
7.3 make -j8 編譯
make -j87.4 生成 engine 引擎
./retina_r50 -s (tensorRT-yolov5) yichao@yichao:~/MyDocuments/tensorrtx/retinaface/build$ time ./retina_r50 -s Loading weights: ../retinaface.wts Your platform support int8: 1 Building engine, please wait for a while... reading calib cache: r50_int8calib.table 2--Demonstration_2_Demonstration_Political_Rally_2_488.jpg 0 29--Students_Schoolkids_29_Students_Schoolkids_Students_Schoolkids_29_517.jpg 1 39--Ice_Skating_39_Ice_Skating_Ice_Skating_39_344.jpg 2 ... ... ... 61--Street_Battle_61_Street_Battle_streetfight_61_566.jpg 998 2--Demonstration_2_Demonstration_Demonstration_Or_Protest_2_260.jpg 999 reading calib cache: r50_int8calib.table writing calib cache: r50_int8calib.table size: 12200 Build engine successfully!real 7m25.594s user 5m58.694s sys 1m34.686s生成engine引擎大小為30.1MB7.4.1 顯存占用情況
Thu Jan 13 15:42:58 2022 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 470.63.01 Driver Version: 470.63.01 CUDA Version: 11.4 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 NVIDIA GeForce ... Off | 00000000:01:00.0 On | N/A | | 27% 39C P0 45W / 75W | 1073MiB / 3903MiB | 86% Default | | | | N/A | +-------------------------------+----------------------+----------------------++-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | 0 N/A N/A 1623 G /usr/lib/xorg/Xorg 209MiB | | 0 N/A N/A 22413 C ./retina_r50 860MiB | +-----------------------------------------------------------------------------+7.5 infer 推理
./retina_r50 -d (tensorRT-yolov5) yichao@yichao:~/MyDocuments/tensorrtx/retinaface/build$ time ./retina_r50 -d 424574us 13240us 14247us ... ... ... 11711us 11662us 11103us number of detections -> 1382-> 11.1058 after nms -> 2467.6 python infer
(tensorRT-yolov5) yichao@yichao:~/MyDocuments/tensorrtx/retinaface$ python retinaface_trt.py 3.9951412677764893 0.014085054397583008 0.014075279235839844 0.013991594314575195 0.014072656631469727 0.014059305191040039 0.014052867889404297 0.014079093933105469 0.01405954360961914 0.014012575149536133八、RetinaFace性能分析
人臉檢測器RetinaFace性能分析
| FP32 | 29ms |
| FP16 | 15ms |
| INT8 | 11ms |
總結(jié):FP 16加速比是FP 32的2倍,INT8 相對(duì)于 FP 16加速不明顯。
九、可能出現(xiàn)的問題
Q1:opencv與CUDA版本不匹配,導(dǎo)致 cmake失敗
CMake Error at /usr/local/opencv3.3.0/share/OpenCV/OpenCVConfig.cmake:108 (message):OpenCV static library was compiled with CUDA 10.2 support. Please, use thesame version or rebuild OpenCV with CUDA 11.0 Call Stack (most recent call first):CMakeLists.txt:28 (find_package) 錯(cuò)誤原因: opencv版本與CUDA版本不匹配。博主使用CUDA10.3編譯opencv3.3.0,正確的應(yīng)該是opencv3.3.0匹配CUDA10.2,而當(dāng)前的opencv版本為3.3.0、CUDA版本為11.0。解決辦法: 因?yàn)橹匦戮幾gopencv比較麻煩,直接切換cuda10.2即可,參考博客 [CUDA在ubuntu多版本切換共存](https://blog.csdn.net/m0_37605642/article/details/120098215)注意:切換cuda版本之后,清空build目錄中的文件,重新cmakeQ2:找不到 NvInfer.h 文件
fatal error: NvInfer.h: No such file or directory | TensorRT 報(bào)錯(cuò)處理 | 【成功解決】
yichao@yichao:~/MyDocuments/tensorrtx/retinaface/build$ make -j8 [ 12%] Building NVCC (Device) object CMakeFiles/decodeplugin.dir/decodeplugin_generated_decode.cu.o In file included from /home/yichao/MyDocuments/tensorrtx/retinaface/decode.cu:1:0: /home/yichao/MyDocuments/tensorrtx/retinaface/decode.h:6:10: fatal error: NvInfer.h: 沒有那個(gè)文件或目錄#include "NvInfer.h"^~~~~~~~~~~ compilation terminated. CMake Error at decodeplugin_generated_decode.cu.o.Debug.cmake:220 (message):Error generating/home/yichao/MyDocuments/tensorrtx/retinaface/build/CMakeFiles/decodeplugin.dir//./decodeplugin_generated_decode.cu.o 錯(cuò)誤原因: NvInfer.h 頭文件屬于 TensorRT 下的一個(gè)專有頭文件,在編譯C++ 代碼時(shí)需要找到它。解決辦法: /home/yichao/MyDocuments/tensorrtx/retinaface/CMakeLists.txt,增加tensorRT的依賴庫# tensorRT include_directories(/home/yichao/360Downloads/TensorRT-7.1.3.4/include) link_directories(/home/yichao/360Downloads/TensorRT-7.1.3.4/lib/)Q3:不支持tensorRT8
32 errors detected in the compilation of "/tmp/tmpxft_00003bbc_00000000-6_decode.cpp1.ii". -- Removing /home/yichao/MyDocuments/tensorrtx/retinaface/build/CMakeFiles/decodeplugin.dir//./decodeplugin_generated_decode.cu.o /home/yichao/360Downloads/cmake-3.21.1-linux-x86_64/bin/cmake -E rm -f /home/yichao/MyDocuments/tensorrtx/retinaface/build/CMakeFiles/decodeplugin.dir//./decodeplugin_generated_decode.cu.o CMake Error at decodeplugin_generated_decode.cu.o.Debug.cmake:280 (message):Error generating file/home/yichao/MyDocuments/tensorrtx/retinaface/build/CMakeFiles/decodeplugin.dir//./decodeplugin_generated_decode.cu.oCMakeFiles/decodeplugin.dir/build.make:75: recipe for target 'CMakeFiles/decodeplugin.dir/decodeplugin_generated_decode.cu.o' failed make[2]: *** [CMakeFiles/decodeplugin.dir/decodeplugin_generated_decode.cu.o] Error 1 make[2]: Leaving directory '/home/yichao/MyDocuments/tensorrtx/retinaface/build' CMakeFiles/Makefile2:86: recipe for target 'CMakeFiles/decodeplugin.dir/all' failed make[1]: *** [CMakeFiles/decodeplugin.dir/all] Error 2 make[1]: Leaving directory '/home/yichao/MyDocuments/tensorrtx/retinaface/build' Makefile:90: recipe for target 'all' failed make: *** [all] Error 2 錯(cuò)誤原因: CMakeLists.txt中的tensorRT配置問題,make編譯使用的tensorRT版本與系統(tǒng)的tensorRT版本要一致。解決辦法: /home/yichao/MyDocuments/tensorrtx/retinaface/CMakeLists.txt修改tensorRT的配置 # tensorRT include_directories(/home/yichao/360Downloads/TensorRT-8.0.1.6/include) link_directories(/home/yichao/360Downloads/TensorRT-8.0.1.6/lib/) 改為 # tensorRT include_directories(/home/yichao/360Downloads/TensorRT-7.1.3.4/include) link_directories(/home/yichao/360Downloads/TensorRT-7.1.3.4/lib/)Q4:找不到 lnvinfer
解決Make時(shí),“/usr/bin/ld: 找不到 -lXXX”問題的四種方法
/usr/bin/ld: 找不到 -lnvinfer collect2: error: ld returned 1 exit status CMakeFiles/decodeplugin.dir/build.make:90: recipe for target 'libdecodeplugin.so' failed make[2]: *** [libdecodeplugin.so] Error 1 CMakeFiles/Makefile2:86: recipe for target 'CMakeFiles/decodeplugin.dir/all' failed make[1]: *** [CMakeFiles/decodeplugin.dir/all] Error 2 Makefile:90: recipe for target 'all' failed make: *** [all] Error 2 錯(cuò)誤原因: 找不到nvinfer庫文件。這個(gè)庫的文件名應(yīng)該為“l(fā)ibnvinfer.so”,其命名規(guī)則是:lib+庫名(即xxx)+.so。解決辦法: 1. 找到 libnvinfer.so 文件 (用find)find / -name libnvinfer.so 或者 (用locate)locate libnvinfer.so# 輸出 /home/yichao/360Downloads/TensorRT-7.1.3.4/lib/libnvinfer.so2. 創(chuàng)建軟鏈接 sudo ln -s /home/yichao/360Downloads/TensorRT-7.1.3.4/lib/libnvinfer.so /usr/lib/libnvinfer.soQ5:源代碼錯(cuò)誤
TensorRT實(shí)現(xiàn)yolov5推理加速(二)
make[2]: *** [CMakeFiles/retina_r50.dir/calibrator.cpp.o] Error 1 make[2]: *** 正在等待未完成的任務(wù).... /home/yichao/MyDocuments/tensorrtx/retinaface/calibrator.cpp: In member function ‘virtual bool Int8EntropyCalibrator2::getBatch(void**, const char**, int)’: /home/yichao/MyDocuments/tensorrtx/retinaface/calibrator.cpp:52:131: error: too many arguments to function ‘cv::Mat cv::dnn::experimental_dnn_v1::blobFromImages(const std::vector<cv::Mat>&, double, cv::Size, const Scalar&, bool)’cv::Mat blob = cv::dnn::blobFromImages(input_imgs_, 1.0, cv::Size(input_w_, input_h_), cv::Scalar(104, 117, 123), false, false);^ compilation terminated due to -Wfatal-errors. CMakeFiles/retina_mnet.dir/build.make:75: recipe for target 'CMakeFiles/retina_mnet.dir/calibrator.cpp.o' failed make[2]: *** [CMakeFiles/retina_mnet.dir/calibrator.cpp.o] Error 1 make[2]: *** 正在等待未完成的任務(wù).... CMakeFiles/Makefile2:138: recipe for target 'CMakeFiles/retina_mnet.dir/all' failed make[1]: *** [CMakeFiles/retina_mnet.dir/all] Error 2 make[1]: *** 正在等待未完成的任務(wù).... CMakeFiles/Makefile2:112: recipe for target 'CMakeFiles/retina_r50.dir/all' failed make[1]: *** [CMakeFiles/retina_r50.dir/all] Error 2 Makefile:90: recipe for target 'all' failed make: *** [all] Error 2 錯(cuò)誤原因: 源碼錯(cuò)誤 /home/yichao/MyDocuments/tensorrtx/retinaface/calibrator.cpp:52解決辦法: 修改源碼cv::Mat blob = cv::dnn::blobFromImages(input_imgs_, 1.0, cv::Size(input_w_, input_h_), cv::Scalar(104, 117, 123), false, false); 修改為 cv::Mat blob = cv::dnn::blobFromImages(input_imgs_, 1.0, cv::Size(input_w_, input_h_), cv::Scalar(104, 117, 123), false);Q6:顯存不足
Cuda Error in allocate: 2 (out of memory) - GPU Memory Leak? #851
顯存不足,生成engine引擎失敗。
[TensorRT] ERROR: ../rtSafe/safeRuntime.cpp (25) - Cuda Error in allocate: 2 (out of memory) [TensorRT] ERROR: ../rtSafe/safeRuntime.cpp (25) - Cuda Error in allocate: 2 (out of memory) Traceback (most recent call last):File "/media/yichao/蟻巢文件/YOYOFile/YOYOFile/demo/build_engine.py", line 146, in <module>main(args)File "/media/yichao/蟻巢文件/YOYOFile/YOYOFile/demo/build_engine.py", line 126, in mainbuilder.create_engine(args.engine, args.precision)File "/media/yichao/蟻巢文件/YOYOFile/YOYOFile/demo/build_engine.py", line 118, in create_enginewith self.builder.build_engine(self.network, self.config) as engine, open(engine_path, "wb") as f: AttributeError: __enter__ 錯(cuò)誤原因:我用python API,在GeForce GTX 1650(4GB)服務(wù)器上生成引擎失敗。在Jetson TX2(8GB)開發(fā)板上測試也失敗。 解釋一: Same problem. But this problem only happens when my system is 1080ti+tensorRT7.0+cuda10.0+centos7.6. When I change to 2080ti+tensorRT7.0, everything works fine. 解釋二: I face the problem with 1080 and no problem on 2080. And I don't found any debug means.總結(jié)
以上是生活随笔為你收集整理的TensorRT实现RetinaFace推理加速(一)的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 使用MCP2515和TJA1050构成C
- 下一篇: opencv读取realsense