# YOLOV5-notes **Repository Path**: MichaelCong/YOLOV5-notes ## Basic Information - **Project Name**: YOLOV5-notes - **Description**: No description available - **Primary Language**: Unknown - **License**: GPL-3.0 - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2021-03-24 - **Last Updated**: 2021-03-24 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # **yolov5-中文注释** 最近在做目标检测相关的任务，最终是要做落地的产品，部署到移动端NVIDIA Jetson TX2上，刚开始尝试了yolov4,发现训练代价很大，最终模型240M左右，有点大，于是开始尝试yolov5，发现yolov5太香了，训练快，精度高，模型小。一、yolov5常规训练二、基于TensorRT量化部署YOLOV5s 4.0模型 # 一、yolov5常规训练 ## 0. 简介参考链接：[ultralytics/yolov5](https://github.com/ultralytics/yolov5) ### **GPU测试结果：** GeForce GTX 1660 SUPER 5941.5MB python test.py Namespace(augment=False, batch_size=4, conf_thres=0.001, data='data/custom.yaml', device='0', exist_ok=False, img_size=416, iou_thres=0.6, name='exp', project='runs/test', save_conf=False, save_hybrid=False, save_json=False, save_txt=False, single_cls=False, task='val', verbose=False, weights='yolov5s.pt') YOLOv5 v1.0-2-g7b5b23a torch 1.7.1 CUDA:0 (GeForce GTX 1660 SUPER, 5941.5MB) Fusing layers... Model Summary: 232 layers, 7262700 parameters, 0 gradients, 16.8 GFLOPS val: Scanning '/home/rencong/Bullet/VOC2007/labels.cache' for images and labels. Class | Images| Targets | P | R |mAP@5|mAP@5:95| :-: | :- | -: |-: | -: | -: | -: | all |3.51e+03| 1.17e+04 |0.664|0.925|0.918| 0.593 | Railway Left |3.51e+03| 482 |0.673|0.91 |0.91 | 0.516 | Railway Straight|3.51e+03|2.07e+03 |0.777|0.962|0.971| 0.784 | Railway Right |3.51e+03| 966 |0.635|0.959|0.955| 0.669 | Pedestrian |3.51e+03|4.74e+03 |0.644|0.871|0.871| 0.545 | Bullet Train |3.51e+03|1.73e+03 |0.772|0.961|0.957| 0.724 | Helmet |3.51e+03|1.31e+03 |0.701|0.917|0.909| 0.563 | Spanner |3.51e+03| 458 |0.446|0.897|0.856| 0.351 | 速度Speed:| 10.9/0.9/11.7 ms | inference/NMS/total | per 416x416 image | at batch-size 4 | Results saved to runs/test/exp ### **CPU测试结果：** Intel(R) Core(TM) i7-1165G7 @ 2.80GHz 2.80 GHz D:\anaconda3\envs\YOLOV5-notes\python.exe E:/MichaelCong/PROGRAMA/YOLOV5-notes/test.py Namespace(augment=False, batch_size=4, conf_thres=0.001, data='data/custom.yaml', device='cpu', exist_ok=False, img_size=416, iou_thres=0.6, name='exp', project='runs/test', save_conf=False, save_hybrid=False, save_json=False, save_txt=False, single_cls=False, task='val', verbose=False, weights='yolov5s.pt') YOLOv5 v1.0-5-gfa4fb25 torch 1.7.1 CPU Fusing layers... Model Summary: 232 layers, 7262700 parameters, 0 gradients val: Scanning 'E:\Bullet\VOCdevkit\VOC2007\labels.cache' for images and labels... 3511 found, 0 missing, 0 empty, 0 corrupted: 100%|██████████| 3511/3511 [00:00val | AP^test | AP₅₀ | Speed_GPU | FPS_GPU || params | FLOPS | |---------- |------ |------ |------ | -------- | ------| ------ |------ | :------: | | [YOLOv5s](https://github.com/ultralytics/yolov5/releases/tag/v3.0) | 37.0 | 37.0 | 56.2 | **2.4ms** | **416** || 7.5M | 13.2B | [YOLOv5m](https://github.com/ultralytics/yolov5/releases/tag/v3.0) | 44.3 | 44.3 | 63.2 | 3.4ms | 294 || 21.8M | 39.4B | [YOLOv5l](https://github.com/ultralytics/yolov5/releases/tag/v3.0) | 47.7 | 47.7 | 66.5 | 4.4ms | 227 || 47.8M | 88.1B | [YOLOv5x](https://github.com/ultralytics/yolov5/releases/tag/v3.0) | **49.2** | **49.2** | **67.7** | 6.9ms | 145 || 89.0M | 166.4B | | | | | | || | | [YOLOv5x](https://github.com/ultralytics/yolov5/releases/tag/v3.0) + TTA|**50.8**| **50.8** | **68.9** | 25.5ms | 39 || 89.0M | 354.3B | | | | | | || | | [YOLOv3-SPP](https://github.com/ultralytics/yolov5/releases/tag/v3.0) | 45.6 | 45.5 | 65.2 | 4.5ms | 222 || 63.0M | 118.0B 上图为基于5000张COCO val2017图像进行推理时，每张图像的平均端到端时间，batch size = 32, GPU：Tesla V100，这个时间包括图像预处理，FP16推理，后处理和NMS（非极大值抑制）。 EfficientDet的数据是从 [google/automl](https://github.com/google/automl) 仓库得到的（batch size = 8）。 ** AP^test denotes COCO [test-dev2017](http://cocodataset.org/#upload) server results, all other AP results in the table denote val2017 accuracy. ** All AP numbers are for single-model single-scale without ensemble or test-time augmentation. **Reproduce** by `python test.py --data coco.yaml --img 640 --conf 0.001` ** Speed_GPU measures end-to-end time per image averaged over 5000 COCO val2017 images using a GCP [n1-standard-16](https://cloud.google.com/compute/docs/machine-types#n1_standard_machine_types) instance with one V100 GPU, and includes image preprocessing, PyTorch FP16 image inference at --batch-size 32 --img-size 640, postprocessing and NMS. Average NMS time included in this chart is 1-2ms/img. **Reproduce** by `python test.py --data coco.yaml --img 640 --conf 0.1` ** All checkpoints are trained to 300 epochs with default settings and hyperparameters (no autoaugmentation). ** Test Time Augmentation ([TTA](https://github.com/ultralytics/yolov5/issues/303)) runs at 3 image sizes. **Reproduce** by `python test.py --data coco.yaml --img 832 --augment` ## 2. 训练环境依赖 yolov5官方说Python版本需要≥3.8，但是我自用3.7也可以，但仍然推荐≥3.8。其他依赖都写在了[requirements.txt](https://github.com/wudashuo/yolov5/blob/master/requirements.txt) 里面。一键安装的话，打开命令行，cd到yolov5的文件夹里，输入： ```bash cd yolov5 pip install -r requirements.txt ``` pip安装慢的，请配置镜像源，下面是清华的镜像源。 ```bash pip install pip -U pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple ``` 想配其他镜像源直接把网址替换即可，以下是国内常用的镜像源： ```yaml 豆瓣 https://pypi.doubanio.com/simple/ 网易 https://mirrors.163.com/pypi/simple/ 阿里云 https://mirrors.aliyun.com/pypi/simple/ 腾讯云 https://mirrors.cloud.tencent.com/pypi/simple 清华大学 https://pypi.tuna.tsinghua.edu.cn/simple/ ``` ## 3. 模型训练 ### 3.1. 快速训练/复现训练首先，下载 [COCO数据集]，下载脚本如下链接所示。 https://github.com/wudashuo/yolov5/blob/master/data/scripts/get_coco.sh 然后，执行下面命令。根据你的显卡情况，使用最大的 `--batch-size` ，(下列命令中的batch size是16G显存的显卡推荐值，根据自己显卡配置调整). ```bash $ python train.py --data coco.yaml --cfg yolov5s.yaml --weights '' --batch-size 64 其中，batch-size大小设置如下： yolov5s.yaml 64 yolov5m.yaml 40 yolov5l.yaml 24 yolov5x.yaml 16 ``` 四个模型yolov5s/m/l/x使用COCO数据集在单个V100显卡上的训练时间为2/4/6/8天。 ### 3.2. 自定义数据训练 #### 3.2.1 准备数据标签 yolo代码的数据格式的标签为txt格式的文件，文件名跟对应的图片名一样，除了后缀改为了.txt。 ```bash |类别| x坐标 | y坐标 | w坐标 | h坐标 | | 1 | 0.62002 | 0.778645 | 0.1832386 | 0.418402 | | 1 | 0.61647 | 0.842013 | 0.19318 | 0.31597 | | 2 | 0.6534 | 0.601 | 0.0767045 | 0.0711805| ``` 具体格式如下： - 每个目标一行，整个图片没有目标的话可不需要txt文件； - 每行的格式为`class_num x_center y_center width height` - 其中`class_num`取值为`0`至`total_class - 1`，框的四个值`x_center` `y_center` `width` `height`是相对于图片分辨率大小正则化的`0-1`之间的数，左上角为`(0,0)`，右下角为`(1,1)` ##### 3.2.2 数据标签规范不同于DarkNet版yolo，图片和标签要分开存放。yolov5的代码会根据图片找标签，具体形式的把图片路径`/images/*.jpg`替换为`/labels/*.txt`，所以要新建两个文件夹，一个名为`images`存放图片，一个名为`labels`存放标签txt文件，如分训练集、验证集和测试集的话，还要再新建各自的文件夹。 ### 3.3 准备yaml文件自定义训练需要修改两个.yaml文件，一个是模型文件，一个是数据文件。 - 模型文件:可以根据你选择训练的模型，直接修改`./models`里的`yolov5s.yaml` / `yolov5m.yaml` / `yolov5l.yaml` / `yolov5x.yaml`文件，只需要将`nc: 80`中的80修改为你数据集的类别数。其他为模型结构不需要改。 - 数据文件:根据`./data`文件夹里的coco数据文件，制作自己的数据文件，在数据文件中定义训练集、验证集、测试集路径；定义总类别数；定义类别名称 ```yaml # train and val data as 1) directory: path/images/, 2) file: path/images.txt, or 3) list: [path1/images/, path2/images/] train: ../coco128/images/train2017/ val: ../coco128/images/val2017/ test:../coco128/images/test2017/ # number of classes nc: 80 # class names names: ['person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', 'train', 'truck', 'boat', 'traffic light', 'fire hydrant', 'stop sign', 'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow', 'elephant', 'bear', 'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball', 'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard', 'tennis racket', 'bottle', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch', 'potted plant', 'bed', 'dining table', 'toilet', 'tv', 'laptop', 'mouse', 'remote', 'keyboard', 'cell phone', 'microwave', 'oven', 'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors', 'teddy bear', 'hair drier', 'toothbrush'] ``` ### 3.4 进行训练训练直接运行`train.py`即可，后面根据需要加上指令参数，`--weights`指定权重，`--cfg`指定模型文件，`--data`指定数据文件，`--batch-size`指定batch大小，`--epochs`指定epoch，`--device`指定设备。一个简单的训练语句： ```bash # 使用yolov5s模型训练coco128数据集5个epochs，batch size设为16 $ python train.py --batch 16 --epochs 5 --data ./data/coco128.yaml --cfg ./models/yolov5s.yaml --weights '' ``` #### 3.4.1 训练指令说明有参： - `--weights` (**☆**)指定权重，如果不加此参数会默认使用COCO预训的`yolov5s.pt`，`--weights ''`则会随机初始化权重 - `--cfg` (**☆**)指定模型文件 - `--data` (**☆**)指定数据文件 - `--hyp`指定超参数文件 - `--epochs` (**☆**)指定epoch数，默认300 - `--batch-size` (**☆**)指定batch大小，默认`16`，官方推荐越大越好，用你GPU能承受最大的`batch size`，可简写为`--batch` - `--img-size` 指定训练图片大小，默认`640`，可简写为`--img` - `--name` 指定结果文件名，默认`result.txt` - `--device` (**☆**)指定训练设备，如`--device 0,1,2,3` - `--local_rank` 分布式训练参数，不要自己修改！ - `--logdir` 指定训练过程存储路径，默认`./runs` - `--workers` 指定dataloader的workers数量，默认`8` 无参（激活生效）： - `--rect`矩形训练 - `--resume` 继续训练，默认从最后一次训练继续 - `--nosave` 训练中途不存储模型，只存最后一个checkpoint - `--notest` 训练中途不在验证集上测试，训练完毕再测试 - `--noautoanchor` 关闭自动锚点检测 - `--evolve`超参数演变 - `--bucket`使用gsutil bucket - `--cache-images` 使用缓存图片训练，速度更快 - `--image-weights` 训练中对图片加权重 - `--multi-scale` 训练图片大小+/-50%变换 - `--single-cls` 单类训练 - `--adam` 使用torch.optim.Adam()优化器 - `--sync-bn` 使用SyncBatchNorm，只在分布式训练可用 ## 4. 模型检测推理支持多种模式，图片、视频、文件夹、rtsp视频流和流媒体都支持。 ### 4.1. 快速检测命令直接执行`detect.py`，指定一下要推理的目录即可，如果没有指定权重，会自动下载默认COCO预训练权重模型。手动下载：[Google Drive](https://drive.google.com/open?id=1Drs_Aiu7xx6S-ix95f9kNsA6ueKRpN2J)、[国内网盘待上传](https://pan.baidu.com/s/1Fo_5jqQfxVFBM2RUwVv4Xg) (提取码：cong ) 推理结果默认会保存到 `./inference/output`中。注意：每次推理会清空output文件夹，注意留存推理结果。 ```bash # 快速推理，--source 指定检测源，以下任意一种类型都支持： $ python detect.py --source 0 # 本机默认摄像头 data/file.jpg # 图片 data/file.mp4 # 视频 path/ # 文件夹下所有媒体 path/*.jpg # 文件夹下某类型媒体 rtsp://170.93.143.139/rtplive/470011e600ef003a004ee33696235daa # rtsp视频流 http://112.50.243.8/PLTV/88888888/224/3221225900/1.m3u8 # http视频流 ``` #### 4.2. 自定义检测使用权重`./weights/yolov5s.pt`去推理`./inference/images`文件夹下的所有媒体，并且推理置信度设为0.5: ```bash $ python detect.py --source ./inference/images/ --weights ./weights/yolov5s.pt --conf 0.5 ``` #### 4.2.1. 检测指令说明自己根据需要加各种指令。 **有参**： - `--source` (**必须**)指定检测来源 - `--weights` 指定权重，不指定的话会使用yolov5sCOCO预训练权重 - `--save-dir` 指定输出文件夹，默认./inference/output - `--img-size` 指定推理图片分辨率，默认640，也可使用`--img` - `--conf-thres` 指定置信度阈值，默认0.4，也可使用`--conf` - `--iou-thres` 指定NMS(非极大值抑制)的IOU阈值，默认0.5 - `--device` 指定设备，如`--device 0` `--device 0,1,2,3` `--device cpu` - `--classes` 只检测特定的类，如`--classes 0 2 4 6 8` **无参**： - `--view-img` 图片形式显示结果 - `--save-txt` 输出标签结果(yolo格式)为txt - `--save-conf` 在输出标签结果txt中同样写入每个目标的置信度 - `--agnostic-nms` 使用agnostic NMS - `--augment` 增强识别，[详情](https://github.com/ultralytics/yolov5/issues/303) - `--update` 更新所有模型 ## 5. 模型测试 ### 5.1. 测试命令首先明确，推理是直接检测图片，而测试是需要图片有相应的真实标签的，相当于检测图片后再把推理标签和真实标签做mAP计算。使用`./weights/yolov5x.pt`权重检测`./data/coco.yaml`里定义的测试集，测试集图片分辨率resize成672。 ```bash python test.py --weights ./weights/yolov5x.pt --data ./data/coco.yaml --img 672 ``` #### 5.2. 各指令说明 **有参**： - `--weights` 测试所用权重，默认yolov5sCOCO预训练权重模型 - `--data` 测试所用的.yaml文件，默认使用`./data/coco128.yaml` - `--batch-size` 测试用的batch大小，默认32，这个大小对结果无影响 - `--img-size` 测试集分辨率大小，默认640，测试建议使用更高分辨率 - `--conf-thres`目标置信度阈值，默认0.001 - `--iou-thres`NMS的IOU阈值，默认0.65 - `--task` 指定任务模式，train, val, 或者test,测试的话用`--task test` - `--device` 指定设备，如`--device 0` `--device 0,1,2,3` `--device cpu` **无参**： - `--save-json`保存结果为json - `--single-cls` 视为只有一类 - `--augment` 增强识别 ~~- `--merge` 使用Merge NMS~~ - `--verbose` 输出各个类别的mAP - `--save-txt` 输出标签结果(yolo格式)为txt ## 6. 例子教程 * [训练用户数据集](https://github.com/ultralytics/yolov5/wiki/Train-Custom-Data) * [多GPU训练](https://github.com/ultralytics/yolov5/issues/475) * [PyTorch Hub](https://github.com/ultralytics/yolov5/issues/36) * [ONNX和TorchScript导出](https://github.com/ultralytics/yolov5/issues/251) * [Test-Time Augmentation (测试增强)](https://github.com/ultralytics/yolov5/issues/303) * [Model Ensembling](https://github.com/ultralytics/yolov5/issues/318) * [Model Pruning/Sparsity](https://github.com/ultralytics/yolov5/issues/304) * [Hyperparameter Evolution](https://github.com/ultralytics/yolov5/issues/607) * [TensorRT Deployment](https://github.com/wang-xinyu/tensorrtx) # 二、基于TensorRT量化部署YOLOV5s 4.0模型为将训练完成的模型部署于TX2嵌入式平台，下面为大家介绍了一个TensorRT int8 量化部署 yolov5模型的教程。如何搭建tensorrt环境，对pytorch模型做onnx格式转换，onnx模型做tensorrt int8量化，及对量化后的模型做推理。参考连接： 1、https://mp.weixin.qq.com/s/debK5wVdNQTUAAse7W95JA 2、https://github.com/Wulingtian/yolov5_tensorrt_int8_tools 3、https://github.com/Wulingtian/yolov5_tensorrt_int8 ## 1、 yolov5导出onnx pip install onnx pip install onnx-simplifier cd models gedit common.py 把BottleneckCSP类下的激活函数替换为relu，tensorrt对leakyRelu int8量化不稳定（这是一个深坑，大家记得避开）即修改为self.act = nn.ReLU(inplace=True) 训练得到模型后，得到训练完的权重。 cd yolov5 python models/export.py --weights 训练得到的模型权重路径 --img-size 训练图片输入尺寸 python3 -m onnxsim onnx模型名称 yolov5s-simple.onnx 得到最终简化后的onnx模型 ## 2、ONNX模型转换为 int8 TensorRT引擎 // git clone https://github.com/Wulingtian/yolov5_tensorrt_int8_tools.git cd yolov5_tensorrt_int8_tools gedit convert_trt_quant.py 修改如下参数： BATCH_SIZE 模型量化一次输入多少张图片 BATCH 模型量化次数 height width 输入图片宽和高 CALIB_IMG_DIR 训练图片路径，用于量化 onnx_model_path onnx模型路径 python convert_trt_quant.py 量化后的模型存到models_save目录下 ## 3、 TensorRT模型推理 cd yolov5_tensorrt_int8 gedit CMakeLists.txt 修改USER_DIR参数为自己的用户根目录 gedit yolov5s_infer.cc 修改如下参数： output_name1 output_name2 output_name3 (yolov5模型有3个输出) 我们可以通过netron查看模型输出名 pip install netron //安装netron gedit netron_yolov5s.py 把如下内容粘贴 ``` import netron netron.start('此处填充简化后的onnx模型路径', port=3344) ``` python netron_yolov5s.py 即可查看模型输出名 trt_model_path 量化的的tensorrt推理引擎（models_save目录下trt后缀的文件） test_img 测试图片路径 INPUT_W INPUT_H 输入图片宽高 NUM_CLASS 训练的模型有多少类 NMS_THRESH nms阈值 CONF_THRESH 置信度参数配置完毕，开始编译运行 mkdir build cd build cmake .. make ./YoloV5sEngine 输出平均推理时间，以及保存预测图片到当前目录下，至此，部署完成！ ## 3、TensorRT int8 量化核心代码一览 ``` //量化预处理与训练保持一致，数据对齐 def preprocess_v1(image_raw): h, w, c = image_raw.shape image = cv2.cvtColor(image_raw, cv2.COLOR_BGR2RGB) # Calculate widht and height and paddings r_w = width / w r_h = height / h if r_h > r_w: tw = width th = int(r_w * h) tx1 = tx2 = 0 ty1 = int((height - th) / 2) ty2 = height - th - ty1 else: tw = int(r_h * w) th = height tx1 = int((width - tw) / 2) tx2 = width - tw - tx1 ty1 = ty2 = 0 # Resize the image with long side while maintaining ratio image = cv2.resize(image, (tw, th)) # Pad the short side with (128,128,128) image = cv2.copyMakeBorder( image, ty1, ty2, tx1, tx2, cv2.BORDER_CONSTANT, (128, 128, 128) ) image = image.astype(np.float32) # Normalize to [0,1] image /= 255.0 # HWC to CHW format: image = np.transpose(image, [2, 0, 1]) # CHW to NCHW format #image = np.expand_dims(image, axis=0) # Convert the image to row-major order, also known as "C order": #image = np.ascontiguousarray(image) return image //构建IInt8EntropyCalibrator量化器 class Calibrator(trt.IInt8EntropyCalibrator): def __init__(self, stream, cache_file=""): trt.IInt8EntropyCalibrator.__init__(self) self.stream = stream self.d_input = cuda.mem_alloc(self.stream.calibration_data.nbytes) self.cache_file = cache_file stream.reset() def get_batch_size(self): return self.stream.batch_size def get_batch(self, names): batch = self.stream.next_batch() if not batch.size: return None cuda.memcpy_htod(self.d_input, batch) return [int(self.d_input)] def read_calibration_cache(self): # If there is a cache, use it instead of calibrating again. Otherwise, implicitly return None. if os.path.exists(self.cache_file): with open(self.cache_file, "rb") as f: logger.info("Using calibration cache to save time: {:}".format(self.cache_file)) return f.read() def write_calibration_cache(self, cache): with open(self.cache_file, "wb") as f: logger.info("Caching calibration data for future use: {:}".format(self.cache_file)) f.write(cache) //加载onnx模型，构建tensorrt engine def get_engine(max_batch_size=1, onnx_file_path="", engine_file_path="",\ fp16_mode=False, int8_mode=False, calibration_stream=None, calibration_table_path="", save_engine=False): """Attempts to load a serialized engine if available, otherwise builds a new TensorRT engine and saves it.""" def build_engine(max_batch_size, save_engine): """Takes an ONNX file and creates a TensorRT engine to run inference with""" with trt.Builder(TRT_LOGGER) as builder, \ builder.create_network(1) as network,\ trt.OnnxParser(network, TRT_LOGGER) as parser: # parse onnx model file if not os.path.exists(onnx_file_path): quit('ONNX file {} not found'.format(onnx_file_path)) print('Loading ONNX file from path {}...'.format(onnx_file_path)) with open(onnx_file_path, 'rb') as model: print('Beginning ONNX file parsing') parser.parse(model.read()) assert network.num_layers > 0, 'Failed to parse ONNX model. \ Please check if the ONNX model is compatible ' print('Completed parsing of ONNX file') print('Building an engine from file {}; this may take a while...'.format(onnx_file_path)) # build trt engine builder.max_batch_size = max_batch_size builder.max_workspace_size = 1 << 30 # 1GB builder.fp16_mode = fp16_mode if int8_mode: builder.int8_mode = int8_mode assert calibration_stream, 'Error: a calibration_stream should be provided for int8 mode' builder.int8_calibrator = Calibrator(calibration_stream, calibration_table_path) print('Int8 mode enabled') engine = builder.build_cuda_engine(network) if engine is None: print('Failed to create the engine') return None print("Completed creating the engine") if save_engine: with open(engine_file_path, "wb") as f: f.write(engine.serialize()) return engine if os.path.exists(engine_file_path): # If a serialized engine exists, load it instead of building a new one. print("Reading engine from file {}".format(engine_file_path)) with open(engine_file_path, "rb") as f, trt.Runtime(TRT_LOGGER) as runtime: return runtime.deserialize_cuda_engine(f.read()) else: return build_engine(max_batch_size, save_engine) ``` # 4、TensorRT inference 核心代码一览 ``` //数据预处理和量化预处理保持一致，故不做展示 //对模型的三个输出进行解析，生成返回模型预测的bboxes信息 void postProcessParall(const int height, const int width, int scale_idx, float postThres, tensor_t * origin_output, vector Strides, vector Anchors, vector *bboxes) { Bbox bbox; float cx, cy, w_b, h_b, score; int cid; const float *ptr = (float *)origin_output->pValue; for(unsigned long a=0; a<3; ++a){ for(unsigned long h=0; h=postThres){ cx = (sigmoid(ptr[0]) * 2.f - 0.5f + static_cast(w)) * static_cast(Strides[scale_idx]); cy = (sigmoid(ptr[1]) * 2.f - 0.5f + static_cast(h)) * static_cast(Strides[scale_idx]); w_b = powf(sigmoid(ptr[2]) * 2.f, 2) * Anchors[scale_idx * 3 + a].width; h_b = powf(sigmoid(ptr[3]) * 2.f, 2) * Anchors[scale_idx * 3 + a].height; bbox.xmin = clip(cx - w_b / 2, 0.F, static_cast(INPUT_W - 1)); bbox.ymin = clip(cy - h_b / 2, 0.f, static_cast(INPUT_H - 1)); bbox.xmax = clip(cx + w_b / 2, 0.f, static_cast(INPUT_W - 1)); bbox.ymax = clip(cy + h_b / 2, 0.f, static_cast(INPUT_H - 1)); bbox.score = score; bbox.cid = cid; //std::cout<< "bbox.cid : " << bbox.cid << std::endl; bboxes->push_back(bbox); } ptr += 5 + NUM_CLASS; } } } } ``` # **联系方式** 如有代码bug请去在issue下提问，欢迎大家集思广益。个人联系方式： # **LICENSE** 感谢yolov5作者团队提供开源的代码，本项目遵循官方[LICENSE](https://github.com/ultralytics/yolov5/blob/master/LICENSE)