YOLOv4 libtorch推理【附代码】

发布时间：2023-02-26 12:00

目标检测中libtorch推理与图像分类和分割不同，目标检测需要对边界框进行解码。

本项目支持CPU、GPU推理，剪枝后的模型也可用进行推理。

环境配置可见另一篇文章：

使用TorchScript和libtorch进行模型推理[附C++代码]_爱吃肉的鹏的博客-CSDN博客_libtorch模型推理

环境

windows 10

pytorch:1.7.0(低版本应该也可以)

libtorch 1.7 Debug版

cuda 10.2

VS 2017

英伟达 1650 4G

大致推理过程为：

1.model定义：

2.image2tensor

3.获得输出张量output

4.对output中的边界框进行decode

5.用confidence和NMS进行筛选

6.绘制检测框和其他信息

完整代码

大致推理过程为：

1.定义model；

2.对输入图像shape进行调整并转为tensor tensor_image；

3.将图像送入网络，获得输出张量output = model(tensor_image)；

4.获得YOLO的三个head，对output中的边界框进行decode;

5.对4中得到的输出用confidence和NMS进行筛选;

6.对筛选出的输出在原图上绘制检测框和其他信息;

1.model定义：

可以通过定义YOLOV4类，在构造函数中传入model路径进行加载：

代码中的is_cuda_available()是用来判断cuda是否可用

model_path中的权重是将pytorch中的pth转pt文件

YOLOV4::YOLOV4(std::string& model_path)
{
	
	model = torch::jit::load(model_path);
	if (is_cuda_available())
	{
		model.to(torch::kCUDA);
	}
	else
	{
		model.to(torch::kCPU);
	}
}

2.image2tensor

先对图像进行裁剪(变成适合网络大小)，下面代码是对图像进行不失真的reshape，并在reshape后的图像上增加“灰条”为的是让图像变为网络输入大小

cv::Mat letterbox_image(cv::Mat image, float size[])
{
	// 图片真实大小
	float iw = image.cols, ih = image.rows;
	// 网络输入图片的大小
	float w = size[0], h = size[1];
	float scale = std::min(w / iw, h / ih);
	// 调整后的大小
	int nw = int(iw * scale), nh = int(ih * scale);
	// 
	cv::resize(image, image, { nw, nh });
	// 创建图片
	cv::Mat new_image(w, h, CV_8UC3, cv::Scalar(128, 128, 128));
	// 设置画布绘制区域并复制
	cv::Rect roi_rect = cv::Rect((w - nw) / 2, (h - nh) / 2, nw, nh);
	image.copyTo(new_image(roi_rect));
	return new_image;
};

将图像从BGR转为RGB，并且将图像类型转为Tensor，再对维度进行改变，使其变成(batch_size,channels,W,H)

// 调整图片格式
cv::cvtColor(crop_img, crop_img, cv::COLOR_BGR2RGB);
crop_img.convertTo(crop_img, CV_32FC3, 1.f / 255.f);
// 转换为tensor
auto tensor_image = at::from_blob(crop_img.data, { 1, crop_img.rows, crop_img.cols, 3 }).to(torch::kCUDA);
	tensor_image = tensor_image.permute({ 0,3,1,2 }).contiguous();

3.获得输出张量output

model.forward()可用获得output，这里需要主要的是由于yolo的输出是个元组(out1,out2,out3)，所以不能像分类和分割一样，直接用model.forward().toTensor(),而是要用toTuple()，这是libtorch中的一个需要注意的问题！

然后再将三个头部转为toTensor()

此刻三个output的shape为【batch_size,255,feature_map[0],feature_map[1]】

// 输入初始化
std::vector input;
input.emplace_back(tensor_image);

auto outputs = model.forward(input).toTuple();
// 提取三个head
std::vector output(3);
output[0] = outputs->elements()[0].toTensor().to(at::kCPU);
output[1] = outputs->elements()[1].toTensor().to(at::kCPU);
output[2] = outputs->elements()[2].toTensor().to(at::kCPU);

4.对output中的边界框进行decode

这一部分很重要！先附上代码

    std::vector feature_out(3);
	for (size_t i = 0; i < 3; i++)
	{
		if (i == 0) feature_out[0] = yolo_decodes1.decode_box(output[0]);
		if (i == 1) feature_out[1] = yolo_decodes2.decode_box(output[1]);
		if (i == 2) feature_out[2] = yolo_decodes3.decode_box(output[2]);
	}

	// 在第二维度上做拼接， shape： (bs, 3*(13*13+26*26+52*52)， 5+num_classes)
	at::Tensor out = at::cat({ feature_out[0], feature_out[1], feature_out[2] }, 1);

这里的yolo_decodes1..是通过DecodeBox定义的3个对象，里面的构造函数中已经传入了每个特征层的锚框和input_size。

DecodeBox的C++解码类的定义如下，anchors是锚框，3行2列：

class DecodeBox
{
public:
	DecodeBox(float t_anchors[][2], float t_image_size[]);
	at::Tensor decode_box(at::Tensor input);
private:
	/**************************************************************************
	 * 存放每个特征层的先验框
	 **************************************************************************/
	float anchors[3][2];

	float image_size[2];

	int num_anchors = 3;

	int num_classes = 80;

	int bbox_attrs = num_classes + 5;
};

构造函数中，t_anchors[][2]是锚框，分别对应不同尺寸的预测特征图，t_image_size[]，是输入网络的图像大小

t_anchors和t_image_size[]如下：我这里的输入大小是416*416

float all_anchors[3][3][2] = { {{142, 110}, {192, 243}, {459, 401}},
								  {{36,   75}, {76,   55}, {72,  146}},
								  {{12,   16}, {19,   36}, {40,   28}} };
float model_image_size[2] = { 416, 416 };

构造函数的实现如下，主要是实现获取每层锚框参数，以及获得input_shape,并将这些参数赋值给DecodeBox类中私有成员变量anchors和image_size。

DecodeBox::DecodeBox(float t_layer_anchors[][2], float t_model_image_size[])
{
	// 获取当前特征图的anchor参数
	for (size_t i = 0; i < 3; i++)
		for (size_t j = 0; j < 2; j++)
			anchors[i][j] = t_layer_anchors[i][j];

	// 获取模型中图像的尺寸
	for (size_t i = 0; i < 2; i++)
		image_size[i] = t_model_image_size[i];
}

Decode中定义decode_box函数，输入类型为张量(即将model(image)的输出作为输入)。大致过程为：

1.获取input的参数，input.shape = 【batch_size,255,feature_map[0],feature_map[1]】

2.计算缩放比例(步长)

3.计算每个特征层缩放后对应的尺寸

4.将[batch_size,255,feature_map[0],feature_map[1]] 转[batch_size,3,feature_map[0],feature_map[1],5+num_classes]

5.prediction的前4个参数对应box参数，需要进行调整，第5个参数为置信度，剩下的对应分类概率

在libtorch中张量类似prediction[5:]切片用

prediction.index({ "...",torch::indexing::Slice{5, torch::indexing::None} })

6.对特征层划分网格，生成先验框，将tensor转数组，shape bs,3,h,w

7.对先验框进行调整后，得到最终的output (batch_size, (x, y, w, h, conf, pred_cls))=【bs,6】

at::Tensor DecodeBox::decode_box(at::Tensor input)
{
	// 获取尺寸等参数
	int batch_size = input.size(0);
	int input_height = input.size(2);
	int input_width = input.size(3);
	// 步长
	int stride_w = image_size[0] / input_width;
	int stride_h = image_size[1] / input_height;

	// 此时获得的scaled_anchors大小是相对于每个特征层的
	float scaled_anchors[3][2];
	for (int i = 0; i < 3; i++)
	{
		scaled_anchors[i][0] = anchors[i][0] / stride_w;
		scaled_anchors[i][1] = anchors[i][1] / stride_h;
	}

	// (bs, 3*(5+num_classes), h, w)  -->  (bs, 3, h, w, (5+num_classes))
	at::Tensor prediction = input.view({ batch_size, num_anchors, bbox_attrs, input_height, input_width }).permute({ 0, 1, 3, 4, 2 }).contiguous();
	// 先验框中心位置的调整参数
	at::Tensor x = at::sigmoid(prediction.index({ "...", 0 }));
	at::Tensor y = at::sigmoid(prediction.index({ "...", 1 }));
	// 先验框宽高参数调整
	at::Tensor w = prediction.index({ "...", 2 });
	at::Tensor h = prediction.index({ "...", 3 });
	// 置信度获取
	at::Tensor conf = at::sigmoid(prediction.index({ "...", 4 }));
	// 物体类别置信度
	at::Tensor pred_cls = at::sigmoid(prediction.index({ "...",torch::indexing::Slice{5, torch::indexing::None} }));

	// 生成网格  先验框中心，网格左上角 bs, 3, h, w
	at::Tensor grid_x = at::linspace(0, input_width - 1, input_width).repeat({ input_width, 1 }).repeat({ batch_size*num_anchors, 1, 1 }).view({ x.sizes() }).toType(torch::kFloat);
	at::Tensor grid_y = at::linspace(0, input_height - 1, input_height).repeat({ input_height, 1 }).t().repeat({ batch_size*num_anchors, 1, 1 }).view({ y.sizes() }).toType(torch::kFloat);

	// 按照网格格式生成先验框的宽高  数组转换为tensor   最终shape  bs, 3, h, w
	at::Tensor anchor_w = at::from_blob(scaled_anchors, { 3, 2 }, at::kFloat).index_select(1, at::tensor(0).toType(at::kLong))\
		.repeat({ batch_size, input_height*input_width }).view(w.sizes());
	at::Tensor anchor_h = at::from_blob(scaled_anchors, { 3, 2 }, at::kFloat).index_select(1, at::tensor(1).toType(at::kLong))\
		.repeat({ batch_size, input_height*input_width }).view(h.sizes());
	/*
	利用预测结果对先验框进行调整
    首先调整先验框的中心，从先验框中心向右下角偏移再调整先验框的宽高。
	*/
	at::Tensor pred_boxes = at::zeros({ prediction.index({"...", torch::indexing::Slice({torch::indexing::None, 4})}).sizes() }).toType(at::kFloat);
	// 填充调整到特征图上的尺寸值
	pred_boxes.index_put_({ "...", 0 }, (x.data() + grid_x));
	pred_boxes.index_put_({ "...", 1 }, (y.data() + grid_y));
	pred_boxes.index_put_({ "...", 2 }, (at::exp(w.data()) * anchor_w));
	pred_boxes.index_put_({ "...", 3 }, (at::exp(h.data()) * anchor_h));

	// 生成转换tensor  (batch_size, 6) -->  (batch_size, (x, y, w, h, conf, pred_cls))
	at::Tensor _scale = at::tensor({ stride_w, stride_h, stride_w, stride_h }).toType(at::kFloat);
	//输出结果重新拼接
	at::Tensor output = at::cat({ pred_boxes.view({batch_size, -1, 4}) * _scale, \
								conf.view({batch_size, -1, 1}), \
								pred_cls.view({batch_size, -1, num_classes}) }, - 1);

	return output.data();
};

5.用confidence和NMS进行筛选

先要计算出输出的框坐标，为后面计算NMS iou做准备。在之前需要通过阈值对置信度进行筛选。代码中的nms用的官方提供的。

std::vector yolo_nms(at::Tensor prediction, int num_classes, float conf_thres, float nms_thres)
{
	//(bs, 3*(13*13+26*26+52*52)， 5+num_classes)
	at::Tensor box_corner = at::zeros(prediction.sizes());
	// 求左上角和右下角
	box_corner.index_put_({ "...", 0 }, prediction.index({ "...", 0 }) - prediction.index({ "...", 2 }) / 2);
	box_corner.index_put_({ "...", 1 }, prediction.index({ "...", 1 }) - prediction.index({ "...", 3 }) / 2);
	box_corner.index_put_({ "...", 2 }, prediction.index({ "...", 0 }) + prediction.index({ "...", 2 }) / 2);
	box_corner.index_put_({ "...", 3 }, prediction.index({ "...", 1 }) + prediction.index({ "...", 3 }) / 2);
	// 赋值 x1 y1 x2 y2
	prediction.index_put_({ "...", torch::indexing::Slice(torch::indexing::None,4) }, box_corner.index({ "...", torch::indexing::Slice(torch::indexing::None,4) }));
	
	std::vector nms_output;
	at::Tensor output = prediction[0];
	std::tuple temp = at::max(output.index({ "...", torch::indexing::Slice(5, 5 + num_classes) }), 1, true);
	at::Tensor class_conf = std::get<0>(temp);
	at::Tensor class_pred = std::get<1>(temp);

	// 利用置信度筛选
	at::Tensor conf_mask = (output.index({ "...", 4 }) * class_conf.index({ "...", 0 }) >= conf_thres).squeeze();

	// 留下有目标的部分
	output = output.index({ conf_mask });

	// 没有目标，直接返回空结果
	if (output.size(0) == 0)
	{
		return nms_output;
	}
	class_conf = class_conf.index({ conf_mask });
	class_pred = class_pred.index({ conf_mask });
	// 获得的内容为(x1, y1, x2, y2, obj_conf, class_conf, class_pred)
	at::Tensor detections = at::cat({ output.index({"...", torch::indexing::Slice(torch::indexing::None, 5)}), class_conf.toType(at::kFloat), class_pred.toType(at::kFloat) }, -1);
	std::tuple unique_labels_tuple = at::unique_consecutive(detections.index({ "...", -1 }));
	at::Tensor unique_labels = std::get<0>(unique_labels_tuple);
	// 遍历所有的种类
	for (int i = 0; i < unique_labels.size(0); i++)
	{
		// 获取某个类初步筛选后的预测结果
		at::Tensor detections_class = detections.index({ detections.index({"...", -1}) == unique_labels[i] });
		at::Tensor keep = nms_cpu(detections_class.index({ "...", torch::indexing::Slice(torch::indexing::None,4) }), detections_class.index({ "...", 4 })*detections_class.index({ "...", 5 }), nms_thres);
		at::Tensor max_detection = detections_class.index({ keep });
		if (i == 0)
		{
			nms_output.push_back(max_detection);
		}
		else
		{
			nms_output[0] = at::cat({ nms_output[0], max_detection });
		}

	}
	return nms_output;

6.绘制检测框和其他信息

void YOLOV4::Show_Detection_Restults(cv::Mat image,std::vector < std::vector>boxes, std::vectorclass_names,std::string mode)
{
	
	for (size_t i = 0; i < boxes.size(); i++)
	{
		// 打印种类及位置信息
		std::cout << i + 1 << "、" << class_names[int(boxes[i][5])] << ": (xmin:" \
			<< boxes[i][1] << ", ymin:" << boxes[i][0] << ", xmax:" << boxes[i][3] << ", ymax:" << boxes[i][2] << ") --" \
			<< "confidence: " << boxes[i][4] << std::endl;
		// 计算位置
		cv::Rect rect(int(boxes[i][1]), int(boxes[i][0]), int(boxes[i][3] - boxes[i][1]), int(boxes[i][2] - boxes[i][0]));
		cv::rectangle(image, rect, cv::Scalar(0, 0, 255), 1, cv::LINE_8, 0);
		// 获取文本框的大小
		cv::Size text_size = cv::getTextSize(class_names[int(boxes[i][5])], fontFace, fontScale, thickness, &baseline);
		// 绘制的起点
		cv::Point origin;
		origin.x = int(boxes[i][1]);
		origin.y = int(boxes[i][0]) + text_size.height;
		// cv::putText(InputOutputArray img, const String &text, Point org, int fontFace, double fontScale, Scalar color)
		cv::putText(image, class_names[int(boxes[i][5])], origin, fontFace, fontScale, cv::Scalar(0, 0, 255), thickness);

		// 置信度显示
		std::string text = std::to_string(boxes[i][4]);
		text = text.substr(0, 5);
		cv::Size text_size2 = cv::getTextSize(text, fontFace, fontScale, thickness, &baseline);
		origin.x = origin.x + text_size.width + 3;
		origin.y = int(boxes[i][0]) + text_size2.height;
		cv::putText(image, text, origin, fontFace, fontScale, cv::Scalar(0, 0, 255), thickness);
	}
	// 如果没检测到任何目标
	if (boxes.size() == 0)
	{
		const std::string text = "NO object";
		float fontScale = 2.0;
		// 获取文本框的大小
		cv::Size text_size = cv::getTextSize(text, fontFace, fontScale, thickness, &baseline);
		std::cout << "no target detected!" << std::endl;
		cv::Point origin; // 绘制的起点
		origin.x = 0;
		origin.y = 0 + text_size.height;
		// cv::putText(InputOutputArray img, const String &text, Point org, int fontFace, double fontScale, Scalar color)
		cv::putText(image, text, origin, fontFace, fontScale, cv::Scalar(255, 0, 0), thickness);
	}

};

完整代码

github:

YINYIPENG-EN/YOLOv4_libtorch · GitHubSupport CPU and GPU inference, and the pruned model can also be used for inference. - YINYIPENG-EN/YOLOv4_libtorchhttps://github.com/YINYIPENG-EN/YOLOv4_libtorch.git

pth权重转为pt，将权重放在与main.cpp同一路径下(放在别的地方也可用，但需要将路径填写正确，建议绝对路径且不要有中文)，同时填写类的txt文件路径，比如我这里用的coco_classes.txt。mode是预测模式，如果是image，表示预测图像，并在image_path填写路径，如果是video，为视频预测，填写image_path视频路径。

std::string model_path = "./yolov4.pt";
   std::string image_path = "street.jpg";
   std::string classes_path = "./coco_classes.txt";
   std::string mode = "image";

如果是采用的自己的数据集，utils.h中修改NUM_CLASSES，conf_thres和nms_thres也可用进行修改以及INPUT_SHAPE。

不过视频推理的时候，发现效果并不是多好，应该还需要硬件加速处理，libtorch也没有感觉倒明显的优势，后面会再研究研究trt

YOLOv4 libtorch推理【附代码】

大致推理过程为：

1.model定义：

2.image2tensor

3.获得输出张量output

4.对output中的边界框进行decode

5.用confidence和NMS进行筛选

6.绘制检测框和其他信息

完整代码

相关推荐