发布时间:2024-05-03 13:01
自从新冠疫情爆发以来,口罩成为了人们生活中必需品,也成为绝对热门的话题存在。疫情的反复出现以及变异,让人们戴口罩成为了日常。尤其是最近以来,全国各地的疫情又有变严重的趋势,德尔塔病毒来势汹汹,拉姆达病毒又开始肆虐,所以,戴口罩不仅是为了自身安全,也是对他人的负责。在疫情之下,出门不戴口罩不仅对自己不负责,对他人而言也是一种潜在的威胁。
进行人工检测费时费力,利用试用计算机自动检测人脸是否佩戴口罩成为一种趋势。进行口罩识别的模型有很多,SDD,mobilenet, yolo系列等,本文根据网上的开源代码和数据集对YOLO4模型进行复现,查看YOLO4模型对人脸口罩识别的效果。YOLO4论文链接:YOLO4. 同时为了减少模型参数,提升模型运行速度,考虑使用基于mobilenet的YOLO4轻量为了来进行口罩识别。
YOLO4本质上就是筛选了一些从YOLO3发布至今的被用在各式各样检测器上,能够提高检测精度的tricks,并以YOLO3为基础进行改进的目标检测模型。所以YOLO4也有tricks万花筒之称,通过这些tricks,YOLO在保证速度的同时,大幅提高模型的检测精度。
先来看一下YOLO4在COCO数据集上的效果,YOLO4虽然不及EfficientDet精度高,但是YOLO4在保证精度的同时大幅度提升速度。
YOLO4的网络结构主要由三部分构成:
分别是:
1、主干特征提取网络Backbone——CSPdarknet53
2、加强特征提取网络——SPP(Spatial Pyramid Pooling)和PANet(Path Aggregation Network)
3、预测网络YoloHead,利用获得到的特征进行预测.
YOLO4的网络结构:
其中,第一部分为主干特征提取网络,它的功能是进行初步的特征提取,利用主干特征提取网络,我们可以获得三个初步的有效特征层,即为Resblock_body(52, 52, 256)x8, Resblock_body(26, 26, 512)x8, Resblock_body(13, 13, 1024)x4三个特征层,作为SPP和PANet的输入。
第二部分加强特征提取网络的功能是进行加强的特征提取,利用加强特征提取网络,我们可以对三个初步的有效特征层进行特征融合,提取出更好的特征,获得三个更有效的有效特征层。
第三部分功能是利用加强特征提取网络获得的有效特征来后的预测结果。
相比于YOLO3使用的DarkNet53,YOLO4使用CSPDarkNet53, 使得的模型具有更强的特征提取能力。其代码具体为:
def darknet_body(x):
\'\'\'Darknent body having 52 Convolution2D layers\'\'\'
x = DarknetConv2D_BN_Mish(32, (3,3))(x)
x = resblock_body(x, 64, 1, False)
x = resblock_body(x, 128, 2)
x = resblock_body(x, 256, 8)
x = resblock_body(x, 512, 8)
x = resblock_body(x, 1024, 4)
return x
# resblock_body结构
def resblock_body(x, num_filters, num_blocks, all_narrow=True):
\'\'\'A series of resblocks starting with a downsampling Convolution2D\'\'\'
# Darknet uses left and top padding instead of \'same\' mode
preconv1 = ZeroPadding2D(((1,0),(1,0)))(x)
preconv1 = DarknetConv2D_BN_Mish(num_filters, (3,3), strides=(2,2))(preconv1)
shortconv = DarknetConv2D_BN_Mish(num_filters//2 if all_narrow else num_filters, (1,1))(preconv1)
mainconv = DarknetConv2D_BN_Mish(num_filters//2 if all_narrow else num_filters, (1,1))(preconv1)
for i in range(num_blocks):
y = compose(
DarknetConv2D_BN_Mish(num_filters//2, (1,1)),
DarknetConv2D_BN_Mish(num_filters//2 if all_narrow else num_filters, (3,3)))(mainconv)
mainconv = Add()([mainconv,y])
postconv = DarknetConv2D_BN_Mish(num_filters//2 if all_narrow else num_filters, (1,1))(mainconv)
route = Concatenate()([postconv, shortconv])
return DarknetConv2D_BN_Mish(num_filters, (1,1))(route)
与YOLO3相比,CSPDarkNet53的resblock_body残差单元块做出了相应改进。
相较于YOLO3的FPN(MaskRCNN中使用), YOLO4使用了SPP+PAN。这一部分主要融合不同特征图的特征信息。
YOLO4中,可以利用主干特征提取网络获得的三个有效特征(Resblock_body(52, 52, 256)x8, Resblock_body(26, 26, 512)x8, Resblock_body(13, 13, 1024)x4)来进行加强特征金字塔的构建。进一步,可以考虑使用mobilenet获得与上面三个通道相同的特征层来代替,来实现mobilenet对,CSPDarkNet53的部分替换。另外还可以使用mobilnet中的深度可分离卷积来替代YOLO4中所使用的的卷积,这样可以减少模型训练的参数。
mobilenetV1的网络结构–mobilenetV1结构.:
def _depthwise_conv_block(inputs, pointwise_conv_filters, alpha,
depth_multiplier=1, strides=(1, 1), block_id=1):
pointwise_conv_filters = int(pointwise_conv_filters * alpha)
# 深度可分离卷积
x = DepthwiseConv2D((3, 3),
padding=\'same\',
depth_multiplier=depth_multiplier,
strides=strides,
use_bias=False,
name=\'conv_dw_%d\' % block_id)(inputs)
x = BatchNormalization(name=\'conv_dw_%d_bn\' % block_id)(x)
x = Activation(relu6, name=\'conv_dw_%d_relu\' % block_id)(x)
# 1x1卷积
x = Conv2D(pointwise_conv_filters, (1, 1),
padding=\'same\',
use_bias=False,
strides=(1, 1),
name=\'conv_pw_%d\' % block_id)(x)
x = BatchNormalization(name=\'conv_pw_%d_bn\' % block_id)(x)
return Activation(relu6, name=\'conv_pw_%d_relu\' % block_id)(x)
def _conv_block(inputs, filters, alpha, kernel=(3, 3), strides=(1, 1)):
filters = int(filters * alpha)
x = Conv2D(filters, kernel,
padding=\'same\',
use_bias=False,
strides=strides,
name=\'conv1\')(inputs)
x = BatchNormalization(name=\'conv1_bn\')(x)
return Activation(relu6, name=\'conv1_relu\')(x)
def relu6(x):
return K.relu(x, max_value=6)
def MobileNetV1(inputs,alpha=1,depth_multiplier=1):
if alpha not in [0.25, 0.5, 0.75, 1.0]:
raise ValueError(\'Unsupported alpha - `{}` in MobilenetV1, Use 0.25, 0.5, 0.75, 1.0\'.format(alpha))
# 416,416,3 -> 208,208,32
x = _conv_block(inputs, 32, alpha, strides=(2, 2))
# 208,208,32 -> 208,208,64
x = _depthwise_conv_block(x, 64, alpha, depth_multiplier, block_id=1)
# 208,208,64 -> 104,104,128
x = _depthwise_conv_block(x, 128, alpha, depth_multiplier,
strides=(2, 2), block_id=2)
x = _depthwise_conv_block(x, 128, alpha, depth_multiplier, block_id=3)
# 104,104.128 -> 64,64,256
x = _depthwise_conv_block(x, 256, alpha, depth_multiplier,
strides=(2, 2), block_id=4)
x = _depthwise_conv_block(x, 256, alpha, depth_multiplier, block_id=5)
feat1 = x
# 64,64,256 -> 32,32,512
x = _depthwise_conv_block(x, 512, alpha, depth_multiplier,
strides=(2, 2), block_id=6)
x = _depthwise_conv_block(x, 512, alpha, depth_multiplier, block_id=7)
x = _depthwise_conv_block(x, 512, alpha, depth_multiplier, block_id=8)
x = _depthwise_conv_block(x, 512, alpha, depth_multiplier, block_id=9)
x = _depthwise_conv_block(x, 512, alpha, depth_multiplier, block_id=10)
x = _depthwise_conv_block(x, 512, alpha, depth_multiplier, block_id=11)
feat2 = x
# 32,32,512 -> 16,16,1024
x = _depthwise_conv_block(x, 1024, alpha, depth_multiplier,
strides=(2, 2), block_id=12)
x = _depthwise_conv_block(x, 1024, alpha, depth_multiplier, block_id=13)
feat3 = x
return feat1,feat2,feat3
进一步得到YOLO4的大致网络结构:
def relu6(x):
return K.relu(x, max_value=6)
@wraps(Conv2D)
def DarknetConv2D(*args, **kwargs):
darknet_conv_kwargs = {}
darknet_conv_kwargs[\'padding\'] = \'valid\' if kwargs.get(\'strides\')==(2,2) else \'same\'
darknet_conv_kwargs.update(kwargs)
return Conv2D(*args, **darknet_conv_kwargs)
def DarknetConv2D_BN_Leaky(*args, **kwargs):
no_bias_kwargs = {\'use_bias\': False}
no_bias_kwargs.update(kwargs)
return compose(
DarknetConv2D(*args, **no_bias_kwargs),
BatchNormalization(),
Activation(relu6))
def _depthwise_conv_block(inputs, pointwise_conv_filters, alpha = 1,
depth_multiplier=1, strides=(1, 1), block_id=1):
pointwise_conv_filters = int(pointwise_conv_filters * alpha)
x = DepthwiseConv2D((3, 3),
padding=\'same\',
depth_multiplier=depth_multiplier,
strides=strides,
use_bias=False)(inputs)
x = BatchNormalization()(x)
x = Activation(relu6)(x)
x = Conv2D(pointwise_conv_filters, (1, 1),
padding=\'same\',
use_bias=False,
strides=(1, 1))(x)
x = BatchNormalization()(x)
return Activation(relu6)(x)
def make_five_convs(x, num_filters):
# 五次卷积
x = DarknetConv2D_BN_Leaky(num_filters, (1,1))(x)
x = _depthwise_conv_block(x, num_filters*2,alpha=1)
x = DarknetConv2D_BN_Leaky(num_filters, (1,1))(x)
x = _depthwise_conv_block(x, num_filters*2,alpha=1)
x = DarknetConv2D_BN_Leaky(num_filters, (1,1))(x)
return x
def yolo_body(inputs, num_anchors, num_classes, backbone=\"mobilenetv1\", alpha=1):
P5 = DarknetConv2D_BN_Leaky(int(512* alpha), (1,1))(feat3)
P5 = _depthwise_conv_block(P5, int(1024* alpha))
P5 = DarknetConv2D_BN_Leaky(int(512* alpha), (1,1))(P5)
maxpool1 = MaxPooling2D(pool_size=(13,13), strides=(1,1), padding=\'same\')(P5)
maxpool2 = MaxPooling2D(pool_size=(9,9), strides=(1,1), padding=\'same\')(P5)
maxpool3 = MaxPooling2D(pool_size=(5,5), strides=(1,1), padding=\'same\')(P5)
P5 = Concatenate()([maxpool1, maxpool2, maxpool3, P5])
P5 = DarknetConv2D_BN_Leaky(int(512* alpha), (1,1))(P5)
P5 = _depthwise_conv_block(P5, int(1024* alpha))
P5 = DarknetConv2D_BN_Leaky(int(512* alpha), (1,1))(P5)
P5_upsample = compose(DarknetConv2D_BN_Leaky(int(256* alpha), (1,1)), UpSampling2D(2))(P5)
P4 = DarknetConv2D_BN_Leaky(int(256* alpha), (1,1))(feat2)
P4 = Concatenate()([P4, P5_upsample])
P4 = make_five_convs(P4,int(256* alpha))
P4_upsample = compose(DarknetConv2D_BN_Leaky(int(128* alpha), (1,1)), UpSampling2D(2))(P4)
P3 = DarknetConv2D_BN_Leaky(int(128* alpha), (1,1))(feat1)
P3 = Concatenate()([P3, P4_upsample])
P3 = make_five_convs(P3,int(128* alpha))
P3_output = _depthwise_conv_block(P3, int(256* alpha))
P3_output = DarknetConv2D(num_anchors*(num_classes+5), (1,1))(P3_output)
P3_downsample = _depthwise_conv_block(P3, int(256* alpha), strides=(2,2))
P4 = Concatenate()([P3_downsample, P4])
P4 = make_five_convs(P4,int(256* alpha))
P4_output = _depthwise_conv_block(P4, int(512* alpha))
P4_output = DarknetConv2D(num_anchors*(num_classes+5), (1,1))(P4_output)
P4_downsample = _depthwise_conv_block(P4, int(512* alpha), strides=(2,2))
P5 = Concatenate()([P4_downsample, P5])
P5 = make_five_convs(P5,int(512* alpha))
P5_output = _depthwise_conv_block(P5, int(1024* alpha))
P5_output = DarknetConv2D(num_anchors*(num_classes+5), (1,1))(P5_output)
return Model(inputs, [P5_output, P4_output, P3_output])
本文所用训练数据为网上公开数据集,训练集和验证集分别包含6000和2000张戴口罩和不戴口罩的数据,其所对应的标签为.xml格式。数据示例如下所示:
训练前先获取对应的类和anchor框数据:类即为mask和face两类,anchor的数据[12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401]。
根据数据生成函数来在训练时生成数据和标签:
def data_generator(annotation_lines, batch_size, input_shape, anchors, num_classes, random=True,
eager=True):
n = len(annotation_lines)
i = 0
flag = True
while True:
image_data = []
box_data = []
for b in range(batch_size):
if i == 0:
np.random.shuffle(annotation_lines)
image, box = get_random_data(annotation_lines[i], input_shape, random=random)
i = (i + 1) % n
image_data.append(image)
box_data.append(box)
image_data = np.array(image_data)
box_data = np.array(box_data)
y_true = preprocess_true_boxes(box_data, input_shape, anchors, num_classes)
if eager:
yield image_data, y_true[0], y_true[1], y_true[2]
else:
yield [image_data, *y_true], np.zeros(batch_size)
# 读入.xml文件,输出label
def preprocess_true_boxes(true_boxes, input_shape, anchors, num_classes):
assert (true_boxes[..., 4] < num_classes).all(), \'class id must be less than num_classes\'
# 一共有三个特征层数
num_layers = len(anchors) // 3
anchor_mask = [[6, 7, 8], [3, 4, 5], [0, 1, 2]]
true_boxes = np.array(true_boxes, dtype=\'float32\')
input_shape = np.array(input_shape, dtype=\'int32\')
boxes_xy = (true_boxes[..., 0:2] + true_boxes[..., 2:4]) // 2
boxes_wh = true_boxes[..., 2:4] - true_boxes[..., 0:2]
true_boxes[..., 0:2] = boxes_xy / input_shape[::-1]
true_boxes[..., 2:4] = boxes_wh / input_shape[::-1]
m = true_boxes.shape[0]
grid_shapes = [input_shape // {0: 32, 1: 16, 2: 8}[l] for l in range(num_layers)]
y_true = [np.zeros((m, grid_shapes[l][0], grid_shapes[l][1], len(anchor_mask[l]), 5 + num_classes),
dtype=\'float32\') for l in range(num_layers)]
anchors = np.expand_dims(anchors, 0)
anchor_maxes = anchors / 2.
anchor_mins = -anchor_maxes
valid_mask = boxes_wh[..., 0] > 0
for b in range(m):
# 对每一张图进行处理
wh = boxes_wh[b, valid_mask[b]]
if len(wh) == 0: continue
wh = np.expand_dims(wh, -2)
box_maxes = wh / 2.
box_mins = -box_maxes
intersect_mins = np.maximum(box_mins, anchor_mins)
intersect_maxes = np.minimum(box_maxes, anchor_maxes)
intersect_wh = np.maximum(intersect_maxes - intersect_mins, 0.)
intersect_area = intersect_wh[..., 0] * intersect_wh[..., 1]
box_area = wh[..., 0] * wh[..., 1]
anchor_area = anchors[..., 0] * anchors[..., 1]
iou = intersect_area / (box_area + anchor_area - intersect_area)
best_anchor = np.argmax(iou, axis=-1)
for t, n in enumerate(best_anchor):
for l in range(num_layers):
if n in anchor_mask[l]:
i = np.floor(true_boxes[b, t, 0] * grid_shapes[l][1]).astype(\'int32\')
j = np.floor(true_boxes[b, t, 1] * grid_shapes[l][0]).astype(\'int32\')
k = anchor_mask[l].index(n)
c = true_boxes[b, t, 4].astype(\'int32\')
y_true[l][b, j, i, k, 0:4] = true_boxes[b, t, 0:4]
y_true[l][b, j, i, k, 4] = 1
y_true[l][b, j, i, k, 5 + c] = 1
return y_true
---bubbliiiing(CSDN)
在训练时,首先创建模型,然后载入预训练权重,给模型赋予初始参数,然后将数据数据进行划分,划分为训练集和验证集。训练可以分两步:冻结训练和解冻训练。由于主干特征提取网络的特征通用,冻结训练可以加快训练速度,也可以在训练初期防止权值被破坏。经过冻结训练和解冻训练,将训练权重保存,用来进行预测。
# ------------------------------------------------------#
image_input = Input(shape=(None, None, 3))
h, w = input_shape
print(\'Create YOLOv4 model with {} anchors and {} classes.\'.format(num_anchors, num_classes))
model_body = yolo_body(image_input, num_anchors // 3, num_classes, backbone, alpha)
# 载入预训练权重
print(\'Load weights {}.\'.format(weights_path))
model_body.load_weights(weights_path, by_name=True, skip_mismatch=True)
# ------------------------------------------------------#
# 在这个地方设置损失,将网络的输出结果传入loss函数
# 把整个模型的输出作为loss
# ------------------------------------------------------#
y_true = [Input(shape=(h // {0: 32, 1: 16, 2: 8}[l], w // {0: 32, 1: 16, 2: 8}[l], \\
num_anchors // 3, num_classes + 5)) for l in range(3)]
loss_input = [*model_body.output, *y_true]
model_loss = Lambda(yolo_loss, output_shape=(1,), name=\'yolo_loss\',
arguments={\'anchors\': anchors, \'num_classes\': num_classes, \'ignore_thresh\': 0.5,
\'label_smoothing\': label_smoothing})(loss_input)
model = Model([model_body.input, *y_true], model_loss)
# 训练参数的设置
logging = TensorBoard(log_dir=log_dir)
checkpoint = ModelCheckpoint(log_dir + \"/ep{epoch:03d}-loss{loss:.3f}-val_loss{val_loss:.3f}.h5\",
save_weights_only=True, save_best_only=False, period=1)
early_stopping = EarlyStopping(min_delta=0, patience=10, verbose=1)
loss_history = LossHistory(log_dir)
# 当前划分方式下,验证集和训练集的比例为1:9
val_split = 0.1
with open(annotation_path) as f:
lines = f.readlines()
np.random.seed(10101)
np.random.shuffle(lines)
np.random.seed(None)
num_val = int(len(lines) * val_split)
num_train = len(lines) - num_val
freeze_layers = 81
# 冻结训练
if True:
Init_epoch = 0
Freeze_epoch = 50
batch_size = 8
learning_rate_base = 1e-3
epoch_size = num_train // batch_size
epoch_size_val = num_val // batch_size
if epoch_size == 0 or epoch_size_val == 0:
raise ValueError(\"数据集过小,无法进行训练,请扩充数据集。\")
reduce_lr = ReduceLROnPlateau(monitor=\'val_loss\', factor=0.5, patience=3, verbose=1)
model.compile(optimizer=Adam(learning_rate_base), loss={\'yolo_loss\': lambda y_true, y_pred: y_pred})
print(\'Train on {} samples, val on {} samples, with batch size {}.\'.format(num_train, num_val, batch_size))
model.fit(data_generator(lines[:num_train], batch_size, input_shape, anchors, num_classes,
random=True, eager=False),
steps_per_epoch=epoch_size,
validation_data=data_generator(lines[num_train:], batch_size, input_shape, anchors, num_classes,
random=False, eager=False),
validation_steps=epoch_size_val,
epochs=Freeze_epoch,
initial_epoch=Init_epoch,
callbacks=[logging, checkpoint, reduce_lr, early_stopping, loss_history])
for i in range(freeze_layers): model_body.layers[i].trainable = True
# 解冻训练
if True:
Freeze_epoch = 50
Epoch = 100
batch_size = 8
learning_rate_base = 1e-4
epoch_size = num_train // batch_size
epoch_size_val = num_val // batch_size
if epoch_size == 0 or epoch_size_val == 0:
raise ValueError(\"数据集过小,无法进行训练,请扩充数据集。\")
reduce_lr = ReduceLROnPlateau(monitor=\'val_loss\', factor=0.5, patience=3, verbose=1)
model.compile(optimizer=Adam(learning_rate_base), loss={\'yolo_loss\': lambda y_true, y_pred: y_pred})
print(\'Train on {} samples, val on {} samples, with batch size {}.\'.format(num_train, num_val, batch_size))
model.fit(data_generator(lines[:num_train], batch_size, input_shape, anchors, num_classes,
random=True, eager=False),
steps_per_epoch=epoch_size,
validation_data=data_generator(lines[num_train:], batch_size, input_shape, anchors, num_classes,
random=False, eager=False),
validation_steps=epoch_size_val,
epochs=Epoch,
initial_epoch=Freeze_epoch,
callbacks=[logging, checkpoint, reduce_lr, early_stopping, loss_history])
训练后保存的模型可以用来进行对视频和图片进行预测。
基于mobilenet的YOLO4模型经过训练,不仅对简单的单人图片预测效果很好,对一些复杂的图片也有比较好的预测效果。