发布时间:2024-03-16 12:01
近期在研究FCOS目标检测算法,论文发表于ICCV 2019。FCOS方法性能还是很不错的,代码工程化也很好,准备follow一下。
FCOS: Fully Convolutional One-Stage Object Detection (ICCV\'19)
论文:https://arxiv.org/pdf/1904.01355.pdf
源码:https://github.com/tianzhi0549/FCOS
------------------------------------ Let\'s start ------------------------------------
这篇博客主要是聊聊FCOS中用到的多尺度测试。通过多尺度测试,基于ResNeXt-64x4d-101和可变形卷积的FCOS模型在COCO test-dev上取得了49.0%的AP。我在COCO上对此也进行了验证,相比于单尺度测试,多尺度测试的AP大概可以提高2个点,还是非常有效的。当然了,由此带来的最大问题是时间开销明显增大很多,这是未来需要解决的一个难点。
在具体说明FCOS多尺度测试源码前,先简单回顾一下目标检测中的多尺度训练及测试,做目标检测的应该基本都知道多尺度对最终性能的重要性。
输入图片的尺寸对检测模型的性能影响相当明显,事实上,多尺度是提升精度最明显的技巧之一。在基础网络部分常常会生成比原图小数十倍的特征图,导致小物体的特征描述不容易被检测网络捕捉。通过输入更大、更多尺寸的图片进行训练,能够在一定程度上提高检测模型对物体大小的鲁棒性,仅在测试阶段引入多尺度,也可享受大尺寸和多尺寸带来的增益。[1]
------------------------------------ Let\'s continue ------------------------------------
FCOS多尺度测试就是在水平翻转、resize到不同尺度的图像上分别检测目标,再将预测出的bbox融合到一起,然后经过nms等后续处理得到最终的检测框。想法比较直接,但是效果显著。FCOS源码中,针对多尺度测试有几个相关的设置:
TEST:
BBOX_AUG:
ENABLED: False
H_FLIP: True
SCALES: (400, 500, 600, 700, 900, 1000, 1100, 1200)
MAX_SIZE: 2000
SCALE_H_FLIP: True
其中,ENABLED是多尺度测试的flag,False表示用单尺度测试,速度较快。如果设置成True,则采用多尺度测试的方式。下面几个是多尺度测试时的参数。
H_FLIP:水平翻转的flag;
SCALES:测试图片resize之后的scale;
MAX_SIZE:测试图片resize时最大的size;
SCALE_H_FLIP:做resize时水平翻转的flag。
接着,简单说明多尺度测试的源码 [2]。主函数是im_detect_bbox_aug,分几步完成多尺度检测的任务:
1. 原图像预测:boxlists_i = im_detect_bbox
2. 翻转后图像预测:boxlists_hf = im_detect_bbox_hflip
3. resize到不同尺度预测:boxlists_scl = im_detect_bbox_scale
4. 不同尺度翻转后图像预测:boxlists_scl_hf = im_detect_bbox_scale
具体实现细节参照以下源码,注释清楚,很容易理解。
import torch
import torchvision.transforms as TT
from fcos_core.config import cfg
from fcos_core.data import transforms as T
from fcos_core.structures.image_list import to_image_list
from fcos_core.structures.bounding_box import BoxList
from fcos_core.modeling.rpn.fcos.inference import make_fcos_postprocessor
def im_detect_bbox_aug(model, images, device):
# Collect detections computed under different transformations
boxlists_ts = []
for _ in range(len(images)):
boxlists_ts.append([])
def add_preds_t(boxlists_t):
for i, boxlist_t in enumerate(boxlists_t):
if len(boxlists_ts[i]) == 0:
# The first one is identity transform, no need to resize the boxlist
boxlists_ts[i].append(boxlist_t)
else:
# Resize the boxlist as the first one
boxlists_ts[i].append(boxlist_t.resize(boxlists_ts[i][0].size))
# Compute detections for the original image (identity transform)
boxlists_i = im_detect_bbox(
model, images, cfg.INPUT.MIN_SIZE_TEST, cfg.INPUT.MAX_SIZE_TEST, device
)
add_preds_t(boxlists_i)
# Perform detection on the horizontally flipped image
if cfg.TEST.BBOX_AUG.H_FLIP:
boxlists_hf = im_detect_bbox_hflip(
model, images, cfg.INPUT.MIN_SIZE_TEST, cfg.INPUT.MAX_SIZE_TEST, device
)
add_preds_t(boxlists_hf)
# Compute detections at different scales
for scale in cfg.TEST.BBOX_AUG.SCALES:
max_size = cfg.TEST.BBOX_AUG.MAX_SIZE
boxlists_scl = im_detect_bbox_scale(
model, images, scale, max_size, device
)
add_preds_t(boxlists_scl)
if cfg.TEST.BBOX_AUG.SCALE_H_FLIP:
boxlists_scl_hf = im_detect_bbox_scale(
model, images, scale, max_size, device, hflip=True
)
add_preds_t(boxlists_scl_hf)
assert cfg.MODEL.FCOS_ON, \"The multi-scale testing only supports FCOS detector\"
# Merge boxlists detected by different bbox aug params
boxlists = []
for i, boxlist_ts in enumerate(boxlists_ts):
bbox = torch.cat([boxlist_t.bbox for boxlist_t in boxlist_ts])
scores = torch.cat([boxlist_t.get_field(\'scores\') for boxlist_t in boxlist_ts])
labels = torch.cat([boxlist_t.get_field(\'labels\') for boxlist_t in boxlist_ts])
boxlist = BoxList(bbox, boxlist_ts[0].size, boxlist_ts[0].mode)
boxlist.add_field(\'scores\', scores)
boxlist.add_field(\'labels\', labels)
boxlists.append(boxlist)
# Apply NMS and limit the final detections
post_processor = make_fcos_postprocessor(cfg)
results = post_processor.select_over_all_levels(boxlists)
return results
def im_detect_bbox(model, images, target_scale, target_max_size, device):
\"\"\"
Performs bbox detection on the original image.
\"\"\"
transform = TT.Compose([
T.Resize(target_scale, target_max_size),
TT.ToTensor(),
T.Normalize(
mean=cfg.INPUT.PIXEL_MEAN, std=cfg.INPUT.PIXEL_STD, to_bgr255=cfg.INPUT.TO_BGR255
)
])
images = [transform(image) for image in images]
images = to_image_list(images, cfg.DATALOADER.SIZE_DIVISIBILITY)
return model(images.to(device))
def im_detect_bbox_hflip(model, images, target_scale, target_max_size, device):
\"\"\"
Performs bbox detection on the horizontally flipped image.
Function signature is the same as for im_detect_bbox.
\"\"\"
transform = TT.Compose([
T.Resize(target_scale, target_max_size),
TT.RandomHorizontalFlip(1.0),
TT.ToTensor(),
T.Normalize(
mean=cfg.INPUT.PIXEL_MEAN, std=cfg.INPUT.PIXEL_STD, to_bgr255=cfg.INPUT.TO_BGR255
)
])
images = [transform(image) for image in images]
images = to_image_list(images, cfg.DATALOADER.SIZE_DIVISIBILITY)
boxlists = model(images.to(device))
# Invert the detections computed on the flipped image
boxlists_inv = [boxlist.transpose(0) for boxlist in boxlists]
return boxlists_inv
def im_detect_bbox_scale(model, images, target_scale, target_max_size, device, hflip=False):
\"\"\"
Computes bbox detections at the given scale.
Returns predictions in the scaled image space.
\"\"\"
if hflip:
boxlists_scl = im_detect_bbox_hflip(model, images, target_scale, target_max_size, device)
else:
boxlists_scl = im_detect_bbox(model, images, target_scale, target_max_size, device)
return boxlists_scl
参考文献
[1] https://www.cnblogs.com/Terrypython/p/10642091.html
[2] https://github.com/tianzhi0549/FCOS/blob/master/fcos_core/engine/bbox_aug.py