目标检测中的多尺度测试及源码解析（FCOS多尺度测试）

发布时间：2024-03-16 12:01

近期在研究FCOS目标检测算法，论文发表于ICCV 2019。FCOS方法性能还是很不错的，代码工程化也很好，准备follow一下。

FCOS: Fully Convolutional One-Stage Object Detection (ICCV\'19)

论文：https://arxiv.org/pdf/1904.01355.pdf

源码：https://github.com/tianzhi0549/FCOS

------------------------------------ Let\'s start ------------------------------------

这篇博客主要是聊聊FCOS中用到的多尺度测试。通过多尺度测试，基于ResNeXt-64x4d-101和可变形卷积的FCOS模型在COCO test-dev上取得了49.0%的AP。我在COCO上对此也进行了验证，相比于单尺度测试，多尺度测试的AP大概可以提高2个点，还是非常有效的。当然了，由此带来的最大问题是时间开销明显增大很多，这是未来需要解决的一个难点。

在具体说明FCOS多尺度测试源码前，先简单回顾一下目标检测中的多尺度训练及测试，做目标检测的应该基本都知道多尺度对最终性能的重要性。

输入图片的尺寸对检测模型的性能影响相当明显，事实上，多尺度是提升精度最明显的技巧之一。在基础网络部分常常会生成比原图小数十倍的特征图，导致小物体的特征描述不容易被检测网络捕捉。通过输入更大、更多尺寸的图片进行训练，能够在一定程度上提高检测模型对物体大小的鲁棒性，仅在测试阶段引入多尺度，也可享受大尺寸和多尺寸带来的增益。[1]

------------------------------------ Let\'s continue ------------------------------------

FCOS多尺度测试就是在水平翻转、resize到不同尺度的图像上分别检测目标，再将预测出的bbox融合到一起，然后经过nms等后续处理得到最终的检测框。想法比较直接，但是效果显著。FCOS源码中，针对多尺度测试有几个相关的设置：

TEST:
  BBOX_AUG:
    ENABLED: False
    H_FLIP: True
    SCALES: (400, 500, 600, 700, 900, 1000, 1100, 1200)
    MAX_SIZE: 2000
    SCALE_H_FLIP: True

其中，ENABLED是多尺度测试的flag，False表示用单尺度测试，速度较快。如果设置成True，则采用多尺度测试的方式。下面几个是多尺度测试时的参数。

H_FLIP：水平翻转的flag；

SCALES：测试图片resize之后的scale；

MAX_SIZE：测试图片resize时最大的size；

SCALE_H_FLIP：做resize时水平翻转的flag。

接着，简单说明多尺度测试的源码 [2]。主函数是im_detect_bbox_aug，分几步完成多尺度检测的任务：

1. 原图像预测：boxlists_i = im_detect_bbox

2. 翻转后图像预测：boxlists_hf = im_detect_bbox_hflip

3. resize到不同尺度预测：boxlists_scl = im_detect_bbox_scale

4. 不同尺度翻转后图像预测：boxlists_scl_hf = im_detect_bbox_scale

具体实现细节参照以下源码，注释清楚，很容易理解。

import torch
import torchvision.transforms as TT

from fcos_core.config import cfg
from fcos_core.data import transforms as T
from fcos_core.structures.image_list import to_image_list
from fcos_core.structures.bounding_box import BoxList
from fcos_core.modeling.rpn.fcos.inference import make_fcos_postprocessor


def im_detect_bbox_aug(model, images, device):
    # Collect detections computed under different transformations
    boxlists_ts = []
    for _ in range(len(images)):
        boxlists_ts.append([])

    def add_preds_t(boxlists_t):
        for i, boxlist_t in enumerate(boxlists_t):
            if len(boxlists_ts[i]) == 0:
                # The first one is identity transform, no need to resize the boxlist
                boxlists_ts[i].append(boxlist_t)
            else:
                # Resize the boxlist as the first one
                boxlists_ts[i].append(boxlist_t.resize(boxlists_ts[i][0].size))

    # Compute detections for the original image (identity transform)
    boxlists_i = im_detect_bbox(
        model, images, cfg.INPUT.MIN_SIZE_TEST, cfg.INPUT.MAX_SIZE_TEST, device
    )
    add_preds_t(boxlists_i)

    # Perform detection on the horizontally flipped image
    if cfg.TEST.BBOX_AUG.H_FLIP:
        boxlists_hf = im_detect_bbox_hflip(
            model, images, cfg.INPUT.MIN_SIZE_TEST, cfg.INPUT.MAX_SIZE_TEST, device
        )
        add_preds_t(boxlists_hf)

    # Compute detections at different scales
    for scale in cfg.TEST.BBOX_AUG.SCALES:
        max_size = cfg.TEST.BBOX_AUG.MAX_SIZE
        boxlists_scl = im_detect_bbox_scale(
            model, images, scale, max_size, device
        )
        add_preds_t(boxlists_scl)

        if cfg.TEST.BBOX_AUG.SCALE_H_FLIP:
            boxlists_scl_hf = im_detect_bbox_scale(
                model, images, scale, max_size, device, hflip=True
            )
            add_preds_t(boxlists_scl_hf)

    assert cfg.MODEL.FCOS_ON, \"The multi-scale testing only supports FCOS detector\"

    # Merge boxlists detected by different bbox aug params
    boxlists = []
    for i, boxlist_ts in enumerate(boxlists_ts):
        bbox = torch.cat([boxlist_t.bbox for boxlist_t in boxlist_ts])
        scores = torch.cat([boxlist_t.get_field(\'scores\') for boxlist_t in boxlist_ts])
        labels = torch.cat([boxlist_t.get_field(\'labels\') for boxlist_t in boxlist_ts])
        boxlist = BoxList(bbox, boxlist_ts[0].size, boxlist_ts[0].mode)
        boxlist.add_field(\'scores\', scores)
        boxlist.add_field(\'labels\', labels)
        boxlists.append(boxlist)

    # Apply NMS and limit the final detections
    post_processor = make_fcos_postprocessor(cfg)
    results = post_processor.select_over_all_levels(boxlists)

    return results


def im_detect_bbox(model, images, target_scale, target_max_size, device):
    \"\"\"
    Performs bbox detection on the original image.
    \"\"\"
    transform = TT.Compose([
        T.Resize(target_scale, target_max_size),
        TT.ToTensor(),
        T.Normalize(
            mean=cfg.INPUT.PIXEL_MEAN, std=cfg.INPUT.PIXEL_STD, to_bgr255=cfg.INPUT.TO_BGR255
        )
    ])
    images = [transform(image) for image in images]
    images = to_image_list(images, cfg.DATALOADER.SIZE_DIVISIBILITY)
    return model(images.to(device))


def im_detect_bbox_hflip(model, images, target_scale, target_max_size, device):
    \"\"\"
    Performs bbox detection on the horizontally flipped image.
    Function signature is the same as for im_detect_bbox.
    \"\"\"
    transform = TT.Compose([
        T.Resize(target_scale, target_max_size),
        TT.RandomHorizontalFlip(1.0),
        TT.ToTensor(),
        T.Normalize(
            mean=cfg.INPUT.PIXEL_MEAN, std=cfg.INPUT.PIXEL_STD, to_bgr255=cfg.INPUT.TO_BGR255
        )
    ])
    images = [transform(image) for image in images]
    images = to_image_list(images, cfg.DATALOADER.SIZE_DIVISIBILITY)
    boxlists = model(images.to(device))

    # Invert the detections computed on the flipped image
    boxlists_inv = [boxlist.transpose(0) for boxlist in boxlists]
    return boxlists_inv


def im_detect_bbox_scale(model, images, target_scale, target_max_size, device, hflip=False):
    \"\"\"
    Computes bbox detections at the given scale.
    Returns predictions in the scaled image space.
    \"\"\"
    if hflip:
        boxlists_scl = im_detect_bbox_hflip(model, images, target_scale, target_max_size, device)
    else:
        boxlists_scl = im_detect_bbox(model, images, target_scale, target_max_size, device)
    return boxlists_scl

参考文献

[1] https://www.cnblogs.com/Terrypython/p/10642091.html

[2] https://github.com/tianzhi0549/FCOS/blob/master/fcos_core/engine/bbox_aug.py

目标检测中的多尺度测试及源码解析（FCOS多尺度测试）

相关推荐