非结构化环境中的实例级物体检测Instance object detection in unstructured environment-仪器光电学院测控系B704实验室

非结构化环境中的实例级物体检测的首要目标是设计一个能够对复杂场景中的特定实例物体进行精确、快速的物体类别、形状、位置识别的检测系统，并能处理诸如真实三维空间场景中实例物体的姿态变化和相互遮挡等复杂问题。随着工业4.0与人工智能领域的迅猛发展，实例级物体检测技术有着很广泛的应用前景，例如在刑侦工作中追踪特定的犯罪分子与车辆、在边防领域锁定敌机与其他指定目标，以及满足智能型服务机器人在家政方面对特定物体的获取、安保方面对智能监控等应用的需求。

The primary goal of instance object detection in unstructured environments is to design a detection system that can recognize the name, shape and position of specific instance objects accurately and quickly in complex scenes, and to deal with complex problems such as posture change and occlusion in real three-dimensional space scenes.With the rapid development of industrial 4.0 and artificial intelligence, instance object detection technology has a wide range of application, such as tracking specific criminals and vehicles in criminal investigation, locking enemy aircraft and other designated targets in frontier defense. It also meets the needs of intelligent service robots for the acquisition of specific objects in the field of housekeeping and intelligent monitoring and other applications in the field of security.

为训练一个用于实例级物体检测的监督性深度网络，针对每个实例物体都需要提供大批量带标签注释的训练样本数据——在非结构化环境中实例物体视角姿态信息、光照强度等丰富变化的图像。然而，实例物体在复杂背景下的各个视角图像数据的采集工作与实例物体在图像中位置信息的标注工作通常繁琐又耗时，于是本实验室引入生成式建模来为每个实例物体自动生成大批量的在三维空间中不同姿态变化的多视角图像，最终将实例物体图像粘贴至复杂背景图像并自动生成位置坐标与尺寸大小的标注文件，从而搭建用于实例级物体检测的带标签训练样本数据库。

In order to train a supervised deep network for instance object detection, a large number of annotated training sample data are needed for each instance object, i.e. rich and varied images such as perspective information and illumination intensity of instance object in unstructured environment. However, the acquisition of image data from each viewpoint of an instance object in a complex background and the labeling of the location information are usually time-consuming. Therefore, to build a training sample database with labels for object detection at the instance level, generative modeling is used in our laboratory to automatically generate a large number of multi-view images in three-dimensional space for each instance object, then paste it onto the complex background images and automatically annotate the position coordinates and size.

本实验室搭建了两种用于实例级物体检测的大型带标签注释的训练数据库：北航室内场景生成图像训练数据库BHGI Dataset(Beihang University Generational Indoor Dataset, 简称BHGID) ，以及室内场景生成图像训练数据库BigBIRD-GI (BigBIRD Generational Indoor Dataset, 简称BGID)。

Two large labeled training databases for instance object detection have been built in our laboratory: Beihang University Generational Indoor Dataset (BHGI) and BigBIRD Generational Indoor Dataset (BigBIRD-GI ).

1、北航室内场景生成图像训练数据库BHGI Dataset

BHGI (Beihang University Generational Indoor Dataset)

北航室内场景生成图像训练数据集BHGI中的实例物体分别为汽车模型、CD唱片盒、洗涤剂瓶子、灭火器、眼镜盒、示波器、药盒、储物盒、茶罐、保温杯，共10个。在三个标准开源室内场景数据集（室内场景识别数据集Indoor Scene Recognition Dataset、开源多场景数据集SUN Dataset、RGB-D开源场景数据集RGB-D Scenes Dataset）中挑选15000张图像作为复杂场景背景图像。

The example instance objects in BHGI are car models, CD record boxes, detergent bottles, fire extinguishers, glasses box, oscilloscopes, pill case, storage box, tea caddy and vacuum. Fifteen thousand images were selected from three standard open source indoor scene datasets (Indoor Scene Recognition Dataset, SUN Dataset, RGB-D open source scene dataset) as background images of complex scenes.

根据对实例物体在三维空间中的多视角姿态与数据增强变换（位移、尺寸、旋转、拉伸、亮度、饱和度）图像的不同扩增丰富程度，北航室内场景生成图像训练数据集BHGI共包含6种不同的数量等级，其细节构成如下所示：

According to the different extensions level of multi-viewpoints in three-dimensional space images and data augmentation transformations (shift, size, rotation, stretching, brightness, saturation) images of instance objects, the BHGI contains six different levels of magnitude. The details of the BHGI are as follows:

BHGI1-4：使用生成式反卷积网络GDDNE分别为每个实例物体自动插值生成1560、13320、52560、145200张不同视角图像，四种不同情况的视角分布分别为：平面内0-360度之间每间隔3度、1度、0.5度、0.3度获取一张关键帧图像共计120个、360个、720个、1200个不同旋转角度，以及摄像机在空间内0-90度之间每间隔7.5度、2.5度、1.25度、0.75度获取一张关键帧图像共计13个、37个、73个、121个不同俯角角度。之后按照2:1:1的复杂背景图像比例，在每张图像上分别放置3个、5个、7个实例物体不同视角图像，同时对有遮挡与无遮挡的样本数量进行等比例设置，并加入400张真实样本数据，共计1400张、2400张、4400张、8400张训练样本数据；

BHGI1-4: Using the generative deconvolutional network GDDNE to generate 1560, 13320, 52560 and 145200 different viewpoint images for each instance object automatically. The angle distributions of such four different situations are as follows: 1) A total of 120, 360, 720, 1200 different rotation angles between 0-360 degrees in the plane at 3 degree, 1 degree, 0.5 degree, 0.3 degree interval. 2)A total of 13, 37, 73, 121 different depression angles are set for the camera in space, each at intervals of 7.5, 2.5, 1.25, 0.75 degrees between 0-90 degrees in space. Then, according to the ratio of 2:1:1 complex background image, three, five and seven different instance objects are placed on each image. At the same time, the number of occluded and unshielded samples is set in equal proportion. Finally, 400 real sample data are added, totaling 1400, 2400, 4400 and 8400 training sample data.

BHGI5-6：使用生成式反卷积网络GDDNE分别为每个实例物体自动插值生成145200张不同视角图像，其视角分布为：平面内0-360度之间每间隔0.3度获取一张关键帧图像共计1200个不同旋转角度，以及摄像机在空间内0-90度之间每间隔0.75度获取一张关键帧图像共计121个不同俯角角度。同时使用生成式反卷积网络GDDNE分别为每个实例物体自动生成6种数据增强变换图像，以及1-2种数据增强叠加变换的21种变换图像，其中每种数据增强图像生成1000张，共计6000张与21000张生成图像两种情况。之后按照2:1:1的复杂背景图像比例，在每张图像上分别放置3个、5个、7个实例物体图像，同时对有遮挡与无遮挡的样本数量进行等比例设置，并加入400张真实样本数据，共计14400张、44400张训练样本数据。

BHGI5-6: Using the generative deconvolutional network GDDNE to generate 145200 different viewpoint images for each instance object automatically. The angle distributions are as follows: 1200 different rotation angles between 0-360 degrees in the plane at 0.3 degree interval and a total of 121 different depression angles are set for the camera in space at intervals of 0.75 degrees between 0-90 degrees in space. At the same time, the GDDNE is used to automatically generate six kinds of data augmentation transformations images for each instance object, in another case, 1-2 kinds of superposition transformations are carried out in 6 kinds of transformations, totaling 21 kinds of transformationsimages. Each kind of data augmentation images are 1000, totally 6000 and 21000 generated images. Then, according to the ratio of 2:1:1 complex background image, three, five and seven different instance objects are placed on each image. At the same time, the number of occluded and unshielded samples is set in equal proportion. Finally, 400 real sample data are added, totaling 14400and 44400 training sample data.

北航室内场景生成图像训练数据集BHGI中的所有数据如附件中压缩文件，数据库中的样本图像如图1所示。

All the data in BHGI are compressed files in the attachment. The sample image in the database is shown in Figure 1.

图1 北航室内场景生成图像训练数据集BHGI部分图像示意图

Fig.1 Some examples in BHGI

为了测试实例级物体检测算法，我们同样制作了带标签注释的测试数据集。该数据集所提供的检测图像为摄像机真实拍摄的实例物体粘贴至复杂背景图像，共包含4种不同的遮挡程度（有遮挡图像所占数量比重分别为25%、50%、75%、100%），共计1080张数据样本。

In order to test the instance object detection algorithm, we also made an annotated test data set. The detection image provided by the data set is pasted to the complex background image by the real instance object taken by the camera. There are four different kinds of occlusion degrees (the proportion of occluded images is 25%, 50%, 75%, 100%, respectively), a total of 1080 data samples.