How robust are deep object detectors to variability in ground truth bounding boxes? experiments for target recognition in infrared imagery

Abstract

In this work we consider the problem of developing deep learning models - such as convolutional neural networks (CNNs) - for automatic target detection (ATD) in infrared (IR) imagery. CNN-based ATD systems must be trained to recognize objects using bounding box (BB) annotations generated by human annotators. We hypothesize that individual annotators may exhibit different biases and/or variability in the characteristics of their BB annotations. Similarly, computer-aided annotation methods may also introduce different types of variability into the BBs. In this work we investigate the impact of BB variability on the behavior and detection performance of CNNs trained using them. We consider two specific BB characteristics here: the center-point, and the overall scale of BBs (with respect to the visual extent of the targets they label). We systematically vary the bias or variance of these characteristics within a large training dataset of IR imagery, and then evaluate the performance on the resulting trained CNN models. Our results indicate that biases in these BB characteristics do not impact performance, but will cause the CNN to mirror the biases in its BB predictions. In contrast, variance in these BB characteristics substantially degrades performance, suggesting care should be taken to reduce variance in the BBs.

DOI

10.1117/12.2565897

Year

2020