Salient keypoint detectors and compact feature descriptors for 3D perception
Prakhya, Sai Manoj
Date of Issue2017
School of Computer Science and Engineering
A*STAR Institute for Infocomm Research (I2R)
3D depth data acquisition has become extremely easy and affordable with the availability of hand-held depth sensors such as Microsoft Kinect, Intel RealSense Camera and Google Tango. Moreover, with the surge in smartphones equipped with depth sensors such as Lenovo Phab2Pro and Asus Zenfone AR, it is quintessential to develop 3D perception applications that are accurate, and run with low memory, computational and bandwidth requirements. The first two steps of various 3D perception applications, such as Simultaneous Localization and Mapping (SLAM), 3D object recognition, retrieval and 3D reconstruction, are 1. 3D Keypoint Detection - Detect meaningful 3D points of interest that can efficiently represent the input 3D point cloud. 2. 3D Feature Description - Represent the 3D neighbourhood of the detected keypoints with a multi-dimensional vector to determine keypoint correspondences. The first part of the thesis focuses on 3D keypoint detection, in which, firstly a highly repeatable salient 3D keypoint detection algorithm is proposed. Next, we consider a specific 3D perception application, SLAM with an RGB-D camera, and propose a new 3D keypoint detection module that works best for it. The second part of the thesis focusses on 3D feature description, in which, we firstly propose a fast real valued low dimensional 3D descriptor, then the first binary 3D descriptor in literature and lastly, a set of even lower bitrate 3D descriptors, which are extremely fast to compute, match yet still offer better performance. Existing 3D keypoint detectors sometimes detect keypoints on non-salient regions/planar regions or, detect noise and glitches as keypoints. In contrary to the existing norm of having distinct keypoints, we propose to detect salient and highly repeatable keypoint sets(groups of keypoints). Towards this, we propose Histogram of Normal Orientations (HoNO) to detect salient regions and effectively remove planar regions by thresholding the kurtosis of HoNO calculated at every point in the point cloud. Then, the final keypoint sets are detected by evaluating the properties of HoNO and neighbourhood covariance matrix. Next, we consider a 3D perception problem, SLAM with an RGB-D camera by solely re- lying on depth data. As a solution, we propose Sparse Depth Odometry (SDO), in which the main contribution lies in the proposal of a new 3D keypoint detection module. The new key- point detection module comprises of two existing keypoint detectors, SURE and NARF, and is designed based on extensive theoretical and experimental analysis. The proposed keypoint detection module finds reliable keypoints that work well with nearest neighbour association and represent the scene comprehensively while working in real time, which are the key requirements of SDO. SDO powered with the proposed keypoint detection module, estimates the ego-motion of an RGB-D camera solely from its depth data and runs online without a GPU. As for 3D feature description, existing real valued 3D descriptors are either high dimensional or demand immense computational time for their extraction and matching. Hence we propose 3DHoPD, a new low dimensional 3D feature descriptor that is extremely fast to com- pute. The novelty lies in compactly encoding the 3D keypoint position by transforming it to a new 3D space, where the keypoints arising from similar 3D surface patches lie close to each other. Then we propose Histograms of Point Distributions (HoPD) descriptor to capture the neighbourhood structure, thus forming 3DHoPD (3D+HoPD). We propose a tailored feature descriptor matching technique, where in, the search space for each keypoint match is reduced by 90%, and then the exact match is found using the proposed HoPD descriptor. There are several real valued 3D descriptors, but there is no binary 3D descriptor for 3D keypoint matching. Binary descriptors are known for their low memory footprint and fast matching via Hamming distance. Hence, we introduce the first binary 3D descriptor, B-SHOT, by proposing an adaptive binarization technique that converts a real valued vector to a binary vector. We apply this method on a state-of-the-art 3D feature descriptor, SHOT, and create a new binary 3D descriptor. B-SHOT requires 32 times lesser memory for its representation while being 6 times faster in feature descriptor matching, when compared to the SHOT. Finally, for the applications that require online transfer of 3D descriptors over a network, there is a need to develop compressed 3D descriptors with even lower memory footprint, i.e., bandwidth and yet have high descriptiveness. Therefore, we propose to employ lattice quantization to efficiently compress 3D feature descriptors. These compressed low bitrate 3D descriptors can be directly matched in compressed domain without any need for decompression, hence drastically reducing the memory footprint and computational requirements for match- ing. We also propose double stage lattice quantization to achieve even more compression in the case of SHOT descriptor. We provide a spectrum of possible bitrates and achievable keypoint matching performance for three state-of-the-art 3D feature descriptors, so that it can aid users to choose the apt one based on the memory, bandwidth and performance requirements.
DRNTU::Engineering::Computer science and engineering