The importance of spatial information on object recognition
Date of Issue2016-04-01
School of Electrical and Electronic Engineering
The “bag-of-words” (BoW) model is a staple in the field of computer vision, spanning applications within object, scene, and action recognition. An assumption inherent in the “bag-of-words” model is that each patch involved in the process is independent and unordered. As a result, BoW naturally neglects the important information of spatial locations and arrangements, leading to several drawbacks. Spatial Pyramid Matching (SPM) is the most popular framework used in incorporating spatial information into the BoW model. The model redefines an image as a pyramid consisting of several layers made of copies of the same image. Each layer l 2 f0; 1; :::;L ����� 1g is divided into 2l 2l subwindows/ spatial windows, and from each sub-window a BoW descriptor is extracted. These descriptors are then concatenated, creating the SPM image descriptor. SPM therefore offers a simple and efficient way to approximate spatial arrangements within the previously unordered collection of codeword histograms. Due to its simplicity, it has been very successful in many applications, and is even used in non-BoW methods. Very few works have questioned the effectiveness of this approach. The efficiency of spatial pyramids as an image descriptor and the appropriateness of SPM construction are simply taken for granted. This work will present a detailed investigation of the importance of spatial information in object recognition, and challenge the traditional SPM arrangement. This thesis is divided into two parts. The first part presents an argument for the necessity of such knowledge, by showing how spatial information can significantly improve recognition systems. The Hierarchical Dirichlet Process (HDP) for image recognition is used to show this. The HDP suffers from the "rich-get-richer" effect caused by the way sampling is carried out. The first part of this thesis shows that spatial information can alleviate this issue, considerably improving the reliability of the HDP. We show that spatial information in the form of the cardinality coefficient and approximate shape masks is not only able to produce overall improvement in terms of accuracy of object recognition, but is also able to mitigate the detrimental effect inherent to the HDP. With the realization that spatial information plays an important role in image recognition, the second part offers a systematic investigation of the architecture of SPM. This study is done to show that SPM representation is sub-optimal, and at the same time present possible ways for improvement. In doing so, this thesis presents a few novel paradigms. From the second part, two novel paradigms are presented based on our investigation of the optimality of traditional SVM. Overlapping spatial windows (OWSPM) and circular spatial windows (CWSPM) present a new way of constructing the spatial pyramids, strengthening the discriminability of SPM representations by adding a broader context to each spatial window. While OWSPM and CWSPM come from investigating the process of crafting SPM representation, the investigation of the arrangement of SPM led to our introduction of optimal spatial window arrangements. This comes in the form of Optimal Window SPM (OA-SPM) and a linear approximation of it in the form of LA-SPM. Combined, these proposed models were tested using various dataset and compared with several baseline methods such as ScSPM, LLC, Object Bank, and Deep Learning. A consistent and significant increase in performance, up to 4.38% with a lesser memory cost of nearly 40%, was reported, showing that the traditional spatial window arrangement of SPM is indeed inefficient. The thesis will present the conclusion that SPM is sub-optimal on multiple fronts. In terms of structure, the disjointed window arrangement of traditional SPM actually performs poorly, and can be improved by the overlapping window arrangement. Furthermore, usage of overlapping windows enabled us to further explore the topic of optimality of SPM. The usage of all spatial windows inside a spatial pyramid proved to be more damaging than beneficial, as it hampers the discriminability of image representation, and adds unnecessary cost to the training and testing process.