Feature weighting with augmented visual phrase in visual product recognition
Date of Issue2016
School of Electrical and Electronic Engineering
Significant progress towards image search has been made in the past decade through the development of local invariant features. Among existing local feature detectors, the Scale Invariant Feature Transform (SIFT)  is widely used since it is designed to be invariant to minimal illumination changes and certain geometric transformations. However, in practice, the recognition performance is still subject to actual condition and there still exist various cases that SIFT can't handle, such as non-linear illumination changes  and certain a ne transformation . It is noted that images taken by mobile phones often su er from such variations. In order to address this issue, we propose a feature weighting algorithm to determine the stability of interest points under various photometric and geometric transformations. By assigning di fferent scores to these interest points according to the repeatibility they are in the augmented samples of diff erent illumination or geometric transformations, the stable interest points can be obtained. The Bag of Word (BoW) based representation is widely used in image visual search due to its good performance and high computational e fficiency. In traditional BoW representation, the local descriptors in the whole image are treated with equal importance. This restricts its performance as in most scenarios the local descriptors on foreground objects should be assigned more weight than the descriptors on the background. In view of this, we incorporate saliency information with the proposed feature weighting algorithm to further improve the performance. The weighted interest points are then used in the weighted scalable vocabulary tree (WSVT) framework for the image recognition task. Geometric veri fication (GV) is used to re-rank the images. Experimental results on a commercial product database show the proposed feature weighting algorithm outperforms the current SVT recognition without feature weighting by 5%. Moreover, with the popularity of image editing softwares and social media networking services such as Instagram and Facebook, a growing number of users tend to process digital images using various post-processing filters before uploading. Some SIFT features are not stable given the nonlinearity of these fi lters, which may degrade recognition performance. Therefore we further proposed a feature weighting method based on post-processing filters and incorporate it into the previous recognition framework. Experimental results on a commercial product database show the proposed algorithm outperforms the current SVT recognition without feature weighting by more than 10% in average recognition performance for various post-processing e ffects. Query images captured by mobile phones often suffer from illumination, scale, and viewpoint changes. This poses a great challenge in visual recognition. In view of this, we propose a framework that uses Augmented Visual Phrases (AVP) in the Bag-of-Phrase model. By checking the consistency between keypoints in the original image and augmented images, a pool of augmented features are constructed by merging all the discriminative keypoints and their corresponding descriptors from the original as well as the transformed images. AVPs are then selected from the pool of augmented features. These selected visual phrases are meaningful as they incorporate features carrying more diverse information. Database images are indexed with a two-dimensional inverted index of visual phrases. To eliminate spurious matches of visual phrases, we apply geometric verification (GV) to top-ranked retrieved images. Experimental results show that the proposed AVP method outperforms the current BoP method by 9% in recognition rate.