Video object search and discovery
Date of Issue2016-12-28
School of Electrical and Electronic Engineering
In terms of volume, videos are becoming the largest big data. The sheer volume of video data demands powerful analytic tools to organize and make sense of them. This thesis proposes to tackle two fundamental problems in big video analytics, i.e., search and discovery, from an object-driven angle. Objects that we consider are the fundamental components of a video, which are concise, visually meaningful and informational. The mere presence of certain objects in a video and their interactions can provide us rich information for video understanding. In addition, they can help establish a quick impression of the video by telling what are there, and provide a small footprint for video indexing, browsing and search. For video object search, we aim to search for and locate a speci fic object spatio-temporally in the video volume. The main challenges are: 1) object appearance variations across video frames caused by pose and scale variations, partial occlusions, etc., 2) false positives introduced by background clutters, and 3) search e fficiency. We propose to formulate video object search as a problem of finding the spatio-temporal object trajectories, where an object trajectory consists of a sequence of bounding boxes that locate the target object across frames. We also present a Max-Path search solution that can e ffectively reduce the complexity of trajectory search from exponential to linear to the video volume size. Furthermore, we present and evaluate the use of object proposals to speed up matching and trajectory search. Experimental results demonstrate three benefi ts of the proposed approaches. First, the formulation as trajectory search can eff ectively improve matching accuracy by enforcing spatio-temporal coherency to overcome appearance variations and background clutters. In addition, the resulting trajectories o er an alternative to frames for measuring object occurrences and consequently the search performance. Second, the Max-Path based trajectory search is effi cient and compatible with both dense confi dence maps and coarsely sampled object proposals. Third, the object proposal based approach can signi ficantly boost search effi ciency without compromising accuracy. For video object discovery, this thesis focuses on the discovery of representative objects from videos. We propose to address this problem by selecting representative object proposals generated from video frames. Although representative selection methods have been applied to video keyframe selection, directly applying them to object-level selection faces two major challenges. First, the key objects do not necessary locate at the densest regions in the feature space due to the appearance variations of the same object across frames, hence, classic density based representative selection method may not work well. Second, the irrelevant and noisy proposals in the proposal pool may signifi cantly a ffect representative selection methods based on sparse reconstruction. To address these challenges, we have devised a new formulation of sparse reconstruction based representative selection that can incorporate object proposal priors and locality prior in the feature space when selecting representatives. Consequently it can better locate key objects and suppress outlier proposals. Although complex constraints have been introduced, we show that the optimization can be converted into a proximal gradient problem and be solved by the fast iterative shrinkage thresholding algorithm (FISTA). The proposed methods are compared against existing state-of-the-arts for object instance search and representative object discovery on challenging datasets. It shows that our methods can more accurately find relevant videos pertaining to an object of interest and discover key objects that capture the essence of a video.