On-line human activity recognition in video streams
Date of Issue2017-06-06
School of Electrical and Electronic Engineering
Human action and activity recognition have been playing an important role in computer vision. On-line action and activity recognition, which is defined as recognizing ongoing actions or activities in untrimmed videos, is a necessity of a real-time recognition system that can be used in many real world scenarios such as abnormal action detection by video surveillance and robotics. Despite the progress made in recent years, on-line action and activity recognition at a real-time speed still remains a challenging problem. In the task of on-line action recognition, decisions are made based on partial observation of the actions. Thus the testing video is represented by a sequence of frame-level descriptors. Due to the fact that local features from a small part of the video are not sufficient to generate frame-level descriptors, these frame-level descriptors may become noisier. Besides, frame-level descriptors from one action category contain large intra-class variation since the appearance and motion of an action are varying temporally and each of these descriptors comes from a small part of the action called action state. Human activities, such as cooking a dish, are much longer and more complex than actions. On-line activity recognition is more difficult since the positions of action states of an activity are not provided. Hence, the method of on-line activity recognition should parse the action states and activities in the testing video and recognize them at the same time. Another challenge is that shared action states in multiple activity categories may heavily weaken the recognition confidence and result in incorrect activity classification. These challenges make on-line activity recognition a difficult problem. Motivated by the above observations, in this thesis, we present a systematic study to address these challenging problems. Firstly, the standard Bag-of-Words (BoW) representation is improved by assigning each local feature vector to multiple nearest neighbor codes in the codebook to suppress the noise. Then we introduce a Discriminative Action States Discovery (DASD) method for on-line action recognition. In the proposed approach, the positive sample set is treated as multiple patterns called action states. To solve the problem caused by the large intra-class variation, DASD method discovers different distributions of frame-level descriptors from each action category while training classifiers for the action states. The action state models are effectively learned by clustering the positive samples and optimizing the decision boundary of each state simultaneously. Lastly, a Statistical Activity Model (SAM) is proposed for the task of on-line activity recognition and anomaly detection. This model employs DASD algorithm to parse the positions of action states in the untrimmed testing video and captures temporal dependencies between action states to recognize human activities. Furthermore, alarms are generated when the action states are incorrect according to the activity. The on-line inference of SAM is solved by on-line dynamic programming. The proposed DASD model is evaluated on the tasks of on-line action recognition and action prediction. Experimental results show that our methods not only outperform the baseline methods and state-of-the-art methods for on-line action recognition. The SAM algorithm is evaluated by the experiments on on-line action state segmentation, activity recognition, and anomaly detection, where our method achieve the best performance. Last but not least, the proposed methods recognize actions and activities in an on-line processing manner with the real-time speed.
DRNTU::Engineering::Electrical and electronic engineering