Robust representation and recognition of facial emotions
Date of Issue2014
School of Electrical and Electronic Engineering
Facial Emotion detection under natural conditions is an interesting topic with a wide range of potential applications like human-computer interaction. Although there is significant research progress in this field, there are still challenges related to real-world unconstrained situations. One essential challenge is to find pose invariant spatio-temporal volumetric features to analyze the video sequence efficiently. Another important issue is how to deal with noisy and imperfect data recorded in uncontrolled environments such as illumination variations, partial occlusion, and head movements. The focus of this research is to develop a robust system for facial expression recognition as a dynamic event in natural situations. Two strategies have been proposed in this research to address the uncontrolled environments related challenges: Robust representation framework: we propose a novel spatio-temporal descriptor based on Optical Flow (OF) components which is very distinctive and also pose-invariant. Robust recognition framework: we explored the effectiveness of sparse representation obtained by supervised learning a set of basis (dictionary). Extreme Sparse Learning (ESL) is proposed to jointly learn a dictionary and a nonlinear classification model to robustly detect the facial expression in real-world natural situations. The proposed approach combines the discriminative power of the Extreme Learning Machine (ELM) with the reconstruction property of the sparse representation to deal with noisy signal and imperfect data recorded in natural settings. Since the facial feature extraction performance is highly dependent on facial pose, we propose a novel spatio-temporal descriptor which is robust to facial pose variations. However, the feature encoding may fail in the presence of extreme head pose variations, where some parts of the face are not visible in the recorded images. To address this problem and also dealing with illumination variations and occlusion, we suggested following the idea of sparse representation where the noisy data can be reconstructed from the clean data provided by the dictionary of the sparse representation. While the sparse representation approach has the ability to enhance noisy data using a dictionary learned from clean data, it is not sufficient because the end goal is to correctly recognize the facial expression. In a sparse-representation-based classification task, the desired dictionary should have both representational ability and discriminative power. Since separating the classification training from dictionary learning may cause the learned dictionary to be sub-optimal for the classification task, we propose to jointly learn a dictionary and classification model. In other words, in contrast with most existing schemes that attempt to update the dictionary and classifier parameters alternately by iteratively solving each sub-problem, we propose to solve them simultaneously. This joint dictionary learning and classifier training can be expected to result in a dictionary that is both reconstructive and discriminative for a robust recognition system. To the best of our knowledge, this is the only work that attempts to simultaneously learn the sparse representation of the signal and train a nonlinear classifier to be discriminative for sparse codes. The proposed method jointly learns a single dictionary and also an optimal nonlinear classifier. We have performed extensive experiments on both acted and spontaneous emotion databases to evaluate the effectiveness of the proposed feature extraction and classification schemes under different scenarios. Our results clearly demonstrate the robustness of the proposed emotion recognition framework, especially in challenging scenarios that involve illumination changes, occlusion, and head pose variations.
DRNTU::Engineering::Electrical and electronic engineering::Control and instrumentation