Domain adaptation and generalization for visual recognition
Date of Issue2017-03-28
Interdisciplinary Graduate School (IGS)
In many visual recognition tasks, the domain distribution mismatch between the training set (i.e., source domain) and the test set (i.e., target domain) may cause the performance of the classifier learnt from the training set to be significantly degraded on the test set. The solutions to address the domain distribution mismatch can be classified into Domain Adaptation (DA) and Domain Generalization (DG). Specifically, DA utilizes the unlabeled target domain data in the training phase to reduce the domain distribution mismatch while DG aims to learn the classifier on the source domain which can generalize well to any unseen target domain. This thesis focuses on DA and DG for visual recognition. Most of the existing DA and DG approaches require well labeled training data. Since collecting labeled data is often time consuming and expensive, some recent works utilize freely available web images/videos for visual recognition. Therefore, the DA and DG methods can be categorized based on learning from web data or well labeled data. For learning from web data, besides the data distribution mismatch between the web training data and test data, there also exist some other problems such as label noise of web data and extra information associated with web data (i.e., privileged information). All the existing DA and DG methods only consider the domain distribution mismatch, but ignore the label noise and privileged information. To this end, we propose a DA framework and a DG method for learning from web data, which leads to the first and second work in this thesis respectively. In the first work, we propose our DA framework named Domain Adaptive Multi-Instance Learning using Privileged Information (MIL-PI-DA) for visual recognition by learning from web data, which can handle the label noise, utilize the privileged information, and reduce the domain distribution mismatch at the same time. In the second work, we propose our DG method named Weakly Supervised Domain Generalization (WSDG) for visual recognition by learning from web data, which can cope with the label noise, take advantage of the privileged information, and generalize well to any unseen target domain at the same time. For learning from well labeled data, there are also some issues with the existing DA and DG approaches. One issue is how to utilize multiple types of features in the multi-view scenario. Although multi-view DA has been studied, no multi-view approach has been proposed for DG, which motivates our third work in this thesis. In the third work, we propose a framework named Exemplar-based Multi-View Domain Generalization (EMVDG) for visual recognition based on the consensus principle and complementary principle, which is the first work to explore the DG problem under the multi-view setting. Another issue when learning from well labeled data is that for global feature representations (e.g. , Fisher vector encoded based on Gaussian Mixture Model (GMM)), the codebook (e.g., GMM) learnt on the source domain may not well capture the distribution of the target domain. There is no existing DA approach considering this issue, which motivates our fourth work in this thesis. In the fourth work, we propose a Domain Adaptive method based on Fisher Vector (DAFV) for visual recognition, which is specifically designed for Fisher vector. Our key idea is to reduce the domain distribution mismatch by selecting domain invariant components of Fisher vectors. For all our proposed DA or DG methods, we conduct extensive experiments and comparisons with the state-of-the-art methods. The experimental results demonstrate the superior performance of our proposed methods under different scenarios.