Hypothesis test and estimator of high dimensional covariance matrix
Date of Issue2015
School of Physical and Mathematical Sciences
This thesis is concerned about statistical inference for the population covariance matrix in the high-dimensional setting. Specically, we consider the two most popular topics nowadays: testing the equality of two population covariance matrices and estimating a single covariance matrix. Due to the increasing interest in the high-dimensional data in the recent years, there are already a mass of works on these two topics. However, we would like to point out that in this thesis, we focus on the cases when either these existing results fail or less studied. Thus our results provide useful supplementation and extension for high-dimensional covariance matrix analysis. The rst problem we consider is testing the equality of two population covariance matrices when the data dimension p diverges with the sample size n (p=n ! c > 0). We propose a weighted test statistic which is powerful in both faint alternatives (many small disturbances) and sparse alternatives (several large disturbances). Its asymptotic null distribution is derived by large random matrix theory without assuming the existence of a limiting cumulative distribution function of the population covariance matrix. The simulation results con rm that our statistic is powerful against all alternatives, while other tests given in the literature fail in at least one situation. As for the large covariance matrix estimation topic, we notice that most of the literature works consider the estimation of large positive de nite population covariance matrix based on matrix norms (e.g. Frobenius norm). By contrast, the rank is less discussed when the covariance matrix is singular (not full rank). While in this low rank assumption, both rank accuracy and the matrix loss should be standards when evaluating a matrix estimator. In this thesis, we settle this problem rst by considering the estimation 2 of the covariance matrix rank via two information criteria. Theoretical convergence properties are developed and ve di erent models are applied to investigate the numerical performance of our rank estimators. Then as an application, we make use of the rank estimator to estimate large singular covariance matrices, considering both rank accuracy and matrix norm loss as evaluation criteria of the matrix estimator. Finally, we further explore the rank estimator's another potential application, estimating the number of factors in nonparametric relationship. In summary, this thesis proposes statistics for high-dimensional covariance matrix testing and estimation problems under the settings that are left out by current literature works. Asymptotic theoretical results are developed and simulations are provided to demonstrate the e ectiveness of the proposed test statistics theoretically and numerically, respectively.