Separation of underdetermined speech mixture based on sparse Bayesian recovery
Date of Issue2017-07-18
School of Electrical and Electronic Engineering
Centre for Signal Processing
This thesis focuses on solving the problems of separating underdetermined speech mixture using sparse Bayesian recovery techniques. Firstly, this thesis describes a novel algorithm to improve the performance of sparsity based single-channel speech separation. The conventional approach assumes the mixing conditions and source signals are stationary. For practical applications of speech source separation, however, we face the challenges of non-stationary mixing conditions due to the variation of sources or moving speakers. The proposed algorithm deals with this nonstationary situation in single-channel source separation where the speech signals are recovered based on a sparse Bayesian learning algorithm. Secondly, an algorithm for underdetermined instantaneous speech separation problem is described based on hierarchical sparse Bayesian technique for e cient data reconstruction. The proposed algorithm consists of three steps. The unknown mixing matrix is rstly estimated from the speech mixtures in the transform domain. Then, a permutation issue is solved based on the results from the rst step to get the correct order of the dictionary. Finally speech sources are recovered using the hierarchical sparse structure of the mixed speech signals. Numerical experiments including the comparison with other sparse representation approach are provided to show that our proposed method could reduce the interference e ectively and achieve desirable performance improvement. Thirdly, we work on the problem of speech source separation from underdetermined convolutive mixture with channel identi cation and recovery. Our proposed method does not require prior knowledge about the source geometry information. The rst step of the proposed algorithm is to estimate the convolutive channel from the speech mixtures after a clustering procedure to select single source time interval. The next step is to recover the speech signal based on a compressed sensing concept in short time Fourier transform domain. Compared to conventional methods, the separation performance is greatly improved when the mixing channel is known. Finally, a noise-robust algorithm of separating speech sources from their underdetermined convolutive mixture is raised. Unlike the previously reported methods, our proposed algorithm can work in a noisy environment. In our method, the recovery of the speech signal makes use of the sparse structure of the speech signals with a calibration to the estimated channel. The proposed method operates in a statistical manner in TF domain to achieve desirable separation results without selecting regularization parameters. Numerical experiments including the comparison with other separation approaches for convolutive speech mixtures are provided to show that our algorithm achieves performance improvement.