Visual data processing over structured dictionaries with applications in light field imaging
Date of Issue2016
School of Electrical and Electronic Engineering
Centre for Signal Processing
Structure exists in all forms of visual data in the nature. In this thesis, we focus on modeling these data structures using sparse representation techniques over a redundant dictionary, which has been proven as an efficient tool in numerous visual signal processing applications. We propose several types of redundant dictionaries with specially designed structures that adapts to the unique application scenarios. Different sparse coding and dictionary training strategies are investigated. Considerations are given to challenging issues such as multi-scale dictionary cross-scale interactions, dictionary disparity segment de-correlation, perspective-shifted dictionary sparse coding acceleration etc. All efforts aim to provide a more powerful frame for representing complicated visual data structures. For sparse signal representation, the sparsity across the scales is a promising yet under investigated direction. In this thesis, we design a multi-scale sparse representation scheme to explore such potential. A multi-scale dictionary (MD) structure is designed. A Cross-scale Matching Pursuit (CMP) algorithm is proposed for multi-scale sparse coding. Two dictionary learning methods: Cross-scale Cooperative Learning MD/CCL), and Cross-scale Atom Clustering (MD/CAC) are proposed with each focusing on one of the two important attributes of an efficient multi-scale dictionary: the similarity, and uniqueness of corresponding atoms in different scales. We analyze and compare their di erent advantages in the application of image denoising under different noise levels, where both methods produce state-of-the-art denoising results. The light field (LF) is a function that describes the intensities of light rays in all possible propagation directions. The LF contains large volumes of visual information that can provide a comprehensive understanding of the 3D environment of the scene. In this thesis, a light field dictionary (LFD) based on perspective-shifting is proposed for sparse representation of the highly correlated light field. A two-stage coding algorithm is proposed which uses the Winner-Take-All (WTA) hashing strategy to narrow done the search range for light field sparse coding. The algorithm proves to be able to increase the coding efficiency by almost three times and keep the reconstruction quality almost the same with the original OMP coding. A compressed sensing framework is proposed for the sampling and reconstruction of a high resolution light field based on a coded aperture camera. Two separate methods, i.e., Sub-Aperture Scan (SAS) and Normalized Fluctuation (NF) are proposed to acquire/calculate the scene disparity, which will be used during the light field reconstruction with the proposed disparity-aware dictionary. A hardware implementation of the proposed light field acquisition/reconstruction scheme is carried out. Both quantitative and qualitative evaluation shows the proposed methods produce state-of- the-art performance in both reconstruction quality and computation efficiency. Then, a light field compression framework based on the LFD is proposed for efficient storage and transmission of the bulky LF data. A highly efficient adaptive guided filtering algorithm is also proposed for the LF disparity/depth map post-processing. Both Quantitative and qualitative simulations validate the efficiency of the proposed methods.