Perceptual and content based analysis for coding multimedia information.
Date of Issue2006
School of Electrical and Electronic Engineering
This report covers the RGM project “Perceptual and Content Based Analysis for Coding Multimedia Information” (22 October 2002 – 30 June 2006). Chapter 1 deals with the topic of content-based image retrieval (CBIR), where three new approaches are proposed: 1) a framework of knowledge-driven CBIR; 2) a criterion to maximize the retrieval performance of Kernel-based Biased Discriminant Analysis (KBDA); and 3) a novel saliency-weighted region-based image retrieval (SW-RBIR) algorithm. This leads to a new image retrieval strategy, where a suitable CBIR algorithm is selected based on image characteristics of foreground objects and backgrounds. Chapter 2 extends the work to video analysis. Two multi-resolution video representation schemes, based on the Kernel-based Principal Component Analysis (KPCA), and the Mean Shift Analysis (MSA) are proposed. Both can lead to efficient multi-resolution video representation by simply tuning the internal parameters. With low computation cost, motion vectors are extracted from MPEG video stream and then processed. This approach is especially suitable for Skycam-based application where the image resolution is usually low. Chapter 3 treats the model selection issue of kernel methods. A unified model selection for both bi-class and multi-class Support Vector Machines (SVMs) is proposed, based on the gradient descent method and conceptually simple and easy to implement. The criterion is then extended to the Kernel-based Linear Discriminant Analysis (KLDA). A generalized radius-margin bound is developed for multi-class SVMs to efficiently perform both model selection and feature selection. ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library Chapter 4 discusses the video repeat identification and video structure analysis. Effective solutions to identify short video repeats from video collections or streams are developed for the purpose of video structure analysis, important event mining, and commercial detection and skipping. Chapter 5 is focused on the image segmentation based on the disjoint set union. A new watershed algorithm is proposed to address issues like the over-segmentation and the memory overflow in some existing methods. Chapter 6 explores the characteristics of the human visual system (HVS) and proposed an improved scheme for estimating just-noticeable distortion (JND). In general, any information below JND can be ignored. Applying the JND gauge to image and video compression can lead to the improvement on coding efficiency as bits can be located according to the JND thresholds in different areas. In other words, the improvement on the perceptual quality of coded images and videos can be observed as compared with traditional coding methods.
DRNTU::Engineering::Computer science and engineering::Data::Coding and information theory