Towards high-quality 3D telepresence with commodity RGBD camera
Date of Issue2018-01-08
School of Computer Science and Engineering
3D telepresence aims at providing remote participants to have the perception of being present at the same physical space, which cannot be achieved by any 2D teleconference system. The success of 3D telepresence will greatly enhance communications, allowing much better user experience, which could stimulate many applications including teleconference, telesurgery, remote education, etc. Despite years of study, 3D telepresence research still faces many challenges such as high system cost, hard to achieve real-time performance with consumer-level hardware and high computation requirement, costly to obtain depth data, hard to extracting 3D people in real-time with high quality and difficult for 3D scene replacement and composition. The emerging of consumer-grade range cameras, such as Microsoft Kinect, which provides convenient and low-cost acquisition of 3D depth in real-time, accelerate many multimedia applications. In this thesis, we make a few attempts, aim at improving the quality of 3D telepresence with commodity RGBD camera. First, considering that the raw depth data of commodity depth camera is highly noisy and error-prone, we carefully study the error patterns of Kinect and propose a multi-scale direction-aware filtering method to combat Kinect noise. We have also implemented the proposed method in CUDA to achieve real-time performance. Experimental results show that our method outperforms the popular bilateral filter. Second, we consider the problem of real-time extracting dynamic foreground person from RGB-D video, which is a common task in 3D telepresence. Existing methods are hard to en- sure real time, high quality and temporal coherence at the same time. We propose a foreground extraction framework which nicely integrates many existing techniques including background subtraction, depth hole filing and 3D matting. We also take advantage of various CUDA strategies and spatial data structures to improve the speed. Experimental results show that, compared with state-of-the-art methods, our proposed method can extract stable foreground objects with higher visual quality as well as better temporal coherence, while still achieving real-time performance. Third, we further consider another challenging problem in 3D telepresence, i.e. given a RGBD video, we want to replace the local 3D background scene by a target 3D scene. There are a lot of issues such as the mismatch between the local scene and the target scene, the range of motion in different scenes, the collision problem, etc. We propose a novel scene replacement system that consists of multi-stages of processing including foreground extraction, scene adjustment, scene analysis, scene suggestion, scene matching, and scene rendering. We also develop our system entirely on the GPU by parallelizing most of the computation with CUDA strategies, by which we can achieve not only good visual quality scene replacement but also real-time performance.