3D audio reproduction using frontal projection headphones
Date of Issue2016
School of Electrical and Electronic Engineering
3D audio reproduction over headphones has become one of the most commonly used forms of playback system. Headphones provide a private listening space and are extremely convenient to use due to portability. However, headphones playback of 3D audio is marred by several challenges. Head related transfer functions (HRTFs) contain all the spatial information involved in the propagation of an acoustic wave from the source position to the receiver position. HRTFs encapsulate the spectral, temporal as well as timbral effects in the source sound spectrum that are caused due to the interaction of the source wave (diffraction, reflections, and scattering) with the human torso, shoulders, head and the pinna for any given source location. Human ears are highly idiosyncratic and thus, HRTFs are also unique and vary largely among different subjects. Use of non-individualized HRTFs degrade the 3D audio perception by introducing front-back confusions, up-down reversals, in-head localization (IHL) and timbral coloration. Therefore, individualized HRTFs are necessary to provide an accurate and immersive perception of 3D sound. However, measuring these individualized HRTFs are extremely tedious. HRTFs have to be measured precisely for every azimuth, elevation, and distance for every individual, which is practically not feasible. For distances in the near field (< 1 m), the measurement of individualized HRTFs is even more difficult due to the large variation of HRTFs with distances in the near-field. Thus, easier techniques to obtain the individualized HRTFs at any source location have to be developed.In the past, there have been several attempts at individualizing the HRTFs. The most widely used technique is to acoustically measure the HRTFs at the listener’s ears at various source positions around the listener’s head. Researchers have also modeled the individualized HRTFs by obtaining the listener’s anthropometric features with the help of MRI, laser scanner, 3D imagery, or other 2D imagery techniques. Other techniques involve customizing the generic HRTFs using perceptual feedback. All these techniques to obtain the individualized HRTFs require highly precise individualized measurements, individual anthropometry features, or long hours of training that can cause severe fatigue to the listeners. In this thesis, a novel individualization technique is developed using “frontal projection headphones playback” that does not require any individualized acoustical measurements. The frontal projection headphones project the sound from the front directly onto the pinnae, unlike the conventional side emitter headphones. Frontal projection of sound during the playback process captures all the individualized pinna spectral features present in the HRTF. The first part of this thesis explains in detail the role of frontal projection headphones playback in modeling the individualized pinna spectral cues. Headphones inherently distort the input sound spectrum and thus, the effect of headphones has to be compensated prior to playback. Headphone transfer function (HPTF) can be considered as a combination of the headphone transducer response and headphone-ear coupling. Headphones-ear coupling depends highly on the human pinnae, thereby making the headphones response highly idiosyncratic. HPTF also displays large variation at high frequencies with repositioning of the headphone over the ear. Thus, a perfect headphone equalization filter cannot be designed that can compensate exactly for the headphone response. For the frontal projection headphones playback, a robust equalization filter known as the Type-2 equalization filter is designed. Type-2 equalization preserves the personal pinna cues that are generated during the playback process, and removes only. the distortion created by the transducer and the resonant modes of the earcup. However, conventional equalization technique would compensate for the entire headphone response and thus the individual pinna cues generated would also be removed. It is found that the individualized pinna cues embedded by the frontal projection headphone after Type-2 equalization model the true individualized pinna cues well. Subjective experiments prove that the frontal projection headphone improves the perception of 3D audio by reducing the localization errors. The frontal projection headphones playback technique is used to model the distance- dependent individualized HRTFs in the horizontal plane. In order to develop this model, the important cues that affect distance perception are first identified with the help of detailed objective and subjective experiments. In particular, the role of the auditory parallax and the interaural cues are investigated. It is found that the auditory parallax cues play only a minor role for distance perception in the presence of interaural cues. The interaural cues play an extremely critical role in the distance localization process. In the proposed model, the distance-dependent spectral effects of the head and head shadow effects are both modeled by the spherical head model. The highly idiosyncratic pinna spectral cues are modeled by the frontal projection headphones. In this way, the frontal horizontal plane HRTFs can be modeled without any need for external individualized measurements. However, in order to model the rear HRTFs, a rear-directional filter is required. Moreover, Type-2 equalization of the headphones and a low-frequency compensation filter is additionally required in the modeling process. Objective and Subjective analysis showed that the modeled individualized distance-dependent HRTFs using frontal projection headphones match the measured individualized HRTFs. The modeling of individualized HRTFs in the horizontal plane is then extended to the sagittal plane. The pinna spectral features that characterize the elevation perception are first studied. It is found that the position and center frequencies of the pinna notches varymonotonically with elevation. The frontal projection headphone responses of the subjects are first captured by a one-time measurement before the modeling process. The important pinna features that affect the elevation are then extracted from the frontal projection response. Elevation perception is simulated by shifting the notch frequency positions corresponding to the elevation variation. With subjective and objective experiments, it is seen that the frontal projection headphones could model the sagittal plane HRTFs close to the individualized HRTFs. It is observed that the use of frontal projection headphones could naturally enhance the veracity of the 3D audio perception. This approach can also be used in rendering natural sound over headphones especially for digital media content(movies, games, etc.). A 3D audio headphone, which consists of a unique combination of emitters and their orientation is developed. The 3D audio headphone holds both the frontal emitter x, as well as the conventional side emitter on each side of the earcup. The primary or the directional cues extracted from the digital media content are played through the frontal emitters, while the ambience or the environment signal is played through both the frontal as well as the side emitters. Perceptual experiments carried out showed that the 3D audio headphones can provide a listening experience close to natural listening. This thesis mainly focuses on the individualization of virtual audio using frontal projection headphones. With the emergence of mobile smartphones with high processing power and the growing market of headphones, personalized 3D audio has the potential to make a great impact with applications ranging from navigation systems, communication systems, assisting the visually impaired and related areas of health, and entertainment.
DRNTU::Engineering::Electrical and electronic engineering