Exploiting long context using joint distance and occurrence information for language modeling
Chong, Tze Yuang
Date of Issue2018
School of Computer Science and Engineering
This thesis investigates an approach to exploiting the long context based on the information about the distance and occurrence. By modeling the joint event of distance and occurrence, this approach attempts to incorporate the inter-dependencies into the model, such that information captured from the long context can be more optimally made use. This thesis addresses the problem with the conventional language modeling approaches that tend to neglect the inter-dependencies. Based on the proposed approach, a novel language model, referred to as the term-distance term-occurrence (TDTO) model, is formulated. The TDTO model estimates probabilities based on the events of term-distance (TD) and term-occurrence (TO) that correspond to the distances and occurrences of words in the context. By expressing the TDTO model in terms of a log-linear interpolation framework, the impact of the TD and TO towards the final estimation can be tuned. Specifically, as the TD events, i.e. positions, within a long context are likely rare or unseen, the weight of the TD component can be tuned down accordingly to alleviate the data scarcity problem. Through a series of experiments, the TDTO model has been shown to be capable of exploiting the long context to reduce the perplexities of the language models. On the BLLIP Wall Street Journal (WSJ) and Switchboard-1 (SWB) corpora, perplexity reductions up to 11.2% and 6.5% were obtained, with the context lengths of seven and eight, respectively. In addition, the TDTO model has been shown to outperform other conventional models used to exploit the long context, such as the distant-bigram, trigger and BOW models – the TDTO model consistently showed lower perplexities. The applicability of the TDTO model has been examined on several tasks, such as the speech recognition, text classiﬁcation and word prediction. The TDTO model has been shown to improve the baseline performance on all the considered tasks. Furthermore, this thesis proposes a neural network implementation of the TDTO model. The aim is to provide a better smoothing mechanism for TDTO modeling. The resulted model, referred to as the neural network based TDTO (NN-TDTO) model, has been empirically shown to outperform the baseline TDTO model in both perplexity and speech recognition accuracy. On the WSJ corpus, the NN-TDTO model yielded up to 9.2% lower perplexity as compared to the TDTO model. On the Aurora-4 speech recognition task, the NN-TDTO model obtained up to 12.9% relatively lower word error rate.
DRNTU::Engineering::Computer science and engineering