Automatic sentiment classification of movie reviews.
Chan, Kok Hong.
Date of Issue2008
Wee Kim Wee School of Communication and Information
The increasing number of online reviews of goods and services has lead to the development of many approaches for sentiment classification and analysis. This study presents a framework for sentiment classification for movie reviews. There are several existing approaches for sentiment classification. Sentiment classification using unigrams has being the most successful for most of the previous studies. However, results generated by unigrams could be degraded by negation terms and terms that require users to do inference. To address this problem, there are several studies that indicate that higher order n-grams have good potential of producing better classification. Problems encountered by unigrams such as negation terms could be solved by higher order n-grams such as bigrams because terms like “not good” has being extracted as a single term. In addition, higher order n-grams with feature reduction methods, such as X2 features reduction, are been explored to see if this attempt will produce better results. The movie reviews datasets are selected because they are considered to be one of the most difficult domains to classify. Producing good classification results from the movie review domain will ensure that good results will be achieved when applied on other datasets. The research methods used for this study will consist of three portions. Firstly, the results from the simple unigram approach in this study are compared with the results presented by Pang, Lee & Vaithyanathan (2002). Secondly, the classification results generated by higher n-grams and adjectives are compared to those presented by Pang et al. (2002). Lastly, the classification results after application of feature reduction methods such as X2 features reduction are compared. An application has also been developed for non-technical users so that these users are not subjected to the tedious process of creating training set and using sentiment classification. Additionally, this application has been bundled with additional feature selection options.
DRNTU::Engineering::Computer science and engineering::Computing methodologies::Document and text processing