Sentiment analysis in Twitter
Date of Issue2016-04-26
School of Computer Engineering
A* STAR, Institute for Infocomm Research
Nowadays, social media platforms, such as Facebook, Twitter and Instagram, have gained tremendous popularity. These platforms allow people to post real time messages about their opinions on a variety of topics, discuss current issues, complain, and express positive feelings. A rising trend for companies or research institutions to analyse opinions and feelings hidden in messages on social media platform has been seen in recent years. In this thesis, the scope of analysis is narrowed down to opinion mining on the messages on Twitter, the so-called tweets. The specific task is to develop a sentiment analysis system for the three-point scale message polarity subtask of the Twitter sentiment analysis task in Semantic Evaluation Exercises 2016 (SemEval-2016). A baseline system had been developed upon which improvements were made. Three other systems had been integrated into the baseline system via the powerful classifier fusion process. One of the three systems leveraged a new asymmetric SIMPLS (ASIMPLS) based classifier whereas the rest leveraged L2-regularized linear regression. ASIMPLS was proved to be able to identify the minority class well in imbalanced classification problems and L2-regularized linear regression was proved to be efficient and of relatively good performance. Besides those features used in most existing systems, word embedding was introduced. For each word, three word embedding vectors derived from positive, neutral, and negative tweet sets respectively were obtained. These vectors are used as features in the ASIMPLS system. The final fusion system achieved 59.63% accuracy evaluated based on the without-neutral F1-score on the SemEval-2016 test set and ranked 7th among 34 systems in the competition.
Final Year Project (FYP)
Nanyang Technological University