dc.contributor.authorChua, Chee Ann
dc.description.abstractData is a valuable asset, but only for people who have adequate skills of data mining and apply them to analyze and reveal the trends or patterns that are hidden inside the otherwise unstructured data. This project aimed to create a tool that is able to help the user to gain insights from a large-scale dataset by applying multiple data mining processes on the data and visualizing the results. Among all the social media sites, Twitter was chosen and 500 million raw tweets were used as the dataset in this project. Only some part of the information from the tweets would be extracted for analysis, specifically, geo-coordinates, timestamp, and the tweet content itself. To ensure that data was perfectly cleansed, data preprocessing had been performed to filter out those records with the missing attributes. The analysis will consist of two data mining techniques: one is cluster analysis for the geo-coordinates, and the other one is topic modeling analysis for the content of the tweets. Meanwhile, these two techniques were not only performed solely in their area but they were also integrated together to build other features like tracking system, which could reveal the user’s mobility and active places from the big data. With all these features, the developed tool was able to turn all these raw data into useful and valuable information.en_US
dc.format.extent51 p.en_US
dc.rightsNanyang Technological University
dc.subjectDRNTU::Engineering::Computer science and engineeringen_US
dc.titleTools for analysis of large-scale networks (I) algorithms, analytics and visualizationen_US
dc.typeFinal Year Project (FYP)en_US
dc.contributor.supervisorHsu Wen Jingen_US
dc.contributor.schoolSchool of Computer Science and Engineeringen_US
dc.description.degreeBachelor of Engineering (Computer Science)en_US

Files in this item


This item appears in the following Collection(s)

Show simple item record