Tools for analysis of large-scale networks (I) algorithms, analytics and visualization
Chua, Chee Ann
Date of Issue2017-04-11
School of Computer Science and Engineering
Data is a valuable asset, but only for people who have adequate skills of data mining and apply them to analyze and reveal the trends or patterns that are hidden inside the otherwise unstructured data. This project aimed to create a tool that is able to help the user to gain insights from a large-scale dataset by applying multiple data mining processes on the data and visualizing the results. Among all the social media sites, Twitter was chosen and 500 million raw tweets were used as the dataset in this project. Only some part of the information from the tweets would be extracted for analysis, specifically, geo-coordinates, timestamp, and the tweet content itself. To ensure that data was perfectly cleansed, data preprocessing had been performed to filter out those records with the missing attributes. The analysis will consist of two data mining techniques: one is cluster analysis for the geo-coordinates, and the other one is topic modeling analysis for the content of the tweets. Meanwhile, these two techniques were not only performed solely in their area but they were also integrated together to build other features like tracking system, which could reveal the user’s mobility and active places from the big data. With all these features, the developed tool was able to turn all these raw data into useful and valuable information.
Final Year Project (FYP)
Nanyang Technological University