Big data analytics for smart transportation
Tan, Judith Yi Ru
Date of Issue2019
School of Computer Science and Engineering
SMRT Corporation Ltd.
The Singapore urban rail network, interior stations and tracks are highly correlated. If one or some specific stations were disrupted, it would impact the whole network gravely. Therefore, it is pivotal to recognize disruptions happening in these critical stations and put more human and material resources to ensure an efficient and timely “failure response strategy” plan. By using the smart card data provided by Land Transport Authority (LTA) and disruptions events reported in social media, it provides us an opportunity to analyse three key features to find the critical stations that are disrupted. The lack of past information relating to disruptions at certain hours and stations make a detailed analysis of smart card data challenging and near impossible. Therefore, in this final year project, an anomaly detection algorithm is implemented to detect disruptions in the smart card data using two approaches to overcome the shortcomings of anomalies, yet to be discovered. The two approaches adopted are: In-sample approach, which focuses on finding a series of statistical models to detect disruptions(anomalies) in the transit data flow. While, out-of-sample approach is derived to find the best model developed by the in-sample approach as the model of detection for stations without past reported disruption. The out-of-sample approach enables one to know if disruptions could have impacted stations that have never been reported at specific hours. The Gaussian methods(Univariate and Multivariate) will be adopted in this project because it is computationally efficient, combines statistics and supervised machine learning way to solve a problem. After comparison between in-sample and out-of-sample in terms of F1-Score, “duration difference” feature achieves the highest F1-score of 0.56 and 0.38 out of the 3 key features respectively. The other two features are “tap-in” with F1 score of 0.32 and 0.20, and “tap-out”, with F1-Score of 0.38 and 0.22. Feature combinations of “tap-in”,” tap-off” and “duration difference” which can only be built using Multivariate Gaussian method were further experimented and achieved the F1-score of 0.61 and 0.02 respectively. Therefore, the features extracted from the smart card data are the preferred indicators to detect disruptions in smart card data. The poor performance for out-of-sample approach could probably due to the lack of past disruption samples, however both approaches complement each other in detecting disruptions in stations even without past disrupted information. If longer period of historical data (our is of 3months) is invested when building both the in-sample and out-of-sample models, better performance can be achieved. Henceforth, from the two “best” performing indicators, the location of stations, the date and time of disruptions can be accurately identified - which can then enable transit agencies to improve their responses to disruptions, in a more timely, accurate and effective manner.
DRNTU::Engineering::Computer science and engineering
Final Year Project (FYP)
Nanyang Technological University