Information extraction and analysis of DBLP data
Neo, Lynette Shi Yun
Date of Issue2017-04-27
School of Computer Science and Engineering
There is much information that can be extracted and analysed from the large collection of DBLP bibliography data. However, it can be a difficult to extract the useful information from such a large set of data with more than 3 million entries. The purpose of this report is to highlight the work done during the course of this final year project. There are two main objectives for this project. The first is to parse an XML file containing DBLP bibliography data into CSV files and load them into a relational database. The second is to do data analytics and mining on the data to extract useful information. For the second objective of this project, three data analytics tasks were done to analyse the DBLP data. This project aims to analyse the collaboration between authors of the DBLP community. In this project, the collaboration network of the authors was analysed to show the trend in collaboration between authors. Next, the collaborators of individual authors were obtained to analyse if there was a relation between the authors and their collaborators. Lastly, topic modelling was done on the titles of the publications and the topics are used to suggest collaborators for authors based on the past topics where the author had published in. This report then discusses the results of these analysis done and conclude with suggestions to future work.
DRNTU::Engineering::Computer science and engineering
Final Year Project (FYP)
Nanyang Technological University