View Item 
      •   Home
      • 1. Schools
      • College of Engineering
      • School of Electrical and Electronic Engineering (EEE)
      • EEE Student Reports (FYP/IA/PA/PI)
      • View Item
      •   Home
      • 1. Schools
      • College of Engineering
      • School of Electrical and Electronic Engineering (EEE)
      • EEE Student Reports (FYP/IA/PA/PI)
      • View Item
      JavaScript is disabled for your browser. Some features of this site may not work without it.
      Subject Lookup

      Browse

      All of DR-NTUCommunities & CollectionsTitlesAuthorsBy DateSubjectsThis CollectionTitlesAuthorsBy DateSubjects

      My Account

      Login

      Statistics

      Most Popular ItemsStatistics by CountryMost Popular Authors

      About DR-NTU

      Automatic document summarization

      Thumbnail
      FYP Report (1.868Mb)
      Author
      Xu, Hengjie
      Date of Issue
      2017-05-12
      School
      School of Electrical and Electronic Engineering
      Abstract
      Text summarization, an important branch of Natural Language Processing (NLP), has attracted an increasingly amount of research and engineering interest due to the explosion of information nowadays. Currently, most summarization applications have been devoted to social media and structured reports, with little attention paid to news-article analytics. This project aims to achieve automatic text summarization of a vast number of news articles using a few key sentences. It is a pipelined system consisting of text representation models and clustering algorithms (with cluster centroids as key sentences). 8 summarization techniques were evaluated both on the article level and sentence level. After research, we choose Bag of Words (BoW) with Latent Semantic Analysis (LSA) and Spherical K-Means as this combination stands out among all the 8 combinations. In particular, on the article level, the combination produces a score of 0.94, a 17.5% boost compared to our baseline from literature. It reflects that our proposed clustering technique is fairly robust and accurate. This project is consolidated into a single web application. The user interface allows users to obtain relevant news articles based on their input, such as subject names, date range and sources. For subsequent analysis of these news articles, Named Entity Recognition (NER) algorithm is refined and applied to extract major entities, such as places, person and organizations, as preliminary analysis. Eventually, news articles are summarized with sentences using our optimal model of summarization.
      Subject
      DRNTU::Engineering::Electrical and electronic engineering::Computer hardware, software and systems
      Type
      Final Year Project (FYP)
      Rights
      Nanyang Technological University
      Collections
      • EEE Student Reports (FYP/IA/PA/PI)

      Show full item record


      NTU Library, Nanyang Avenue, Singapore 639798 © 2011 Nanyang Technological University. All rights reserved.
      DSpace software copyright © 2002-2015  DuraSpace
      Contact Us | Send Feedback
      Share |    
      Theme by 
      Atmire NV
       

       


      NTU Library, Nanyang Avenue, Singapore 639798 © 2011 Nanyang Technological University. All rights reserved.
      DSpace software copyright © 2002-2015  DuraSpace
      Contact Us | Send Feedback
      Share |    
      Theme by 
      Atmire NV
       

       

      DCSIMG