dc.contributor.authorKokil Jaidka
dc.date.accessioned2014-06-05T06:43:07Z
dc.date.available2014-06-05T06:43:07Z
dc.date.copyright2014en_US
dc.date.issued2014
dc.identifier.urihttp://hdl.handle.net/10356/61137
dc.description.abstractThis study is in the area of multi-document summarization of research papers. It addresses the gap identified between the structure and readability of human-written summaries and other automatic multi-document summaries, which only focus on selecting the more important information from the set of documents but neglect to consider its readability. In the context of this overall goal, the first part of this study develops a literature review framework which specifies the structural, rhetorical and content characteristics of human-written literature reviews. In the second part of the study, an automatic method is developed which partially implements this framework, to generate multi-document summaries of research papers emulating some characteristic of human-written literature reviews. The framework is based on extensive discourse and information analyses of literature reviews in the domain of information science. The corpus for analysis comprised 120 literature review sections published as a part of research papers in international peer-reviewed top information science journals – Journal of the American Society for Information Science and Technology (JASIST), Journal of Information Science (JIS) and Journal of Documentation (JDoc) over the years 2000-2008. The macro-level analysis identifies the document structure within a literature review, which comprises 9 types of discourse elements. The sentence-level analysis identifies 22 rhetorical functions employed in literature reviews and 153 linguistic devices which frame information within sentences. The information analysis identifies significant associations between the source sections of selected sentences and the transformations performed on them. Results show that literature reviews are written in two main styles – integrative literature reviews and descriptive literature reviews. Integrative literature reviews present information from several studies in a condensed form as a critical summary, possibly complemented with a comparison, evaluation or comment on the research gap. They focus on highlighting relationships amongst concepts or comparing studies against each other. Descriptive reviews present more experimental detail about previous studies, such as their approach, results and evaluation. These findings are incorporated into the multi-level literature review framework, comprising their macro-level structure and their rhetorical functions, as well as the information summarization strategies. Based on this framework, in the second part of the study a multi-document summarization method emulating characteristics of human literature reviews is developed to generate an integrative summary that combines information across the papers and highlights the agreements and disagreements among them. It extracts information concepts from research papers by imitating researchers’ preferences, integrates them across the set of related papers and organizes them as a topic tree; finally, it presents them using sentence templates which realize rhetorical functions. The method which is presented here only focuses on summarizing and comparing the research objectives information across papers, and hence it applies only those components of the framework which are appropriate to choose and synthesize research objective information. Automatic content evaluation shows no significant difference between the summaries generated by the automatic method, and the baseline sentence extraction system, MEAD. However, the quality characteristics of the automatic summaries are a significant improvement over MEAD summaries because about two-thirds of all assessors (35 PhD students and professors in Library and Information Science) preferred to use them over MEAD summaries; they are also perceived as significantly more useful for obtaining a research overview or seeing comparisons across studies. The automatic summaries are also considered more readable in the way they relate topics and sentence to each other. However, they still have grammatical errors and repetitions; to resolve those, it is recommended to improve include some post-processing steps in the automatic method. Assessors with different levels of research experience are found to hold different expectations from the final summary – the ones with less experience look for more details about individual studies; it can be inferred that they prefer a more descriptive literature review. More experienced assessors want to understand the bigger picture and the main themes of the research; evidently, they want a more integrative literature review. These insights can help in customizing the automatic method for its users.en_US
dc.description.abstractThis study is in the area of multi-document summarization of research papers. It addresses the gap identified between the quality of human-written summaries and other automatic multi.­ document summaries, which only focus on selecting the more important information from the set of documents but neglect to consider its readability. In the context of this overall goal, the first part of this study develops a literature review framework which specifies the structural, rhetorical and content characteristics of human-written literature reviews. The framework is based on extensive discourse and content analysis of literature reviews which identified the macro-level structure, sentence-level rhetorical functions and the authors' selection and transformation strategies which constitute literature reviews. The second part of the study develops an automatic method to partially implement this framework and generate multi-document summanes of research papers emulating some characteristics of human-written literature reviews in selecting, integrating, organizing and framing information. Assessors perceive this automatic summary as significantly more useful and readable than the summaries of the baseline system, MEAD, which employs a sentence extraction method.en_US
dc.format.extent258 p.en_US
dc.language.isoenen_US
dc.rightsNanyang Technological University
dc.subjectDRNTU::Library and information science::Generalen_US
dc.subjectDRNTU::Engineering::Computer science and engineering::Information systems::Information systems applicationsen_US
dc.subjectDRNTU::Humanities::Linguisticsen_US
dc.subjectDRNTU::Humanities::Linguistics::Sociolinguistics::Computational linguisticsen_US
dc.titleA literature review framework for multi-document summarization of research papersen_US
dc.typeThesis
dc.contributor.supervisorJin Cheon Naen_US
dc.contributor.supervisorKhoo Soo Guan, Christopheren_US
dc.contributor.schoolWee Kim Wee School of Communication and Informationen_US
dc.description.degree​Doctor of Philosophy (WKWSCI)en_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record