Supporting information needs of developers through web Q&A discussions
Date of Issue2018
School of Computer Science and Engineering
Programming is evolving because of the prevalence of the Web. Nowadays, it is a common activity that developers search the Web to find information in order to solve the problems they encounter while working on software development tasks. However, existing studies investigated the information needs of developers on the Web via qualitative analysis and questionnaire survey. Unfortunately, little is known about the developers' micro-level information behaviors and needs on the Web during software development. For example, how often did the developers refine existing queries and/or create new queries? and how many web pages were opened after a search? To fill this gap, we conducted an empirical study to investigate the strategies that how developers seek and use web resources at the micro-level. The empirical study revealed three key insights: First, developers might have an incomplete or even incorrect understanding of their needs; Second, there is a gap between the producers and consumers of software documentation; Third, many important pieces of information that developers need are explicitly undocumented in software documentation. There insights motivated further studies of supporting developers' information needs. More specifically, the contributions of this thesis are: (1) Understanding information needs of developers: We developed a video scraping tool to automatically extract developers' behavioral data from the task videos. We conducted a micro-level quantitative analysis of the developers' information, including patterns of keyword sources, keyword refinement, web pages visited, context switching, and information flow. The outcomes of this micro-level quantitative analysis provided three important insights for supporting developers' information needs. (2) Discovering learning resources: To bridge the information gap in the first insight, we developed our LinkLive technique to recommend more correlated learning resources when developers know less. LinkLive uses multiple features, including hyperlink co-occurrences in web Q&A discussions, locations (e.g., question, answer, or comment) in which hyperlinks are referenced, and votes for posts/comments in which hyperlinks are referenced. A large-scale evaluation shows that our technique recommends correlated web resources with satisfactory precision and recall in an open setting. (3) Answering programming questions: To bridge the information gap in the second insight, we proposed a novel deep-learning-to-answer framework, named QDLinker, for answering programming questions with software documentation. QDLinker leverages the large volume of discussions in Community-based Question Answering (CQA) to bridge the semantic gap between programmers' questions and software documentation. Through extensive experiments, we show that QDLinker significantly outperforms the baselines based on traditional retrieval models and Web search services dedicated for software documentation. (4) Distilling crowdsourced negative caveats: To bridge the information gap in the third insight, we proposed DISCA, a novel approach to automatically distilling desirable Application Program Interface (API) negative caveats from unstructured web Q&A discussions. The quantitative and qualitative evaluations show that DISCA can greatly augment the official API documentation.
DRNTU::Engineering::Computer science and engineering