Improving Automated Bug Triaging with Specialized Topic Model

Improving Automated Bug Triaging with Specialized Topic Model


Bug triaging refers to the process of assigning a bug to the most appropriate developer to fix. It becomes more and more difficult and complicated as the size of software and the number of developers increase. In this paper, we propose a new framework for bug triaging, which maps the words in the bug reports (i.e., the term space) to their corresponding topics (i.e., the topic space). We propose a specialized topic modeling algorithm named multi-feature topic model (MTM) which extends Latent Dirichlet Allocation (LDA) for bug triaging. MTM considers product and component information of bug reports to map the term space to the topic space. Finally, we propose an incremental learning method named TopicMiner which considers the topic distribution of a new bug report to assign an appropriate fixer based on the affinity of the fixer to the topics. We pair TopicMiner with MTM (TopicMiner MTM). We have evaluated our solution on 5 large bug report datasets including GCC, OpenOffice, Mozilla, Netbeans, and Eclipse containing a total of 227,278 bug reports. We show that TopicMinerMTM can achieve top-1 and top-5 prediction accuracies of 0.4831 – 0.6868, and 0.7686 – 0.9084, respectively. We also compare TopicMinerMTM with Bugzie, LDA-KL, SVM-LDA, LDA-Activity, and Yang et al.’s approach. The results show that TopicMinerMTM on average improves top-1 and top-5 prediction accuracies of Bugzie by 128.48% and 53.22%, LDA-KL by 262.91% and 105.97%, SVM-LDA by 205.89% and 110.48%, LDA-Activity by 377.60% and 176.32%, and Yang et al.’s approach by 59.88% and 13.70%, respectively.

PROJECT OUTPUT VIDEO: (Click the below link to see the project output video):


  • To aid in finding appropriate developers, automatic bug triaging approaches have been proposed in the existing. Many of these approaches use the vector space model (VSM) to represent a bug report, i.e., a bug report is treated as a vector of terms (words) and their counts. However, developers often use various terms to express the same meaning. The same term can also carry different meanings depending on the context. These synonymous and polysemous words cannot be captured by VSM.
  • Various topic modeling algorithms are proposed in the literature including Latent Semantic Indexing/Analysis (LSA), probabilistic LSA (pLSA), and Latent Dirichlet Allocation (LDA). Among the three, LDA is the most recently proposed and it addresses the limitations of LSA and pLSA.


  • LDA considers a document as a random mixture of latent topics, where a topic is a random mixture of terms.
  • One or few features can be only taken into consideration.
  • Lower accuracy.
  • More complex
  • More time taken


  • We extend LDA and propose a new topic model named multi-feature topic model (MTM) for the bug triaging problem. Since a bug report has multiple features (e.g., product affected by the bug, component affected by the bug, etc.), MTM considers the features of a bug report when it converts terms in the textual description of the report (i.e., texts in the summary and description fields of the report) to their corresponding topics in the topic space. Given a bug report with a particular feature combination (i.e., product component combination), MTM converts a word in the bug report, to a topic.
  • We refer to a feature as a categorical field in a bug report that a bug reporter can fill when the reporter submits a bug report. These fields include the product, component, reporter, priority, severity, OS, version, and platform fields. We exclude the natural language descriptions in the bug reports, which includes the contents of the summary and description fields, as the features since they are not categorical in nature.
  • In this paper, we use the product-component combination as the input feature combination, since product and component are two of the most important features that describe a bug. Given a bug report with a particular feature combination, MTM converts a term in the bug report to a topic by putting special emphasis on the appearances of the word in bug reports with the same feature combination, without ignoring the word appearances in all other bug reports.


  • MTM considers each combination of features as a random mixture of latent topics, where a topic is a random mixture of terms.
  • MTM is an extensible topic model, where one or more features can be taken into consideration.
  • We propose a new approach for bug triaging which leverages MTM. We take as input a training set of bug reports (whose fixers are known) and a new bug report whose fixer is to be predicted.
  • Our approach, named TopicMiner MTM computes the affinity of a developer to a new bug report, based on the reports that the developer fixed before. To do this, we compare the topics that appear in the new bug report with those in the old reports that the developer has fixed before.




  • System : Pentium Dual Core.
  • Hard Disk : 120 GB.
  • Monitor : 15’’ LED
  • Input Devices : Keyboard, Mouse
  • Ram : 1 GB


  • Operating system : Windows 7.
  • Coding Language : JAVA/J2EE
  • Tool : ECLIPSE
  • Database :         MYSQL


Xin Xia, Member, IEEE, David Lo, Member, IEEE, Ying Ding, Jafar M. Al-Kofahi, Tien N. Nguyen, Member, IEEE, Xinyu Wang, Member, IEEE, “Improving Automated Bug Triaging with Specialized Topic Model”, IEEE Transactions on Software Engineering, 2017.

About the Author