ConCodeSe ranks, on average, for 77% of bug reports one or more affected files in top-10.
Given a bug report (BR) and a source file, our approach computes two kinds of scores for the file: a probabilistic score, given by VSM, and a lexical similarity score. Each kind of scoring is obtained with four search types using a different set of terms indexed from the BR and the file.
For each of the 8 combinations of scoring, all files are ranked in descending order. Then, for each file we take the best of its 8 ranks.
Whilst other localisation algorithms take a “one size fits all” approach, we treat each BR and file individually, using the summary, stack trace, stemming, comments and file names only when available and relevant, i.e. when they improve the ranking.
Quick start guide: pdf