Our tool scores each file against a given defect description and then ranks the files in descending order of score, aiming for at least one of the files affected by the defect to be among the top-ranked ones, so that it can serve as an entry point to navigate the code and find the other affected files. In particular it succeeds in placing an affected file among the top-1, top-5 and top-10 files for 48%, 70% and 77% of CRs, on average.
We compared our approach to five other state-of-the-art tools, using their own five metrics on their own six case studies. Out of the 30 performance indicators, we match 2 and improve 28. On average, for 77% of the bug reports we place one or more affected files in the top-10 ranked files.
We also improved, in most cases substantially, the mean reciprocal rank value for all six applications evaluated, thereby reducing the number of files to inspect before finding a relevant file.
The results for our study can be obtained by clicking here: results
ConCodeSe utilises state of the art data extraction, persistence and search APIs. The Java code is parsed using a source code mining tool. We also developed a Java module using the Lucene’s Standard-Analyzer to tokenise the text in the bug reports into terms, which also includes stop-word removal.
Given a bug report (BR) and a source file, our approach computes two kinds of scores for the file: a probabilistic score, given by VSM as implemented by Lucene, and a lexical similarity score. Each kind of scoring is obtained with four search types using a different set of terms indexed from the BR and the file.
For each of the 8 combinations of scoring, all files are ranked in descending order. Then, for each file we take the best of its 8 ranks.
Whilst other localisation algorithms take a “one size fits all” approach, we treat each BR and file individually, using the summary, stack trace, stemming, comments and file names only when available and relevant, i.e. when they improve the ranking.
Download: Contextual Code Search Engine Readme: pdf
Bug localisation is a core program comprehension task in software maintenance: given the observation of a bug, where is it located in the source code files? Information retrieval (IR) approaches see a bug report as the query, and the source code files as the documents to be retrieved, ranked by relevance. Such approaches have the advantage of not requiring expensive static or dynamic analysis of the code.
However, most of state-of-the-art IR approaches rely on project history, in particular previously fixed bugs and previous versions of the source code. We present a novel approach that directly scores each current file against the given report, thus not requiring past code and reports. The scoring is based on heuristics identified through manual inspection of a small set of bug reports.
We compared our approach to five others, using their own five metrics on their own six open source projects. Out of 30 performance indicators, we improve 28. For example, on average we find one or more affected files in the top 10 ranked files for 77% of the bug reports. These results show the applicability of our approach to software projects without history.
Improving Information Retrieval Bug Localisation Using Contextual Heuristics. Dilshener, Tezcan (2017). PhD thesis, The Open University
Locating bugs without looking back (journal version). Dilshener, T., Wermelinger, M. & Yu, Y. Autom Softw Eng (2017). https://doi.org/10.1007/s10515-017-0226-1, online pdf
Locating bugs without looking back. T. Dilshener; M. Wermelinger; and Y. Yu (2016), In Proceedings of the 13th International Conference on Mining Software Repositories, Austin, Texas, MSR ’16, pp. 286–290. ACM, New York, NY, USA. presentation pdf
Improving Bug Localisation Using Lexical Information and Call Relations. T. Dilshener; M. Wermelinger; and Y. Yu (2014) presentation poster pdf
Leveraging Domain Vocabulary across Artefacts: a Comparison of Conceptually Related Applications. T. Dilshener; M. Wermelinger; and Y. Yu (2013) presentation poster pdf
Improving information retrieval-based concept location using contextual relationships. T. Dilshener (2012), In 2012 34th International Conference on Software Engineering (ICSE), pp. 1499–1502. presentation poster pdf
Relating developers’ concepts and artefact vocabulary in a financial software module. T. Dilshener and M. Wermelinger (2011), In Software Maintenance (ICSM), 2011 27th IEEE International Conference on, pp. 412–417. presentation pdf
Google Scholar details click here