Simplifying Software Repository Analysis through Collective, Incremental Ontology Matching
Over the course of software development projects, software repositories accumulate a wealth of data that has the potential to provide decision support to practitioners. However, without knowing which coherences are worthwhile monitoring, the available data cannot be used to its potential. To this end, the software repository mining community researches these repositories and captures the gathered insights within publications and by creating new as well as improving existing analysis tools.
These tools and the underlying knowledge, however, are disconnected, meaning that changed or newly discovered metrics and models have to be made available for each analysis tool separately, e.g. by creating new plugins. My work presents an approach to overcome this issue by allowing to transfer queries on software repositories between different implementations. Due to the heterogeneity and constant change of the employed groupware tools, this is not a trivial task. Differences in data schemas and semantics need to be handled, i.e., ontologies have to be matched. While this task can be supported through automatic matching by a certain degree, a considerable amount of matching tasks requires manual user interaction.
The presented approach integrates these matching tasks into the process of query translation. Thus, users get direct feedback about the correctness of the generated alignment and the immediate benefit of obtaining answers to the questions that are reflected by the queries. We implemented this concept as part of a repository for patterns in groupware activity. This repository collectivizes the necessary translation efforts as each user contributes in the scope of their queries of interest. Furthermore, existing alignments are chained in order to further minimize the effort necessary to execute queries on as many repository implementations as possible. We evaluate our approach by showing how it simplifies the implementation of realistic use cases in comparison to existing, state-of-the-art analysis tools.