Building Causal Inference Models to Predict Fault Understanding and Intervene in Code Inspection Tasks
In this talk I will present the theory and results on two methodological issues that I face in my research: (1) how to build models for counterfactual thinking (with respect to the accuracy of fault understanding)? (2) how to evaluate modeling assumptions (through sensitivity analysis, choices for Bayesian priors, and instances of class imbalance)?
Structured abstract
[Context] Software programmers can spend up to 40 percent of their time searching for the causes of software failures. To alleviate that, programmers use debugging tools and techniques to reduce the search space for software faults (bugs).
[Research Problem] However, these techniques assume ”perfect fault understanding”, which means that the programmer will always correctly identify the bug if the programmer is presented with the source code lines containing the bug. Since this is not true, programmers waste time with invalid bug fixes and, hence, lose confidence in the debugging tools.
[Approach] I study how to mitigate this problem by predicting when programmers are accurate in their identification of the software fault.
[Data] I performed two large field experiments with real bugs from popular open source software projects and more than one thousand programmers recruited on the Amazon Mechanical Turk platform. I used the data from one of these experiments to validate assumptions via a third controlled experiment with programmers recruited among industry practitioners and graduate students.
[Results] My results are promising in a sense that I could uncover a set of prediction factors based on task attributes and programmers’ profiles. These factors were combined into distinct models that can predict with different levels of precision and recall when a fault is correctly identified.
[Application of Results] I used two of these prediction models to develop a voting and sequencing algorithm that parallelizes the search for the software fault. Simulations showed that the algorithm can coordinate the work of hundreds of programmers to identify and explain different types of real software bugs. The outcomes in terms of speed and costs were competitive when compared with a single professional programme