A Human Study of Patch Maintainability
Identifying and fixing defects is a crucial and expensive part of the software lifecycle. Measuring the quality of bug-fixing patches is a difficult task that affects both functional correctness and the future maintainability of the code base. Recent research interest in automatic patch generation makes a systematic understanding of patch maintainability and understandability even more critical.We present a human study involving over 150 participants, 32 real-world defects, and 40 distinct patches. In the study, humans perform tasks that demonstrate their understanding of the control flow, state, and maintainability aspects of code patches. As a base- line we use both human-written patches that were later reverted and also patches that have stood the test of time to ground our results. To address any potential lack of readability with machine-generated patches, we propose a system wherein such patches are augmented with synthesized, human-readable documentation that summarizes their effects and context. Our results show that machine-generated patches are slightly less maintainable than human-written ones, but that trend reverses when machine patches are augmented with our synthesized documentation. Finally, we examine the relationship between code features (such as the ratio of variable uses to assign- ments) with participants’ abilities to complete the study tasks and thus explain a portion of the broad concept of patch quality.
Feel free to try out the actual experiment at: CodeQuality
The data and subject code from our experiments can be found here
A Human Study of Fault Localization Accuracy
Localizing and repairing defects are critical software engineering activities. Not all programs and not all bugs are equally easy to debug, however. We present formal models, backed by a human study involving 65 participants (from both academia and industry) and 1830 total judgments, relating various software- and defect-related features to human accuracy at locating errors. Our study involves example code from Java textbooks, helping us to control for both readability and complexity. We find that certain types of defects are much harder for humans to locate accurately. For example, humans are over five times more accurate at locating "extra statements" than "missing statements" based on experimental observation. We also find that, independent of the type of defect involved, certain code contexts are harder to debug than others. For example, humans are over three times more accurate at finding defects in code that provides an array abstraction than in code that provides a tree abstraction. We identify and analyze code features that are predictive of human fault localization accuracy. Finally, we present a formal model of debugging accuracy based on those source code features that have a statistically significant correlation with human performance (for the full paper and slides, please see the publications page).Feel free to try out the actual experiment at: BugFinder
The data and subject code from our experiments can be found here
