data_mining_static_code_attributes_to_learn_defect_predictors
Differences
This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| data_mining_static_code_attributes_to_learn_defect_predictors [2014/02/15 12:16] – yann | data_mining_static_code_attributes_to_learn_defect_predictors [2025/01/15 21:40] (current) – external edit 127.0.0.1 | ||
|---|---|---|---|
| Line 11: | Line 11: | ||
| The authors make the case that, when it comes to defect prediction, "how the attributes are used to build predictors is much more important than which particular attributes are used". To come this this conclusion, they use 38 attributed and three different learners: OneR, J48, and naïve Bayes. They use as dataset the [[http:// | The authors make the case that, when it comes to defect prediction, "how the attributes are used to build predictors is much more important than which particular attributes are used". To come this this conclusion, they use 38 attributed and three different learners: OneR, J48, and naïve Bayes. They use as dataset the [[http:// | ||
| - | They start by justifying the need for such predictors: "These potential defect-prone trouble spots can then be examined in more detail by, say, modelchecking, | + | The authors |
| + | predictor" | ||
| + | |||
| + | The authors build a baseline using the following data: | ||
| + | * Independent variables: the three learners OneR, J48, and naïve Bayes; | ||
| + | * Studied objects: 8 systems whose metric values are available in the NASA MDP dataset; | ||
| + | * Input data to the independent variables: 38 code metrics available in the NASA MDP dataset; | ||
| + | * Dependent variables: the binary variable // | ||
| + | * Measures: the probability of detection, //pd//, and of false alarm, //pf//; | ||
| + | |||
| + | The authors explain their choice of the measures by explaining that " | ||
| + | |||
| + | The authors describe in details the procedure for building the predictors and comparing them with one another: the " | ||
| + | |||
| + | In conclusion, the authors show that the naïve Bayes-based predictor was the best, i.e., has the best balance between //pd// and //pf// over all other possible combination of attributes and independent variables. But they also show that the different attributes were better for different object systems: | ||
| + | * For //pc1//, the best code metrics are call_pairs, μ2, and number_of_lines; | ||
| + | * For //mw1//, the best code metrics are B, node_count, μ2; | ||
| + | * For //kc3//, the best code metrics are loc_executable, | ||
| + | * For //cm1//, the best code metrics are loc_comments, | ||
| + | * For //pc2//, the best code metrics are loc_comments, | ||
| + | * For //kc4//, the best code metrics are call_pairs, edge_count, node_count; | ||
| + | * For //pc3//, the best code metrics are loc_blanks, I, number_of_lines; | ||
| + | * For //pc4//, the best code metrics are loc_blanks, loc_code_and_command, | ||
| + | |||
| + | The only limitations to the study (in addition to the threats mentioned in the paper) are that one of the authors worked with the NASA on the MDP program, thus there is possibly an experimenter bias. More seriously, the NASA MDP only provide metric values, no source code is available to check the quality of the data, compute different metrics, and apply different analyses! | ||
data_mining_static_code_attributes_to_learn_defect_predictors.1392466603.txt.gz · Last modified: (external edit)
