kc1-binary (Q6033790): Difference between revisions
From MaRDI portal
Created a new Item |
Added link to MaRDI item. |
||
links / mardi / name | links / mardi / name | ||
Latest revision as of 12:29, 16 April 2024
OpenML dataset with id 1066
Language | Label | Description | Also known as |
---|---|---|---|
English | kc1-binary |
OpenML dataset with id 1066 |
Statements
1
0 references
**Author**: \N**Source**: Unknown - Date unknown \N**Please cite**: \N\N%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%\NThis is a PROMISE Software Engineering Repository data set made publicly\Navailable in order to encourage repeatable, verifiable, refutable, and/or\Nimprovable predictive models of software engineering.\N\NIf you publish material based on PROMISE data sets then, please\Nfollow the acknowledgment guidelines posted on the PROMISE repository\Nweb page http://promise.site.uottawa.ca/SERepository .\N%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%\N\N1. Title: Class-level data for KC1\NThis one includes a {_TRUE,FALSE} attribute (DL) to indicate defectiveness.\N\N2. Sources\N(a) Creator: A. Gunes Koru\N(b) Date: February 21, 2005\N(c) Contact: gkoru AT umbc DOT edu Phone: +1 (410) 455 8843\N\N3. Donor: A. Gunes Koru\N\N4. Past Usage: This data was used for:\N\NA. Gunes Koru and Hongfang Liu, "An Investigation of the Effect\Nof Module Size on Defect Prediction Using Static Measures", PROMISE -\NPredictive Models in Software Engineering Workshop, ICSE 2005,\NMay 15th 2005, Saint Louis, Missouri, US.\N\NWe used several machine learning algorithms to predict the defective\Nmodules in five NASA products, namely, CM1, JM1, KC1, KC2, and PC1.\NA set of static measures were used as predictor variables. While doing\Nso, we observed that a large portion of the modules were small, as\Nmeasured by lines of code (LOC). When we experimented on the data\Nsubsets created by partitioning according to module size, we obtained\Nhigher prediction performance for the subsets that include larger\Nmodules. We also performed defect prediction using class-level data\Nfor KC1 rather than method-level data. In this case, the use of class-level\Ndata resulted in improved prediction performance compared to using\Nmethod-level data. These findings suggest that quality assurance activities\Ncan be guided even better if defect predictions are made by using\Ndata that belong to larger modules.\N\N5. Features:\N\NThe descriptions of the features are taken from\Nhttp://mdp.ivv.nasa.gov/mdp_glossary.html\N\NFeature Used as the Response Variable:\N======================================\NDL: Defect level. _TRUE if the class contains one or more defects,\Nfalse otherwise.\N\NFeatures at Class Level Originally\N==================================\N\NPERCENT_PUB_DATA: The percentage of data that is public and protected data\Nin a class. In general, lower values indicate greater encapsulation. It is\Nmeasure of encapsulation.\N\NACCESS_TO_PUB_DATA: The amount of times that a class's public and protected\Ndata is accessed. In general, lower values indicate greater encapsulation.\NIt is a measure of encapsulation.\N\NCOUPLING_BETWEEN_OBJECTS: The number of distinct non-inheritance-related\Nclasses on which a class depends. If a class that is heavily dependent on\Nmany classes outside of its hierarchy is introduced into a library, all the\Nclasses upon which it depends need to be introduced as well. This may be\Nacceptable, especially if the classes which it references are already part\Nof a class library and are even more fundamental than the specified class.\N\NDEPTH: The level for a class. For instance, if a parent has one child the\Ndepth for the child is two. Depth indicates at what level a class is located\Nwithin its class hierarchy. In general, inheritance increases when depth\Nincreases.\N\NLACK_OF_COHESION_OF_METHODS: For each data field in a class, the percentage\Nof the methods in the class using that data field; the percentages are\Naveraged then subtracted from 100%. The locm metric indicates low or\Nhigh percentage of cohesion. If the percentage is low, the class is cohesive.\NIf it is high, it may indicate that the class could be split into separate\Nclasses that will individually have greater cohesion.\N\NNUM_OF_CHILDREN: The number of classes derived from a specified class.\N\NDEP_ON_CHILD: Whether a class is dependent on a descendant.\N\NFAN_IN: This is a count of calls by higher modules.\N\NRESPONSE_FOR_CLASS: A count of methods implemented within a class plus the\Nnumber of methods accessible to an object class due to inheritance. In\Ngeneral, lower values indicate greater polymorphism.\N\NWEIGHTED_METHODS_PER_CLASS: A count of methods implemented within a class\N(rather than all methods accessible within the class hierarchy). In general,\Nlower values indicate greater polymorphism.\NFeatures Transformed to Class Level (Originally at Method Level)\N================================================================\N\NTransformation was achieved by obtaining min, max, sum, and avg values\Nover all the methods in a class. There this data set includes four\Nfeatures for all of the following features that were originally at the\Nmethod level but transformed to the class level. For example, LOC_BLANK\Nhas minLOC_BLANK, maxLOC_BLANK, avgLOC_BLANK, and maxLOC_BLANK.\N\NLOC_BLANK: Lines with only white space or no text content.\N\NBRANCH_COUNT: This metric is the number of branches for each module.\NBranches are defined as those edges that exit from a decision node.\NThe greater the number of branches in a program's modules, the more\Ntesting resource's required.\N\NLOC_CODE_AND_COMMENT: Lines that contain both code and comment.\N\NLOC_COMMENTS: The number of lines in a module. This particular metric\Nincludes all blank lines, comment lines, and source lines.\N\NCYCLOMATIC_COMPLEXITY: It is a measure of the complexity of a modules\Ndecision structure. It is the number of linearly independent paths.\N\NDESIGN_COMPLEXITY: Design complexity is a measure of a module's decision\Nstructure as it relates to calls to other modules. This quantifies the\Ntesting effort related to integration.\N\NESSENTIAL_COMPLEXITY: Essential complexity is a measure of the degree to\Nwhich a module contains unstructured constructs.\N\NLOC_EXECUTABLE: Source lines of code that contain only code and white space.\N\NHALSTEAD_CONTENT: Complexity of a given algorithm independent of the\Nlanguage used to express the algorithm.\N\NHALSTEAD_DIFFICULTY: Level of difficulty in the program.\N\NHALSTEAD_EFFORT: Estimated mental effort required to develop the program.\N\NHALSTEAD_ERROR_EST: Estimated number of errors in the program.\N\NHALSTEAD_LENGTH: This is a Halstead metric that includes the total number\Nof operator occurrences and total number of operand occurrences.\N\NHALSTEAD_LEVEL: Level at which the program can be understood.\N\NHALSTEAD_PROG_TIME: Estimated amount of time to implement the algorithm.\N\NHALSTEAD_VOLUME: This is a Halstead metric that contains the minimum\Nnumber of bits required for coding the program.\N\NNUM_OPERANDS: Variables and identifiers Constants (numeric literal/string)\NFunction names when used during calls.\N\NNUM_UNIQUE_OPERANDS: Variables and identifiers Constants\N(numeric literal/string) Function names when used during calls\N\NNUM_UNIQUE_OPERATORS: Number of unique operators.\N\NLOC_TOTAL: Total Lines of Code.
0 references
2005-02-21
0 references
6 October 2014
0 references
DL
0 references
https://dl.acm.org/doi/abs/10.1145/1083165.1083172
0 references
1
0 references
2
0 references
95
0 references
145
0 references
0
0 references
0
0 references
94
0 references
1
0 references