cocomo_numeric (Q6033774): Difference between revisions

Latest revision as of 13:29, 16 April 2024

OpenML dataset with id 1051

Language	Label	Description	Also known as
English	cocomo_numeric	OpenML dataset with id 1051

Statements

0 references

0 references

dataset version identifier

1

0 references

description

**Author**: \N**Source**: Unknown - Date unknown \N**Please cite**: \N\N%-*- text -*-\N%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%\NThis is a PROMISE Software Engineering Repository data set made publicly\Navailable in order to encourage repeatable, verifiable, refutable, and/or\Nimprovable predictive models of software engineering.\N\NIf you publish material based on PROMISE data sets then, please\Nfollow the acknowledgment guidelines posted on the PROMISE repository\Nweb page http://promise.site.uottawa.ca/SERepository .\N%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%\N1. Title/Topic: cocomonasa/software cost estimation\N2. Sources:\N\N-- Creators: 60 NASA projects from different centers\Nfor projects from the 1980s and 1990s. Collected by\NJairus Hihn, JPL, NASA, Manager SQIP Measurement &\NBenchmarking Element\NPhone (818) 354-1248 (Jairus.M.Hihn@jpl.nasa.gov)\N\N-- Donor: Tim Menzies (tim@barmag.net)\N\N-- Date: December 2 2004\N3. Past Usage\N1. "Validation Methods for Calibrating Software Effort\NModels", T. Menzies and D. Port and Z. Chen and\NJ. Hihn and S. Stukes, Proceedings ICSE 2005,\Nhttp://menzies.us/pdf/04coconut.pdf\N-- Results\N-- Given background knowledge on 60 prior projects,\Na new cost model can be tuned to local data using\Nas little as 20 new projects.\N-- A very simple calibration method (COCONUT) can\Nachieve PRED(30)=7% or PRED(20)=50% (after 20 projects).\NThese are results seen in 30 repeats of an incremental\Ncross-validation study.\N-- Two cost models are compared; one based on just\Nlines of code and one using over a dozen "effort\Nmultipliers". Just using lines of code loses 10 to 20\NPRED(N) points.\N\N3.1 Additional Usage:\N2. "Feature Subset Selection Can Improve Software Cost Estimation Accuracy"\NZhihao Chen, Tim Menzies, Dan Port and Barry Boehm\NProceedings PROMISE Workshop 2005,\Nhttp://www.etechstyle.com/chen/papers/05fsscocomo.pdf\NP02, P03, P04 are used in this paper.\N-- Results\N-- To the best of our knowledge, this is the first report\Nof applying feature subset selection (FSS)\Nto software effort data.\N\N-- FSS can dramatically improve cost estimation.\N\N---T-tests are applied to the results to demonstrate\Nthat always in our data sets, removing\Nattributes improves performance without increasing the\Nvariance in model behavior.\N\N4. Relevant Information\N\NThe COCOMO software cost model measures effort in calendar months\Nof 152 hours (and includes development and management hours).\NCOCOMO assumes that the effort grows more than linearly on\Nsoftware size; i.e. months=a* KSLOC^b*c. Here, "a" and "b" are\Ndomain-specific parameters; "KSLOC" is estimated directly or\Ncomputed from a function point analysis; and "c" is the product\Nof over a dozen "effort multipliers". I.e.\N\Nmonths=a*(KSLOC^b)*(EM1* EM2 * EM3 * ...)\N\NThe effort multipliers are as follows:\N\Nincrease | acap | analysts capability\Nthese to | pcap | programmers capability\Ndecrease | aexp | application experience\Neffort | modp | modern programing practices\N| tool | use of software tools\N| vexp | virtual machine experience\N| lexp | language experience\N----------+------+---------------------------\N| sced | schedule constraint\N----------+------+---------------------------\Ndecrease | stor | main memory constraint\Nthese to | data | data base size\Ndecrease | time | time constraint for cpu\Neffort | turn | turnaround time\N| virt | machine volatility\N| cplx | process complexity\N| rely | required software reliability\N\NIn COCOMO I, the exponent on KSLOC was a single value ranging from\N1.05 to 1.2. In COCOMO II, the exponent "b" was divided into a\Nconstant, plus the sum of five "scale factors" which modeled\Nissues such as ``have we built this kind of system before?''. The\NCOCOMO~II effort multipliers are similar but COCOMO~II dropped one\Nof the effort multiplier parameters; renamed some others; and\Nadded a few more (for "required level of reuse", "multiple-site\Ndevelopment", and "schedule pressure").\N\NThe effort multipliers fall into three groups: those that are\Npositively correlated to more effort; those that are\Nnegatively correlated to more effort; and a third group\Ncontaining just schedule information. In COCOMO~I, "sced" has a\NU-shaped correlation to effort; i.e. giving programmers either\Ntoo much or too little time to develop a system can be\Ndetrimental.\N\NThe numeric values of the effort multipliers are:\N\Nvery\T\T\T\Tvery\Textra\Tproductivity\Nlow\Tlow\Tnominal\Thigh\Thigh\Thigh\Trange\N---------------------------------------------------------------------\Nacap\T1.46 \T1.19 \T1.00 \T0.86 \T0.71 \T\T2.06\Npcap\T1.42. \T1.17 \T1.00 \T0.86 \T0.70 \T\T1.67\Naexp \T1.29 \T1.13 \T1.00 \T0.91 \T0.82 \T\T1.57\Nmodp \T1.24. \T1.10 \T1.00 \T0.91 \T0.82 \T\T1.34\Ntool \T1.24 \T1.10 \T1.00 \T0.91 \T0.83 \T\T1.49\Nvexp \T1.21 \T1.10 \T1.00 \T0.90 \T \T\T1.34\Nlexp \T1.14 \T1.07 \T1.00 \T0.95 \T \T\T1.20\Nsced \T1.23 \T1.08 \T1.00 \T1.04 \T1.10 \T \Te\Nstor \T \T \T1.00 \T1.06 \T1.21 \T1.56\T-1.21\Ndata \T \T 0.94 \T1.00 \T1.08 \T1.16\T\T-1.23\Ntime \T \T \T1.00 \T1.11 \T1.30 \T1.66\T-1.30\Nturn \T \T0.87 \T1.00 \T1.07 \T1.15 \T\T-1.32\Nvirt \T \T0.87 \T1.00 \T1.15 \T1.30 \T\T-1.49\Nrely \T0.75\T 0.88\T 1.00 \T 1.15 \T 1.40\T\T-1.87\Ncplx \T0.70 \T0.85 \T1.00 \T1.15 \T1.30 \T1.65\T-2.36\N\NThese were learnt by Barry Boehm after a regression analysis of the\Nprojects in the COCOMO I data set.\N@Book{boehm81,\NAuthor =\T "B. Boehm",\NTitle =\T "Software Engineering Economics",\NPublisher =\T "Prentice Hall",\NYear =\T 1981}\N\NThe last column of the above table shows max(E)/min(EM) and shows\Nthe overall effect of a single effort multiplier. For example,\Nincreasing "acap" (analyst experience) from very low to very\Nhigh will most decrease effort while increasing "rely"\N(required reliability) from very low to very high will most\Nincrease effort.\N\NThere is much more to COCOMO that the above description. The\NCOCOMO~II text is over 500 pages long and offers\Nall the details needed to implement data capture and analysis of\NCOCOMO in an industrial context.\N@Book{boehm00b,\NAuthor = "Barry Boehm and Ellis Horowitz and Ray Madachy and\NDonald Reifer and Bradford K. Clark and Bert Steece\Nand A. Winsor Brown and Sunita Chulani and Chris Abts",\NTitle = "Software Cost Estimation with Cocomo II",\NPublisher = "Prentice Hall",\NYear = 2000,\Nibsn = "0130266922"}\N\NIncluded in that book is not just an effort model but other\Nmodels for schedule, risk, use of COTS, etc. However, most\N(?all) of the validation work on COCOMO has focused on the effort\Nmodel.\N@article{chulani99,\Nauthor =\T "S. Chulani and B. Boehm and B. Steece",\Ntitle =\T "Bayesian Analysis of Empirical Software Engineering\NCost Models",\Njournal =\T "IEEE Transaction on Software Engineering",\Nvolume =\T 25,\Nnumber =\T 4,\Nmonth =\T "July/August",\Nyear =\T "1999"}\N\NThe value of an effort predictor can be reported many ways\Nincluding MMRE and PRED(N).MMRE and PRED are computed from the\Nrelative error, or RE, which is the relative size of the\Ndifference between the actual and estimated value:\N\NRE.i = (estimate.i - actual.i) / (actual.i)\N\NGiven a data set of of size "D", a "Train"ing set of size\N"(X=|Train|) <= D", and a "test" set of size "T=D-|Train|", then\Nthe mean magnitude of the relative error, or MMRE, is the\Npercentage of the absolute values of the relative errors,\Naveraged over the "T" items in the "Test" set; i.e.\N\NMRE.i = abs(RE.i)\NMMRE.i = 100/T*( MRE.1 + MRE.2 + ... + MRE.T)\N\NPRED(N) reports the average percentage of estimates that were\Nwithin N% of the actual values:\N\Ncount=0\Nfor(i=1;i<=T;i++) do if (MRE.i <= N/100) then count++ fi done\NPRED(N) = 100/T * sum\N\NFor example, e.g. PRED(30)=50% means that half the estimates are\Nwithin 30% of the actual. Shepperd and Schofield comment that\N"MMRE is fairly conservative with a bias against overestimates\Nwhile Pred(25) will identify those prediction systems that are\Ngenerally accurate but occasionally wildly inaccurate".\N@article{shepperd97,\Nauthor="M. Shepperd and C. Schofield",\Ntitle="Estimating Software Project Effort Using Analogies",\Njournal="IEEE Transactions on Software Engineering",\Nvolume=23,\Nnumber=12,\Nmonth="November",\Nyear=1997,\Nnote="Available from\N\url{http://www.utdallas.edu/~rbanker/SE_XII.pdf}"}\N\N4.1 Further classification of the projects\N\N4.1.1 Classify the projects into different project categories - P02, P03, P04.\N(The criteria is unknown and they are disjoint.)\N\NCategory\Tsequence Original sequence_of_NASA\NP01\T1\TNASA\T26\NP01\T2\TNASA\T27\NP01\T3\TNASA\T28\NP01\T4\TNASA\T29\NP01\T5\TNASA\T30\NP01\T6\TNASA\T31\NP01\T7\TNASA\T32\NP02\T1\TNASA\T4\NP02\T2\TNASA\T5\NP02\T3\TNASA\T6\NP02\T4\TNASA\T7\NP02\T5\TNASA\T8\NP02\T6\TNASA\T9\NP02\T7\TNASA\T10\NP02\T8\TNASA\T11\NP02\T9\TNASA\T12\NP02\T10\TNASA\T13\NP02\T11\TNASA\T14\NP02\T12\TNASA\T15\NP02\T13\TNASA\T16\NP02\T14\TNASA\T17\NP02\T15\TNASA\T18\NP02\T16\TNASA\T19\NP02\T17\TNASA\T20\NP02\T18\TNASA\T21\NP02\T19\TNASA\T22\NP02\T20\TNASA\T23\NP02\T21\TNASA\T24\NP02\T22\TNASA\T25\NP03\T1\TNASA\T34\NP03\T2\TNASA\T35\NP03\T3\TNASA\T36\NP03\T4\TNASA\T37\NP03\T5\TNASA\T38\NP03\T6\TNASA\T39\NP03\T7\TNASA\T40\NP03\T8\TNASA\T41\NP03\T9\TNASA\T42\NP03\T10\TNASA\T43\NP03\T11\TNASA\T44\NP03\T12\TNASA\T45\NP04\T1\TNASA\T47\NP04\T2\TNASA\T48\NP04\T3\TNASA\T49\NP04\T4\TNASA\T50\NP04\T5\TNASA\T51\NP04\T6\TNASA\T52\NP04\T7\TNASA\T53\NP04\T8\TNASA\T54\NP04\T9\TNASA\T55\NP04\T10\TNASA\T56\NP04\T11\TNASA\T57\NP04\T12\TNASA\T58\NP04\T13\TNASA\T59\NP04\T14\TNASA\T60\N\N4.1.2 Classify the projects into different task categories - T01, T02, T03.\N(The criteria is unknown and they are disjoint.)\NT01:sequencing T02:avionics T03:missionPlanning\N\NCategory\Tsequence Original sequence_of_NASA\NT01\T1\TNASA\T43\NT01\T2\TNASA\T41\NT01\T3\TNASA\T37\NT01\T4\TNASA\T34\NT01\T5\TNASA\T40\NT01\T6\TNASA\T38\NT01\T7\TNASA\T39\NT01\T8\TNASA\T36\NT02\T1\TNASA\T4\NT02\T2\TNASA\T6\NT02\T3\TNASA\T26\NT02\T4\TNASA\T27\NT02\T5\TNASA\T33\NT02\T6\TNASA\T32\NT02\T7\TNASA\T29\NT02\T8\TNASA\T30\NT02\T9\TNASA\T28\NT02\T10\TNASA\T7\NT02\T11\TNASA\T9\NT02\T12\TNASA\T10\NT02\T13\TNASA\T55\NT02\T14\TNASA\T31\NT03\T1\TNASA\T51\NT03\T2\TNASA\T52\NT03\T3\TNASA\T16\NT03\T4\TNASA\T17\NT03\T5\TNASA\T8\NT03\T6\TNASA\T50\NT03\T7\TNASA\T53\NT03\T8\TNASA\T45\NT03\T9\TNASA\T48\NT03\T10\TNASA\T47\N\N4.1.3 Classify the projects into different Centers - C01, C02, C03.\N(The criteria is unknown and they are disjoint.)\NCategory sequence Original sequence_of_NASA\N\NC01\T1\TNASA\T1\NC01\T2\TNASA\T2\NC01\T3\TNASA\T51\NC01\T4\TNASA\T52\NC01\T5\TNASA\T50\NC01\T6\TNASA\T53\NC01\T7\TNASA\T48\NC01\T8\TNASA\T47\NC01\T9\TNASA\T58\NC01\T10\TNASA\T59\NC01\T11\TNASA\T60\NC01\T12\TNASA\T49\NC01\T13\TNASA\T54\NC02\T1\TNASA\T45\NC02\T2\TNASA\T43\NC02\T3\TNASA\T41\NC02\T4\TNASA\T35\NC02\T5\TNASA\T34\NC02\T6\TNASA\T40\NC02\T7\TNASA\T38\NC02\T8\TNASA\T39\NC02\T9\TNASA\T36\NC02\T10\TNASA\T37\NC02\T11\TNASA\T42\NC02\T12\TNASA\T44\NC03\T1\TNASA\T4\NC03\T2\TNASA\T6\NC03\T3\TNASA\T26\NC03\T4\TNASA\T27\NC03\T5\TNASA\T33\NC03\T6\TNASA\T32\NC03\T7\TNASA\T29\NC03\T8\TNASA\T30\NC03\T9\TNASA\T28\NC03\T10\TNASA\T7\NC03\T11\TNASA\T9\NC03\T12\TNASA\T10\NC03\T13\TNASA\T31\NC03\T14\TNASA\T21\NC03\T15\TNASA\T14\NC03\T16\TNASA\T22\NC03\T17\TNASA\T3\NC03\T18\TNASA\T19\NC03\T19\TNASA\T16\NC03\T20\TNASA\T17\NC03\T21\TNASA\T8\NC03\T22\TNASA\T23\NC03\T23\TNASA\T20\NC03\T24\TNASA\T24\NC03\T25\TNASA\T12\NC03\T26\TNASA\T5\NC03\T27\TNASA\T13\NC03\T28\TNASA\T25\NC03\T29\TNASA\T15\NC03\T30\TNASA\T18\NC03\T31\TNASA\T11\N5. Number of instances: 60\N6. Number of attributes: 17 (15 discrete in the range Very_Low to\NExtra_High; one lines of code measure, and one goal field\Nbeing the actual effort in person months).\N7. Attribute information:\N8. Missing attributes: none\N9: Class distribution: the class value (ACT_EFFORT) is continuous.\NAfter sorting all the instances on ACT_EFFORT, the following\Ndistribution was found:\NInstances Range\N--------- --------------\N1..10 8.4 .. 42\N11..20 48 .. 68\N21..30 70 .. 117.6\N31..40 120 .. 300\N41..50 324 .. 571\N51..60 750 .. 3240\NChange log:\N-----------\N\N2005/04/04 Jelber Sayyad Shirabad (PROMISE Librarian) <promise@site.uottawa.ca>\N1) Minor editorial changes, as well as moving the information provided by\NZhihao Chen to the new sections 3.1 and 4.1\N\N2005/03/28 Zhihao Chen, CSE, USC, USA, <zhihaoch@cse.usc.edu>\N1) Fix a mistake in line corresponding to cplx entry in the table of "The numeric values of the effort multipliers"\N"cplx \T0.70 \T0.85 \T1.00 \T1.15 \T1.30 \T1.65\T-1.86" should be\N"cplx \T0.70 \T0.85 \T1.00 \T1.15 \T1.30 \T1.65\T-2.36"\N\N2) Additional information about various classifications of the projects are provided.\N\N3) Additional usage information is provided

0 references

Jairus Hihn

0 references

2004-12-2

0 references

upload date

6 October 2014

0 references

full work available at URL

https://api.openml.org/data/v1/download/53934/cocomo_numeric.arff

0 references

http://promise.site.uottawa.ca/SERepository/

0 references

default target attribute

ACT_EFFORT

0 references

0 references

0 references

https://ieeexplore.ieee.org/abstract/document/1553605

0 references

checksum

f26e1680852b8effe5ecbb564b62b6f2

determination method

MD5

0 references

number of binary features

0

0 references

number of classes

0

0 references

number of features

17

0 references

number of instances

60

0 references

number of instances with missing values

0

0 references

number of missing values

0

0 references

number of numeric features

2

0 references

number of symbolic features

15

0 references

file format

ARFF

0 references

MaRDI profile type

MaRDI dataset profile

0 references

Sitelinks

Mathematics(1 entry)

mardi Dataset:6033774

Revision as of 11:35, 15 April 2024 Importer (talk \| contribs) Bots 7,032,692 edits ‎Created a new Item	Latest revision as of 13:29, 16 April 2024 Import240416010454 (talk \| contribs) 10,906 edits Added link to MaRDI item.
links / mardi / name	links / mardi / name
		Dataset:6033774