cocomo_numeric (Q6033774)

From MaRDI portal
Revision as of 12:29, 16 April 2024 by Import240416010454 (talk | contribs) (Added link to MaRDI item.)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
OpenML dataset with id 1051
Language Label Description Also known as
English
cocomo_numeric
OpenML dataset with id 1051

    Statements

    0 references
    **Author**: \N**Source**: Unknown - Date unknown \N**Please cite**: \N\N%-*- text -*-\N%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%\NThis is a PROMISE Software Engineering Repository data set made publicly\Navailable in order to encourage repeatable, verifiable, refutable, and/or\Nimprovable predictive models of software engineering.\N\NIf you publish material based on PROMISE data sets then, please\Nfollow the acknowledgment guidelines posted on the PROMISE repository\Nweb page http://promise.site.uottawa.ca/SERepository .\N%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%\N1. Title/Topic: cocomonasa/software cost estimation\N2. Sources:\N\N-- Creators: 60 NASA projects from different centers\Nfor projects from the 1980s and 1990s. Collected by\NJairus Hihn, JPL, NASA, Manager SQIP Measurement &\NBenchmarking Element\NPhone (818) 354-1248 (Jairus.M.Hihn@jpl.nasa.gov)\N\N-- Donor: Tim Menzies (tim@barmag.net)\N\N-- Date: December 2 2004\N3. Past Usage\N1. "Validation Methods for Calibrating Software Effort\NModels", T. Menzies and D. Port and Z. Chen and\NJ. Hihn and S. Stukes, Proceedings ICSE 2005,\Nhttp://menzies.us/pdf/04coconut.pdf\N-- Results\N-- Given background knowledge on 60 prior projects,\Na new cost model can be tuned to local data using\Nas little as 20 new projects.\N-- A very simple calibration method (COCONUT) can\Nachieve PRED(30)=7% or PRED(20)=50% (after 20 projects).\NThese are results seen in 30 repeats of an incremental\Ncross-validation study.\N-- Two cost models are compared; one based on just\Nlines of code and one using over a dozen "effort\Nmultipliers". Just using lines of code loses 10 to 20\NPRED(N) points.\N\N3.1 Additional Usage:\N2. "Feature Subset Selection Can Improve Software Cost Estimation Accuracy"\NZhihao Chen, Tim Menzies, Dan Port and Barry Boehm\NProceedings PROMISE Workshop 2005,\Nhttp://www.etechstyle.com/chen/papers/05fsscocomo.pdf\NP02, P03, P04 are used in this paper.\N-- Results\N-- To the best of our knowledge, this is the first report\Nof applying feature subset selection (FSS)\Nto software effort data.\N\N-- FSS can dramatically improve cost estimation.\N\N---T-tests are applied to the results to demonstrate\Nthat always in our data sets, removing\Nattributes improves performance without increasing the\Nvariance in model behavior.\N\N4. Relevant Information\N\NThe COCOMO software cost model measures effort in calendar months\Nof 152 hours (and includes development and management hours).\NCOCOMO assumes that the effort grows more than linearly on\Nsoftware size; i.e. months=a* KSLOC^b*c. Here, "a" and "b" are\Ndomain-specific parameters; "KSLOC" is estimated directly or\Ncomputed from a function point analysis; and "c" is the product\Nof over a dozen "effort multipliers". I.e.\N\Nmonths=a*(KSLOC^b)*(EM1* EM2 * EM3 * ...)\N\NThe effort multipliers are as follows:\N\Nincrease | acap | analysts capability\Nthese to | pcap | programmers capability\Ndecrease | aexp | application experience\Neffort | modp | modern programing practices\N| tool | use of software tools\N| vexp | virtual machine experience\N| lexp | language experience\N----------+------+---------------------------\N| sced | schedule constraint\N----------+------+---------------------------\Ndecrease | stor | main memory constraint\Nthese to | data | data base size\Ndecrease | time | time constraint for cpu\Neffort | turn | turnaround time\N| virt | machine volatility\N| cplx | process complexity\N| rely | required software reliability\N\NIn COCOMO I, the exponent on KSLOC was a single value ranging from\N1.05 to 1.2. In COCOMO II, the exponent "b" was divided into a\Nconstant, plus the sum of five "scale factors" which modeled\Nissues such as ``have we built this kind of system before?''. The\NCOCOMO~II effort multipliers are similar but COCOMO~II dropped one\Nof the effort multiplier parameters; renamed some others; and\Nadded a few more (for "required level of reuse", "multiple-site\Ndevelopment", and "schedule pressure").\N\NThe effort multipliers fall into three groups: those that are\Npositively correlated to more effort; those that are\Nnegatively correlated to more effort; and a third group\Ncontaining just schedule information. In COCOMO~I, "sced" has a\NU-shaped correlation to effort; i.e. giving programmers either\Ntoo much or too little time to develop a system can be\Ndetrimental.\N\NThe numeric values of the effort multipliers are:\N\Nvery\T\T\T\Tvery\Textra\Tproductivity\Nlow\Tlow\Tnominal\Thigh\Thigh\Thigh\Trange\N---------------------------------------------------------------------\Nacap\T1.46 \T1.19 \T1.00 \T0.86 \T0.71 \T\T2.06\Npcap\T1.42. \T1.17 \T1.00 \T0.86 \T0.70 \T\T1.67\Naexp \T1.29 \T1.13 \T1.00 \T0.91 \T0.82 \T\T1.57\Nmodp \T1.24. \T1.10 \T1.00 \T0.91 \T0.82 \T\T1.34\Ntool \T1.24 \T1.10 \T1.00 \T0.91 \T0.83 \T\T1.49\Nvexp \T1.21 \T1.10 \T1.00 \T0.90 \T \T\T1.34\Nlexp \T1.14 \T1.07 \T1.00 \T0.95 \T \T\T1.20\Nsced \T1.23 \T1.08 \T1.00 \T1.04 \T1.10 \T \Te\Nstor \T \T \T1.00 \T1.06 \T1.21 \T1.56\T-1.21\Ndata \T \T 0.94 \T1.00 \T1.08 \T1.16\T\T-1.23\Ntime \T \T \T1.00 \T1.11 \T1.30 \T1.66\T-1.30\Nturn \T \T0.87 \T1.00 \T1.07 \T1.15 \T\T-1.32\Nvirt \T \T0.87 \T1.00 \T1.15 \T1.30 \T\T-1.49\Nrely \T0.75\T 0.88\T 1.00 \T 1.15 \T 1.40\T\T-1.87\Ncplx \T0.70 \T0.85 \T1.00 \T1.15 \T1.30 \T1.65\T-2.36\N\NThese were learnt by Barry Boehm after a regression analysis of the\Nprojects in the COCOMO I data set.\N@Book{boehm81,\NAuthor =\T "B. Boehm",\NTitle =\T "Software Engineering Economics",\NPublisher =\T "Prentice Hall",\NYear =\T 1981}\N\NThe last column of the above table shows max(E)/min(EM) and shows\Nthe overall effect of a single effort multiplier. For example,\Nincreasing "acap" (analyst experience) from very low to very\Nhigh will most decrease effort while increasing "rely"\N(required reliability) from very low to very high will most\Nincrease effort.\N\NThere is much more to COCOMO that the above description. The\NCOCOMO~II text is over 500 pages long and offers\Nall the details needed to implement data capture and analysis of\NCOCOMO in an industrial context.\N@Book{boehm00b,\NAuthor = "Barry Boehm and Ellis Horowitz and Ray Madachy and\NDonald Reifer and Bradford K. Clark and Bert Steece\Nand A. Winsor Brown and Sunita Chulani and Chris Abts",\NTitle = "Software Cost Estimation with Cocomo II",\NPublisher = "Prentice Hall",\NYear = 2000,\Nibsn = "0130266922"}\N\NIncluded in that book is not just an effort model but other\Nmodels for schedule, risk, use of COTS, etc. However, most\N(?all) of the validation work on COCOMO has focused on the effort\Nmodel.\N@article{chulani99,\Nauthor =\T "S. Chulani and B. Boehm and B. Steece",\Ntitle =\T "Bayesian Analysis of Empirical Software Engineering\NCost Models",\Njournal =\T "IEEE Transaction on Software Engineering",\Nvolume =\T 25,\Nnumber =\T 4,\Nmonth =\T "July/August",\Nyear =\T "1999"}\N\NThe value of an effort predictor can be reported many ways\Nincluding MMRE and PRED(N).MMRE and PRED are computed from the\Nrelative error, or RE, which is the relative size of the\Ndifference between the actual and estimated value:\N\NRE.i = (estimate.i - actual.i) / (actual.i)\N\NGiven a data set of of size "D", a "Train"ing set of size\N"(X=|Train|) <= D", and a "test" set of size "T=D-|Train|", then\Nthe mean magnitude of the relative error, or MMRE, is the\Npercentage of the absolute values of the relative errors,\Naveraged over the "T" items in the "Test" set; i.e.\N\NMRE.i = abs(RE.i)\NMMRE.i = 100/T*( MRE.1 + MRE.2 + ... + MRE.T)\N\NPRED(N) reports the average percentage of estimates that were\Nwithin N% of the actual values:\N\Ncount=0\Nfor(i=1;i<=T;i++) do if (MRE.i <= N/100) then count++ fi done\NPRED(N) = 100/T * sum\N\NFor example, e.g. PRED(30)=50% means that half the estimates are\Nwithin 30% of the actual. Shepperd and Schofield comment that\N"MMRE is fairly conservative with a bias against overestimates\Nwhile Pred(25) will identify those prediction systems that are\Ngenerally accurate but occasionally wildly inaccurate".\N@article{shepperd97,\Nauthor="M. Shepperd and C. Schofield",\Ntitle="Estimating Software Project Effort Using Analogies",\Njournal="IEEE Transactions on Software Engineering",\Nvolume=23,\Nnumber=12,\Nmonth="November",\Nyear=1997,\Nnote="Available from\N\url{http://www.utdallas.edu/~rbanker/SE_XII.pdf}"}\N\N4.1 Further classification of the projects\N\N4.1.1 Classify the projects into different project categories - P02, P03, P04.\N(The criteria is unknown and they are disjoint.)\N\NCategory\Tsequence Original sequence_of_NASA\NP01\T1\TNASA\T26\NP01\T2\TNASA\T27\NP01\T3\TNASA\T28\NP01\T4\TNASA\T29\NP01\T5\TNASA\T30\NP01\T6\TNASA\T31\NP01\T7\TNASA\T32\NP02\T1\TNASA\T4\NP02\T2\TNASA\T5\NP02\T3\TNASA\T6\NP02\T4\TNASA\T7\NP02\T5\TNASA\T8\NP02\T6\TNASA\T9\NP02\T7\TNASA\T10\NP02\T8\TNASA\T11\NP02\T9\TNASA\T12\NP02\T10\TNASA\T13\NP02\T11\TNASA\T14\NP02\T12\TNASA\T15\NP02\T13\TNASA\T16\NP02\T14\TNASA\T17\NP02\T15\TNASA\T18\NP02\T16\TNASA\T19\NP02\T17\TNASA\T20\NP02\T18\TNASA\T21\NP02\T19\TNASA\T22\NP02\T20\TNASA\T23\NP02\T21\TNASA\T24\NP02\T22\TNASA\T25\NP03\T1\TNASA\T34\NP03\T2\TNASA\T35\NP03\T3\TNASA\T36\NP03\T4\TNASA\T37\NP03\T5\TNASA\T38\NP03\T6\TNASA\T39\NP03\T7\TNASA\T40\NP03\T8\TNASA\T41\NP03\T9\TNASA\T42\NP03\T10\TNASA\T43\NP03\T11\TNASA\T44\NP03\T12\TNASA\T45\NP04\T1\TNASA\T47\NP04\T2\TNASA\T48\NP04\T3\TNASA\T49\NP04\T4\TNASA\T50\NP04\T5\TNASA\T51\NP04\T6\TNASA\T52\NP04\T7\TNASA\T53\NP04\T8\TNASA\T54\NP04\T9\TNASA\T55\NP04\T10\TNASA\T56\NP04\T11\TNASA\T57\NP04\T12\TNASA\T58\NP04\T13\TNASA\T59\NP04\T14\TNASA\T60\N\N4.1.2 Classify the projects into different task categories - T01, T02, T03.\N(The criteria is unknown and they are disjoint.)\NT01:sequencing T02:avionics T03:missionPlanning\N\NCategory\Tsequence Original sequence_of_NASA\NT01\T1\TNASA\T43\NT01\T2\TNASA\T41\NT01\T3\TNASA\T37\NT01\T4\TNASA\T34\NT01\T5\TNASA\T40\NT01\T6\TNASA\T38\NT01\T7\TNASA\T39\NT01\T8\TNASA\T36\NT02\T1\TNASA\T4\NT02\T2\TNASA\T6\NT02\T3\TNASA\T26\NT02\T4\TNASA\T27\NT02\T5\TNASA\T33\NT02\T6\TNASA\T32\NT02\T7\TNASA\T29\NT02\T8\TNASA\T30\NT02\T9\TNASA\T28\NT02\T10\TNASA\T7\NT02\T11\TNASA\T9\NT02\T12\TNASA\T10\NT02\T13\TNASA\T55\NT02\T14\TNASA\T31\NT03\T1\TNASA\T51\NT03\T2\TNASA\T52\NT03\T3\TNASA\T16\NT03\T4\TNASA\T17\NT03\T5\TNASA\T8\NT03\T6\TNASA\T50\NT03\T7\TNASA\T53\NT03\T8\TNASA\T45\NT03\T9\TNASA\T48\NT03\T10\TNASA\T47\N\N4.1.3 Classify the projects into different Centers - C01, C02, C03.\N(The criteria is unknown and they are disjoint.)\NCategory sequence Original sequence_of_NASA\N\NC01\T1\TNASA\T1\NC01\T2\TNASA\T2\NC01\T3\TNASA\T51\NC01\T4\TNASA\T52\NC01\T5\TNASA\T50\NC01\T6\TNASA\T53\NC01\T7\TNASA\T48\NC01\T8\TNASA\T47\NC01\T9\TNASA\T58\NC01\T10\TNASA\T59\NC01\T11\TNASA\T60\NC01\T12\TNASA\T49\NC01\T13\TNASA\T54\NC02\T1\TNASA\T45\NC02\T2\TNASA\T43\NC02\T3\TNASA\T41\NC02\T4\TNASA\T35\NC02\T5\TNASA\T34\NC02\T6\TNASA\T40\NC02\T7\TNASA\T38\NC02\T8\TNASA\T39\NC02\T9\TNASA\T36\NC02\T10\TNASA\T37\NC02\T11\TNASA\T42\NC02\T12\TNASA\T44\NC03\T1\TNASA\T4\NC03\T2\TNASA\T6\NC03\T3\TNASA\T26\NC03\T4\TNASA\T27\NC03\T5\TNASA\T33\NC03\T6\TNASA\T32\NC03\T7\TNASA\T29\NC03\T8\TNASA\T30\NC03\T9\TNASA\T28\NC03\T10\TNASA\T7\NC03\T11\TNASA\T9\NC03\T12\TNASA\T10\NC03\T13\TNASA\T31\NC03\T14\TNASA\T21\NC03\T15\TNASA\T14\NC03\T16\TNASA\T22\NC03\T17\TNASA\T3\NC03\T18\TNASA\T19\NC03\T19\TNASA\T16\NC03\T20\TNASA\T17\NC03\T21\TNASA\T8\NC03\T22\TNASA\T23\NC03\T23\TNASA\T20\NC03\T24\TNASA\T24\NC03\T25\TNASA\T12\NC03\T26\TNASA\T5\NC03\T27\TNASA\T13\NC03\T28\TNASA\T25\NC03\T29\TNASA\T15\NC03\T30\TNASA\T18\NC03\T31\TNASA\T11\N5. Number of instances: 60\N6. Number of attributes: 17 (15 discrete in the range Very_Low to\NExtra_High; one lines of code measure, and one goal field\Nbeing the actual effort in person months).\N7. Attribute information:\N8. Missing attributes: none\N9: Class distribution: the class value (ACT_EFFORT) is continuous.\NAfter sorting all the instances on ACT_EFFORT, the following\Ndistribution was found:\NInstances Range\N--------- --------------\N1..10 8.4 .. 42\N11..20 48 .. 68\N21..30 70 .. 117.6\N31..40 120 .. 300\N41..50 324 .. 571\N51..60 750 .. 3240\NChange log:\N-----------\N\N2005/04/04 Jelber Sayyad Shirabad (PROMISE Librarian) <promise@site.uottawa.ca>\N1) Minor editorial changes, as well as moving the information provided by\NZhihao Chen to the new sections 3.1 and 4.1\N\N2005/03/28 Zhihao Chen, CSE, USC, USA, <zhihaoch@cse.usc.edu>\N1) Fix a mistake in line corresponding to cplx entry in the table of "The numeric values of the effort multipliers"\N"cplx \T0.70 \T0.85 \T1.00 \T1.15 \T1.30 \T1.65\T-1.86" should be\N"cplx \T0.70 \T0.85 \T1.00 \T1.15 \T1.30 \T1.65\T-2.36"\N\N2) Additional information about various classifications of the projects are provided.\N\N3) Additional usage information is provided
    0 references
    Jairus Hihn
    0 references
    2004-12-2
    0 references
    6 October 2014
    0 references
    ACT_EFFORT
    0 references
    0 references
    0 references
    https://ieeexplore.ieee.org/abstract/document/1553605
    0 references
    f26e1680852b8effe5ecbb564b62b6f2
    0 references
    0
    0 references
    0
    0 references
    17
    0 references
    60
    0 references
    0
    0 references
    15
    0 references
    0 references

    Identifiers

    0 references