A Bayesian Software Estimating Model Using a Generalized g-Prior Approach
نویسندگان
چکیده
Created to provide a software cost estimation model suited for a rapidly evolving environment, the COCOMO II model is the result of a 1994 research effort to update the 1981 COnstructive COst MOdel and its 1987 Ada version. Boehm et al [3, 15] provided the initial definition and rationale for this model. The model’s inputs include Source Lines of Code and/or Function Points as the sizing parameter, adjusted for both reuse and breakage; a set of 17 multiplicative effort multipliers and a set of 5 exponential scale factors [see appendix A]. They based their initial calibration of the model on expert judgement. Soon after the initial publication of this model, the Center for Software Engineering (CSE) began an effort to empirically validate COCOMO II [14]. By January 1997, they had a dataset consisting of 83 completed projects collected from several Commercial, Aerospace, Government and FFRDC organizations. CSE used this dataset to calibrate the COCOMO II.1997 model parameters. Because of uncertainties in the data and / or respondents’ misinterpretations of the rating scales, CSE developed a pragmatic calibration procedure for combining sample estimates with expert judgement. Specifically, the above model calibration for the COCOMO II.1997 parameters assigned a 10% weight to the regression estimates while expert-judgement estimates received a weight of 90%. This calibration procedure yielded effort predictions within 30% of the actuals 52% of the time. CSE continued the data collection effort and the database grew from 83 datapoints in 1997 to 161 datapoints in 1998. Using this data and a Bayesian approach that can assign differential weights to the parameters based on the precision of the data, we provide an alternative calibration of COCOMO II. Intuitively, we prefer this approach to the uniform 10% weighted average approach described above because some of the effort multipliers and scale factors are more clearly understood than others. The sample information for well-defined cost drivers receives a higher weight than that given to the less precise cost drivers. This calibration procedure yielded significantly better predictions; that is our version of COCOMO II gives effort predictions within 30% of the actuals 76% of the time. The reader should note that these predictions are based on outof-sample data (projects) as described in the ‘Cross Validation’ section (i.e. section 5). This paper presents a generalized g-prior approach to calibrating the COCOMO II model. The paper shows that if the weights assigned to sample estimates versus expert judgement are allowed to vary according to precision, a superior predictive model will result. Section 1 of this paper describes the calibration approach used on the 1997 dataset followed by four sections on approaches used on the 1998 dataset. Section 2 discusses the ordinary least squares approach on the 1998 dataset. Section 3 gives an overview of the Bayesian framework followed by section 4 where an overview of the generalized g-prior is provided. Then, section 5 discusses the application of the g-prior approach on the 1998 dataset of COCOMO II. Section 6 summarizes the results obtained by using the Bayesian approaches discussed in the earlier sections. And, we conclude that the Bayesian framework is well suited for calibrating software cost models and the generalized g-prior approach can be used to develop models with very good prediction accuracies. Although this paper gives a synopsis of the COCOMO II model structure, the reader is urged to read [3] to attain a better understanding of COCOMO II and to ascertain the differences from its predecessors.
منابع مشابه
Estimation of Products Final Price Using Bayesian Analysis Generalized Poisson Model and Artificial Neural Networks
Estimating the final price of products is of great importance. For manufacturing companies proposing a final price is only possible after the design process over. These companies propose an approximate initial price of the required products to the customers for which some of time and money is required. Here using the existing data of already designed transformers and utilizing the bayesian anal...
متن کاملBayesian Model Selection in Finite Mixtures by Marginal Density Decompositions
We consider the problem of estimating the number of components d and the unknown mixing distribution in a nite mixture model, in which d is bounded by some xed nite number N . Our approach relies on the use of a prior over the space of mixing distributions with at most N components . By decomposing the resulting marginal density under this prior, we discover a weighted Bayes factor method...
متن کاملBayesian Sample size Determination for Longitudinal Studies with Continuous Response using Marginal Models
Introduction Longitudinal study designs are common in a lot of scientific researches, especially in medical, social and economic sciences. The reason is that longitudinal studies allow researchers to measure changes of each individual over time and often have higher statistical power than cross-sectional studies. Choosing an appropriate sample size is a crucial step in a successful study. A st...
متن کاملE-Bayesian Estimations of Reliability and Hazard Rate based on Generalized Inverted Exponential Distribution and Type II Censoring
Introduction This paper is concerned with using the Maximum Likelihood, Bayes and a new method, E-Bayesian, estimations for computing estimates for the unknown parameter, reliability and hazard rate functions of the Generalized Inverted Exponential distribution. The estimates are derived based on a conjugate prior for the unknown parameter. E-Bayesian estimations are obtained based on th...
متن کاملGenetic Properties of Some Economic Traits in Isfahan Native Fowl Using Bayesian and REML Methods
The objective of the present study was to estimate heritability values for some performance and egg quality traits of native fowl in Isfahan breeding center using REML and Bayesian approaches. The records were about 51521 and 975 for performance and egg quality traits, respectively. At the first step, variance components were estimated for body weight at hatch (BW0), body weight at 8 weeks of a...
متن کامل