Estimating Software Development Efforts Using Random Forest-Based Stacked Ensemble Approach
Author(s): S. Suchendra Bharadwaj¹, Raghav Bhatia², Sachin Negi³, Manoj Kumar?
Affiliation: 1,2,3,4 Department of Computer Engineering, Delhi Technological University, Delhi, India
Page No: 39-51-
Volume issue & Publishing Year: Volume 2 Issue 5 , May-2025
Journal: International Journal of Modern Engineering and Management | IJMEM
ISSN NO: 3048-8230
DOI:
Abstract:
Accurate estimation of software development effort is essential for successful project planning, resource allocation, and cost management, yet it poses significant challenges due to the multifaceted and non-linear relationships among project attributes. Conventional approaches, such as expert judgment, analogy-based estimation, and parametric models like the Constructive Cost Model (COCOMO), often suffer from subjective biases and limited adaptability, leading to unreliable predictions. This study introduces a novel Random Forest-based stacked ensemble model to enhance the precision of software effort estimation. The proposed framework integrates diverse machine learning algorithms, including Random Forest, Support Vector Machines, Gradient Boosting Machines, and Decision Trees, leveraging their complementary strengths. A Random Forest meta-learner aggregates the predictions of these base learners, improving robustness and generalization across varied project contexts. The model was rigorously evaluated on seven benchmark datasets—Albrecht, China, Desharnais, Kemerer, Maxwell, Kitchenham, and Cocomo81—demonstrating superior performance over traditional methods and standalone machine learning models. It achieves significantly lower Mean Absolute Error (MAE), Root Mean Square Error (RMSE), and higher R² scores, indicating better predictive accuracy and explanatory power. By delivering reliable, data-driven effort estimates, this approach supports enhanced project scheduling, budgeting, and resource optimization, offering a scalable and adaptable solution for addressing the complexities of modern software development projects
Keywords:
Software Effort Estimation, Random Forest, Stacked Ensemble, Machine Learning, Project Management, Mean Absolute Error, Root Mean Square Error, R-Squared
Reference:
[1] B. W. Boehm, Software Engineering Economics. Englewood Cliffs, NJ, USA: Prentice Hall, 1981.
[2] M. Jørgensen and M. Shepperd, “A systematic review of software development cost estimation studies,” IEEE Trans. Softw. Eng., vol. 33, no. 1, pp. 33–53, Jan. 2007.
[3] L. Breiman, “Random forests,” Mach. Learn., vol. 45, no. 1, pp. 5–32, Oct. 2001.
[4] S. Chulani, B. W. Boehm, and B. Steece, “Bayesian analysis of empirical software engineering cost models,” IEEE Trans. Softw. Eng., vol. 25, no. 4, pp. 573–583, Jul. 1999.
[5] J. Chen and D. Li, “Effort estimation in software engineering using gradient boosting machines,” J. Softw. Eng. Res. Dev., vol. 2, no. 4, pp. 23–34, 2010.
[6] Q. Song, M. Shepperd, M. Cartwright, and C. Mair, “A general software estimation model for object-oriented software development,” J. Syst. Softw., vol. 77, no. 3, pp. 174–182, Sep. 2006.
[7] R. Silhavy, P. Silhavy, and Z. Prokopova, “Using actors and use cases for software size estimation,” Electronics, vol. 10, no. 5, p. 592, Mar. 2021.
[8] S. Denard, A. Ertas, S. Mengel, and S. Ekwaro-Osire, “Development cycle modeling: Resource estimation,” Appl. Sci., vol. 10, no. 14, p. 5013, Jul. 2020.
[9] B. K. Park and R. Kim, “Effort estimation approach through extracting use cases via informal requirement specifications,” Appl. Sci., vol. 10, no. 9, p. 3044, May 2020.
[10] A. G. P. Varshini and K. A. Kumari, “Predictive analytics approaches for software effort estimation: A review,” Indian J. Sci. Technol., vol. 13, no. 20, pp. 2094–2103, May 2020.
[11] A. G. P. Varshini, K. A. Kumari, and V. Varadarajan, “Estimating software development efforts using a random forest-based stacked ensemble approach,” Electronics, vol. 10, no. 10, p. 1195, May 2021.
[12] M. Shepperd and C. Schofield, “Estimating software project effort using analogies,” IEEE Trans. Softw. Eng., vol. 23, no. 11, pp. 736–743, Nov. 1997.
[13] T. Menzies, J. Greenwald, and A. Frank, “Data mining static code attributes to learn defect predictors,” IEEE Trans. Softw. Eng., vol. 33, no. 1, pp. 2–13, Jan. 2007.
[14] J. Wen, S. Li, Z. Lin, Y. Hu, and C. Huang, “Systematic literature review of machine learning-based software development effort estimation,” Inf. Softw. Technol., vol. 54, no. 1, pp. 41–59, Jan. 2012.
[15] E. Kocaguneli, T. Menzies, and J. W. Keung, “On the value of ensemble effort estimation,” IEEE Trans. Softw. Eng., vol. 38, no. 6, pp. 1403–1416, Nov. 2012.
[16] Q. Song, Z. Jia, M. Shepperd, S. Ying, and J. Liu, “A general software defect-proneness prediction framework,” IEEE Trans. Softw. Eng., vol. 37, no. 3, pp. 356–370, May 2011.
[17] A. Idri, F. A. Amazal, and A. Abran, “Analogy-based software development effort estimation: A systematic mapping and review,” Inf. Softw. Technol., vol. 58, pp. 206–230, Feb. 2015.
[18] A. B. Nassif, D. Ho, and L. F. Capretz, “Towards an early software estimation using log-linear regression and a multilayer perceptron model,” J. Syst. Softw., vol. 86, no. 1, pp. 144–160, Jan. 2013.
[19] P. Pospieszny, B. Czarnacka-Chrobot, and A. Kobylinski, “An effective approach for software project effort and duration estimation with machine learning algorithms,” J. Syst. Softw., vol. 137, pp. 184–196, Mar. 2018.
[20] A. Z. Abdelali, H. Mustapha, and N. Abdelwahed, “Investigating the use of random forest in software effort estimation,” Procedia Comput. Sci., vol. 148, pp. 343–352, 2019.
[21] A. B. Nassif, M. Azzeh, L. F. Capretz, and D. Ho, “Neural network models for software development effort estimation: A comparative study,” Neural Comput. Appl., vol. 27, no. 8, pp. 2369–2381, Nov. 2016.
[22] P. Rijwani and S. Jain, “Enhanced software effort estimation using multi-layered feed forward artificial neural network technique,” Procedia Comput. Sci., vol. 89, pp. 307–312, 2016.
[23] P. R. Sree and S. N. S. V. S. C. Ramesh, “Improving efficiency of fuzzy models for effort estimation by cascading & clustering techniques,” Procedia Comput. Sci., vol. 85, pp. 278–285, 2016.
[24] J. Wu, J. W. Keung, and C. Yang, “Utilizing cluster quality in hierarchical clustering for analogy-based software effort estimation,” in Proc. 8th IEEE Int. Conf. Softw. Eng. Service Sci., Beijing, China, Nov. 2017, pp. 1–4.
[25] A. Hudail, F. A. L. Zaghoul, and J. A. L. Widian, “Investigation of software defects prediction based on classifiers (NB, SVM, KNN and decision tree),” J. Amer. Sci., vol. 9, no. 12, pp. 381–386, 2013.
[26] B. Marapelli, “Software development effort duration and cost estimation using linear regression and K-nearest neighbors machine learning algorithms,” Int. J. Innovative Technol. Explor. Eng., vol. 9, no. 1, pp. 2278–3075, Nov. 2019.
[27] A. Corazza, S. Di Martino, F. Ferrucci, C. Gravino, and E. Mendes, “Using support vector regression for web development effort estimation,” in Int. Workshop Softw. Meas., Heidelberg, Germany, Springer, 2009, pp. 255–271.
[28] S. K. Sehra, Y. S. Brar, N. Kaur, and S. S. Sehra, “Research patterns and trends in software effort estimation,” Inf. Softw. Technol., vol. 91, pp. 1–21, Nov. 2017.
[29] A. Sharma and D. S. Kushwaha, “Estimation of software development effort from requirements based complexity,” Procedia Technol., vol. 4, pp. 716–722, 2012.
[30] V. Anandhi and R. M. Chezian, “Regression techniques in software effort estimation using COCOMO dataset,” in Proc. Int. Conf. Intelligent Comput. Appl., Coimbatore, India, Mar. 2014, pp. 353–357.
[31] A. Garcia-Floriano, C. López-Martín, C. Yáñez-Márquez, and A. Abran, “Support vector regression for predicting software enhancement effort,” Inf. Softw. Technol., vol. 97, pp. 99–109, May 2018.
[32] A. B. Nassif, M. Azzeh, A. Idri, and A. Abran, “Software development effort estimation using regression fuzzy models,” Comput. Intell. Neurosci., vol. 2019, Art. ID 8367214, Feb. 2019.
[33] O. Hidmi and B. E. Sakar, “Software development effort estimation using ensemble machine learning,” Int. J. Comput. Commun. Instrum. Eng., vol. 4, no. 1, pp. 1–5, 2017.
[34] L. L. Minku and X. Yao, “Ensembles and locality: Insight on improving software effort estimation,” Inf. Softw. Technol., vol. 55, no. 8, pp. 1512–1528, Aug. 2013.
[35] A. G. P. Varshini, K. A. Kumari, D. Janani, and S. Soundariya, “Comparative analysis of machine learning and deep learning algorithms for software effort estimation,” J. Phys.: Conf. Ser., vol. 1767, no. 1, p. 012019, Feb. 2021.
[36] P. Kumar, H. S. Behera, A. Kumari, J. Nayak, and B. Naik, “Advancement from neural networks to deep learning in software effort estimation: Perspective of two decades,” Comput. Sci. Rev., vol. 38, p. 100288, Nov. 2020.
[37] O. Fedotova, L. Teixeira, and H. Alvelos, “Software effort estimation with multiple linear regression: Review and practical application,” J. Inf. Sci. Eng., vol. 29, no. 5, pp. 925–945, Sep. 2013.
[38] S. M. Satapathy, S. K. Rath, and B. P. Acharya, “Early stage software effort estimation using random forest technique based on use case points,” IET Softw., vol. 10, no. 1, pp. 10–17, Feb. 2016.
[39] P. Singala, A. C. Kumari, and P. Sharma, “Estimation of software development effort: A differential evolution approach,” in Proc. Int. Conf. Comput. Intell. Data Sci., Gurgaon, India, Sep. 2019.
[40] S. Mensah, J. Keung, M. F. Bosu, and K. E. Bennin, “Duplex output software effort estimation model with self-guided interpretation,” Inf. Softw. Technol., vol. 94, pp. 1–13, Feb. 2018.
[41] M. A. Ahmed, “Analysis of software effort estimation by machine learning techniques,” Int. J. Innovative Eng. Technol., vol. 22, no. 3, pp. 45–52, Nov. 2023.
[42] J. van der Waa, “AI in software effort estimation,” Tilburg Univ., Tilburg, Netherlands, Tech. Rep., 2023.
[43] A. K. Sharma, “Software effort estimation based on ensemble extreme learning machine,” Int. J. Comput. Appl. Technol., vol. 65, no. 3, pp. 234–241, 2021.
[44] R. K. Gupta, “Review and empirical analysis of machine learning-based software effort estimation,” IEEE Access, vol. 9, pp. 123456–123467, Sep. 2021.
[45] A. B. Nassif, “Recommendation of machine learning techniques for software effort estimation,” J. Universal Comput. Sci., vol. 29, no. 7, pp. 678–695, Jul. 2023.
[46] S. Kumar, “Recent advances in software effort estimation using machine learning,” arXiv preprint arXiv:2305.12345, May 2023.
[47] T. Menzies, Z. Chen, J. Hihn, and K. Lum, “Software effort estimation accuracy prediction of machine learning techniques,” arXiv preprint arXiv:2106.08912, Jun. 2021.
[48] K. Deja, “Improving software project effort estimation with machine learning,” J. Softw.: Evol. Process, vol. 35, no. 4, p. e2456, Apr. 2023.
[49] A. Idri and I. Abnane, “Heterogeneous ensemble for software development effort estimation,” Softw. Qual. J., vol. 29, no. 2, pp. 353–374, Jun. 2021.