A Novel Catboost Regressor for Effort Estimation in Scrum Projects
DOI:
https://doi.org/10.22399/ijcesen.2559Keywords:
Scrum, Boosting Models, Effort Estimation, Regression, CatBoostAbstract
Software Effort Estimation plays an important role in Scrum project management as it allows teams to allocate resources as well as planning of development cycles. Traditional approaches like Planning Poker and expert judgment models suffer from scalability, subjectivity, and inconsistency, which makes them inaccurate and often leads to project overruns. This research work proposes a CatBoost Regressor as a solution for enhancing effort estimation in Scrum projects. The technique proposed in this paper is capable of addressing some of the most challenging estimation problems like handling categorical features and reducing prediction bias. Unlike other conventional machine learning models, CatBoost deals with high dimensionality and optimizing learning outcomes from past Scrum project data. Catboost model outperforms the traditional regression models in terms of R2, MSE, RMSE by achieving an accuracy of 98.48% which is a drastic improvement over traditional regression models. This research work concludes that our model enhances Scrum effort estimation, making it robust and efficient solution for agile project management.
References
[1] Arora, M., Verma, S., Kavita, & Chopra, S. (2020). A systematic literature review of machine learning estimation approaches in Scrum projects. In Proceedings (pp. 573–586). https://doi.org/10.1007/978-981-15-1451-7_59
[2] Arora, M., Sharma, A., Katoch, S., Malviya, M., & Chopra, S. (2021). A state of the art regressor model’s comparison for effort estimation of agile software. In 2021 2nd International Conference on Intelligent Engineering and Management (ICIEM) (pp. 211–215). IEEE. https://doi.org/10.1109/ICIEM51511.2021.9445345
[3] Arora, M., Sharma, A., Verma, S., Katoch, S., & Chopra, S. (n.d.). An ANFIS-driven estimation of effort in agile Scrum projects. SSRN. Retrieved from https://ssrn.com/abstract=4495912
[4] Aizaz, F., Janjua, U. I., Zafar, H., Khan, J. A., & Kazim, I. (2021). An empirical investigation on software cost estimation techniques and barriers on agile software development in software industry of Pakistan. In 2021 International Conference on Frontiers of Information Technology (FIT) (pp. 194–199). IEEE. https://doi.org/10.1109/FIT53504.2021.00044
[5] Dorogush, A. V., Ershov, V., & Gulin, A. G. (n.d.). CatBoost: Gradient boosting with categorical features support. Retrieved from https://github.com/Microsoft/LightGBM
[6] Fernández-Diego, M., Méndez, E. R., González-Ladrón-De-Guevara, F., Abrahão, S., & Insfran, E. (2020). An update on effort estimation in agile software development: A systematic literature review. IEEE Access. https://doi.org/10.1109/ACCESS.2020.3021664
[7] Haldar, S., & Capretz, L. F. (2024). Interpretable software maintenance and support effort prediction using machine learning. In 2024 IEEE/ACM 46th International Conference on Software Engineering: Companion Proceedings (pp. 288–289). ACM. https://doi.org/10.1145/3639478.3643069
[8] Hancock, J. T., & Khoshgoftaar, T. M. (2020). CatBoost for big data: An interdisciplinary review. Journal of Big Data, 7(1). https://doi.org/10.1186/s40537-020-00369-8
[9] Hao, D., Xiaoqi, Y., & Taoyu, Q. (n.d.). Hybrid machine learning models based on CATBoost classifier for assessing students’ academic performance. International Journal of Advanced Computer Science and Applications. Retrieved from http://www.ijacsa.thesai.org
[10] Ibrahim, A. A., Raheem, R., Sowole, S. O., Muhammed, M. M., & Abdulaziz, R. O. (2020). Comparison of the CatBoost classifier with other machine learning methods. International Journal of Advanced Computer Science and Applications, 11(11). https://doi.org/10.14569/IJACSA.2020.0111190
[11] Jayabharath, M., Choudary, N. L., Pranay, C. S., Praveenya, M. D., & Reddy, B. R. (2023). An analysis of software maintainability prediction using ensemble learning algorithms. In 2023 3rd International Conference on Artificial Intelligence and Signal Processing (AISP). IEEE. https://doi.org/10.1109/AISP57993.2023.10135034
[12] Jeganathan, S., Lakshminarayanan, A. R., Parthasarathy, S., Khan, A. A. A., & Sathick, K. J. (2024). OptCatB: Optuna hyperparameter optimization model to forecast the educational proficiency of immigrant students based on CatBoost regression. Journal of Internet Services and Information Security, 14(2), 111–132. https://doi.org/10.58346/JISIS.2024.I2.008
[13] Karthick, K., Dharmaprakash, R., & Sathya, S. (2024). Predictive modeling of energy consumption in the steel industry using CatBoost regression: A data-driven approach for sustainable energy management. International Journal of Robotics and Control Systems, 4(1), 33–49. https://doi.org/10.31763/ijrcs.v4i1.1234
[14] Kiran, D. S., & Ponnala, R. (2023). Ensemble boosting algorithms for software defect prediction. In 2023 International Conference on Advances in Computation, Communication and Information Technology (ICAICCIT) (pp. 321–326). IEEE. https://doi.org/10.1109/ICAICCIT60255.2023.10466047
[15] Kundu, A., Kundu, S. G., Sahu, S. K., & Badgayan, N. D. (2025). Leveraging Azure automated machine learning and CatBoost gradient boosting algorithm for service quality prediction in hospitality. Computers, 14(2), 32. https://doi.org/10.3390/computers14020032
[16] Maulana, A., Afidh, R. P. F., Maulydia, N. B., Idroes, G. M., & Rahimah, S. (2024). Predicting obesity levels with high accuracy: Insights from a CatBoost machine learning model. Infolitika Journal of Data Science, 2(1), 17–27. https://doi.org/10.60084/ijds.v2i1.195
[17] Nishthaa, & Malhotra, R. (2023). Identification of defects in a software using machine learning. AIP Conference Proceedings, 060013. https://doi.org/10.1063/5.0179183
[18] Piñeros Rodríguez, C. A., Sierra Martinez, L. M., Peluffo Ordoñez, D. H., & Timana Peña, J. A. (2023). Effort estimation in agile software development: A systematic map study. Inge CuC, 19(1). https://doi.org/10.17981/ingecuc.19.1.2023.03
[19] Prokhorenkova, L., Gusev, G., Vorobev, A., Dorogush, A. V., & Gulin, A. (2017). CatBoost: Unbiased boosting with categorical features. arXiv preprint. Retrieved from http://arxiv.org/abs/1706.09516
[20] Rankovic, N., Rankovic, D., & Ivanovic, M. (2023). Synthetic open-source agile software estimation performance.
[21] Shukla, S., & Kumar, S. (2024). Study of learning techniques for effort estimation in object-oriented software development. IEEE Transactions on Engineering Management, 71, 4602–4618. https://doi.org/10.1109/TEM.2022.3217570
[22] Vardhan, M., Banerjee, K., & Aggarwal, D. (2022). A systematic approach for the detection of software bug using CatBoost. In 2022 International Conference on Machine Learning, Big Data, Cloud and Parallel Computing (COM-IT-CON) (pp. 414–419). IEEE. https://doi.org/10.1109/COM-IT-CON54601.2022.9850519
[23] Vishwakarma, D. K., Sharma, R., Pandey, A., Bajpai, S., Mishra, S., & Pandey, D. (2024). Evaluation of CatBoost method for predicting weekly pan evaporation in subtropical and sub-humid regions. Pure and Applied Geophysics, 181(2), 719–747. https://doi.org/10.1007/s00024-023-03426-4
[24] Abbas, J., Zhang, C., & Luo, B. (2024). EnsCL-CatBoost: A strategic framework for software requirements classification. IEEE Access. https://doi.org/10.1109/ACCESS.2024.3452011
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 International Journal of Computational and Experimental Science and Engineering

This work is licensed under a Creative Commons Attribution 4.0 International License.