logo

Regression-based predictive modelling of software size of fintech projects using technical specifications

Authors

  • Iqra Kanwal

    FAST School of Computing, National University of Computer and Emerging Sciences (NUCES), Lahore
    Author
  • Ali Afzal Malik

    FAST School of Computing, National University of Computer and Emerging Sciences (NUCES), Lahore
    Author

DOI:

https://doi.org/10.22581/muet1982.3289

Keywords:

K-fold cross validation, Lines of code , Multiple linear regression, Size prediction model, Software size prediction, Technical specifications

Abstract

This research aims to develop a predictive model to estimate the lines of code (LOC) of software projects using technical requirements specifications. It addresses the recurring issue of inaccurate effort and cost estimation in software development that often results in budget overruns and delays. This study includes a detailed analysis of a dataset comprising past real-life software projects. It focuses on extracting relevant predictors from projects' requirements written in technical and easily comprehensible natural language. To assess feasibility, a pilot study is conducted at the beginning. Then, Simple Linear Regression (SLR) is employed to determine the relative predictive strength of eight potential predictors identified earlier. The number of API calls is found to be the strongest independent predictor (R2 = 0.670) of LOC. The subsequent phase entails constructing a software size prediction model using Forward Stepwise Multiple Linear Regression (FSMLR). The adjusted R2 value of the final model indicates that two factors – the number of API calls and the number of GUI fields – account for more than 80% of the variation in code size (measured using LOC). Model validation is performed using k-fold cross-validation. Validation results are also promising. The average MMRE of all folds is 0.203 indicating that, on average, the model's predictions are off by approximately 20% relative to the actual values. The average PRED (25) is 0.708 implying that nearly 71% of predicted size values are within 25% of the actual size values. This model can help project managers in making better decisions regarding project management, budgeting, and scheduling.

Downloads

Download data is not yet available.

References

N. Nan and D. E. Harter, “Impact of Budget and Schedule Pressure on Software Development Cycle Time and Effort”, IEEE Transactions on Software Engineering, vol. 35, no. 5, pp. 624-637, Sept.-Oct. 2009, doi: 10.1109/TSE.2009.18. DOI: https://doi.org/10.1109/TSE.2009.18

B. Curtis, H. Krasner, and N. Iscoe, “A field study of the software design process for large systems”, Communications of the ACM, vol. 31, no. 11, pp. 1268–1287, Nov. 1988, doi: https://doi.org/10.1145/50087.50089. DOI: https://doi.org/10.1145/50087.50089

J. T. Dhas, “Importance of Software Sizing in Software Project Management: A Study”, Italian Journal of Pure and Applied Mathematics, vol. 118, pp. 269–273, Mar. 2020

B. Boehm, “Cost estimation with COCOMO II”, ResearchGate, Nov. 14, 2002. https://www.researchgate.net/publication/228600814_Cost_estimation_with_COCOMO_II (accessed Mar. 08, 2025).

D. Garmus, D.P. Herron “Function Point Analysis: Measurement Practices for Successful Software Projects”, Addison-Wesley Information Technology Series, 2001.

Y. Zheng, B. Wang, Y. Zheng, and L. Shi, “Estimation of software projects effort based on function point”, 2009 4th International Conference on Computer Science & Education, Jul. 2009, doi: https://doi.org/10.1109/iccse.2009.5228317 DOI: https://doi.org/10.1109/ICCSE.2009.5228317

E. N. Regolin, G. A. de Souza, A. R. T. Pozo, and S. R. Vergilio, “Exploring machine learning techniques for software size estimation”, 23rd International Conference of the Chilean Computer Science Society, 2003. SCCC 2003. Proceedings., Chillan, Chile, 2003, pp. 130-136, doi: 10.1109/SCCC.2003.1245453. DOI: https://doi.org/10.1109/SCCC.2003.1245453

N. A. Zakaria, A. R. Ismail, A. Y. Ali, N. H. Khalid, and N. Z. Abidin, “Software Project Estimation with Machine Learning”, International Journal of Advanced Computer Science and Applications, vol. 12, no. 6, 2021. doi:10.14569/ijacsa.2021.0120685

Sharma and D. S. Kushwaha, “Estimation of Software Development Effort from Requirements Based Complexity”, Procedia Technology, vol. 4, pp. 716–722, 2012, doi: https://doi.org/10.1016/j.protcy.2012.05.116 DOI: https://doi.org/10.1016/j.protcy.2012.05.116

T. E. Ayyildiz and A. Koçyigit, “A Case Study on the Utilization of Problem and Solution Domain Measures for Software Size Estimation”, 2016 42th Euromicro Conference on Software Engineering and Advanced Applications (SEAA), Limassol, Cyprus, 2016, pp. 108-111, doi: 10.1109/SEAA.2016.13. DOI: https://doi.org/10.1109/SEAA.2016.13

K. Lind and R. Heldal, “A practical approach to size estimation of embedded software components”, IEEE Transactions on Software Engineering, vol. 38, no. 5, pp. 993–1007, Sep. 2012, doi: 10.1109/tse.2011.86. DOI: https://doi.org/10.1109/TSE.2011.86

P. R. Hill, “Practical Software Project Estimation: A Toolkit for Estimating Software Development Effort & Duration”, First edition. New York: McGrawHill Education, 2011. Available: https://www.accessengineeringlibrary.com/content/book/9780071717915

IBM, “Downloading IBM SPSS Statistics 29”, www.ibm.com, Nov. 17, 2022. https://www.ibm.com/support/pages/downloading-ibm-spss-statistics-29

“Statistic - IntelliJ IDEs Plugin | Marketplace”, JetBrains Marketplace, Dec. 27, 2023. https://plugins.jetbrains.com/plugin/4509-statistic

B. C. Gupta, I. Guttman, and K. P. Jayalath, “Simple Linear Regression Analysis”, Statistics and Probability with Applications for Engineers and Scientists using MINITAB, R and JMP, John Wiley & Sons, Ltd, 2020, pp. 622–692. doi: https://doi.org/10.1002/9781119516651.ch15.

B. C. Gupta, I. Guttman, and K. P. Jayalath, “Multiple Linear Regression Analysis”, Statistics and Probability with Applications for Engineers and Scientists using MINITAB, R and JMP, John Wiley & Sons, Ltd, 2020, pp. 693–756. doi: https://doi.org/10.1002/9781119516651.ch16.

R. D. Cook, “Detection of Influential Observation in Linear Regression”, Technometrics, vol. 42, no. 1, pp. 65–68, Feb. 2000, doi: https://doi.org/10.1080/00401706.2000.10485981. DOI: https://doi.org/10.1080/00401706.2000.10485981

B. A. Kitchenham, L. M. Pickard, S. G. MacDonell, and M. J. Shepperd, “What accuracy statistics really measure”, IEE Proceedings - Software, vol. 148, no. 3, p. 81, 2001, doi: https://doi.org/10.1049/ip-sen:20010506. DOI: https://doi.org/10.1049/ip-sen:20010506

D. Berrar, “Cross-Validation”, Encyclopedia of Bioinformatics and Computational Biology, vol. 1, pp. 542–545, 2019, doi: https://doi.org/10.1016/b978-0-12-809633-8.20349-x DOI: https://doi.org/10.1016/B978-0-12-809633-8.20349-X

Downloads

Published

2025-04-09

How to Cite

Regression-based predictive modelling of software size of fintech projects using technical specifications. (2025). Mehran University Research Journal of Engineering and Technology, 44(2), 164-173. https://doi.org/10.22581/muet1982.3289