A NONLINEAR REGRESSION MODEL FOR EARLY LOC ESTIMATION OF OPEN-SOURCE KOTLIN-BASED APPLICATIONS
DOI:
https://doi.org/10.15588/1607-3274-2024-1-8Keywords:
estimation, lines of code, open-source app, Kotlin, nonlinear regression model, Box-Cox transformation, class, weighted methods per class, depth of inheritance treeAbstract
Context. The early lines of code (LOC) estimation in software projects holds significant importance, as it directly influences the prediction of development effort, covering a spectrum of different programming languages, and open-source Kotlin-based applications in particular. The object of the study is the process of early LOC estimation of open-source Kotlin-based apps. The subject of the study is the nonlinear regression models for early LOC estimation of open-source Kotlin-based apps.
Objective. The goal of the work is to build the nonlinear regression model with three predictors for early LOC estimation of open-source Kotlin-based apps based on the Box-Cox four-variate normalizing transformation to increase the confidence in early LOC estimation of these apps.
Method. For early LOC estimation in open-source Kotlin-based apps, the model, confidence, and prediction intervals of nonlinear regression were constructed using the Box-Cox four-variate normalizing transformation and specialized techniques. These techniques, relying on multiple nonlinear regression analyses incorporating multivariate normalizing transformations, account for the dependencies between variables in non-Gaussian data scenarios. As a result, this method tends to reduce the mean magnitude of relative error (MMRE) and narrow confidence and prediction intervals compared to models utilizing univariate normalizing transformations.
Results. An analysis has been carried out to compare the constructed model with nonlinear regression models employing decimal logarithm and Box-Cox univariate transformation.
Conclusions. The nonlinear regression model with three predictors for early LOC estimation of open-source Kotlin-based apps is constructed using the Box-Cox four-variate transformation. Compared to the other nonlinear regression models, this model demonstrates a larger multiple coefficient of determination, a smaller value of the MMRE, and narrower confidence and prediction intervals. The prospects for further research may include the application of other data sets to construct the nonlinear regression model for early LOC estimation of open-source Kotlin-based apps for other restrictions on predictors.
References
Ponnala R., Reddy C. R. K. Object Oriented Dynamic Metrics in Software Development: A Literature Review, International Journal of Applied Engineering Research, 2019, Vol. 14, No. 22, pp. 4161–4172.
Boehm B. W., Abts C., Brown A. W. et al. Software cost estimation with COCOMO II. Upper Saddle River, NJ: Prentice Hall PTR, 2000, 506 p.
Rumiński B. Top Apps Built with Kotlin Multiplatform [2023 Update] [Electronic resource]. Access mode: https://www.netguru.com/blog/top-apps-built-with-kotlinmultiplatform
Kaczmarek J., Kucharski M. Size and effort estimation for applications written in Java, Information and Software Technology, 2004, Vol. 46, Issue 9, pp. 589–601. DOI: 10.1016/j.infsof.2003.11.001
Laird L. M., Brennan M. C. Software measurement and estimation. A practical approach. quantitative software engineering series. Wiley-IEEE Computer Society Press, 2006, 379 p.
Tan H. B. K., Zhao Y., Zhang H. Conceptual data modelbased software size estimation for information systems, Transactions on Software Engineering and Methodology, 2009, Vol. 19, Issue 2, pp. 1–37. DOI: 10.1145/1571629.1571630
Zifen Y. An improved software size estimation method based on object-oriented approach, Electrical & Electronics Engineering : IEEE Symposium EEESYM’12, Kuala Lumpur, Malaysia, 24–27 June 2012, proceedings. Los Alamitos: IEEE, 2012, pp. 615–617. DOI: 10.1109/EEESym.2012.6258733.
Kiewkanya M., Surak S. Constructing C++ software size estimation model from class diagram, Computer Science and Software Engineering : 13th International Joint Conference, Khon Kaen, Thailand, 13–15 July 2016 : proceedings. – Los Alamitos: IEEE, 2016, pp. 1–6. DOI: 10.1109/JCSSE.2016.7748880
Sholiq S., Dewi R. S., Subriadi A. P. A comparative study of software development size estimation method: UCPabc vs Function Points, Procedia Computer Science, 2017, Vol. 124, pp. 470–477. DOI: 10.1016/j.procs.2017.12.179
Prykhodko N. V., Prykhodko S. B. The non-linear regression model to estimate the software size of open source Java-based systems, Radio Electronics, Computer Science, Control, 2018, No. 3 (46), pp. 158–166. DOI: 10.15588/1607-3274-2018-3-17
Prykhodko S. B., Prykhodko N. V., Smykodub T. G. Chotyrokhfaktorna neliniyna regresiyna model dlia otsiniuvannia rozmiru Java-zastosunkiv z vidkrytym kodom, Vcheni zapysky TNU imeni V.I. Vernads’kogo. Serija: tehnichni nauky, 2020, Vol. 31 (70), Issue 2, pp. 157–162. DOI: https://doi.org/10.32838/2663-5941/2020.2-1/25
Prykhodko S. B., Shutko I. S., Prykhodko A. S. A nonlinear regression model to estimate the size of web apps created using the CakePHP framework, Radio Electronics, Computer Science, Control, 2021, No. 4 (59), pp. 129–139. DOI: 10.15588/1607-3274-2021-4-12
Manisha, Rishi, R. Early size estimation using machine learning, Proceedings of the 2021 8th International Conference on Computing for Sustainable Global Development (INDIACom), New Delhi, India, 17–19 March 2021 : proceedings. Los Alamitos, IEEE, 2021, pp. 757–762. DOI: 10.1109/INDIACom51348.2021.00135
Daud M., Malik A. A. Improving the accuracy of early software size estimation using analysis-to-design adjustment factors (ADAFs), IEEE Access, 2021, Vol. 9, pp. 81986– 81999. DOI: 10.1109/ACCESS.2021.3085752
Zhang K., Wang X., Ren J. et al. Efficiency improvement of function point-based software size estimation with deep learning model. Los Alamitos, IEEE, 2021, Vol. 9, pp. 107124–107136. DOI: 10.1109/ACCESS.2020.2998581
Ritu, Garg Y. Comparative Analysis of Machine Learning Techniques in Effort Estimation, Proceedings of the 2022 International Conference on Machine Learning, Big Data, Cloud and Parallel Computing (COM-IT-CON), Faridabad, India, 2022 : proceedings. Los Alamitos, IEEE, 2022. pp. 401–405. DOI: 10.1109/COM-ITCON54601.2022.9850592
Brar P., Nandal D. A Systematic Literature Review of Machine Learning Techniques for Software Effort Estimation Models, Proceeding of the 2022 Fifth International Conference on Computational Intelligence and Communication Technologies (CCICT). Sonepat, India, 08–09 July 2022 – Los Alamitos, IEEE, 2022, pp. 494–499, DOI: 10.1109/CCiCT56684.2022.00093.
Assefa Y., Berhanu F., Tilahun A. et al. Software Effort Estimation using Machine learning Algorithm, Proceeding of the 2022 International Conference on Information and Communication Technology for Development for Africa (ICT4DA), Bahir Dar, Ethiopia, 28–30 November 2022 : proceedings. Los Alamitos, IEEE, 2022, pp. 163–168, DOI: 10.1109/ICT4DA56482.2022.9971209
Jadhav A., Shandilya S. K. , Izonin I. et al. Effective Software Effort Estimation Leveraging Machine Learning for Digital Transformation, IEEE Access, 2023, Vol. 11, pp. 83523–83536. DOI: 10.1109/ACCESS.2023.3293432
Kumar S., Arora M., Sakshi et al. A Review of Effort Estimation in Agile Software Development using Machine Learning Techniques, Proceedings of the 2022 4th International Conference on Inventive Research in Computing Applications (ICIRCA), Coimbatore, India, 21–23 September 2022 : proceedings. Los Alamitos, IEEE, pp. 416–422. DOI: 10.1109/ICIRCA54612.2022.9985542
R. K. B. N. Software Effort Estimation using ANN (Back Propagation), Proceedings of the 2023 7th International Conference on Computing Methodologies and Communication (ICCMC), Erode, India, 23–25 February 2023 : proceedings. Los Alamitos, IEEE, pp. 1–2. DOI: 10.1109/ICCMC56507.2023.10084264.
Sousa A. O., Veloso D. T., Gonçalves H. M. et al. Applying Machine Learning to Estimate the Effort and Duration of Individual Tasks in Software Projects, IEEE Access, 2023, Vol. 11, pp. 89933–89946. DOI: 10.1109/ACCESS.2023.3307310
Nassif A. B., AbuTalib M., Capretz L. F. Software effort estimation from use case diagrams using nonlinear regression analysis, Electrical and Computer Engineering : IEEE Canadian Conference CCECE’20. London, ON, Canada, 30 Aug.–2 Sept., 2020 : proceedings,IEEE, 2020, pp. 1–4. DOI: 10.1109/CCECE47787.2020.9255712.
Prykhodko S., Prykhodko N., Knyrik K. Estimating the efforts of mobile application development in the planning phase using nonlinear regression analysis, Applied Computer Systems, 2020, Vol. 25, No. 2, pp. 172–179. DOI: 10.2478/acss2020-0019
[Nhung H. L. T. K., Hai V. V. , Silhavy R. et al. Parametric Software Effort Estimation Based on Optimizing Correction Factors and Multiple Linear Regression, IEEE Access, 2022. Vol. 10, pp. 2963–2986. DOI: 10.1109/ACCESS.2021.3139183
Sahoo P., Behera D. K., Mohanty J. R. et al. Effort Estimation of Software products by using UML Sequence models with Regression Analysis, Proceedings of the 2022 OITS International Conference on Information Technology (OCIT), Bhubaneswar, India, 14–16 December 2022 : proceedings. Los Alamitos, IEEE, pp. 97–101, DOI: 10.1109/OCIT56763.2022.00028.
Cibir E., Ayyildiz T. E. An Empirical Study on Software Test Effort Estimation for Defense Projects, IEEE Access, 2022, Vol. 10, pp. 48082–48087. DOI: 10.1109/ACCESS.2022.3172326
Yuan X., Su J., Yu C. and Ye S. Power Grid Software Cost Estimation Based on Improved COCOMO Model, Proceeding of the 2023 IEEE 3rd International Conference on Electronic Technology, Communication and Information (ICETCI), Changchun, China, 26–28 May 2023 : proceedings. Los Alamitos, IEEE, pp. 1265–1269. DOI: 10.1109/ICETCI57876.2023.10176686
Prykhodko S., Prykhodko N., Makarova L. et al. Outlier Detection in Non-Linear Regression Analysis Based on the Normalizing Transformations, Proceedings of the 15th International Conference on Advanced Trends in Radioelectronics, Telecommunications and Computer Engineering (TCSET), IEEE, Lviv-Slavske, 25–29 February 2020 : proceeding. Los Alamitos, IEEE, pp. 407–410. DOI: 10.1109/TCSET49122.2020.235464
CodeMR [Electronic resource]. Access mode: https://plugins.jetbrains.com/plugin/10811-codemr
Mardia K. V. Measures of multivariate skewness and kurtosis with applications, Biometrika, 1970, Vol. 57, pp. 519– 530. DOI: 10.1093/biomet/57.3.519
Frost J. Overfitting Regression Models: Problems, Detection, and Avoidance [Electronic resource]. Access mode: https://statisticsbyjim.com/regression/overfitting-regressionmodels/
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2024 S. B. Prykhodko, N. V. Prykhodko, A. V. Koltsov
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Creative Commons Licensing Notifications in the Copyright Notices
The journal allows the authors to hold the copyright without restrictions and to retain publishing rights without restrictions.
The journal allows readers to read, download, copy, distribute, print, search, or link to the full texts of its articles.
The journal allows to reuse and remixing of its content, in accordance with a Creative Commons license СС BY -SA.
Authors who publish with this journal agree to the following terms:
-
Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License CC BY-SA that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
-
Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
-
Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.