A MULTIPLE NON-LINEAR REGRESSION MODEL TO ESTIMATE THE AGILE TESTING EFFORTS FOR SMALL WEB PROJECTS

Context. Software testing effort estimation is one of the important problems in software development and software testing life cycle. The object of the study is the process of estimating the agile testing efforts for small Web projects. The subject of the study is the multiple regression models for estimating the agile testing efforts for small Web projects. Objective. The goal of the work is the creation of the multiple non-linear regression model for estimating the agile testing efforts for small Web projects on the basis of the Johnson multivariate normalizing transformation. Method. The model, confidence and prediction intervals of multiple non-linear regression for estimating the agile testing efforts for small Web projects are constructed on the basis of the Johnson multivariate normalizing transformation for non-Gaussian data with the help of appropriate techniques. The techniques based on the multiple non-linear regression analysis using the multivariate normalizing transformations to build the models, equations, confidence and prediction intervals of multiple non-linear regressions are used. The techniques allow to take into account the correlation between random variables in the case of normalization of multivariate non-Gaussian data. In general, this leads to a reduction of the mean magnitude of relative error, the widths of the confidence and prediction intervals in comparison with the linear models and nonlinear models constructed using univariate normalizing transformations. Results. Comparison of the constructed model with the linear model and non-linear regression models based on the decimal logarithm and the Johnson univariate transformation has been performed. Conclusions. The multiple non-linear regression model to estimate the agile testing efforts for small Web projects is firstly constructed on the basis of the Johnson multivariate transformation for B S family. This model, in comparison with other regression models (both linear and non-linear), has a smaller value of the mean magnitude of relative error, smaller widths of the confidence and prediction intervals. The prospects for further research may include the application of other multivariate normalizing transformations and data sets to construct the multiple non-linear regression model for estimating the agile testing efforts for small Web projects.

ABBREVIATIONS LB is lower bound; MD is Mahalanobis distance; MRE is a magnitude of relative error; MMRE is a mean magnitude of relative error; PRED is percentage of prediction; UB is upper bound. k is a number of independent variables (regressors); N is a number of data points; ( ) ; ψ is a vector of multivariate normalizing transformation,

INTRODUCTION
Software testing effort estimation is one of the important problems in software development and software testing life cycle. Agile testing is a software testing process that follows the principles of agile software development [1][2][3][4][5]. In comparison with waterfall testing, it is a new age software testing approach which leads to a reduction of testing efforts. Agile testing is well suited for small software projects including small Web projects.
The agile testing lifecycle consists of the 5 phases [5], the second of which is agile testing planning, that includes testing effort estimation. A testing effort estimation is a difficult problem, for the solution of which various mathematical models are applied.
Today one of the most well-known effort estimation model is the COCOMO II (COnstructive COst MOdel) [6]. The COCOMO II is a non-linear regression equation with parameters that are derived from historical data of software projects. This equation is built on the basis of univariate normalizing transformation in the decimal logarithm form. The paper [7] proposed the multiple linear and non-linear regression equations for estimating the testing efforts of software projects including large ones. However, a prediction regression equation result is a mean value of dependent random variable. There is no random error term in regression equation. A prediction regression model result is a value of dependent random variable, since there is the random error term in regression model. Therefore, to predict agile testing effort as a value of a dependent random variable there is the need to develop the appropriate non-linear regression models.
The object of study is the process of estimating the agile testing efforts for small Web projects.
The subject of study is the multiple non-linear regression models to estimate the agile testing efforts for small Web projects.
The purpose of the work is to construct the multiple non-linear regression model for estimating the agile testing efforts for small Web projects. The agile testing effort prediction results by constructed model should be better in comparison with other regression models, both linear and nonlinear, primarily on such standard evaluations as mean magnitude of relative error, widths of confidence and prediction intervals.

PROBLEM STATEMENT
to Gaussian random vector and the inverse transformation for (1) It is required to build the multiple non-linear regression model in the form on the basis of the transformations (1) and (2).

REVIEW OF THE LITERATURE
A normalizing transformation is often a good way to build the models, equations, confidence and prediction intervals of multiple non-linear regressions [8][9][10][11][12][13]. According to [9] transformations are made for essentially four purposes, two of which are: firstly, to obtain approximate normality for the distribution of the error term (residuals) or the dependent random variable, secondly, to transform the response and/or the predictor in such a way that the strength of the linear relationship between new variables (normalized variables) is better than the linear relationship between dependent and independent random variables.
Well-known techniques for building the models, equations, confidence and prediction intervals of multiple non-linear regressions are based on the univariate normalizing transformations (such as, the decimal logarithm, the natural logarithm, the Box-Cox transformation), which do not take into account the correlation between random variables in the case of normalization of multivariate non-Gaussian data. Application of univariate normalizing transformations for building the multiple non-linear regression models does not always lead to good prediction results by such regression models, primarily on such standard evaluations as mean magnitude of relative error, widths of confidence and prediction intervals [13]. This leads to the need to use the multivariate normalizing transformations.
In [13] the techniques to build the models, confidence and prediction intervals of multiple non-linear regressions for multivariate non-Gaussian data on the basis of the bijective multivariate normalizing transformations were proposed. The techniques consist of three steps. In the first step, a set of multivariate non-Gaussian data is normalized using a bijective multivariate normalizing transformation. In the second step, the model, confidence and prediction intervals of linear regression for the normalized data are built. In the third step, the model, confidence and prediction intervals of multiple non-linear regression for multivariate non-Gaussian data are constructed on the basis of the model, confidence and prediction intervals of linear regression for the normalized data and the multivariate normalizing transformation.
Non-linear regression prediction results by models, which constructed in the papers [13,14] on the basis of the Johnson multivariate normalizing transformation, are better in comparison with other regression models, both linear and nonlinear, primarily on such standard evaluations as mean magnitude of relative error, widths of confidence and prediction intervals.
This leads to the need to develop the multiple nonlinear regression model for estimating the agile testing efforts for small Web projects on the basis of the multivariate normalizing transformations.

MATERIALS AND METHODS
After normalizing the non-Gaussian data by the transformation (1) the linear regression model is built for normalized data. The linear regression model for normalized data will have the form [13] ( ) ε (3) After that the multiple non-linear regression model is built on the basis of the linear regression model (3) for the normalized data and the transformations (1) and (2). The non-linear regression model will have the form [13] ( ) [ ] The technique to build a confidence interval of multiple non-linear regression is based on a confidence interval of linear regression for normalized data, and transformations (1) and (2) [13]: The technique to build a prediction interval of multiple non-linear regression is based on a prediction interval of linear regression for normalized data, and transformations (1) and (2) [13]: For normalizing the multivariate non-Gaussian data, we use the Johnson translation system. In our case the Johnson normalizing translation is given by [14] ( ) Here . In our case X equals Y, 1 X , 2 X or 3 X respectively. The model, equation, confidence and prediction intervals of multiple non-linear regression to estimate agile testing efforts for small Web projects are constructed on the basis of the Johnson multivariate normalizing transformation for the four-dimensional non-Gaussian data set from Table 1 for 40 small Web projects (rows 1-40). Also Table 1 contains the values of squared Mahalanobis distance (MD) for 41 and 40 (after outlier cutoff) data rows. For detecting the outliers in the data from Table 1 we use the technique based on multivariate normalizing transformations and the squared MD [15]. There is one outlier in the data from Table 1   . The data of system 41 is multivariate outlier, since for this data row the squared MD equals to 20.43 is greater than the value of the quantile of the Chi-Square distribution, which equals to 14.86 for 0.005 significance level. The same result was obtained for the univariate transformation in the decimal logarithm form. In this case the data of system 41 is multivariate outlier too, since for this data row the squared MD equals to 20.26.
The squared MD values for 40 data rows indicate there are no outliers in this data from Table 1. .
After normalizing the non-Gaussian data by the multivariate transformation (5) for B S family the linear regression model is built for normalized data Parameters of the linear regression model (7) were estimated by the least square method. Estimators for parameters of the model (7) are such: . After that the multiple non-linear regression model (4) is built (8) is the multiple non-linear regression model to estimate the agile testing efforts for small Web projects.

EXPERIMENTS
For comparison of the model (8) with other multiple models one linear regression model and two non-linear regression models are built on the basis of 40 data rows from Table 1 and two univariate normalizing transformations: the decimal logarithm transformation and the Johnson transformation.
The multiple linear regression model has the form where the estimators for parameters are: . The multiple non-linear regression model is constructed on the basis of the linear regression model (7) for the normalized data and the decimal logarithm transformation where the estimators for parameters are: 0.4500 . The multiple non-linear regression model is constructed on the basis of the linear regression model (7) for the normalized data and the Johnson univariate transformation for B S family (6). In this case the estimators for parameters of the model (8) are: . The computer program implementing the constructed models (8), (9) and (10) was developed to conduct experiments. The program was written in the sci-language for the Scilab system. Scilab (http://www.scilab.org) is the free and open source software, the alternative to commercial packages for system modeling and simulation packages such as MATLAB and MATRIXx.

RESULTS
If the Gaussian random variable ε equals zero the regression models (8), (9) and (10)   The prediction results by model (8) and values of MRE are shown in the Table 2 for two cases: Johnson's univariate and multivariate normalizing transformations. Table 2 also contains the prediction results by linear regression model (9) for values of components of vector X from Table 1  The confidence and prediction intervals of multiple non-linear regression are defined for the data from Table 1. Table 2 contains the lower (LB) and upper (UB) bounds of the confidence intervals of linear and multiple non-linear regressions on the basis of univariate and multivariate transformations respectively for 0.05 significance level. The widths of the confidence interval of multiple non-linear regression on the basis of the Johnson multivariate transformation are smaller than for linear regression (9) for 34 rows of data: 1-25, 27-35. Also the widths of the confidence interval of multiple non-linear regression on the basis of the Johnson multivariate transformation are less for more data rows than for multiple non-linear regressions following the univariate transformations, both decimal logarithm and the Johnson. The widths of the confidence interval of multiple non-linear regression on the basis of the Johnson multivariate transformation are smaller than following the decimal logarithm univariate transformation for 37 rows of data: 1-31, 33, 36-40. And ones are smaller than following the Johnson univariate transformation for 34 rows of data: 2, 3, 5, 6, 8-11, 13-37 and 39.
Approximately the same results are obtained for the prediction intervals of regressions. Table 3 contains the lower (LB) and upper (UB) bounds of the prediction intervals of multiple linear and non-linear regressions on the basis of univariate and multivariate transformations respectively for 0.05 significance level. Note the lower bounds of the prediction interval of linear regression (9) are negative for the four rows of data: 1, 4, 5 and 7. All the lower bounds of the prediction interval of multiple non-linear regressions are positive. The widths of the prediction interval of multiple non-linear regression on the basis of the Johnson multivariate transformation are smaller than for linear regression (9) for 35 rows of data: 1-35. Also the widths of the prediction interval of multiple non-linear regression on the basis of the Johnson multivariate transformation are smaller for more data rows than for multiple non-linear regressions following the univariate transformations, both decimal logarithm and the Johnson. The widths of the prediction interval of multiple non-linear regression on the basis of the Johnson multivariate transformation are smaller than following the decimal logarithm univariate transformation for 38 rows of data: 1-31, 33, 35-40. And ones are smaller than following the Johnson univariate transformation for 26 rows of data: 1, 2, 4-19, 21, 23-26, 28-30.
The null hypothesis that the observed frequency distribution of residuals for linear regression models (7) and (9) is the same as the normal distribution was tested by Pearson's chi-squared test. We can accept the null hypothesis that the distribution of residuals for linear regression model (7) is the same as the normal distribution for normalized data, which normalized by the Johnson multivariate transformation only, since the chisquared test statistic value equals to 5.59 is smaller than the critical value of the chi-square, which equals to 7.81 for 3 degrees of freedom and 0.05 significance level. The chi-squared test statistic values equal to 60.61, 12.41 and 17.34 respectively for the model (9), the model (7) The estimator of multivariate kurtosis given by [16] ( ) ( ) In our case, in the formulas (11) and (12), the vectors Z and Z should be replaced by the vectors P and P or T and T , respectively, for the initial (non-Gaussian) or normalized data. It is known that  Table 1, the normalized data on the basis of the decimal logarithm transformation, the Johnson univariate and multivariate transformations respectively. The values of these estimators indicate that the necessary condition for multivariate normality is practically performed for the normalized data on the basis of the decimal logarithm transformation and the Johnson multivariate transformation, it does not hold for other data.

DISCUSSION
As it evident from the Table 3, the values of lower bounds of the prediction intervals of linear regression (9) for estimating the agile testing efforts for small Web projects are negative for some data rows. In our opinion, the presence of negative values may be explained by two reasons. Firstly, for the initial data from Table 1, four basic assumptions that justify the use of linear regression model, one of which is normality of the error distribution, are not valid. Moreover, the chi-squared test statistic value for residuals in linear regression model (9) is larger than for residuals in linear regression model (7) for normalized data, which normalized by the Johnson multivariate transformation, more than 10 times. Secondly, there is reason to reject the hypothesis that the sample of normalized data comes from a multivariate normal distribution. Note all the lower bounds of the prediction intervals of multiple non-linear regressions are positive.
Also note that in our case for the data from Table 1, the poor normalization of multivariate non-Gaussian data using the Johnson univariate transformation leads to an increase in the widths of the confidence and prediction intervals of multiple non-linear regression for a larger number of data rows compared to the Johnson multivariate transformation.
The widths of the confidence and prediction intervals of multiple non-linear regression on the basis of the Johnson multivariate transformation are smaller for more data rows than for linear regression and multiple nonlinear regressions following the univariate transformations, both decimal logarithm and the Johnson. Also the MMRE value is smaller for the model (8) for the Johnson multivariate transformation in comparison with all other models, both linear and non-linear, based on univariate transformations. This may be explained best multivariate normalization and the fact that there is no reason to reject the null hypothesis that the distribution of residuals for linear regression model (7) is the same as the normal distribution for normalized data, which normalized by the Johnson multivariate transformation only.

CONCLUSIONS
The important problem of increase of confidence of agile testing effort estimation for small Web projects is solved.
The scientific novelty of obtained results is that the multiple non-linear regression model to estimate the agile testing efforts for small Web projects is firstly constructed on the basis of the Johnson multivariate transformation for B S family. This model, in comparison with other regression models (both linear and non-linear), has a smaller value of the mean magnitude of relative error, smaller widths of the confidence and prediction intervals of multiple non-linear regression.
The practical significance of obtained results is that the software realizing the constructed model is developed in the sci-language for Scilab. The experimental results allow to recommend the constructed model for use in practice.
Prospects for further research may include the application of other multivariate normalizing transformations and data sets to construct the multiple non-linear regression model for estimating the agile testing efforts for small Web projects.