Modified Jackknife Ridge Estimator for Beta Regression Model With Application to Chemical Data

: The linear regression model is not applicable when the response variable's value comes in the form of percentages, proportions, and rates, which are restricted to the interval (0, 1). In this situation, we applied the beta regression model (BRM) which is popularly used to model chemical, environmental and biological data. The parameters in the model are often estimated using the conventional method of maximum likelihood. However, this estimator is unreliable and inefficient when the explanatory variables are linearly correlated-a condition known as multicollinearity. Thus, we developed the Jackknife Beta ridge and the modified Jackknife Beta ridge estimator for efficient estimation of the regression coefficient when there is multicollinearity. The properties of the new estimators were derived. We compared the performance of the estimator with the existing estimators theoretically using the mean squared error criterion. Furthermore, we conducted a simulation study and a chemical data to evaluate the new estimators’ performance. The theoretical comparison, simulation and real-life application results established the dominance of the proposed methods.


INTRODUCTION
The beta regression model has been common in many areas, primarily economic and medical research, such as income share, unemployment rates in certain nations, the Gini index for each region, graduation rates in major universities, or the percentage of body fat in medical subjects. Beta regression model, like any regression model in the context of generalized linear models (GLMs), is used to examine the effect of certain explanatory variables on a non-normal response variable. However, in beta regression, the response component is restricted to an interval (0,1) such as proportions, percentages, and fractions.
Multicollinearity is a popular issue in econometric modelling introduced by Frisch [14]. It indicates that there is a strong association between the explanatory variables. It is well established that the covariance matrix of the maximum likelihood (ML) estimator is ill-conditioned in the case of severing multicollinearity. One of the negative consequences of this issue is that the variance of the regression coefficients gets inflated. Therefore, the significance and the magnitude of the coefficients are affected. Many of the conventional approaches used to address this issue include: gathering additional data, re-specifying the model, or removing the correlated variable/s. Throughout recent years, shrinkage methods have become a commonly recognized and more effective methodology for solving this issue throughout regression models. To solve this issue, Hoerl and Kennard [17,18] proposed the ridge estimator. The concept behind the ridge estimator is to apply a small definite amount (k) to the diagonal entries of the covariance matrix to increase the conditioning of this matrix, reduce the MSE and achieve consistent coefficients. For a review of this method in both linear and GLMs, see, e.g., Kibria and Lukman [20], Algamal [4], Abonazel [1], Rady et al. [29], Abonazel and Farghali [2], Farghali et al. [12], and Lukman et al. [23].
One of the drawbacks of the ridge estimator is that estimated parameters are nonlinear functions of the ridge parameter and that the small selected might not be high enough to solve multicollinearity.

BETA RIDGE REGRESSION
The Beta regression model was first introduced by Ferrari and Cribari-Neto [13] by relating the mean function of its response variable to a set of linear predictors through a link function. This model includes a precision parameter whose reciprocal is considered as a dispersion measure. Suppose that is a continuous random variable that follows a beta distribution, the probability density function has the following form: for 0 < < 1; 0 < < 1; > 0, and Γ(. ) is the gamma function, and is the precision parameter that can be written as in [6]: = 1− 2 2 . The mean and variance of the beta probability distribution are: E( ) = , var( ) = (1 − ) 2 . The model allows , depending on covariates as follows: where g(. ) be a monotonic differentiable link function used to relate the systematic component with the random component, = ( 1 , … , ) is a × 1 vector of unknown parameters, x = ( 1 , … , ) is the vector of regressors, and is a linear predictor.
Estimation of the beta regression parameters is done by using the ML method [10]. The loglikelihood function of the beta regression model is given by: Differentiating the log-likelihood in Eq. (3) with respect to gives us the score function for which is given by: 2 )), such that (. ) denoting the digamma function. The iterative reweighted least-squares (IWLS) algorithm or Fisher scoring algorithm used for estimating [8,9]. The form of this algorithm can be written as: The initial value of can be obtained by the least-squares estimation, while the initial value for each precision parameter is: where ̂ and ̂2 values are obtained from linear regression. Given = 0, 1, 2, … is the number of iterations that are performed, convergence occurs when the difference between successive estimates becomes smaller than a given small constant. At the final step, the ML estimator of is obtained as: where is an × matrix of regressors, ̂=̂+̂− 1̂( * − * ), and ̂= diag(̂1, … ,̂); Here, ̂ and ̂ are the matrices W and A, respectively, evaluated at the ML estimator. The ML estimator of is normally distributed with asymptotic mean vectors (̂B R ) = and asymptotic covariance matrix: Cov(̂B R ) = 1 (̂) −1 .
Hence the asymptotic mean squared error (MSE) [11] of the ML estimator based on the asymptotic covariance matrices is: where is the eigenvalue of the ̂ matrix.

Jackknife Beta Ridge Regression
In the context of the linear regression model, the Jackknife procedure has been proposed by Singh et al. [31] to alleviate the problem of bias in generalized ridge estimator. The theoretical and application of the jackknife estimator have been studied by several authors [24,15,7,32,33,19,34,35]. Now, we introduce the Jackknifed ridge estimator of the beta regression. Based on Λ and Q matrices, the beta regression estimator of Eq. (6) can be re-written as BR BRˆ, As a result, the BRR estimator of Eq. (9) is rewritten as: (12) Following the idea of Jackknife approach [16], let () Then, Eq. (13) can be expressed as

M WM I m m m M WM I m
Using the weighted pseudo values [16], which are calculated as Then, the Jackknifed beta ridge regression estimator (JBRR) is defined as

Modified Jackknife Beta Ridge Regression
In the linear regression model, the modified Jackknife ridge estimator has been proposed by Batah, et al. [9] by combining the ideas of the generalized ridge estimator and Jackknifed ridge estimator, which is given by Singh et al. [31]. In a similar way, we propose a new estimator of JBRR τ . Following Algamal [4], the proposed estimator, which is denoted by modified Jackknifed beta ridge estimator (MJBRR), is designated as in the case of the JBRR  by replacing BR  with BRR  . The MJBRR estimator is defined as: (15) The asymptotic properties of MJBRR  are defined as, respectively HΛ H I I (18).

SUPERIORITY OF PROPOSED ESTIMATORS IN TERMS OF MSE
In general, to compare the two different estimators, one should be concerned with their performance in terms of MSE. For two given estimators 1 β and 2 β of β , the estimator 2 β is said to be superior to is a positive definite matrix. It is clear that J is a positive definite matrix. As a result, the 1  is a nonnegative definite iff T FF  is a nonnegative definite matrix. Thus, Consequently, the proof is completed.

SIMULATION STUDY
In this section, the estimators' performance is investigation through a simulation study. Correlated explanatory variables are generated in line with the work of Kibria and Lukman [20]:  [21,12]. The sample sizes n are 30, 50 and 100, while the number of explanatory variables is taken to be 4 and 8. The simulation study is conducted by adopting the R programming language with the help of betareg{} package [28]. Following Qasim et al. [27], the MSE and the median squared error (MdSE) were employed to evaluate the estimators' performance: (24) where j is the number of replications which is set to be 2000 and * j  is the estimated vector of  in the jth replication.
The initial values for parameter vector and are estimated by following Ferrari and Cribari-Neto [13], Abonazel and Taha [3], and Qasim et al. [27], in the betareg{} package.
The biasing parameter for beta ridge and Jackknife beta ridge was determined using the following biasing parameters:  Table 1 Table 2. The results from Table 1 and 2 show that the Jackknife Beta ridge regression with biasing parameter k 2 has the smallest mean and median squared error for all the considered sample sizes. The maximum likelihood estimator possesses the least performance due to the presence of multicollinearity. The ridge regression estimator outperforms the maximum likelihood estimator. It is also evident from the result that the median squared error exhibit smaller values than the mean squared error. The result in this study agrees with IJMSCS ISSN: xxxx-xxxx 21 the conclusion made in the context of the linear regression model. The jackknife version of the ridge estimator produces an estimator with smaller MSE and MdSE. When the values of  and increases the MSE and MdSE also increases but decreases as the sample sizes increases from 30 to 100. In general, the proposed estimator performs better than the other estimators for all simulated cases. We observed that the MSE and MdSE values of all the estimators increased as increases. The Jackknife beta ridge estimator outperform BR and BRR under all specifications Overall, the improvement in the proposed over the beta ridge regression is significant.

GASOLINE YIELD DATA
This chemical dataset was originally obtained by Prater [26], and later used by the following authors: Ospina et al. [25], Karlsson et al. [21]. The dataset contains 32 observations on the response and on the independent variables. The variables in the study are described as follows: The dependent variable y is the proportion of crude oil after distillation and fractionation while the explanatory variables are crude oil gravity (gravity), vapor pressure of crude oil (pressure), temperature at which 10% of the crude oil has vaporized (temp10) and temperature at which all petrol in the amount of crude oil vaporizes (temp). Atkinson [5] analyzed this dataset using the linear regression model and observed some anomaly in the distribution of the error. Recently, Karlsson et al. [21] shows that the beta regression model is more suitable to model the data. We used the condition index to diagnose if the model is multicollinearity free. Condition index (CI) is defined as follows: = √ max eigenvalue min eigenvalue (27) According to literature if the CI exceeds 1000, there is severe multicollinearity. The CI for the dataset under study is 11281.4 which signal severe multicollinearity. The Beta regression estimates for each of the estimation techniques and their corresponding mean squared error are shown in Table 3. The result show that the ridge and Jackknife ridge regression estimates possess the same regression coefficient sign except the maximum likelihood estimate. This is traceable to one of the consequences of multicollinearity on the maximum likelihood estimator as mentioned by Lukman and Ayinde [22]. The proposed method has the lowest mean squared error. This agree with the literature that the jackknife ridge estimator produced a reduced bias which in turn result to a lower mean squared error than the Beta ridge estimator. The estimator performance is also a function of the Biasing parameter. It is also evident that jackknifing the ridge estimator reduced the mean squared error of the Beat ridge estimator.

CONCLUSIONS
Multicollinearity is a threat to parameter estimation and inferences in the linear regression model and the generalized regression model such as the Beta regression model (BRM). The regression parameters are popularly estimated using the method of maximum likelihood. However, the performance of the maximum likelihood suffers setbacks when there is multicollinearity. The Beta ridge estimator produces a more reliable IJMSCS ISSN: xxxx-xxxx 23 estimate with minimum mean squared error than the earlier mentioned estimator. The limitation of the Beta ridge estimator is high bias. An effective means of reducing the bias is to employ the Jackknife procedure. In this study, we proposed the Jackknife Beta ridge and Modified Jackknife Beta ridge estimators. We compared the estimators' performance in the following ways: theoretically, simulation study and real-life application. In conclusion, the new methods generally produced a more efficient estimates when there is multicollinearity in the BRM.