0% found this document useful (0 votes)
7 views

9

notes on sample survey iit kanpur

Uploaded by

Swagato Karmakar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
7 views

9

notes on sample survey iit kanpur

Uploaded by

Swagato Karmakar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 25
Chapter 9 Multicollineari A. basic assumption is multiple linear regression model is that the rank of the matrix of observations on explanatory variables is the same as the number of explanatory variables. In other words, such a matrix is of full column rank. This, in turn, implies that all the explanatory variables are independent, i.e., there is no linear relationship among the explanatory variables. It is termed that the explanatory variables are orthogonal. In many situations in practice, the explanatory variables may not remain independent due to various reasons, The situation where the explanatory variables are highly intercorrelated is referred to as multicollinearity. Consider the multiple regression model X Be e~ N(0,o7D) with k explanatory variables X,,X,,....X, with usual assumptions including Rank (X) = k Assume the observations on all X,'s and y,'s are centered and scaled to unit length. So - X'X becomes a kxk matrix of correlation coefficients between the explanatory variables and - X'y becomes a kx1 vector of correlation coefficients between explanatory and study variables. Let X =[X,,X,,...%,] where X, is the j* column of X denoting the n observations on X,. The column vectors X,,X;,...X, are linearly dependent if there exists a set of constants f,,f,,...f,, not all zero, such that If this holds exactly for a subset of the X,,X,,...,X,, then rank(X'X)41, then Var(h,) = Var(b,) > 0 So if variables are perfectly collinear, the variance of OLSEs becomes large. This indicates highly unreliable estimates, and this is an inadmissible situation. ‘Regression Analysis | Chapter 9 | Multicollinearity | Shalabh, IIT Kanpur Consider the following result r 0.99 09 on 0 Var(b,) =Var(b,) 500” So” 1.0le? | o #1 because X'X The standard errors of , and, rise sharply as r—>+1 and they break down at_r becomes non-singular. * If r is close to 0, then multicollinearity does not harm, and it is termed as non-harmful multicollinearity. * If r is close to +1 or -1 then multicollinearity inflates the variance, and it rises terribly. This is termed as harmful multicollinearity. There is no clear cut boundary to disti iguish between the harmful and non-harmful multicollinearity. Generally, if r is low, the multicollinearity is considered as non-harmful, and if r is high, the multicollinearity is regarded as harmful. In case of near or high multicollinearity, the following possible consequences are encountered. 1. The OLSE remains an unbiased estimator of f, but its sampling variance becomes very large. So OLSE becomes imprecise, and property of BLUE does not hold anymore. 2. Due to large standard errors, the regression coefficients may not appear significant. Consequently, essential variables may be dropped. For example, to test. H,, : 8, =0, we use t—ratio as 5, ) Since Var(b,) is large, so f, is small and consequently H, is more often accepted. ‘Thus harmful multicollinearity intends to delete important variables. 3. Due to large standard errors, the large confidence region may arise. For example, the confidence ( interval is given by | b, \ \ —~ Var(b,) |. When Var(b,) it becomes large, then the confidence interval becomes wider. ‘Regression Analysis | Chapter 9 | Multicollinearity | Shalabh, IIT Kanpur 4, The OLSE may be sensitive to small changes in the values of explanatory variables. If some observations are added or dropped, OLSE may change considerably in magnitude as well as in ‘Thus OLSE loses sign, Ideally, OLSE should not change with the inclusion or deletion of variables. stability and robustness. When the number of explanatory variables is more than two, say k as X,,Xz,..X, then the j* diagonal element of C=(X'X)" is 1 Cy TR where R is the multiple correlation coefficient or the coefficient of determination from the regression of X, onthe remaining (k-1) explanatory variables. If _X, is highly correlated with any subset of other (k—1) explanatory variables then R? is high and close to 1. Consequently, the variance of " OLSE Var(b,)= + becomes very high. The covariance between b, and b, will also be large if X, and X, are involved in the linear relationship leading to multicollinearity. The least-squares estimates b, become too large in absolute value in the presence of multicollinearity. For example, consider the squared distance between b and # as LE =(b-p)b-f) E(P) = E(b, - 8, > Var(b,) =o'r(X'XY. ‘The trace of a matrix is the same as the sum of its eigenvalues. If 4,,/,,...,2, are the eigenvalues of (X'X), then + are the eigenvalues of (X'X)"' and hence ‘Regression Analysis | Chapter 9 | Multicollinearity | Shalabh, IIT Kanpur If (XX) is ill-conditioned due to the presence of multicollinearity, then at least one of the eigenvalue will be small, So the distance between b and # may also be substantial. Thus E() = E@-By(b-B) o'tr(X'X)' = E(b'b-2b' B+ B'B) => E(b'b) =o tr(X'XY' + BB => is generally longer than = OLSE is too large in absolute value. The least-squares produces wrong estimates of parameters in the presence of multicollinearity. This does not imply that the fitted model provides wrong predictions also. If the predictions are confined to x-space with non-harmful multicollinearity, then predictions are satisfactory, Multicollinearity diagnostics An important question arises about how to diagnose the presence of multicollinearity in the data on the basis of given sample information. Several diagnostic measures are available, and each of them is based on a particular approach, It is difficult to say that which of the diagnostic is best or ultimate, Some of the popular and important diagnostics are described further. The detection of multicollinearity involves 3 aspects: (i) Determining its presence. (ii) Determining its severity. (iii) Determining its form or location, 1. Determinant of x'x (|x'x|) This measure is based on the fact that the matrix X'X becomes ill-conditioned in the presence of multicollinearity. The value of the determinant of X'X, ie. |X'X| declines as the degree of multicollinearity increases. If Rank(X'X)0, the degree of ‘multicollinearity increases and it becomes exact or perfect at |X"X| Thus |X'X| serves as a measure of multicollinearity and |X'X|=0 indicates that perfect multicollinearity exists. ‘Regression Analysis | Chapter 9 | Multicollinearity | Shalabh, HIT Kanpur Limitations: This measure has the following limitations (i) It is not bounded as 0.<|X'X| <=. (ii) It is affected by the dispersion of explanatory variables. For example, if k = 2, then where 1, is the correlation coefficient between and x,. So |X'X| depends on the correlation coefficient and variability of the explanatory variable. If explanatory variables have very low variability, then |X'X| may tend to zero, which will indicate the presence of multicollinearity and which is not the case so. (iii) It gives no idea about the relative effects on individual coefficients. If multicollinearity is present, then it will not indicate that which variable in |X" X| is causing multicollinearity and is hard to determine, 2. Inspection of correlation matrix The inspection of off-diagonal elements 7, in XX gives an idea about the presence of multicollinearity. If X, and X, are nearly linearly dependent, then |,| will be close to 1, Note that the observations in X are standardized in the sense that each observation is subtracted from the mean of that variable and divided by the square root of the corrected sum of squares of that variable, When more than two explanatory variables are considered, and if they are involved in near-linear dependency, then it is not necessary that any of the +, will be large. Generally, a pairwise inspection of correlation coefficients is not sufficient for detecting multicollinearity in the data, ‘Regression Analysis | Chapter 9 | Multicollinearity | Shalabh, IIT Kanpur 3. Determinant of correlation matrix Let D be the determinant of the correlation matrix then 0< D <1. If D=0 then it indicates the existence of exact linear dependence among explanatory variables. If D=1 then the columns of X matrix are orthonormal. Thus a value close to 0 is an indication of a high degree of multicollinearity. Any value of D between 0 and 1 gives an idea of the degree of multicollinearity. Limitation It gives no information about the number of linear dependencies among explanatory variables. Advantages over |X"x| (i) Itisa bounded measure, 0< D<1 (ii) Itis not affected by the dispersion of explanatory variables. For example, when k =2, beats] 8 |= d=ra) an Sx 4, Measure based on partial regression: A measure of multicollinearity can be obtained on the basis of coefficients of determination based on partial regression. Let R? be the coefficient of determination in the full model, i.c., based on all explanatory variables and R? be the coefficient of determination in the model when the i explanatory variable is dropped, i=1,2,...,k, and R? = Max(R?, R?,....R?). Procedure: (i) Drop one of the explanatory variables among & variables, say X,. (ii) Runregression of y over rest of the (k-1) variables X,,X,y...; (iii) Calculate R? (iv) Similarly, calculate R?,R?,.. (v) Find R? = Max(R?,R},....R?). (vi) Determine R?— R7. ‘Regression Analysis | Chapter 9 | Multicollinearity | Shalabh, IIT Kanpur The quantity (R?—R?) provides a measure of multicollinearity. If multicollinearity is present, R? will be high. Higher the degree of multicollinearity, higher the value of R?. So in the presence of multicollinearity, (R’ - R?) be low. Thus if (R’ - R?) is close to 0, it indicates the high degree of multicollinearity. Limitations: (i) It gives no information about the underlying relations about explanatory variables, i.e., how many relationships are present or how many explanatory variables are responsible for the multicollinearity. (ii) A small value of (R?—R}) may occur because of poor specification of the model also and it ‘may be inferred in such situation that multicollinearity is present. 5. Variance inflation factors (VIF The matrix X'X becomes ill-conditioned in the presence of multicollinearity in the data. So the diagonal elements of C=(X'X)" helps in the detection of multicollinearity. If R? denotes the coefficient of determination obtained when X, is regressed on the remaining (k—1) variables excluding X,, then the J diagonal element of C is 1 Cy = Re 1-R If X, is nearly orthogonal to remaining explanatory variables, then R} is small and consequently C, is close to 1 If X, is nearly linearly dependent on a subset of remaining explanatory variables, then Rj is close to 1 and consequently C,, is large. Since the variance of" OLSE of f, is Var(b,)=0°C, ‘Regression Analysis | Chapter 9 | Multicollinearity | Shalabh, IIT Kanpur So C,, is the factor by which the variance of b, increases when the explanatory variables are near-linear dependent, Based on this concept, the variance inflation factor for the explanatory variable is defined as vir, = TR This is the factor which is responsible for inflating the sampling variance. The combined effect of dependencies among the explanatory variables on the variance of a term is measured by the VIF of that term in the model. One or more large VIF indicate the presence of multicollinearity in the data. In practice, usually, a VF >5 or 10 indicates that the associated regression coefficients are poorly estimated because of multicollinearity. If regression coefficients are estimated by OLSE and its variance is o°(X'XY". So VIF indicates that a part of this variance is given by VIF). Limitation: (i) Itsheds no light on the number of dependencies among the explanatory variables. (i) The rule of VIF > 5 or 10 is a rule of thumb which may differ from one situation to another situation. Another interpretation of VIF; The VIFS can also be viewed as follows. The confidence interval of 7" OLSE of £, is given by {oa BC, t, \ Seat) ) The length of the confidence interval is 1, = 26°C, 1, ‘Regression Analysis | Chapter 9 | Multicollinearity | Shalabh, IIT Kanpur 10 Now consider a situation where X is an orthogonal matrix, ie., X'X=J so that C, =1, sample size is ) the same as earlier and the same root me: |. then the length of confiden squares (LG, -2 Mie } interval becomes LM=26t, Consider the ratio due to the presence of multicollinearity. 6. Condition number and condition index: Let A, AysurAy be the eigenvalues (or characteristic roots) of X'X. Let Page = Ma dys Ayyony Aa) 4, Min(A,, Ay ,.054,). The condition number (CN) is defined as cv =40, o1000, then it indicates a severe (or strong) multicollinearity. The condition number is based only or two eigenvalues: ,,, and ,,,. Another measures are condition indices which use the information on other eigenvalues. ‘Regression Analysis | Chapter 9 | Multicollinearity | Shalabh, HIT Kanpur on The condition indices of X'X are defined as F712, In fact, the largest C, The number of condition indices that are large, say more than 1000, indicate the number of near-linear dependencies in XX. A limitation of CN and C, is that they are unbounded mé sures 0 Var(b,) = 0" where ¥,,¥ja,..5My are the elements in V. The condition indices are 4 SM FEUD on ae k, ‘Regression Analysis | Chapter 9 | Multicollinearity | Shalabh, IIT Kanpur 12 Procedure: (i) Find condition index C,,C,,. (ii) (@) Identify those 4,'s for which C, is greater than the danger level 1000. (b) This gives the number of linear dependencies (c) Don’t consider those C,'s which are below the danger level. (iii) For such 4's with condition index above the danger level, choose one such eigenvalue, say Ay (iv) Find the value of the proportion of variance corresponding to 4, in Var(b,),Var(b,)su.Var(b,) as _O}fa) vil Soha) Py (y 2) Note that Gl can be found from the expression i.e., corresponding to j* factor. The proportion of variance p, provides a measure of multicollinearity If p,>0.5, it indicates that 2, is adversely affected by the multicollinearity, i.e., an estimate of £, is influenced by the presence of multicollinearity Itis a good diagnostic tool in the sense that it tells about the presence of harmful multicollinearity as well as also indicates the number of linear dependencies responsible for multicollinearity. This diagnostic is better than other diagnostics. ‘Regression Analysis | Chapter 9 | Multicollinearity | Shalabh, IIT Kanpur The condition indices are also defined by the singular value decomposition of the matrix as follows: X =UDV' where U is mxk matrix, Vo is kxk matrix, U'U=I, V'V=1,D is kxk matrix, D = diag(t,. flys. Hy) and fy, fy, 4, are the singular values of X,V is a matrix whose columns are eigenvectors corresponding to eigenvalues of X'X and U is a matrix whose columns are the eigenvectors associated with the & nonzero eigenvalues of X'X. The condition indices of X matrix are defined as 9, = HOS, FH1 Qk 4 WHETE Hye = Marx Hy oes Hy) Tf AyAgyon A are the eigenvalues of XX then XX =(UDV''UDV'=VD*V" AV, so wh, j=1,2 Note that with 2 Var(b,) = 0° LF y VIE, = ‘ ar (ara) VIF, Py The ill-conditioning in X is reflected in the size of singular values. There will be one small singular value for each non-linear dependency. The extent of ill-conditioning is described by how small is 1, relative to Hs It is suggested that the explanatory variables should be scaled to unit length but should not be centered when computing p,. This will helps in diagnosing the role of intercept term in near-linear dependence. No unique guidance is available in the literature on the issue of centering the explanatory variables. The centering makes the intercept orthogonal to explanatory variables. So this may remove the ill-conditioning due to intercept term in the model. ‘Regression Analysis | Chapter 9 | Multicollinearity | Shalabh, IIT Kanpur 14 Remedies for multicollinearity: Various techniques have been proposed to deal with the problems resulting from the presence of multicollinearity in the data 1. Obtain more data The harmful multicollinearity arises sentially because the rank of X'X falls below & and |X*X| is close to zero. Additional data may help in reducing the sampling variance of the estimates. The data need to be collected such that it helps in breaking up the multicollinearity in the data, It is always not possible to collect additional data for various reasons as follows. ‘+ The experiment and process have finished and no longer available. ‘© The economic constraints may also not allow collecting the additional data, The additional data may not match with the earlier collected data and may be unusual. ‘© If the data is in time series, then longer time series may force to take ignore data that is too far in the past. ‘+ If multicollinearity is due to any identity or exact relationship, then increasing the sample size will not help. ‘+ Sometimes, it is not advisable to use the data even if it is available. For example, if the data on consumption pattern is available for the years 1950-2010, then one may not like to use it as the consumption pattern usually does not remains the same for such a long period. 2. Drop some variables that are collinear: If possible, identify the variables which seem to cause multicollinearity. These collinear variables can be dropped so as to match the condition of fall rank of X —matrix. The process of omitting the variables way be carried out on the basis of some kind of ordering of explanatory variables, e.g., those variables can be deleted first wl ich have smaller value of t-ratio. In another example, suppose the experimenter is not interested in all the parameters. In such cases, one can get the estimators of the parameters of interest which have smaller mean squared errors them the variance of OLSE by dropping some variables. If some variables are eliminated, then this may reduce the predictive power of the model. Sometim is no assurance of how the model will exhibit less multicollinearity. ‘Regression Analysis | Chapter 9 | Multicollinearity | Shalabh, IIT Kanpur 1S 3. Use some relevant prior informatio: One may search for some relevant prior information about the regression coefficients. This may lead to the specification of estimates of some coefficients. The more general situation includes the specification of some exact linear restrictions and stochastic linear restrictions. The procedures like restricted regression and mixed regression can be used for this purpose. The relevance and correctness of information play an important role in such analysis, but it is challenging to ensure it in practice. For example, the estimates derived in the U.K. may not be valid in India, 4, Employ generalized inverse If rank (X'X)Var(Z,)>...>Var(Z,). The linear combination having the largest variance is the first principal component. The linear combination having the second largest variance is the second-largest principal component and so on. These principal components have the property that }’Var(Z,)=)"Var(X,). Also, the X,,X,....X, are correlated but Z.Zzyn%, ate orthogonal or uncorrelated. So there will be zero multicollinearity among Z,,Z3..-2, The problem of multicollinearity arises because X,,X,,..,X, are not independent. Since the principal components based on X,,Xy,...X, are mutually independent, so they can be used as explanatory variables, and such regression will combat the multicollinearity. ‘Regression Analysis | Chapter 9 | Multicollinearity | Shalabh, IIT Kanpur 17 Let Ay, Apy.nyA, be the eigenvalues of XX, A=diag(A,,Ayy.4,) is kxk diagonal matrix, V isa kxk orthogonal matrix whose columns are the eigenvectors associated with 4,,4,.. Consider the canonical form of the linear model y=XBte =XW' Bre Za+e where Z=XV,a=V'B,V'X'XV =Z'Z= Columns of aZ,) define a new set of explanatory variables which are called as principal components. The OLSE of a is a-(Z'2y'Zy =A"z'y and its covariance matrix is V(a)=07(Z'Z =otA“ \ =ordiag| +... AoA AS Note that 4, is the variance of j* principal component and Z'Z =)" )\Z,Z, =A. A small eigenvalue of X'X means that the linear relationship between the original explanatory variable exists and the variance of the corresponding orthogonal regression coefficient is large, which indicates that the multicollinearity exists. If one or more Z, is small, then it indicates that multicollinearity is present. Retainment of principal components: The new set of variables, ic., principal components are orthogonal, and they retain the same magnitude of variance as of the original set, If multicollinearity is severe, then there will be at least one small value of eigenvalue. The elimination of one or more principal components associated with the smallest eigenvalues will reduce the total variance in the model. Moreover, the principal components responsible for creating multicollinearity will be removed, and the resulting model will be appreciably improved. ‘Regression Analysis | Chapter 9 | Multicollinearity | Shalabh, IIT Kanpur 18 The principal component matrix Z=[Z,,Zs..Z,] with Z,,Z,,...Z, contains precisely the same information as the original data in X in the sense that the total variability in X and Z is the same. ‘The difference between them is that the original data are arranged into a set of new variables which are uncorrelated with each other and can be ranked with respect to the magnitude of their eigenvalues. The column vector Z, corresponding to the largest 4, accounts for the largest proportion of the variation in the original data. Thus the Z,s are indexed so that 4, > 4, >...> 4, >0 and 2, is the variance of Z, A strategy of elimination of principal components is to begin by discarding the component associated with the smallest eigenvalue. The idea behind to do so is that the principal component with the smallest eigenvalue is contributing least variance and so is least informative, Using this procedure, principal components are eliminated until the remaining components explain some preselected variance is terms of percentage of the total variance. For example, if 90% of the total variance is needed, and suppose r principal components are eliminated which means that (k—r) principal components contribute 90% variation, then r is selected to satisly Various strategies to choose the required number of principal components are also available in the literature. Suppose after using such a rule, the r principal components are eliminated. Now only (kr) components will be used for regression. So Z matrix is partitioned as Z=(Z, Z,,)=XV, Vir) where Z, submatrix is of order xr and contains the principal components to be eliminated. The is of order _nx(k—r) and contains the principal components to be retained, The reduced model obtained after the elimination of r principal components can be expressed as vaL a, +6 ‘Regression Analysis | Chapter 9 | Multicollinearity | Shalabh, IIT Kanpur The random error component is represented as ¢* just to distinguish with ¢. ‘The reduced coefficients contain the coefficients associated with retained Z,'s. So por (Zp Zar bar) 4, = (Mp Gyy-0,) Vg = UiarMir)- Using OLS on the model with retained principal components, the OLSE of @,_, is 4, = (LLL 03 ‘Now it is transformed back to original explanatory variables as follows: a=V'p a, = VB > Bee =Viob. is the principal component regression estimator of £. This method improves the efficiency as well as multicollinearity. 6. Ridge regression The OLSE is the best linear unbiased estimator of regression coefficient in the sense that it has minimum. variance in the class of linear and unbiased estimators. However, if the condition of unbiased can be relaxed, then it is possible to find a biased estimator of regression coefficient say # that has smaller variance them the unbiased OLSE 5. The mean squared error (MSE) of is MSE(B) = E(B BY [{-E A} {2-8} =Var(p)+[ EB) 8} =Vartp)+[Bias(A)} . Thus MSE(f) can be made smaller than Var(f) by introducing small bias is #. One of the approach to do so is ridge regression, ‘The ridge regression estimator is obtained by solving the normal equations of least squares estimation. The normal equations are modified as (XX 451) Big =X'Y > Boag =(X'X +81)" Xy is the ridge regression estimator of and 520 is any characterizing scalar termed as biasing parameter. ‘Regression Analysis | Chapter 9 | Multicollinearity | Shalabh, IIT Kanpur 20 As 5-0,8->b(OLSE) and as 5 > 0, 8 > 0, So larger the value of 4’, larger shrinkage towards zero. Note that the OLSE in inappropriate to use in the sense that it has very high variance when multicollinearity is present in the data, On the other hand, a very small value of may tend to accept the null hypothesis H,,:8=0. indicating that the corresponding variables are not relevant. The value of the biasing parameter controls the amount of shrinkage in the estimates Bias of ridge regression estimator: The bias of f.,,. is Bias Be) = E Boag) B =(X'X +51) X'E(y)-B =[('x +6 XX-1)8 =(X'X +6 '[X'X-X'X-S1]B =-5(X'X +61) B. Thus the ridge regression estimator is a biased estimator of Covariance matrix: The covariance matrix of f,,,, is defined as ~ EB te} {Bre ~E Bre} |> V Base Since Bigg ~E (Bigg) =(X'X + IY X'y— (IX + IX XB =(X'X +61) X'y- XP) =(X'X+61I'X'e, so V Boag) = (XX +81) XV (E)X(X'X +51)" =o°(X'X 451) 1X'X(X'X +61) ‘Regression Analysis | Chapter 9 | Multicollinearity | Shalabh, IIT Kanpur 21 Mean squared error: The mean squared error of ., is MSE Bay) =VAr(Beane) [Bids Bru) = tr[V Pra) | [Bis Bag) = tr (XIX SIE XX(K'X + 61" + B(X'X + DB 4 =o Lah eer x rons where 4,,2,,...4, are the eigenvalues of XX. Thus as 5 increases, the bias in f,,. increases but its variance decreases, Thus the trade-off between bias and variance hinges upon the value of 6 . It can be shown that there exists a value of 6 such that MSE Bie provided" is bounded. ) ” fi =C, where C is some constant. So minimize 6(B)=(v- XB) (y- Xf) + 5(B'B-C) where & is the Lagrangian multiplier. Differentiating S() with respect to f, the normal equations are obtained as SD) 95 axy42x'Xp +298=0 op > Bragg = (XIX +81 X'Y. ‘Note that if C is very small, it may indicate that most of the regression coefficients are close to zero and if C is large, then it may indicate that the regression coefficients are away from zero, So C puts a sort of penalty on the regression coefficients to enable its estimation. ‘Regression Analysis | Chapter 9 | Multicollinearity | Shalabh, IIT Kanpur 25

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy