Chapter 5

Download as pdf or txt
Download as pdf or txt
You are on page 1of 16

Chapter 5

Multiple correlation and multiple regression

The previous chapter considered how to determine the relationship between two variables and how to predict one from the other. The general solution was to consider the ratio of the covariance between two variables to the variance of the predictor variable (regression ) or the ratio of the covariance to the square root of the product the variances (correlation ). This solution may be generalized to the problem of how to predict a single variable from the weighted linear sum of multiple variables (multiple regression ) or to measure the strength of this relationship (multiple correlation ). As part of the problem of nding the weights, the concepts of partial covariance and partial correlation will be introduced. To do all of this will require nding the variance of a composite score, and the covariance of this composite with another score, which might itself be a composite. Much of psychometric theory is merely an extension, an elaboration, or a generalization of these concepts. Almost all tests are composites of items or subtests. An understanding how to decompose test variance into its component parts, and conversely, an understanding how to analyze tests as composites of items, allows us to analyze the meaning of tests. But tests are not merely composites of items. Tests relate to other tests. A deep appreciation of the basic Pearson correlation coecient facilitates an understanding of its generalization to multiple and partial correlation, to factor analysis, and to questions of validity.

5.1 The variance of composites


If x1 and x2 are vectors of N observations centered around their mean (that is, deviation 2 /(N 1) and V = x2 /(N 1), or, in matrix terms scores) their variances are Vx1 = xi i2 x2 1 x /(N 1). The variance of the composite made up of the sum Vx1 = x x / ( N 1 ) and V = x x2 1 1 2 2 of the corresponding scores, x + y is just V(x1+x2) =
2 + y2 + 2 x y (x + y) (x + y) (xi + yi )2 xi i i i = = . N 1 N 1 N 1

(5.1)

Generalizing 5.1 to the case of n xs, the composite matrix of these is just N Xn with dimensions of N rows and n columns. The matrix of variances and covariances of the individual items of this composite is written as S as it is a sample estimate of the population variance-covariance matrix, . It is perhaps helpful to view S in terms of its elements, n of which are variances

127

128

5 Multiple correlation and multiple regression

and n2 n = n (n 1) are covariances:

The diagonal of S = diag(S) is just the vector of individual variances. The trace of S is the sum of the diagonals and will be used a great deal when considering how to estimate reliability. It is convenient to represent the sum of all of the elements in the matrix, S, as the variance of the composite matrix. VX = 1 (X X)1 X X = . N 1 N 1

vx1 cx1x2 cx1xn cx1x2 vx2 cx2xn S= . . . . . . . . . cx1xn cx2xn vxn

5.2 Multiple regression


in terms of x may be generalized to the The problem of the optimal linear prediction of y in terms of a composite variable X where X is made up of problem of linearly predicting y individual variables x1 , x2 , ..., xn . Just as by .x = covxy /varx is the optimal slope for predicting y, so it is possible to nd a set of weights ( weights in the standardized case, b weights in the unstandardized case) for each of the individual xi s. Consider rst the problem of two predictors, x1 and x2 , we want to nd the nd weights, bi , that when multiplied by x1 and x2 maximize the covariances with y. That is, we want to solve the two simultaneous equations vx1 b1 + cx1x2 b2 = cx1y . cx1x2 b1 + vx2 b2 = cx2y or, in the standardized case, nd the i : 1 + rx1x2 2 = rx1y . rx1x2 1 + 2 = rx2y

(5.2)

We can directly solve these two equations by adding and subtracting terms to the two such that we end up with a solution to the rst in terms of 1 and to the second in terms of 2 : 1 = rx1y rx1x2 2 (5.3) 2 = rx2y rx1x2 1 Substituting the second row of (5.3) into the rst row, and vice versa we nd 1 = rx1y rx1x2 (rx2y rx1x2 1 ) 2 = rx2y rx1x2 (rx1y rx1x2 2 ) Collecting terms and rearranging :

5.2 Multiple regression

129

leads to

2 =r 1 rx x1y rx1x2 rx2y 1x2 1 2 =r 2 rx x2y rx1x2 rx1y 1x2 2

(5.4)

Alternatively, these two equations (5.2) may be represented as the product of a vector of unknowns (the s) and a matrix of coecients of the predictors (the rxi s) and a matrix of coecients for the criterion (rxi y): r r (1 2 ) x1x1 x1x2 = (rx1y rx2x2 ) (5.5) rx1x2 rx2x2 If we let = (1 2 ), R = rx1x1 rx1x2 rx1x2 rx2x2 and rxy = (rx1y rx2x2 ) then equation 5.5 becomes (5.6)

2 ) 1 = (rx1y rx1x2 rx2y )/(1 rx 1x2 2 ) 2 = (rx2y rx1x2 rx1y )/(1 rx 1x2

R = rxy and we can solve Equation 5.6 for by multiplying both sides by the inverse of R. = RR1 = rxy R1

(5.7)

Similarly, if cxy represents the covariances of the xi with y, then the b weights may be found by b = cxy S1 and thus, the predicted scores are = X = rxy R1 X. y (5.8)

The i are the direct eects of the xi on y. The total eects of xi on y are the correlations, the indirect eects reect the product of the correlations between the predictor variables and the direct eects of each predictor variable. Estimation of the b or vectors, with many diagnostic statistics of the quality of the regression, may be found using the lm function. When using categorical predictors, the linear model is also known as analysis of variance which may be done using the anova and aov functions. When the outcome variables are dichotomous, logistic regression using the generalized linear model function glm and a binomial error function. A complete discussion of the power of the generalized linear model is beyond any introductory text, and the interested reader is referred to e.g., Cohen et al. (2003); Dalgaard (2002); Fox (2008); Judd and McClelland (1989); Venables and Ripley (2002). Diagnostic tests of the regressions, including plots of the residuals versus estimated values, tests of the normality of the residuals, identication of highly weighted subjects are available as part of the graphics associated with the lm function.

130

5 Multiple correlation and multiple regression

5.2.1 Direct and indirect eects, suppression and other surprises


If the predictor set xi , x j are uncorrelated, then each separate variable makes a unique contribution to the dependent variable, y, and R2 ,the amount of variance accounted for in y, is the sum of the individual r2 . In that case, even though each predictor accounted for only 10% of the variance of y, with just 10 predictors, there would be no unexplained variance. Unfortunately, most predictors are correlated, and the s found in 5.5 or 5.7 are less than the original correlations and since R2 = i rxi y = rxy the R2 will not increase as much as it would if the predictors were less or not correlated. An interesting case that occurs infrequently, but is important to consider, is the case of suppression . A suppressor may not correlate with the criterion variable, but, because it does correlate with the other predictor variables, removes variance from those other predictor variables (Nickerson, 2008; Paulhus et al., 2004). This has the eect of reducing the denominator in equation 5.5 and thus increasing the betai for the other variables. Consider the case of two predictors of stock broker success: self reported need for achievement and self reported anxiety (Table 5.1). Although Need Achievement has a modest correlation with success, and Anxiety has none at all, adding Anxiety into the multiple regression increases the R2 from .09 to .12. An explanation for this particular eect might be that people wo want to be stock brokers are more likely to say that they have high Need Achievement. Some of this variance is probably legitimate, but some might be due to a tendency to fake positive aspects. Low anxious scores could reect a tendency to fake positive by denying negative aspects. But those who are willing to report being anxious probably are anxious, and are telling the truth. Thus, adding anxiety into the regression removes some misrepresentation from the Need Achievement scores, and increases the multiple R1

5.2.2 Interactions and product terms: the need to center the data
In psychometric applications, the main use of regression is in predicting a single criterion variable in terms of the linear sums of a predictor set. Sometimes, however, a more appropriate model is to consider that some of the variables have multiplicative eects (i.e., interact) such the eect of x on y depends upon a third variable z. This can be examined by using the product terms of x and z. But to do so and to avoid problems of interpretation, it is rst necessary to zero center the predictors so that the product terms are not correlated with the additive terms. The default values of the scale function will center as well as standardize the scores. To just center a variable, x, use scale(x,scale=FALSE). This will preserve the units of x. scale returns a matrix but the lm function requires a data.frame as input. Thus, it is necessary to convert the output of scale back into a data.frame. A detailed discussion of how to analyze and then plot data showing interactions between experimental variables and subject variables (e.g., manipulated positive aect and extraversion) or interactions of subject variables with each other (e.g., neuroticism and extraversion)
1

Atlhough the correlation values are enhanced to show the eect, this particular example was observed in a high stakes employment testing situation.

5.2 Multiple regression

131

Table 5.1 An example of suppression is found when predicting stockbroker success from self report measures of need for achievement and anxiety. By having a suppressor variable, anxiety, the multiple R goes from .3 to .35. > stock > mat.regress(stock,c(1,2),3) Nach Anxiety Success achievement 1.0 -0.5 0.3 Anxiety -0.5 1.0 0.0 Success 0.3 0.0 1.0 $beta Nach Anxiety 0.4 0.2 $R Success 0.35 $R2 Success 0.12

Independent Predictors
rx1y X1 1 x y Y x2y X2 rx2y rx1x2

Correlated Predictors
rx1y X1 1 x y Y x2y X2 rx2y

Suppression
rx1y X1 1 x y rx1x2 x2y Y z X2 zx2 zx1

Missing variable
rx1y X1

zy

X2 rx2y

Fig. 5.1 There are least four basic regression cases: The independent predictor where the i are the same as the correlations; the normal, correlated predictor case, where the i are found as in 5.7; the case of suppression, where although a variable does not correlate with the criterion, because it does correlate with a predictor, it will have useful i weight; and the case where the model is misspecied and in fact a missing variable accounts for the correlations.

132

5 Multiple correlation and multiple regression

is beyond the scope of this text and is considered in great detail by Aiken and West (1991) and Cohen et al. (2003), and in less detail in an online appendix to a chapter on experimental approaches to personality Revelle (2007), http://personality-project.org/r/ simulating-personality.html. In that appendix, simulated data are created to show additive and interactive eects. An example analysis examines the eect of Extraversion and a movie induced mood on positive aect. The regression is done using the lm function on the centered data (Table 5.2). The graphic display shows two regression lines, one for the simulated positive mood induction, the other for a neutral induction.
Table 5.2 Linear model analysis of simulated data showing an interaction between the personality dimension of extraversion and a movie based mood induction. Adapted from Revelle (2007). > # a great deal of code to simulate the data > mod1 <- lm(PosAffect ~ extraversion*reward,data = centered.affect.data) #look for interactions > print(summary(mod1,digits=2) Call: lm(formula = PosAffect ~ extraversion * reward, data = centered.affect.data) Residuals: Min 1Q Median -2.062 -0.464 0.083 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -0.8401 0.0957 -8.8 6e-14 *** extraversion -0.0053 0.0935 -0.1 0.95 reward1 1.6894 0.1354 12.5 <2e-16 *** extraversion:reward1 0.2529 0.1271 2.0 0.05 * --Signif. codes: 0 ^ O***~ O 0.001 ^ O**~ O 0.01 ^ O*~ O 0.05 ^ O.~ O 0.1 ^ O ~ O 1 Residual standard error: 0.68 on 96 degrees of freedom Multiple R-squared: 0.63, Adjusted R-squared: 0.62 F-statistic: 54 on 3 and 96 DF, p-value: <2e-16

3Q 0.445

Max 2.044

5.2.3 Condence intervals of the regression and regression weights


The multiple correlation nds weights to best t the particular sample. Unfortunately, it is biased estimate of the population values. Consequently, the value of R2 is likely to shrink when applied to another sample. Standard estimates for the amount of shrinkage consider the size of the sample as well as the number of variables in the model. For N subjects and k 2 , is predictors, estimated R2 , R 2 = 1 (1 R2 ) N 1 . R N k1

5.2 Multiple regression

133

Simulated interaction

Positive

Positive Affect

-2

-1

Neutral 1 2 3 Extraversion 4 5

Fig. 5.2 The (simulated) eect of extraversion and movie induced mood on positive aect. Adapted from Revelle (2007). Detailed code for plotting interaction graphs is available in the appendix.

The condence interval of R2 is, of course, a function of the variance of R2 which is (taken from Cohen et al. (2003) and Olkin and Finn (1995))
2 SER 2 =

4R2 (1 R2 )(N k = 1)2 . (N 2 1)(N + 3)

Because multiple R is partitioning the observed variance into modeled and residual variance, testing the hypothesis that the multiple R is zero may be done by analysis of variance and leads to an F ratio with k and (N-k-1) degrees of freedom: F= R2 (n k 1) . (1 R2 )k

The standard errors of the beta weights is SEi =

1 R2 (N k 1)(1 R2 i)

where R2 i is the multiple correlation of xi on all the other x j variables. (This term, the squared multiple correlation , is used in estimating communalities in factor analysis, see 6.2.1. It may be found by the smc function.).

134

5 Multiple correlation and multiple regression

5.2.4 Multiple regression from the covariance/correlation matrix


Using the raw data allows for error diagnostics and for the inclusion of interaction terms. But since Equation 5.7 is expressed in terms of the correlation matrix, the regression weights can be found from the correlation matrix. This is particularly useful if one does not have access to the raw data (e.g., when reanalyzing a published study), or if the correlation matrix is synthetically constructed. The function mat.regress allows one to extract subsets of variables (predictors and criteria) from a matrix of correlations and nd the multiple correlations and beta weights of the x set predicting each member of the y set.

5.2.5 The robust beauty of linear models


Although the weights 5.7 are the optimal weights, it has been known since Wilks (1938) that dierences from optimal do not change the result very much. This has come to be called the robust beauty of linear models Dawes and Corrigan (1974); Dawes (1979) and follows the principal of it dont make no nevermind Wainer (1976). That is, for standardized variables predicting a criterion with .25 < < .75, setting all betai = .5 will reduce the accuracy of prediction by no more than 1/96th. Thus the advice to standardize and add. (Clearly this advice does not work for strong negative correlations, but in that case standardize and subtract. In the general case weights of -1, 0, or 1 are the robust alternative.) A graphic demonstration of how a very small reduction in the R2 value can lead to an innite set of fungible weights that are all equally good in predicting the criterion is the paper by Waller (2008) with associated R code. This paper reiterates the skepticism that one should have for the interpretability of any particular pattern of weights. A much fuller treatment of the problem of interpreting dierences in beta weights is found in the recent chapter by Azen and Budescu (2009).

5.3 Partial and semi-partial correlation


Given three or more variables, an interesting question to ask is what is the relationship between xi and y when the eect of x j has been removed? In an experiment it is possible to answer this by forcing xi and x j to be independent by design. Then it is possible to decompose the variance of y in terms of eects of xi and x j and possibly their interaction. However, in the correlational case, it is likely that xi and x j are correlated. A solution is to consider linear regression to predict xi and y from x j and to correlate the residuals. That is, we know from linear regression that it is possible to predict xi and y from x j . Then the correlation of the i and y. j = y y j is a measure of the strength of the relationship between residuals xi. = xi x xi and y when the eect of x j has been removed. This is known as the partial correlation , for it has partialed out the eects on both the xi and y of the other variables. In the process of nding the appropriate weights in the multiple regression, the eect of each variable xi on the criterion y was found with the eect of the other x j ( j = i) variables removed. This was done explicitly in Equation 5.4 and implicitly in 5.7. The numerator in 5.4 is a covariance with the eect of the second variable removed and the denominator

5.3 Partial and semi-partial correlation

135

is a variance with the second variable removed. Just as in simple regression where is a covariance divided by a variance and a correlation is a covariance divided by the square root of the product of two variances, so is the case in multiple correlation where the i is a partial covariance divided by a partial variance and and a partial correlation is a partial covariance divided by the square root of the product of two partial variances. The partial correlation between xi and y with the eect of x j removed is r(xi .x j )(y.x j ) =
2 )(1 r 2 ) (1 rx yx j ix j

rxi y rxi x j rx j y

(5.9)

Compare this to 5.4 which is the formula for the weight. Given a data matrix, X and a matrix of covariates, Z, with correlations Rxz with X, and correlations Rz with each other, the residuals, X* will be
1 X = X Rxz R z Z

To nd the matrix of partial correlations, R* where the eect of a number of the Z variables been removed, just express equation 5.9 in matrix form. First nd the residual covariances, C* and then divide these by the square roots of the residual variances (the diagonal elements of C*). 1 C = (R Rxz R z ) 1 1 R = ( diag(C ) C diag(C ) (5.10)

Consider the correlation matrix of ve variables seen in Table 5.3. The partial correlations of the rst three with the eect of the last two removed is found using the partial.r function.
Table 5.3 Using partial.r to nd a matrix of partial correlations > R.mat V1 V2 V3 V4 V5 V1 1.00 0.56 0.48 0.40 0.32 V2 0.56 1.00 0.42 0.35 0.28 V3 0.48 0.42 1.00 0.30 0.24 V4 0.40 0.35 0.30 1.00 0.20 V5 0.32 0.28 0.24 0.20 1.00 the columns for the X and Z variables

> partial.r(R.mat,c(1:3),c(4:5)) #specify the matrix for input, and V1 V2 V3 V1 1.00 0.46 0.38 V2 0.46 1.00 0.32 V3 0.38 0.32 1.00

The semi-partial correlation , also known as the part-correlation is the correlation between xi and y removing the eect of the other x j from the predictor, xi , but not from the criterion, y. It is just rxi y rxi x j rx j y (5.11) r(xi .x j )(y) = 2 ) (1 rx ix j

136

5 Multiple correlation and multiple regression

express form

in

matrix

5.3.1 Alternative interpretations of the partial correlation


Partial correlations are used when arguing that the eect of xi on y either does or does remain when other variables, x j are statistically controlled. That is, in Table 5.3, the correlation between V1 and V2 is very high, even when the eects of V4 and V5 are removed. But this interpretation requires that each variable is measured without error. An alternative model that corrects for error of measurement (unreliability) would show that when the error free parts of V4 and V5 are used as covariates, the partial correlation between V1 and V2 becomes 0.. This issue will be discussed in much more detail when considering models of reliability as well as factor analysis and structural equation modeling .

5.4 Alternative regression techniques


That the linear model can be used with categorical predictors has already been discussed. Generalizations of the linear model to outcomes than are not normally distributed fall under the class of the generalized linear model and can found using the glm function. One of the most common extensions is to the case of dichotomous outcomes (pass or fail, survive or die) which may be predicted using logistic regression . Another generalization is to non-normally distributed count data or rate data where either Poisson regression or negative binomial regression are used. These models are solved by iterative maximum likelihood procedures rather than ordinary least squares as used in the linear model. The need for these generalizations is that the normal theory of the linear model is inappropriate for such dependent variables. (e.g., what is the meaning of a predicted probability higher than 1 or less than 0?) The various generalizations of the linear model transform the dependent variable in some way so as to make linear changes in the predictors lead to linear changes in the transformed dependent variable. For more complete discussions of when to apply the linear model versus generalizations of these models, consult Cohen et al. (2003) or Gardner et al. (1995).

5.4.1 Logistic regression


Consider, for example, the case of a binary outcome variable. Because the observed values can only be 0 or 1, it is necessary to predict the probability of the score rather than the score itself. But even so, probabilities are bounded (0,1) so regression estimates less than 0 or greater than 1 are meaningless. A solution is to analyze not the data themselves, but rather a monotonic transformation of the probabilities, the logistic function: p(Y |X ) = 1 1 + e(0 + x) .

5.4 Alternative regression techniques

137

Using deviation scores, if the likelihood, p(y), of observing some binary outcome, y, is a continuous function of a predictor set, X, where each column of X, xi , is related to the outcome probability with a logistic function where 0 is the predicted intercept and i is the eect of xi 1 p(y|x1 . . . xi . . . xn ) = 1 + e(0 +1 x1 +...i xi +...n xn ) and therefore, the likelihood of not observing y, p(y ), given the same predictor set is p(y |x1 . . . xi . . . xn ) = 1 1 1 + e(0 +1 x1 +...i xi +...n xn ) = e(0 +1 x1 +...i xi +...n xn ) 1 + e(0 +1 x1 +...i xi +...n xn )

then the odds ratio of observing y to not observing y is p(y|x1 . . . xi . . . xn ) 1 = ( + x +... x +... x ) = e(0 +1 x1 +...i xi +...n xn ) . n n i i 0 1 1 p(y |x1 . . . xi . . . xn ) e Thus, the logarithm of the odds ratio (the log odds ) is a linear function of the xi : ln(odds) = 0 + 1 x1 + . . . i xi + . . . n xn = 0 + X (5.12)

Consider the probability of being a college graduate given the predictors of age and several measures of ability. The data set sat.act has a measure of education (0 = not yet nished high school, ..., 5 have a graduate degree). Converting this to a dichotomous score (education >3) to identify those who have nished college or not, and then predicting this variable by a logistic regression using the glm function shows that age is positively related to the probability of being a college graduate (not an overly surprising result) as is a higher ACT (American College Testing program) score. The results are expressed as changes in the logarithm of the odds for unit changes in the predictors. Expressing these as odds ratios may be done by taking the anti-log (i.e., the exponential) of the parameters. The condence intervals of the parameters or of the Odds Ratios may be found by using the confinit function (Table 5.4).

5.4.2 Poisson regression, quasi-Poisson regression, and negative-binomial regression


If the underlying process is thought to be binary with a low probability of one of the two alternatives (e.g., scoring a goal in a football tournament, speaking versus not speaking in a classroom, becoming sick or not, missing school for a day, dying from being kicked by a horse, a ying bomb hit in a particular area, a phone trunk line being in use, etc.) sampled over a number of trials and the measure is the discrete counts (e.g., 0, 1, ... n= number of responses) of the less likely alternative, one appropriate distributional model is the Poisson . The Poisson is the limiting case of a binomial over N trials with probability p for small p. For a random variable, Y, the probability that it takes on a particular value, y, is p(Y = y) = e y y!

where both the expectation (mean) and variance of Y are

138 Table 5.4 parameters odds ratios intervals of > > > > >

5 Multiple correlation and multiple regression An example of logistic regression using the glm function. The resulting coecients are the of the logistic model expressed in the logarithm of the odds. They may be converted to by taking the exponential of the parameters. The same may be done with the condence the parameters and of the odds ratios.

data(sat.act) college <- (sat.act$education > 3) +0 #convert to a binary variable College <- data.frame(college,sat.act) logistic.model <- glm(college~age+ACT,family=binomial,data=College) summary(logistic.model)

Call: glm(formula = college ~ age + ACT, family = binomial, data = College) Deviance Residuals: Min 1Q Median 3Q Max -3.8501 -0.6105 -0.4584 0.5568 1.7715 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -7.78855 0.79969 -9.739 <2e-16 *** age 0.23234 0.01912 12.149 <2e-16 *** ACT 0.05590 0.02197 2.544 0.0109 * --Signif. codes: 0 ^ O***~ O 0.001 ^ O**~ O 0.01 ^ O*~ O 0.05 ^ O.~ O 0.1 ^ O ~ O 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 941.40 on 699 degrees of freedom Residual deviance: 615.32 on 697 degrees of freedom AIC: 621.32 Number of Fisher Scoring iterations: 5 > round(exp(coef(logistic.model)),2) > round(exp(confint(logistic.model)),digits=3) (Intercept) 0.00 age 1.26 2.5 % 97.5 % (Intercept) 0.000 0.002 age 1.217 1.312 ACT 1.014 1.105 ACT 1.06

E (Y ) = var(Y ) = and y factorial is y! = y (y 1) (y 2)... 2 1.

(5.13)

The sum of independent Poisson variables is itself distributed as a Poisson variable, so it is possible to aggregate data across an independent grouping variable. Poisson regression models the mean for Y by modeling as an exponential function of the predictor set (xi ) E (Y ) = = e +1 x1 ++ p x p and the log of the mean will thus be a linear function of the predictors. Several example data sets are available in R to demonstrate the advantages of Poisson regression over simple linear regression. epil in MASS reports the number of epileptic seizures

5.4 Alternative regression techniques

139

before and after administration of an anti-seizure medication or a placebo as a function of age and other covariates, quine (also in MASS reports the rate of absenteeism in a small town in Australia as a function of culture, age, sex, and learning ability. In the Poisson model, the mean has the same value as the variance (Equation 5.13). However, overdispersed data have larger variances than expected from the Poisson model. Examples of such data include the number of violent episodes of psychatric patients (Gardner et al., 1995) or the number of seals on a beach pull out (Ver Hoef and Boveng, 2007). Such data should be modeled using the negative binomial or an overdispered Poisson model Gardner et al. (1995). The over-dispersed Poisson model adds an additional parameter, , to the Poisson variance model: var(Y ) = . These generalizations of the linear model make use of the glm function with a link function of the appropriate error family for the particular model. Thus, logistic regression uses the binomial family (Table 5.4) and poisson regression uses a logarthmic family (Table 5.5). Negative binomial regression may be done using the glm.nb function from the MASS package.

5.4.3 Using multiple regression for circular data


Some variables show a cyclical pattern over periods of hours, days, months or years. In psychology perhaps the best example is the diurnal rhythm of mood and energy. Energetic arousal is typically low in the early morning, rises to a peak sometime between 12:00 and 16:00, and then decreases during the evening. Such rhythms can be described in terms of their period and their phase . If the period of a rhythm is about 24 hours, it is said to be circadian . The acrophase of a variable is that time of day when the variable reaches its maximum. A great deal of research has shown that people dier in the time of day at which they achieve their acrophase for variables ranging from cognitive performance (Revelle, 1993; Revelle et al., 1980) to positive aect (Rafaeli et al., 2007; Thayer et al., 1988) (however, for some measures, such as body temperature, the minimum is more precise measure of phase than is the maximum (Baehr et al., 2000)). If we know the acrophase, we can use circular statistics to nd the mean and correlation of these variables with other circular variables (3.4.1). The acrophase itself can be estimated using linear regression, not of the raw data predicted by time of day, but rather by multiple regression using the sine and cosine of time of day (expressed in radians). Consider four dierent emotion variables, Energetic Arousal , Positive Aect , Tense Arousal and Negative Aect . Assume that all four of these variable show a diurnal rhythm, but dier in their phase (Figure 5.3). Consider the example data set created in Table 5.6. Four curves are created (top panel of Figure 5.3) with dierent phases, but then have error added to them (lower panel of Figure 5.3). The cosinor function estimates the phase angle by tting each variable with multiple regression where the predictors are cos(time 2 /24) and sin(time 2 /24). The resulting weights are then transformed into phase angles (in radians ) by sin cos = tan1 ( )= 2 . 2 cos cos + sin

140

5 Multiple correlation and multiple regression

Table 5.5 Using the general linear model glm to do Poisson regression for the eect of an anti-seizure drug on epilepsy attacks. The data are from the epil data set in MASS. Compare this analysis with a simple linear model or with a linear model of the log transformed data. Note that the eect of the drug in the linear model is not statistically dierent from zero, but is in the Poisson regression. > data(epil) > summary(glm(y~trt+base,data=epil,family=poisson)) Call: glm(formula = y ~ trt + base, family = poisson, data = epil) Deviance Residuals: Min 1Q Median -4.6157 -1.5080 -0.4681 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 1.278079 0.040709 31.396 < 2e-16 *** trtprogabide -0.223093 0.046309 -4.817 1.45e-06 *** base 0.021754 0.000482 45.130 < 2e-16 *** --Signif. codes: 0 ^ O***~ O 0.001 ^ O**~ O 0.01 ^ O*~ O 0.05 ^ O.~ O 0.1 ^ O ~ O 1 (Dispersion parameter for poisson family taken to be 1) Null deviance: 2517.83 Residual deviance: 987.27 AIC: 1759.2 on 235 on 233 degrees of freedom degrees of freedom

3Q 0.4374

Max 12.4054

Number of Fisher Scoring iterations: 5 > summary(lm(y~trt+base,data=epil)) lm(formula = y ~ trt + base, data = epil) Residuals: Min 1Q -19.40019 -3.29228 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -2.27396 0.96814 -2.349 0.0197 * trtprogabide -0.91233 1.04514 -0.873 0.3836 base 0.35258 0.01958 18.003 <2e-16 *** --Signif. codes: 0 ^ O***~ O 0.001 ^ O**~ O 0.01 ^ O*~ O 0.05 ^ O.~ O 0.1 ^ O ~ O 1 Residual standard error: 8.017 on 233 degrees of freedom Multiple R-squared: 0.582, Adjusted R-squared: 0.5784 F-statistic: 162.2 on 2 and 233 DF, p-value: < 2.2e-16

Median 0.02348

3Q 2.11521

Max 58.88226

5.4 Alternative regression techniques

141
24 2 .

This result may be transformed back to hours by phase = circular statistics are the circular and CircStats packages.

Other packages that use

Table 5.6 Many emotional variables show a diurnal rhythm. Here four variables are simulated in their pure form, and then contaminated by noise. Phase is estimated by the cosinor function. > > > > > > > > > > > > set.seed(42) nt = 4 time <- seq(1:24) pure <- matrix(time,24,nt) pure <- cos((pure + col(pure)*nt)*pi/12) diurnal <- data.frame(time,pure) noisy <- pure + rnorm(24*nt)/2 circadian <- data.frame(time,noisy) colnames(circadian) <- colnames(diurnal) <- c("Time", "NegA","TA","PA","EA") p <- cosinor(diurnal) n <- cosinor(circadian) round(data.frame(phase=p[,1],estimate=n[,1],fit=n[,2]),2)

> matplot(pure,type="l",xlab = "Time of Day",ylab="Intensity", main="Hypothetical emotional curves",xaxp=c(0,24,8)) > matplot(noisy,type="l",xlab = "Time of Day",ylab="Arousal", main="Noisy arousal curves",xaxp=c(0,24,8)) NegA TA PA EA phase estimate fit 20 20.59 0.61 16 16.38 0.76 12 12.38 0.81 8 8.26 0.84

5.4.4 Robust regression using M estimators


Robust techniques estimate relationships trying to correct for unusual data (outliers). A number of packages include functions that apply robust techniques to estimate correlations, covariances, and linear regressions. The MASS package, robust, robustbase all include robust estimation procedures. Consider the stackloss data set in the MASS package. A pairs.panels plot of the data suggests that three cases are extreme outliers. The robust linear regression function rlm shows a somewhat dierent pattern of estimates than does ordinary regression. An interesting demonstration of the power of the human eye to estimate relationships was presented by Wainer and Thissen (1979) who show that visual displays are an important part of the data analytic enterprise. Students shown gures representing various pure cases of correlation were able to estimate the underlying correlation of contaminated data better than many of the more classic robust estimates. This is an important message: Look at your data!. Do not be misled by simple (or even complex) summary statistics. The power of the eye to detect outliers, non-linearity, and just general errors can not be underestimated.

142

5 Multiple correlation and multiple regression

Hypothetical arousal curves


0.0 0.5 1.0 -1.0

Arousal

12 Time of Day

15

18

21

24

Noisy arousal curves

Arousal

-2

-1

12 Time of Day

15

18

21

24

Fig. 5.3 Some psychological variables have a diurnal rhythm. The phase of the rhythm may be estimated using the cosinor function using multiple regression of the sine and cosine of the time of day. The top panel shows four diurnal rhythms with acrophases of 8, 12, 16, and 20. The lower panel plots the same data, but with random noise added to the signal. The corresponding phases estimated using cosinor are 8.3, 12.4, 16.4 and 20.6.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy