Chapter 5
Chapter 5
Chapter 5
The previous chapter considered how to determine the relationship between two variables and how to predict one from the other. The general solution was to consider the ratio of the covariance between two variables to the variance of the predictor variable (regression ) or the ratio of the covariance to the square root of the product the variances (correlation ). This solution may be generalized to the problem of how to predict a single variable from the weighted linear sum of multiple variables (multiple regression ) or to measure the strength of this relationship (multiple correlation ). As part of the problem of nding the weights, the concepts of partial covariance and partial correlation will be introduced. To do all of this will require nding the variance of a composite score, and the covariance of this composite with another score, which might itself be a composite. Much of psychometric theory is merely an extension, an elaboration, or a generalization of these concepts. Almost all tests are composites of items or subtests. An understanding how to decompose test variance into its component parts, and conversely, an understanding how to analyze tests as composites of items, allows us to analyze the meaning of tests. But tests are not merely composites of items. Tests relate to other tests. A deep appreciation of the basic Pearson correlation coecient facilitates an understanding of its generalization to multiple and partial correlation, to factor analysis, and to questions of validity.
(5.1)
Generalizing 5.1 to the case of n xs, the composite matrix of these is just N Xn with dimensions of N rows and n columns. The matrix of variances and covariances of the individual items of this composite is written as S as it is a sample estimate of the population variance-covariance matrix, . It is perhaps helpful to view S in terms of its elements, n of which are variances
127
128
The diagonal of S = diag(S) is just the vector of individual variances. The trace of S is the sum of the diagonals and will be used a great deal when considering how to estimate reliability. It is convenient to represent the sum of all of the elements in the matrix, S, as the variance of the composite matrix. VX = 1 (X X)1 X X = . N 1 N 1
(5.2)
We can directly solve these two equations by adding and subtracting terms to the two such that we end up with a solution to the rst in terms of 1 and to the second in terms of 2 : 1 = rx1y rx1x2 2 (5.3) 2 = rx2y rx1x2 1 Substituting the second row of (5.3) into the rst row, and vice versa we nd 1 = rx1y rx1x2 (rx2y rx1x2 1 ) 2 = rx2y rx1x2 (rx1y rx1x2 2 ) Collecting terms and rearranging :
129
leads to
(5.4)
Alternatively, these two equations (5.2) may be represented as the product of a vector of unknowns (the s) and a matrix of coecients of the predictors (the rxi s) and a matrix of coecients for the criterion (rxi y): r r (1 2 ) x1x1 x1x2 = (rx1y rx2x2 ) (5.5) rx1x2 rx2x2 If we let = (1 2 ), R = rx1x1 rx1x2 rx1x2 rx2x2 and rxy = (rx1y rx2x2 ) then equation 5.5 becomes (5.6)
2 ) 1 = (rx1y rx1x2 rx2y )/(1 rx 1x2 2 ) 2 = (rx2y rx1x2 rx1y )/(1 rx 1x2
R = rxy and we can solve Equation 5.6 for by multiplying both sides by the inverse of R. = RR1 = rxy R1
(5.7)
Similarly, if cxy represents the covariances of the xi with y, then the b weights may be found by b = cxy S1 and thus, the predicted scores are = X = rxy R1 X. y (5.8)
The i are the direct eects of the xi on y. The total eects of xi on y are the correlations, the indirect eects reect the product of the correlations between the predictor variables and the direct eects of each predictor variable. Estimation of the b or vectors, with many diagnostic statistics of the quality of the regression, may be found using the lm function. When using categorical predictors, the linear model is also known as analysis of variance which may be done using the anova and aov functions. When the outcome variables are dichotomous, logistic regression using the generalized linear model function glm and a binomial error function. A complete discussion of the power of the generalized linear model is beyond any introductory text, and the interested reader is referred to e.g., Cohen et al. (2003); Dalgaard (2002); Fox (2008); Judd and McClelland (1989); Venables and Ripley (2002). Diagnostic tests of the regressions, including plots of the residuals versus estimated values, tests of the normality of the residuals, identication of highly weighted subjects are available as part of the graphics associated with the lm function.
130
5.2.2 Interactions and product terms: the need to center the data
In psychometric applications, the main use of regression is in predicting a single criterion variable in terms of the linear sums of a predictor set. Sometimes, however, a more appropriate model is to consider that some of the variables have multiplicative eects (i.e., interact) such the eect of x on y depends upon a third variable z. This can be examined by using the product terms of x and z. But to do so and to avoid problems of interpretation, it is rst necessary to zero center the predictors so that the product terms are not correlated with the additive terms. The default values of the scale function will center as well as standardize the scores. To just center a variable, x, use scale(x,scale=FALSE). This will preserve the units of x. scale returns a matrix but the lm function requires a data.frame as input. Thus, it is necessary to convert the output of scale back into a data.frame. A detailed discussion of how to analyze and then plot data showing interactions between experimental variables and subject variables (e.g., manipulated positive aect and extraversion) or interactions of subject variables with each other (e.g., neuroticism and extraversion)
1
Atlhough the correlation values are enhanced to show the eect, this particular example was observed in a high stakes employment testing situation.
131
Table 5.1 An example of suppression is found when predicting stockbroker success from self report measures of need for achievement and anxiety. By having a suppressor variable, anxiety, the multiple R goes from .3 to .35. > stock > mat.regress(stock,c(1,2),3) Nach Anxiety Success achievement 1.0 -0.5 0.3 Anxiety -0.5 1.0 0.0 Success 0.3 0.0 1.0 $beta Nach Anxiety 0.4 0.2 $R Success 0.35 $R2 Success 0.12
Independent Predictors
rx1y X1 1 x y Y x2y X2 rx2y rx1x2
Correlated Predictors
rx1y X1 1 x y Y x2y X2 rx2y
Suppression
rx1y X1 1 x y rx1x2 x2y Y z X2 zx2 zx1
Missing variable
rx1y X1
zy
X2 rx2y
Fig. 5.1 There are least four basic regression cases: The independent predictor where the i are the same as the correlations; the normal, correlated predictor case, where the i are found as in 5.7; the case of suppression, where although a variable does not correlate with the criterion, because it does correlate with a predictor, it will have useful i weight; and the case where the model is misspecied and in fact a missing variable accounts for the correlations.
132
is beyond the scope of this text and is considered in great detail by Aiken and West (1991) and Cohen et al. (2003), and in less detail in an online appendix to a chapter on experimental approaches to personality Revelle (2007), http://personality-project.org/r/ simulating-personality.html. In that appendix, simulated data are created to show additive and interactive eects. An example analysis examines the eect of Extraversion and a movie induced mood on positive aect. The regression is done using the lm function on the centered data (Table 5.2). The graphic display shows two regression lines, one for the simulated positive mood induction, the other for a neutral induction.
Table 5.2 Linear model analysis of simulated data showing an interaction between the personality dimension of extraversion and a movie based mood induction. Adapted from Revelle (2007). > # a great deal of code to simulate the data > mod1 <- lm(PosAffect ~ extraversion*reward,data = centered.affect.data) #look for interactions > print(summary(mod1,digits=2) Call: lm(formula = PosAffect ~ extraversion * reward, data = centered.affect.data) Residuals: Min 1Q Median -2.062 -0.464 0.083 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -0.8401 0.0957 -8.8 6e-14 *** extraversion -0.0053 0.0935 -0.1 0.95 reward1 1.6894 0.1354 12.5 <2e-16 *** extraversion:reward1 0.2529 0.1271 2.0 0.05 * --Signif. codes: 0 ^ O***~ O 0.001 ^ O**~ O 0.01 ^ O*~ O 0.05 ^ O.~ O 0.1 ^ O ~ O 1 Residual standard error: 0.68 on 96 degrees of freedom Multiple R-squared: 0.63, Adjusted R-squared: 0.62 F-statistic: 54 on 3 and 96 DF, p-value: <2e-16
3Q 0.445
Max 2.044
133
Simulated interaction
Positive
Positive Affect
-2
-1
Neutral 1 2 3 Extraversion 4 5
Fig. 5.2 The (simulated) eect of extraversion and movie induced mood on positive aect. Adapted from Revelle (2007). Detailed code for plotting interaction graphs is available in the appendix.
The condence interval of R2 is, of course, a function of the variance of R2 which is (taken from Cohen et al. (2003) and Olkin and Finn (1995))
2 SER 2 =
Because multiple R is partitioning the observed variance into modeled and residual variance, testing the hypothesis that the multiple R is zero may be done by analysis of variance and leads to an F ratio with k and (N-k-1) degrees of freedom: F= R2 (n k 1) . (1 R2 )k
1 R2 (N k 1)(1 R2 i)
where R2 i is the multiple correlation of xi on all the other x j variables. (This term, the squared multiple correlation , is used in estimating communalities in factor analysis, see 6.2.1. It may be found by the smc function.).
134
135
is a variance with the second variable removed. Just as in simple regression where is a covariance divided by a variance and a correlation is a covariance divided by the square root of the product of two variances, so is the case in multiple correlation where the i is a partial covariance divided by a partial variance and and a partial correlation is a partial covariance divided by the square root of the product of two partial variances. The partial correlation between xi and y with the eect of x j removed is r(xi .x j )(y.x j ) =
2 )(1 r 2 ) (1 rx yx j ix j
rxi y rxi x j rx j y
(5.9)
Compare this to 5.4 which is the formula for the weight. Given a data matrix, X and a matrix of covariates, Z, with correlations Rxz with X, and correlations Rz with each other, the residuals, X* will be
1 X = X Rxz R z Z
To nd the matrix of partial correlations, R* where the eect of a number of the Z variables been removed, just express equation 5.9 in matrix form. First nd the residual covariances, C* and then divide these by the square roots of the residual variances (the diagonal elements of C*). 1 C = (R Rxz R z ) 1 1 R = ( diag(C ) C diag(C ) (5.10)
Consider the correlation matrix of ve variables seen in Table 5.3. The partial correlations of the rst three with the eect of the last two removed is found using the partial.r function.
Table 5.3 Using partial.r to nd a matrix of partial correlations > R.mat V1 V2 V3 V4 V5 V1 1.00 0.56 0.48 0.40 0.32 V2 0.56 1.00 0.42 0.35 0.28 V3 0.48 0.42 1.00 0.30 0.24 V4 0.40 0.35 0.30 1.00 0.20 V5 0.32 0.28 0.24 0.20 1.00 the columns for the X and Z variables
> partial.r(R.mat,c(1:3),c(4:5)) #specify the matrix for input, and V1 V2 V3 V1 1.00 0.46 0.38 V2 0.46 1.00 0.32 V3 0.38 0.32 1.00
The semi-partial correlation , also known as the part-correlation is the correlation between xi and y removing the eect of the other x j from the predictor, xi , but not from the criterion, y. It is just rxi y rxi x j rx j y (5.11) r(xi .x j )(y) = 2 ) (1 rx ix j
136
express form
in
matrix
137
Using deviation scores, if the likelihood, p(y), of observing some binary outcome, y, is a continuous function of a predictor set, X, where each column of X, xi , is related to the outcome probability with a logistic function where 0 is the predicted intercept and i is the eect of xi 1 p(y|x1 . . . xi . . . xn ) = 1 + e(0 +1 x1 +...i xi +...n xn ) and therefore, the likelihood of not observing y, p(y ), given the same predictor set is p(y |x1 . . . xi . . . xn ) = 1 1 1 + e(0 +1 x1 +...i xi +...n xn ) = e(0 +1 x1 +...i xi +...n xn ) 1 + e(0 +1 x1 +...i xi +...n xn )
then the odds ratio of observing y to not observing y is p(y|x1 . . . xi . . . xn ) 1 = ( + x +... x +... x ) = e(0 +1 x1 +...i xi +...n xn ) . n n i i 0 1 1 p(y |x1 . . . xi . . . xn ) e Thus, the logarithm of the odds ratio (the log odds ) is a linear function of the xi : ln(odds) = 0 + 1 x1 + . . . i xi + . . . n xn = 0 + X (5.12)
Consider the probability of being a college graduate given the predictors of age and several measures of ability. The data set sat.act has a measure of education (0 = not yet nished high school, ..., 5 have a graduate degree). Converting this to a dichotomous score (education >3) to identify those who have nished college or not, and then predicting this variable by a logistic regression using the glm function shows that age is positively related to the probability of being a college graduate (not an overly surprising result) as is a higher ACT (American College Testing program) score. The results are expressed as changes in the logarithm of the odds for unit changes in the predictors. Expressing these as odds ratios may be done by taking the anti-log (i.e., the exponential) of the parameters. The condence intervals of the parameters or of the Odds Ratios may be found by using the confinit function (Table 5.4).
138 Table 5.4 parameters odds ratios intervals of > > > > >
5 Multiple correlation and multiple regression An example of logistic regression using the glm function. The resulting coecients are the of the logistic model expressed in the logarithm of the odds. They may be converted to by taking the exponential of the parameters. The same may be done with the condence the parameters and of the odds ratios.
data(sat.act) college <- (sat.act$education > 3) +0 #convert to a binary variable College <- data.frame(college,sat.act) logistic.model <- glm(college~age+ACT,family=binomial,data=College) summary(logistic.model)
Call: glm(formula = college ~ age + ACT, family = binomial, data = College) Deviance Residuals: Min 1Q Median 3Q Max -3.8501 -0.6105 -0.4584 0.5568 1.7715 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -7.78855 0.79969 -9.739 <2e-16 *** age 0.23234 0.01912 12.149 <2e-16 *** ACT 0.05590 0.02197 2.544 0.0109 * --Signif. codes: 0 ^ O***~ O 0.001 ^ O**~ O 0.01 ^ O*~ O 0.05 ^ O.~ O 0.1 ^ O ~ O 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 941.40 on 699 degrees of freedom Residual deviance: 615.32 on 697 degrees of freedom AIC: 621.32 Number of Fisher Scoring iterations: 5 > round(exp(coef(logistic.model)),2) > round(exp(confint(logistic.model)),digits=3) (Intercept) 0.00 age 1.26 2.5 % 97.5 % (Intercept) 0.000 0.002 age 1.217 1.312 ACT 1.014 1.105 ACT 1.06
(5.13)
The sum of independent Poisson variables is itself distributed as a Poisson variable, so it is possible to aggregate data across an independent grouping variable. Poisson regression models the mean for Y by modeling as an exponential function of the predictor set (xi ) E (Y ) = = e +1 x1 ++ p x p and the log of the mean will thus be a linear function of the predictors. Several example data sets are available in R to demonstrate the advantages of Poisson regression over simple linear regression. epil in MASS reports the number of epileptic seizures
139
before and after administration of an anti-seizure medication or a placebo as a function of age and other covariates, quine (also in MASS reports the rate of absenteeism in a small town in Australia as a function of culture, age, sex, and learning ability. In the Poisson model, the mean has the same value as the variance (Equation 5.13). However, overdispersed data have larger variances than expected from the Poisson model. Examples of such data include the number of violent episodes of psychatric patients (Gardner et al., 1995) or the number of seals on a beach pull out (Ver Hoef and Boveng, 2007). Such data should be modeled using the negative binomial or an overdispered Poisson model Gardner et al. (1995). The over-dispersed Poisson model adds an additional parameter, , to the Poisson variance model: var(Y ) = . These generalizations of the linear model make use of the glm function with a link function of the appropriate error family for the particular model. Thus, logistic regression uses the binomial family (Table 5.4) and poisson regression uses a logarthmic family (Table 5.5). Negative binomial regression may be done using the glm.nb function from the MASS package.
140
Table 5.5 Using the general linear model glm to do Poisson regression for the eect of an anti-seizure drug on epilepsy attacks. The data are from the epil data set in MASS. Compare this analysis with a simple linear model or with a linear model of the log transformed data. Note that the eect of the drug in the linear model is not statistically dierent from zero, but is in the Poisson regression. > data(epil) > summary(glm(y~trt+base,data=epil,family=poisson)) Call: glm(formula = y ~ trt + base, family = poisson, data = epil) Deviance Residuals: Min 1Q Median -4.6157 -1.5080 -0.4681 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 1.278079 0.040709 31.396 < 2e-16 *** trtprogabide -0.223093 0.046309 -4.817 1.45e-06 *** base 0.021754 0.000482 45.130 < 2e-16 *** --Signif. codes: 0 ^ O***~ O 0.001 ^ O**~ O 0.01 ^ O*~ O 0.05 ^ O.~ O 0.1 ^ O ~ O 1 (Dispersion parameter for poisson family taken to be 1) Null deviance: 2517.83 Residual deviance: 987.27 AIC: 1759.2 on 235 on 233 degrees of freedom degrees of freedom
3Q 0.4374
Max 12.4054
Number of Fisher Scoring iterations: 5 > summary(lm(y~trt+base,data=epil)) lm(formula = y ~ trt + base, data = epil) Residuals: Min 1Q -19.40019 -3.29228 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -2.27396 0.96814 -2.349 0.0197 * trtprogabide -0.91233 1.04514 -0.873 0.3836 base 0.35258 0.01958 18.003 <2e-16 *** --Signif. codes: 0 ^ O***~ O 0.001 ^ O**~ O 0.01 ^ O*~ O 0.05 ^ O.~ O 0.1 ^ O ~ O 1 Residual standard error: 8.017 on 233 degrees of freedom Multiple R-squared: 0.582, Adjusted R-squared: 0.5784 F-statistic: 162.2 on 2 and 233 DF, p-value: < 2.2e-16
Median 0.02348
3Q 2.11521
Max 58.88226
141
24 2 .
This result may be transformed back to hours by phase = circular statistics are the circular and CircStats packages.
Table 5.6 Many emotional variables show a diurnal rhythm. Here four variables are simulated in their pure form, and then contaminated by noise. Phase is estimated by the cosinor function. > > > > > > > > > > > > set.seed(42) nt = 4 time <- seq(1:24) pure <- matrix(time,24,nt) pure <- cos((pure + col(pure)*nt)*pi/12) diurnal <- data.frame(time,pure) noisy <- pure + rnorm(24*nt)/2 circadian <- data.frame(time,noisy) colnames(circadian) <- colnames(diurnal) <- c("Time", "NegA","TA","PA","EA") p <- cosinor(diurnal) n <- cosinor(circadian) round(data.frame(phase=p[,1],estimate=n[,1],fit=n[,2]),2)
> matplot(pure,type="l",xlab = "Time of Day",ylab="Intensity", main="Hypothetical emotional curves",xaxp=c(0,24,8)) > matplot(noisy,type="l",xlab = "Time of Day",ylab="Arousal", main="Noisy arousal curves",xaxp=c(0,24,8)) NegA TA PA EA phase estimate fit 20 20.59 0.61 16 16.38 0.76 12 12.38 0.81 8 8.26 0.84
142
Arousal
12 Time of Day
15
18
21
24
Arousal
-2
-1
12 Time of Day
15
18
21
24
Fig. 5.3 Some psychological variables have a diurnal rhythm. The phase of the rhythm may be estimated using the cosinor function using multiple regression of the sine and cosine of the time of day. The top panel shows four diurnal rhythms with acrophases of 8, 12, 16, and 20. The lower panel plots the same data, but with random noise added to the signal. The corresponding phases estimated using cosinor are 8.3, 12.4, 16.4 and 20.6.