5. Regression

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 20

regression

The Regression Line


◦It is closely related to correlational analysis
◦Regression is used for making predictions about a variable by using
the score of another variable.
◦ For doing this the regression line is used (the best-fitting straight line
through a set of points in a scatter diagram).
◦ For each level of X (or point on the X scale), there is a distribution of
scores on Y.
◦ In other words, we could find a mean of Y when we know the value
of X.
◦ It is found by using the principle of least squares, which minimizes the squared deviation
around the regression line. Let us explain.
◦ The mean is the point of least squares for any single variable. This means that the sum of
the squared deviations around the mean will be less than it is around any value other
than the mean.
◦ For example, consider the scores 5, 4, 3, 2, and 1.
◦Each individual had actually obtained a score on X and Y. Take the example of someone
who obtained a score of 4 on X and 6 on Y.
◦Let’s pretend that we do not know the Y and we want to predict it.
◦The regression equation can provide a predicted value for Y, denoted as Y’.
◦Using the regression equation formula (Y= a + bX) we can calculate Y’ for this person.
◦First, we need to know that, Y is the dependent variable (stress level).
◦X is the independent variable (amount of cigarettes smoked).
◦b is the slope of the line (regression coefficient) and ‘a’ is the y-intercept.
◦The intercept in this equation is 2. This means the line intersects the Y-axis at 2.
◦. The regression coefficient for this model is b = .67 and tells how much cigarette smoking
increases by the one-unit increase of stress level .
◦Y’ = 2 + .67 (4) = 4.68
◦Predicted score is rarely the exact same value, but it is an approximation.
◦A regression equation gives a predicted value of Y′ for each value of X.
◦In addition to these predicted values, there are observed values of Y.
◦The difference between the predicted and the observed(real) values is called
the residual.
◦Symbolically, the residual is defined as Y - Y′.
Try this at home. how many
cigarettes would be consumed if
they cost $2.00
How to Interpret a Regression
Plot
◦Regression plots are pictures that show the relationship between variables.
◦A common use of correlation is to determine the criterion validity evidence for a test or
the relationship between a test score and some well-defined criterion.
◦The association between a test of job aptitude and the criterion of actual performance on
the job is an example of criterion validity evidence.
◦The problems dealt with in studies of criterion validity evidence require one to predict
some criterion score on the basis of a predictor or test score.
◦Suppose that you want to build a test to predict how enjoyable someone will turn out to be
as a date.
◦You might expect the distribution of enjoyableness of dates to be normal.
Figure 3.7 shows the points on hypothetical scales of
dating desirability and the enjoyableness of dates. The line
through the points is the one that minimizes the squared
distance between the line and the data points. In other
words, the line is the one straight line that summarizes
more about the relationship between dating desirability
and enjoyableness than does any other straight line.
Figure 3.8 shows the hypothetical relationship
between a test score and a criterion. Using this
figure, you should be able to find the predicted
value on the criterion variable by knowing the
score on the test or the predictor.
Figure 3.8 shows the hypothetical relationship between a test score and a
criterion. Using this figure, you should be able to find the predicted value
on the criterion variable by knowing the score on the test or the predictor.

The dashed line shows the course you should take for finding the
corresponding score. Now read the number on the criterion axis where
your line has stopped.
On the basis of information you gained by using the test, you would thus
expect to obtain 7.4 as the criterion variable.
◦This chapter began with a discussion of a claim made in a grocery store tabloid
that poor diet causes marital problems. Actually, there was no specific
evidence that diet causes the problems only that diet and marital difficulties are
associated.
◦So there may be other factors that influence this relationship.
Multivariate analysis
◦When there are more than two variable causing changes on our dependent variable,
this requires multivariate analysis
◦Eg. The prediction of success in the first year of college using linear combination of
SAT verbal and quantitative scores.
◦ Suppose we want to predict success in law school from three variables: undergraduate GPA,
rating by former professors, and age.
◦ This type of multivariate analysis is called multiple regression
◦ the goal of the analysis is to find the linear combination of the three variables that provides the
best prediction of law school success.
◦ We find the correlation between the criterion (law school GPA) and some composite of the
predictors (undergraduate GPA plus professor rating plus age).
◦ Multiple regression is appropriate when the criterion variable is continuous (not nominal).
Factor Analysis
◦ Another multivariate model
◦ Factor analysis is used to study the interrelationships among a set of variables without reference to
a criterion.
◦ You might think of factor analysis as a data-reduction technique. When we have responses to a
large number of items or a large number of tests, we often want to reduce all this information to
more manageable chunks.
◦ In factor analysis, we first create a matrix that shows the correlation between every variable and
every other variable.
◦ Then we find the linear combinations, or principal components, of the variables that describe as
many of the interrelationships among the variables as possible.
◦ Once the linear combinations or principal components have been found, we can find the
correlation between the original items and the factors.
◦ These correlations are called factor loadings.
◦By examining which variables load highly on each factor, we
can start interpreting the meanings of the factors. use
methods that help them get a clearer picture of the meaning
of the components by transforming the variables in a way that
pushes the factor loadings toward the high or the low extreme.
◦Because these transformational methods involve rotating the
axes in the space created by the factors, they are called
methods of rotation.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy