0% found this document useful (0 votes)
14 views8 pages

Correlation and Regression Analysis

This document provides an overview of simple linear regression and correlation analysis, explaining key concepts such as dependent and independent variables, regression equations, and the purpose of regression analysis. It details the assumptions, estimation of parameters, interpretation of slope and intercept, and includes examples and exercises for practical application. Additionally, it discusses correlation coefficients and their significance in measuring the strength of relationships between variables.

Uploaded by

sudarionejie80
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views8 pages

Correlation and Regression Analysis

This document provides an overview of simple linear regression and correlation analysis, explaining key concepts such as dependent and independent variables, regression equations, and the purpose of regression analysis. It details the assumptions, estimation of parameters, interpretation of slope and intercept, and includes examples and exercises for practical application. Additionally, it discusses correlation coefficients and their significance in measuring the strength of relationships between variables.

Uploaded by

sudarionejie80
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

UNIT 10.

SIMPLE LINEAR REGRESSION AND CORRELATION

10.1. Things to know

1. Regression analysis is a statistical method which makes use of the relationship between two
or more quantitative variables so that one variable, called the dependent variable or
response variable, can be predicted with the knowledge of the values of the other variable,
called the independent variable or explanatory variable..

2. A mathematical equation that allows us to predict values of one dependent variable from
known values of one or more independent variable is called a regression equation.

3. Purposes of Regression Analysis


i. Predicts the value of a dependent variable based on the value of a least one
independent variable.
ii. Explains the effect of the independent variables on the dependent variable.

4. Types of Regression Models

Positive Linear Relationship Relationship NOT Linear

Negative Linear Relationship No Relationship

5. For this chapter, it focuses on the problem of estimating or predicting a value of a dependent
variable Y on the basis of a known measurement of an independent variable X .

6. Scatter diagram is a graphical presentation of the independent variable (plotted on the


horizontal axis) and the dependent variable (plotted on the vertical axis). Through this graph
or diagram is the easiest way to determine if a relationship exists between the two variables.

7. A linear relationship between two variables is one in which the relationship can be most
accurately presented by a straight line.

1
8. In this section, the problem of estimating or predicting the value of a dependent variable on
the basis of a known measurement of an independent variable will be given consideration.

9. Although a graphic solution is sometimes used for prediction, it is much more common to
predict Y from the equation of the straight line. The general form of the equation is given
by
Y = a + bX , linear regression line equation or simple linear regression

10. For each X , the equation Y = a + bX will predict a value of Y. The estimated regression
line is defined by the equation
 
Y = a + bX Where: Y = is the predicted dependent variable
 
a = Y intercept (value of Y when X = 0)
b = slope of the line
a and b are the estimates of the parameters of
regression which are calculated from the available sample
points.

Remark: Through the estimated regression line equation we can now predict any Y value
just by knowing the corresponding X value.

11. Assumptions on Regression Analysis


i. The values of the independent variable X may be “fixed”, that is, X values may be
selected in advance by the researcher, or they may be obtained without the imposition
of any restriction, in which case, X is a random variable.
ii. The values of X are measured without error.
iii. The variance of the subpopulations of the dependent variable, given different values
of the independent variable, are equal.
iv. The subpopulation of the dependent variable X , given different values of the
independent variable Y , is normally distributed.
v. The means of the subpopulations Y all lies on the same straight line (assumptions of
linearity).

12. Estimation of Parameters


Given the sample {( xi , yi ), i = 1, 2, 3, n} the least squares estimates of the parameters in
the regression line are:
n n n
n xi yi −  xi  yi _ _
b= i =1 i =1 i =1
2
; a = y− b x
 n 
n
n x −   xi  2
i
i =1  i =1 
n n

_ y i _ x i
y= i =1
and x= i =1
are the means of the sample values
n n

2
12. a is the estimate of the population Y intercept  o and b is the estimate of the population
slope coefficient 1.

13. Interpretation of the Slope and the Intercept


 o is the average value of Y when the value of X is zero.
1 measures the change in the average value of Y as a result of a one-unit change
in X .
a is the estimated average value of Y when the value of X is zero.
b is the estimated change in the average value of Y as a result of a one-unit
change in X .

14. Example: Given the data in Table 10.1.1. Find the following
a. Find the equation of the regression line.
b. Scatter diagram.
c. Find the point estimate of Y when x = 113.

Table 10.1.1: IQ Scores and Math 15 Midterm Scores of 12 College Students


Student Number IQ (X) Math 15 Score (Y)
1 110 50
2 112 56
3 118 52
4 119 59
5 122 61
6 125 53
7 127 61
8 130 58
9 132 65
10 134 59
11 136 64
12 138 68

Solution:
12 12
n = 12,  xi = 110 + 112 + ... + 138 = 1,503.00,
i =1
x
i =1
2
i = 1102 + 1122 + ... + 1382 = 189,187.00
12 12 _ _

 yi = 50 + 56 + ... + 68 = 706.00,
i =1
 yi2 = 502 + 562 + ... + 682 = 41,682.00, x = 125.25, y = 58.833
i =1
12

x y
i =1
i i = 110(50) + 112(56) + ... + 138(68) = 88,857

12(88,857) − (1,503)(706)
b= = 0.4598, a = 58.833 − 0.4598(125.25) = 1.2414
12(189,187) − (1,503) 2

3

a. Y = 1.2414 + 0.4598 X

b. Scatter Plot
70

60

50
SCORE

40
100 110 120 130 140

IQ


c. Y = 1.2414 + 0.4598(113) = 53.20

15. Inference about the slope ( 1 ) : t Test


To test whether there is a significant linear dependency of Y on X , we have to show
that 1  0.
To test the null hypothesis H o that 1 = 0 (no linear dependency) against a suitable
alternative, we use the t − distribution with n − 2 degrees of freedom to establish the critical
region and then base our decision on the value of
b − 1
t=
sb S yx n −1 2
where sb = n 2
and s yx = ( s y − b 2 s x2 )
s n − 1(b − 1 ) _
n−2
= x
s yx
i =1
( xi − x) 2

16. Is their a linear dependency of Y on X of the given example above? Test at 0.05 level of
significance.
Solution: Step 1. H o : 1 = 0
H1 :   0
Step 2.  = 0.05
Step 3. Appropriate test statistic: t test
Step 4. Reject H o if tc  t0.05, df , that is, tc  1.812
Using the PHStat2 output for succeeding steps, we have
Standard
Coefficients Error t Stat P-value
Intercept 1.241744548 14.66505476 0.084673707 0.934191994

4
IQ Score 0.459813084 0.116796188 3.936884361 0.002788923

Thus, B1  0 which means that IQ score affects Math 15 midterm scores.

17. Correlation analysis attempts to measure the strength of the relationship between two random
variables by means of a single number called correlation coefficient. This concerned only
with the strength of the relationship and no causal effect is implied.

18. The Pearson Correlation Coefficient (  ) measure the strength of the linear relationship
between two variables X and Y . The estimated sample correlation coefficient, denoted by
(r ), is given by:
n n n
n xi yi −  xi  yi
r= i =1 i =1 i =1
where n is the sample size
 n
 n
 
2 n
 n  
2

  i   i    i   i  
− −
2 2
n x x n y y
 i=1  i=1    i=1  i=1  

19. Sample of Observation from various r values

Y Y Y

X X X
r = -1 r = -.6 r=0
Y Y

r = .6 r=1

20. Features of  and r


- unit free
- ranges from -1 to 1
- the closer to -1 the stronger the negative linear relationship
- the closer to 1 the stronger the positive linear relationship
- the closer to 0, the weaker the linear relationship

21. The Sample Coefficient of Determination, r 2 , is a number that determine the total variation
in the values of variable Y that can be accounted for or explained by the linear relationship
with the values of the variable X .

5
22. Example: Of the given example above, find the sample correlation coefficient and sample
coefficient of determination and interpret the results.

23. Test for a Linear Relationship


Hypotheses: H o :  = 0 (no correlation)
` H1 :   0 (correlation)
r−
Test statistic: t =
1− r2
n−2

24. Using the above example, is there evidence of a linear relationship between the students
Math 15 midterm scores and IQ scores at 0.05 level of significance?

Solution: Step 1. H o :  = 0 (no association)


H1 :   0 (association)
Step 2.  = 0.05
Step 3. Appropriate test statistic: t test
Step 4. Critical region: Reject Ho if tc  t , n −2
or tc  −t , n−2
2 2

tc  t0.025, 10 or tc  −t0.05, 10
tc  2.228 or tc  −2.228
0.7796 − 0
Step 5. Computation: tc = = 19.88
1 − .77962
12 − 2
Step 6. Reject H o since tc  2.228.
Step 7. There is sufficient sample evidence that there is a significant
linear relationship between Students IQ scores and their Math 15
midterm scores.

10.2. Problems/Exercises

I. True/False. Write True if the statement is true and False if otherwise.

6
1. Once you have computed the linear regression line equation, the intercept is completely
determined.
2. The slope of the regression line can be a negative value.
3. If the Y − intercept is –2.5, the X − score must be 1.00.
4. All things being equal, the higher the correlation, the more accurate in the prediction.
^
5. In the regression equation Y = a + bX , Y is used to predict the value of X .
6. A direct proportion line indicates a positive correlation.
7. The correlation value ranges from –1 to +1.
8. A correlation of +0.45 will have the same standard deviation as –0.45.
9. When the value of r = 1, it denotes a perfect positive correlation.
10. The sum of all the errors in the regression line will always add to zero.
11. When the correlation coefficient r is squared, it is called as the coefficient of
determination.
12. If the correlation of two variables is close to zero, it indicates that no relationship exists
between the two variables.
13. In testing the significance for r using the t -distribution, is dependent only to the variables
for X & Y .

II. Solve what is indicated in the problem. Show your solutions legibly.

1. The Kryplium Junior School Board is trying to anticipate building needs on the bases of
past student enrolment. From previous years, they have collected and recorded the data
for enrollment and the community population. The data are presented below:

Year Enrollment Community


Population

1993 740 20,050


1994 750 20,940
1995 772 22,050
1996 792 23,160
1997 810 24,310
1998 825 25,540
1999 890 26,830

Questions:

a. Derive regression equation line for:


1. Predicting enrolment from the community population
2. Predicting community population from year.
3. Predicting enrollment from year.

b. Use the prediction equation in (a), solve for:


1. What is the predicted enrolment in the year 2003?
2. What is the predicted population in the year 2003?

7
3. Give your best estimate of the enrolment when the community population
was 18, 000.
4. Give your best estimate for the year in which the population was 18,000.

c. Solve for the correlation coefficients in (a).


d. Test the correlation coefficient in letter c and interpret your results.

2. Listed below are the IQ scores ( X ) and the Final grade (Y ) for 10 students
X 90 90 100 100 100 110 110 115 120 120
Y 2.0 3.0 2.5 3.0 3.5 3.0 4.0 3.5 3.5 4.0
Answer the following:
a. Plot a scatter plot.
b. Solve for r and r 2 .
c. Test H o : 1 = 0 using 0.05 level of significance.
d. Test Ho:  = 0 using 0.05 level of significance.
e. Find the linear regression line equation.

3. Listed below are the undergraduate Grade Point Average GPA ( X ) and First Semesters
Graduate GPA (Y ) of 10 Senior students.
X 90 90 100 100 100 110 110 115 120 120
Y 2.0 3.0 2.5 3.0 3.5 3.0 4.0 3.5 3.5 4.0
Answer the following:
a. Plot a scatter plot.
b. Solve for r and r 2 .
c. H o : 1 = 0 using 0.05 level of significance.
d. Test Ho:  = 0 using 0.05 level of significance.
e. What is the estimated graduate GPA to be for a student if the undergrad GPA
is 3.5?

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy