0% found this document useful (0 votes)
2 views19 pages

Correlation Regression 15 16

The document discusses correlation and regression analysis, explaining the concepts of correlation, types of correlation, and the use of scatter diagrams to visualize relationships between variables. It also covers the coefficient of correlation, hypothesis testing for correlation coefficients, and the principles of simple and multiple linear regression, including their assumptions and interpretations. Additionally, it highlights the importance of the coefficient of determination in assessing the goodness of fit for regression models.

Uploaded by

moviestore27
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views19 pages

Correlation Regression 15 16

The document discusses correlation and regression analysis, explaining the concepts of correlation, types of correlation, and the use of scatter diagrams to visualize relationships between variables. It also covers the coefficient of correlation, hypothesis testing for correlation coefficients, and the principles of simple and multiple linear regression, including their assumptions and interpretations. Additionally, it highlights the importance of the coefficient of determination in assessing the goodness of fit for regression models.

Uploaded by

moviestore27
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 19

Correlation and Regression Analysis

Correlation and Regression Analysis


• Correlation: The linear relationship between two interval/ratio scale variables is called
correlation.
• Correlation analysis: Correlation analysis is the statistical tool we can use to describe the
degree to which one variable is linearly related to another.
• Types of correlation: (i) Positive correlation and (ii) Negative correlation
• If the relationship between two variables is such that the increase/decrease of one
accomplished by the increase/decrease of another variable, then the correlation is called
positive correlation.
• If the relationship between two variables is such that the increase/decrease of one
accomplished by the decrease/ increase of another variable, then the correlation is called
negative correlation.
Scatter diagram
• We can tentatively decide whether there is an approximate linear (straight-line) relationship
between and by drawing a scatter diagram.
• If the values tend to increase/decrease in a straight line fashion, as the values
increase/decrease, then it is reasonable to decide that there exist some degree of positive
correlation between and .
• If the values tend to increase/decrease in a straight line fashion, as the values
decrease/increase, then it is reasonable to decide that there exist some degree of negative
correlation between and .
• If the values do not tend to increase or decrease in a straight line fashion, as the change in
the values, then it is reasonable to decide that there is no (linear) relationship between
and .
Scatter diagram: Visual representation of correlation

• From the scatter diagram, we can observe that there exists a positive correlation between
and
Coefficient of Correlation

• Karl Pearson’s Coefficient of Correlation: The coefficient of correlation is denoted by . If


the two variables under study are and , then is defined as

• It can be shown that takes value between and . The sign of r gives the direction of the
relation, i.e., whether they are positively correlated or negatively correlated. The magnitude
of i.e., gives the strength of the relationship.

• Hypothesis for testing correlation coefficients: versus or or

• The test statistic for testing the above hypothesis is , which follows -distribution with df.
Coefficient of Correlation (cont…)
• Example: Suppose that a study was conducted for investigating relationship between working
memory (WM) and reading comprehension skill (RC) among healthy African-American adults. The
following scores were obtained from a sample of 10 subjects.
Subject 1 2 3 4 5 6 7 8 9 10
WM scores 21 38 14 29 45 33 38 53 31 40
RC scores 44 58 40 49 70 55 55 80 43 56

• Find the coefficient of correlation. Test whether there is a positive correlation between working
memory (WM) and reading comprehension skill (RC) at 5% level of significance and comment.
• Let us consider WM scores as variable , and RC scores as variable .

WM score (x) RC score (y)


21 44 441 1936 924
38 58 1444 3364 2204
14 40 196 1600 560
29 49 841 2401 1421
45 70 2025 4900 3150
33 55 1089 3025 1815
38 55 1444 3025 2090
53 80 2809 6400 4240
31 43 961 1849 1333
40 56 1600 3136 2240
=12850 =31636 =19977
Coefficient of Correlation (cont…)
• We have , , . So, and

• Therefore the correlation coefficient is calculated as


Coefficient of Correlation (cont…)
Hypothesis for testing correlation coefficients
• Step 1: versus
• Step 2:
• Step 3:
• Step 4: Here , at df the critical value = 1.86. may be rejected if
• Step 5: We have : , the observed sample yields
• As falls in the rejection region, we may reject the and conclude that there is a positive
correlation between working memory (WM) and reading comprehension skill (RC)
Regression Analysis
• The regression analysis is a technique of studying the dependence of one variable (called dependent
variable), on one or more variables (called explanatory variables), with a view to estimating or
predicting the population mean or average value of the dependent variable in terms of the known or
fixed values of the independent variables.
• Simple Linear Regression: It includes only one independent variable. It is usually referred by
drawing a particular straight line through a scatter plot.
Simple Linear Regression Model
• Simple Linear Regression Model: The dependent variable () can be modeled by the
value of independent variable () as , where and are the regression parameters and ε is
the random error term such that .

• Estimated regression equation of on :

 Estimate of :

 Estimate of :

• read hat, is the estimated value of the variable for a selected value. is the estimated
intercept and is the slope of the line, or the average change in for each unit change (either
increase or decrease) in the independent variable .

• Hypothesis Test for the Regression Parameter: versus


Simple Linear Regression (cont…)
• Example: A study was made by a retail merchant to determine the relation between weekly
advertising expenditures and sales. The following data were recorded from a sample:
Advertising Cost ($) 40 20 25 20 30 50 40 20 50 40 25 50
Sales ($) 385 400 395 365 475 440 490 420 560 525 480 510
• Find which one is the dependent variable and which is the independent variable.
• Estimate the regression equation and interpret the parameters.
• What is the expected sale in a week with advertising cost $45?
• Let us consider advertising cost as variable , and sales as variable .
Advertising Cost (x) Sales (y)
40 385 1600 148225 15400
20 400 400 160000 8000
25 395 625 156025 9875
20 365 400 133225 7300
30 475 900 225625 14250
50 440 2500 193600 22000
40 490 1600 240100 19600
20 420 400 176400 8400
50 560 2500 313600 28000
40 525 1600 275625 21000
25 480 625 230400 12000
50 510 2500 260100 25500
=15650 =2512925 =191325
Simple Linear Regression (cont…)
• We have , , . So, and

• Estimated regression equation of on :

 Estimate of :

 Estimate of :

• Estimated regression equation of on :

• The expected sale in a week with advertising cost $45 is


Multiple Linear Regression Model
• Multiple linear regression extends simple linear regression to include more than one
explanatory variable. In both cases, we still use the term ‘linear’ because we assume that
the response variable is directly related to a linear combination of the explanatory variables.

• Multiple Linear Regression Model: The dependent variable () can be modeled by the
value of independent variable () as , where are the regression parameters and ε is the
random error term such that .

• Estimated regression equation of on :

• Hypothesis Test for the Regression Parameters: versus ()


Assumptions for Multiple Linear Regression Model
• Normality assumption (required also for simple linear regression).

• Constant variance assumption (homoscedasticity) i.e., variance is a constant value


(required also for simple linear regression). If this assumption holds then the error terms
have constant variance. The violation of this assumption is called the problem of
heteroscedasticity.

• In multiple linear regression model, it is assumed that here is no correlation between the
independent (explanatory) variables. When two of the explanatory variables in a model are
highly correlated (and could therefore be used to predict one another), we say that they are
collinear. This leads to a problem called multicollinearity.
Multiple Linear Regression– Interpreting the results

• Coefficient for quantitative independent variable : If we leave all other variables the
same (sometimes called “holding all other variables constant”), then we can see that an
increase of 1 unit increase in the leads to a unit increase/decrease (based on the sign of )
in the average value of . Another way of saying this is to say this is controlling for , a 1 unit
increase in the leads to a unit increase/decrease (based on the sign of ) in the average
value of .
Index of goodness of fit: Coefficient of determination ()
• measures how good the fit of the regression line is? Note that like ANOVA, in regression
analysis Total SS = Regression SS + Error SS. . The sum of squares can be computed
as:Total SSRegression SS and Error SS (Residual SS) Total SS- Regression SS

• Therefore, takes on values between 0 and 1. Values of this ratio closer to 1 would imply a
better fitting estimated regression line.

• For instance, , implies that 77% of the total variation of Y (dependent variable) is explained
by the regression line (or by the variation in all independent variables)

• Adjusted : The adjusted is a modified version of that accounts for predictors that are not
significant in a regression model. In other words, the adjusted shows whether adding
additional predictors improve a regression model or not.
Index of goodness of fit (cont…)
Testing significance of ( vs )

• In ANOVA table of regression output, we look mainly at the Sig. column of the F statistic,
which tells us the p-value for the statistic. If this is greater than 0.05 then the whole
model is not statistically significant and we need to stop our analysis here.
Test of hypothesis in regression models
Hypothesis for testing regression parameters
• Step 1: versus
• Step 2:
• Step 3: follows t distribution with df, where
• Step 4: Here , at df the critical value = -2.228, 2.228. may be rejected if or
• Step 5: We have :, the observed sample yields As falls in the rejection region, we may
reject the and conclude that changes in advertising costs can change sales.
Thank You

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy