Regression Analysis

REGRESSION ANALYSIS
Correlation only indicates the degree and direction of relationship between two variables.
It does not, necessarily connote a cause-effect relationship. Even when there are grounds
to believe the causal relationship exits, correlation does not tell us which variable is the
cause and which, the effect. For example, the demand for a commodity and its price will
generally be found to be correlated, but the question whether demand depends on price or
vice-versa; will not be answered by correlation.
The dictionary meaning of the ‘regression’ is the act of the returning or going back. The
term ‘regression’ was first used by Francis Galton in 1877 while studying the
relationship between the heights of fathers and sons.
“Regression is the measure of the average relationship between two or more variables in
terms of the original units of data.”
The line of regression is the line, which gives the best estimate to the values of one
variable for any specific values of other variables.
For two variables on regression analysis, there are two regression lines. One line as the
regression of x on y and other is for regression of y on x.
These two regression line show the average relationship between the two variables. The
regression line of y on x gives the most probable value of y for given value of x and the
regression line of x and y gives the most probable values of x for the given value of y.
For perfect correlation, positive or negative i.e. for r= ±, the two lines coincide i.e. we will
find only one straight line. If r=0, i.e. both the variance are independent then the two lines
will cut each other at a right angle. In this case the two lines will be ║to x and y axis.
The Graph is given below:-

We restrict our discussion to linear relationships only that is the equations to be
considered are
1- y=a+bx
2- x=a+by
In equation first x is called the independent variable and y the dependent variable.
Conditional on the x value, the equations gives the variation of y. In other words ,it
means that corresponding to each value of x ,there is whole conditional probability
distribution of y.
Similar discussion holds for the equation second, where y acts as independent variable
and x as dependent variable.
What purpose does regression line serve?
1- The first object is to estimate the dependent variable from known values of
independent variable. This is possible from regression line.
2- The next objective is to obtain a measure of the error involved in using regression
line for estimation.
3- With the help of regression coefficients we can calculate the correlation

coefficient. The square of correlation coefficient (r), is called coefficient of
determination, measure the degree of association of correlation that exits between
two variables.
What is the difference between correlation and linear regression?
Correlation and linear regression are not the same. Consider these differences:
• Correlation quantifies the degree to which two variables are related. Correlation
does not find a best-fit line (that is regression). You simply are computing a
correlation coefficient (r) that tells you how much one variable tends to change
when the other one does.
• With correlation you don't have to think about cause and effect. You simply
quantify how well two variables relate to each other. With regression, you do
have to think about cause and effect as the regression line is determined as the
best way to predict Y from X.
• With correlation, it doesn't matter which of the two variables you call "X" and
which you call "Y". You'll get the same correlation coefficient if you swap the
two. With linear regression, the decision of which variable you call "X" and
which you call "Y" matters a lot, as you'll get a different best-fit line if you swap
the two. The line that best predicts Y from X is not the same as the line that
predicts X from Y.
• Correlation is almost always used when you measure both variables. It rarely is
appropriate when one variable is something you experimentally manipulate. With
linear regression, the X variable is often something you experimental manipulate
(time, concentration...) and the Y variable is something you measure.
Regression analysis is widely used for prediction (including forecasting of time-

series data). Use of regression analysis for prediction has substantial overlap with the
field of machine learning. Regression analysis is also used to understand which among
the independent variables are related to the dependent variable, and to explore the forms
of these relationships. In restricted circumstances, regression analysis can be used to
infer causal relationships between the independent and dependent variables.
A large body of techniques for carrying out regression analysis has been developed.
Familiar methods such as linear regression and ordinary least squares regression
are parametric, in that the regression function is defined in terms of a finite number of
unknown parameters that are estimated from the data. Nonparametric regression refers to
techniques that allow the regression function to lie in a specified set of functions, which
may beinfinite-dimensional.
The performance of regression analysis methods in practice depends on the form of the
data-generating process, and how it relates to the regression approach being used. Since
the true form of the data-generating process is not known, regression analysis depends to
some extent on making assumptions about this process. These assumptions are sometimes
(but not always) testable if a large amount of data is available. Regression models for
prediction are often useful even when the assumptions are moderately violated, although
they may not perform optimally. However when carrying out inference using regression
models, especially involving small effects or questions of causality based
on observational data, regression methods must be used cautiously as they can easily give
misleading results.
Underlying assumptions
Classical assumptions for regression analysis include:
 The sample must be representative of the population for the inference prediction.
 The error is assumed to be a random variable with a mean of zero conditional on
the explanatory variables.
 The variables are error-free. If this is not so, modeling may be done using errors-
in-variables model techniques.
 The predictors must be linearly independent, i.e. it must not be possible to express
any predictor as a linear combination of the others. SeeMulticollinearity.
 The errors are uncorrelated, that is, the variance-covariance matrix of the errors
is diagonal and each non-zero element is the variance of the error.
 The variance of the error is constant across observations (homoscedasticity). If
not, weighted least squares or other methods might be used.
These are sufficient (but not all necessary) conditions for the least-squares estimator to
possess desirable properties, in particular, these assumptions imply that the parameter
estimates will be unbiased, consistent, and efficient in the class of linear unbiased
estimators. Many of these assumptions may be relaxed in more advanced treatments.
Basic Formulae of Regression Analysis:-
X=a+by (Regression line x on y)

Y=a+bx (Regression line y on x)
1st – Regression equation of x on y:-

2nd – Regression equation of y on x:-
Regression Coefficient:-
Case 1st - when x on y means regression coefficient is ‘bxy’
Case 2nd – when y on x means regression coefficient is ‘byx’

Least Square Estimation:-
The main object of constructing statistical relationship is to predict or explain the

effects on one dependent variable resulting from changes in one or more explanatory
variables. Under the least square criteria, the line of best fit is said to be that which
minimizes the sum of the squared residuals between the points of the graph and the
points of straight line. The least squares method is the most widely used procedure for
developing estimates of the model parameters.
The graph of the estimated regression equation for simple linear regression is a straight
line approximation to the relationship between y and x.
When regression equations obtained directly that is without taking deviation from actual
or assumed mean then the two Normal equations are to be solved simultaneously as
follows;
For Regression Equation of x on y

i.e. x=a+by
The two Normal Equations are:-
For Regression Equation of y on x

i.e. y=a+bx
The two Normal Equations are:-
Remarks:-
1- It may be noted that both the regression coefficient ( x on y means bxy and y on x
means byx ) cannot exceed 1.
2- Both the regression coefficient shall either be positive + or negative -.

3- Correlation coefficient (r) will have same sign as that of regression coefficient.
Problems related to Regression Analysis:-
Question 1 :- Calculate the following;

• The Regression equations of x on y
• The Regression equation of y on x
• Estimate x when y=20
• Estimate y when x=25
Figures are
x= 10, 12,13,17,18
y= 5, 6, 7,9,13
Solution:-
Question 2:- Using the following data obtain regression equation:-
Total Outputs in units(x) Number of person employed (y)

1 1
3 2
5 3
6 4
5 5
Solution:-
Question 3:-tp12
Question 4:- Find out the likely production corresponding to a rainfall of 40 cms. From
the following data:-
Rail fall (in Cm) Output (in Quintals)
Average 30 50
Standard Deviation 5 10
Correlation Coefficient (r) = 0.8
Solution:-

Regression Analysis

Uploaded by

Copyright:

Available Formats

Regression Analysis

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Regression Analysis

Uploaded by

Copyright:

Available Formats

REGRESSION ANALYSIS

The Graph is given below:-

What purpose does regression line serve?

3- With the help of regression coefficients we can calculate the correlation

What is the difference between correlation and linear regression?

Regression analysis is widely used for prediction (including forecasting of time-

Basic Formulae of Regression Analysis:-

X=a+by (Regression line x on y)

1st – Regression equation of x on y:-

Case 1st - when x on y means regression coefficient is ‘bxy’

Case 2nd – when y on x means regression coefficient is ‘byx’

The main object of constructing statistical relationship is to predict or explain the

For Regression Equation of x on y

The two Normal Equations are:-

For Regression Equation of y on x

The two Normal Equations are:-

2- Both the regression coefficient shall either be positive + or negative -.

Problems related to Regression Analysis:-

Question 1 :- Calculate the following;

Total Outputs in units(x) Number of person employed (y)

Correlation Coefficient (r) = 0.8

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.