Investigating Variables
Investigating Variables
Relationship between
Variables
The most commonly used techniques for investigating
the relationship between two quantitative variables
are correlation and linear regression.
Purpose:
1. Measure the influence of one or more variables on another variable.
2. Prediction of a variable by one or more other variables.
Linear
Regression Linear regression by the least-
squares method is a technique
that fits a straight line to a set of
data points consisting of values
for a dependent variable, y, and
corresponding values for an
independent variable, x.
Linear
Regression Simple Linear Regression
Uses only one independent
variable is use to predict the
dependent variable
Visually, the relationship between the independent and dependent variable are represented in a
scatter plot.The greater the linear relationship between the dependent and independent variables,
the more the data points lie on a straight line.
Constructing Least-Square Equation
The regression line can be described by the following equation:
The regression coefficient b can now have different signs, which can be interpreted as follows
b > 0: there is a positive correlation between x and y (the greater x, the greater y)
b < 0: there is a negative correlation between x and y (the greater x, the smaller y)
b = 0: there is no correlation between x and y
Multiple Linear Regression
The equation necessary for the calculation
of a multiple regression is obtained with
kdependent variables as:
The coefficients can now be interpreted similarly to the linear regression equation. If all
independent variables are 0, the resulting value is a. If an independent variable changes by
one unit, the associated coefficient indicates by how much the dependent variable
changes. So if the independent variable xi increases by one unit, the dependent variable
yincreases by bi.
Coefficient of determination
In order to find out how well the regression model can predict or explain the dependent
variable, two main measures are used. This is on the one hand the coefficient of
determination R2 and on the other hand the standard estimation error.
The coefficient of determination R2, also known as the variance explanation, indicates how
large the portion of the variance is that can be explained by the independent variables.
The more variance can be explained, the better the regression model is.
Standard estimation error
The standard estimation error is the standard deviation of the estimation error. This gives an
impression of how much the prediction differs from the correct value. Graphically interpreted,
the standard estimation error is the dispersion of the observed values around the regression
line.
Correlation Analysis
Correlation gauges the strength of association
between measured variables by evaluating their joint
behavior. In other words, it shows the strength of their
tendency to change together.
As r moves closer to zero, either from - 1 or from + 1, the data fit a linear model less well;
as a result, predicting the value of one variable from a value of the other becomes less
reliable.
Any Questions?