0% found this document useful (0 votes)
11 views

Investigating Variables

The document discusses techniques for investigating the relationship between two quantitative variables, focusing on correlation and linear regression. It explains how regression analysis can predict a dependent variable based on one or more independent variables, detailing simple and multiple linear regression methods. Additionally, it covers the calculation of regression coefficients, the interpretation of correlation, and the measures of model effectiveness such as the coefficient of determination and standard estimation error.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

Investigating Variables

The document discusses techniques for investigating the relationship between two quantitative variables, focusing on correlation and linear regression. It explains how regression analysis can predict a dependent variable based on one or more independent variables, detailing simple and multiple linear regression methods. Additionally, it covers the calculation of regression coefficients, the interpretation of correlation, and the measures of model effectiveness such as the coefficient of determination and standard estimation error.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Investigating

Relationship between
Variables
The most commonly used techniques for investigating
the relationship between two quantitative variables
are correlation and linear regression.

Correlation quantifies the strength of the linear


relationship between a pair of variables.

Regression expresses the relationship in the form of


an equation.
Regression
Analysis
In such a case, we use regression analysis to
discover the mathematical equation that relates the
independent and dependent variables.

This makes it possible to infer or predict another


variable on the basis of one or more variables.

Purpose:
1. Measure the influence of one or more variables on another variable.
2. Prediction of a variable by one or more other variables.
Linear
Regression Linear regression by the least-
squares method is a technique
that fits a straight line to a set of
data points consisting of values
for a dependent variable, y, and
corresponding values for an
independent variable, x.
Linear
Regression Simple Linear Regression
Uses only one independent
variable is use to predict the
dependent variable

Multiple Linear Regression


Uses several independent variable
to predict dependent variable
Simple Linear Regression
The greater the linear relationship between variables, the more accurate is the prediction.

Visually, the relationship between the independent and dependent variable are represented in a
scatter plot.The greater the linear relationship between the dependent and independent variables,
the more the data points lie on a straight line.
Constructing Least-Square Equation
The regression line can be described by the following equation:

a = point of the intersection with the y-axis


b = gradient of the line or the slope
y = is the respective of the estimate of the y-value. this means that for
each x-value the corresponding y-value is estimated.
Constructing Least-Square Equation
How to calculate a and b?

b = correlation times the standard deviation of y divided by the standard


deviation of the y.
a = mean value of the y minus the slope times the mean value of x
If all points (measured values) were exactly
on one straight line, the estimate would be
perfect. However, this is almost never the
case and therefore, in most cases a straight
line must be found, which is as close as
possible to the individual data points. The
attempt is thus made to keep the error in the
estimation as small as possible so that the
distance between the estimated value and
the true value is as small as possible.
This distance or error is called the "residual",
is abbreviated as "e" (error) and can be
represented by the greek letter epsilon (ϵ).
When calculating the regression line, an attempt is made to determine the regression
coefficients (a and b) so that the sum of the squared residuals is minimal.

The regression coefficient b can now have different signs, which can be interpreted as follows

b > 0: there is a positive correlation between x and y (the greater x, the greater y)
b < 0: there is a negative correlation between x and y (the greater x, the smaller y)
b = 0: there is no correlation between x and y
Multiple Linear Regression
The equation necessary for the calculation
of a multiple regression is obtained with
kdependent variables as:

The coefficients can now be interpreted similarly to the linear regression equation. If all
independent variables are 0, the resulting value is a. If an independent variable changes by
one unit, the associated coefficient indicates by how much the dependent variable
changes. So if the independent variable xi increases by one unit, the dependent variable
yincreases by bi.
Coefficient of determination
In order to find out how well the regression model can predict or explain the dependent
variable, two main measures are used. This is on the one hand the coefficient of
determination R2 and on the other hand the standard estimation error.

The coefficient of determination R2, also known as the variance explanation, indicates how
large the portion of the variance is that can be explained by the independent variables.

The more variance can be explained, the better the regression model is.
Standard estimation error
The standard estimation error is the standard deviation of the estimation error. This gives an
impression of how much the prediction differs from the correct value. Graphically interpreted,
the standard estimation error is the dispersion of the observed values around the regression
line.
Correlation Analysis
Correlation gauges the strength of association
between measured variables by evaluating their joint
behavior. In other words, it shows the strength of their
tendency to change together.

As r moves closer to zero, either from - 1 or from + 1, the data fit a linear model less well;
as a result, predicting the value of one variable from a value of the other becomes less
reliable.
Any Questions?

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy