To Find Correlation and Regression of The Following Data
To Find Correlation and Regression of The Following Data
To Find Correlation and Regression of The Following Data
Here, TV ad Spend(X) is the independent variable and Sales(Y) is the dependent variable. The
data collected is of different Quarter and it is as follows:
Correlation is a statistical measure that expresses the extent to which two variables are
linearly related (meaning they change together at a constant rate). It's a common tool for
describing simple relationships without making a statement about cause and effect.
The sample correlation coefficient, r, quantifies the strength of the relationship. Correlations are
also tested for statistical significance.
Calculation of Correlation:
INTERPRETATION:
Correlation is a statistical measure that describes how two variables are related and indicates
that as one variable changes in value, the other variable tends to change in a specific direction.
Correlation is a statistical measure that quantifies the direction and strength of the relationship
between two numeric variables.
The correlation of 0.902663175 shows that there is positive correlation between TV ad Spend
incurred by the firm and net sales made by the company during the year. This shows that when
TV ad increases then there is increase in net sales also.
REGRESSION:
Regression Statistics
Our regression output indicates that 81.48% of the variation in unit sales is explained by the
advertisement budget. And 18.52% (100%-81.48%) of the variation is caused by factors other
than advertisement expenditure.
in regression analysis; your objective is to figure out the relationship between the variables
being analyzed. One way to express the relationship between the variables is in the form of a
mathematical expression. In simple linear regression, we assume that the relationship is linear
or, in other words, is a straight line. The mathematical expression of a straight line is:
INTERPRETATION:
Y = a + bX
In this equation:
Y is the variable we are trying to predict. It is called the dependent variable because we are
assuming that Y is dependent on the X variable (the ‘independent’ variable).
X is called the independent variable because we assume it is not dependent on Y. It is also
called the explanatory variable because it is supposed to “explain” what causes changes in
Y.
B is the slope of the regression line. The slope reflects how large or small the change in Y
will be for a unit change in X.
A is the intercept or the point at which the regression line will intercept the Y-axis.
The values of a and b form the heart of the regression model. The values of a and b are found
as the coefficients in any regression output.
X and Y are variables and will take on different values at different points in time. The values of
a and b are substituted in the regression equation to get the relationship between X and Y as
follows:
Y = 437.88 + 16.95*X
This can also be expressed in the context of the example or question making the relationship
more meaningful.
If I spend $1 on advertising, I can expect to have sales of $454.83 (Sales = 437.88 + 16.95*$1)
If I spend $2 on advertising, I can expect to have sales of $471.78 (Sales = 437.88 + 16.95*$2)
If I spend $3 on advertising, I can expect to have sales of $488.73 (Sales = 437.88 + 16.95*$3)
The intercept of 437.88 indicates that sales will be 437.88 if we do not spend any money on
advertising. This is because when advertising spend is zero, it (zero) is multiplied by the slope
or b (here 16.95), resulting in a zero. This is added to your intercept, leaving you only the
intercept value 437.88.
If I spend $0 on advertising, I can expect to have sales of $437.88 (Sales = 437.88 + 16.95*$0)
The coefficient b (here 16.95) indicates that for every unit increase in the X variable (here TV
spend), the Y variable (here sales) will change by the amount of the coefficient 16.95. It is also
referred to as the slope of the line in a simple linear equation.
the sign of coefficient b is positive (here, it is +16.95). Therefore for every $1 increase in TV
spend, sales can be expected to increase by $16.95 (the value of the coefficient).
You will notice that the P-value of the TV spend variable in our example is very small. We do
not see a number after 4 decimals. This indicates that this is a ‘significant variable’ and that the
TV spend is likely to impacts sales figures.
Note that the P-value is similar in interpretation to the significance F discussed earlier in this
book. The key difference is that the P-value applies to each corresponding coefficient, and the
significance F applies to the entire model as a whole.
he coefficient of the independent variable is an estimate of the impact this variable has on the
variable being studied. This is estimated from a sample that was analyzed in our regression
analysis. The 95% confidence interval of your coefficient gives you the range within which the
real value of the coefficient you are estimating falls in. The 95% Confidence Interval is also
shown as Lower 95% & Upper 95% in many packages.
You can be 95% confident that the real, underlying value of the coefficient you are estimating
falls somewhere in that 95% confidence interval. So, if the interval does not contain 0, your P-
value will be .05 or less.
We can see that the Lower 95% is 12.31 and the Upper 95% is 21.58 in our example. What this
indicates is that while we believe that the coefficient for TV ads in our example is 16.95, there is
a 95% chance that it could be as low as 12.31 or as high as 21.58. Because this range does not
include a zero, we have confidence that the TV ads spend does impact our sales results.