BA Module 4 Summary
BA Module 4 Summary
BA Module 4 Summary
→ We determine a point forecast by entering the desired value of x into the regression equation.
• We must be extremely cautious about using regression to forecast for values outside of the historically
observed range of the independent variable (x-values).
• Instead of predicting a single point, we can construct a prediction interval, an interval around the point
forecast that is likely to contain, for example, the actual selling price of a house of a given size.
→ The width of a prediction interval varies based on the standard deviation of the regression (the standard
error of the regression), the desired level of confidence, and the location of the x-value of interest in
relation to the historical values of the independent variable.
→ It is important to evaluate several metrics in order to determine whether a single variable linear regression model is
a good fit for a data set, rather than looking at single metrics in isolation.
2
→ R measures the percent of total variation in the dependent variable, y, that is explained by the regression line.
2 !"#$"%$&' !"#$%&'!( !" !"# !"#!"$$%&' !"#$ !"#$"%%&'( !"# !" !"#$%&'
• R = =
!"#$% !"#$"%$&' !"#$% !"# !" !"#$%&'
2
• 0≤R ≤1
2
• For a single variable linear regression, R is equal to the square of the correlation coefficient.
2
→ In addition to analyzing R , we must test whether the relationship between the dependent and independent variable
is significant and whether the linear model is a good fit for the data. We do this by analyzing the p-value (or
confidence interval) associated with the independent variable and the regression’s residual plot.
• The p-value of the independent variable is the result of the hypothesis test that tests whether there is a
significant linear relationship; that is, it tests whether the slope of the regression line is zero, H0: β=0 and Ha:
β≠0.
→ If the coefficient’s p-value is less than 0.05, we reject the null hypothesis and conclude that we have
sufficient evidence to be 95% confident that there is a significant linear relationship between the
dependent and independent variables.
2
→ Note that the p-value and R provide different information. A linear relationship can be significant (have a
2
low p-value) but not explain a large percentage of the variation (not have a high R .)
• A confidence interval associated with an independent variable’s coefficient indicates the likely range for that
coefficient.
→ If the 95% confidence interval does not contain zero, we can be 95% confident that there is a significant
linear relationship between the variables.
→ Residual plots can provide important insights into whether a linear model is a good fit.
• Each observation in a data set has a residual equal to the historically observed value minus the regression’s
predicted value, that is, 𝜀=y–𝐲.
• Linear regression models assume that the regression’s residuals follow a normal distribution with a mean of
zero and fixed variance.
→ We can also perform regression analyses using qualitative, or categorical, variables. To do so, we must convert
data to dummy (0, 1) variables. After that, we can proceed as we would with any other regression analysis.
• A dummy variable is equal to 1 when the variable of interest fits a certain criterion. For example, a dummy
variable for “Female” would equal 1 for all female observations and 0 for male observations.
EXCEL SUMMARY
Recall the Excel functions and analyses covered in this course and make sure to familiarize yourself with all of the
necessary steps, syntax, and arguments. We have provided some additional information for the more complex
functions listed below. As usual, the arguments shown in square brackets are optional.
→ Adding the best fit line to a scatter plot using the Insert menu