Midterm Notes MGMT 2050
Midterm Notes MGMT 2050
Midterm Notes MGMT 2050
Data Mining
What if?
o Decision Variables
Uncertainty: imperfect knowledge of what will happen
Risk: associated with consequences and likelihood of what might happen
Linear Regression
If t-critical falls in rejection region, reject null hypothesis and conclude that a
linear relationship exists
At this point, you can use t-stats to remove a variable. If t <1, the
variable should be removed and adjusted R-squared will increase
Since this is a dummy variable, the input to Location1 can only be 0 (somewhere
other than York Lanes), or 1 (York Lanes). Thus, it can be interpreted from this slope
that a person who eats at York Lanes will spend approximately $1.59 more than one
who eats at Seneca (because Seneca is when both Location1 and Location2 are
equal to 0)
X5 = 1 if TEL, 0 if not
Since this is a dummy variable, the input to Location2 can only be 0 (somewhere
other than TEL), or 1 (TEL). Therefore, it can be interpreted from this slope that one
who eats at TEL will spend approximately $0.57 less than one who eats at Seneca
2. Coefficient of Determination
R^2= 0.0952
This indicates that 9.52% of the variation in amount of money spent
on a meal by students is explained by the variation in the
independent variables. This indicates a weak linear relationship
between the dependent and independent variables. Since the value
of R squared is quite low we can infer that this model is not a good
fit for the data.
Adjusted R Square = 0.0619
This indicates that 6.19% of the proportion of variation in the
amount of money spentis explained by all the independent
variables adjusted for the number of independent variables used.
Since the Adjusted R squared is also fairly low it can be concluded
that for this model, there is a weak relationship between some of
the independent variables and the amount of money spent on a
meal.
3. F-test (ANOVA)
H0: b1 = b2 = b3 = b4 = b5
HA: at least onebi 0
Histogram
60
40
Frequency
Frequency 20
0
-1
2 More
Bin:
The histogram indicates that the standard residuals of the data are
relatively normally distributed allowing us to conclude that this
assumption holds true.
Independence of Errors:
Since the data is cross-sectional we can assume that this
assumption holds.
a. Age
10
0
-10 10
20
30
40
50
60
70
-20
Age
1. Linearity:
The residuals appear to be randomly scattered about zero with no
apparent pattern. Thus, there is not enough evidence to conclude that
some other functional form would fit the data better and this
assumption holds.
2. Homoscedasticity: The spread of the residuals seems to be fairly
consistent and thus indicates that the variance of this variable is fairly
constant and is not homoscedastic.