Price Vs SQFT: Best Fit Trend Line Equation: Price 70.226 Square Feet - 10091
Price Vs SQFT: Best Fit Trend Line Equation: Price 70.226 Square Feet - 10091
Price Vs SQFT: Best Fit Trend Line Equation: Price 70.226 Square Feet - 10091
Price Vs SqFt
250000
200000
100000
50000
0
1200 1400 1600 1800 2000 2200 2400 2600 2800
Square feet
Best Fit Trend Line equation: Price = 70.226* Square feet – 10091
Price Vs Bedrooms
250000
200000
f(x) = 5407.21 x² − 14062.77 x + 120689.63
150000
Price
100000
50000
0
1.5 2 2.5 3 3.5 4 4.5 5 5.5
Bedroom
Price Vs Bathroom
250000
200000
f(x) = 14445.71 x² − 46238.13 x + 153321.21
150000
Price
100000
50000
0
1.5 2 2.5 3 3.5 4 4.5
Bathroom
Price Vs Brick
250000
200000
150000
f(x) = 25810.91 x + 121958.14
100000
50000
0
0 0.2 0.4 0.6 0.8 1 1.2
Q2) Obtain the intercepts & slope coefficients of the model and interpret them. Interpret the overall
regression results with a sound theoretical knowledge.
For developing an accurate forecasting model, several models were developed on trial and error basis to arrive at
the best model. They intercept and slope coefficient of the model are as follows:
Model 1:
Price= C0+C1*SqFt+C2*Bedroom+C3*Bathroom+C4*Offers
X
Regression Statistics
Multiple R R Square Adjusted R Square Standard Error Observations
0.835573 0.698182 0.688367 14999.25 128
The intercept and slope coefficients are as shown in the table. Noteworthy points are as follows:
The coefficient of the “Offer” is negative which indicates that as offer increases prices will go down. This
also matches with our intuition. All other coefficients of independent variable are positive.
All independent variables are statistically significant i.e. p value<0.05. However, the intercept term’s p
value is 0.175 which implies that intercept term is statistically not significant.
Adjusted R2 for the model is 0.6883
Regression Statistics
Multiple R R Square Adjusted R Square Standard Error Observations
0.884907 0.783061 0.77417 12768.46 128
The intercept and slope coefficients are as shown in the table. Noteworthy points are as follows:
The coefficient of the “Offer” is negative which indicates that as offer increases prices will go down. This
also matches with our intuition. All other coefficients of independent variable are positive.
All independent variables are statistically significant i.e. p value<0.05. However, the intercept term’s p
value is 0.1465 which implies that intercept term is statistically not significant.
Adjusted R2 for the model is 0.77417 which is better than model 1.
Regression Statistics
Multiple R R Square Adjusted R Square Standard Error Observations
0.931998406 0.868621029 0.852623922 10018.94419 128
2. Test the Multicollinearity problem with a suitable method. Solve the problem of Multicollinearity if so, by any
one of the methods which you think is suitable for your example?
Multicollinearity Test:
VIF values
Independent SqFt Bedrooms Bathrooms Offers Brick West North East
Variable Location Location Location
VIF 1.862 1.702 1.501 1.702 1.104 1.732 1.652 7.845
Since, multicollinearity is shown by East Location, so we will modify our regression model and will use stepwise
regression for counter the problem.
Stepwise Regression:
Model Summary g
Durbin-
Std. Error Change Statistics Watson
R Adjusted of the R Square F Sig. F
Model R Square R Square Estimate Change Change df1 df2 Change
a
1 .714 .510 .506 18886.376 .510 131.041 1 126 .000
b
2 .812 .659 .654 15814.680 .149 54.700 1 125 .000
c
3 .885 .783 .778 12653.739 .124 71.251 1 124 .000
d
4 .918 .842 .837 10849.280 .059 45.678 1 123 .000
e
5 .928 .861 .855 10226.392 .019 16.440 1 122 .000
f
6 .932 .868 .862 9995.067 .007 6.712 1 121 .011 1.902
a. Predictors: (Constant), West Location
b. Predictors: (Constant), West Location, SqFt
c. Predictors: (Constant), West Location, SqFt, Brick
d. Predictors: (Constant), West Location, SqFt, Brick, Offers
e. Predictors: (Constant), West Location, SqFt, Brick, Offers, Bathrooms
f. Predictors: (Constant), West Location, SqFt, Brick, Offers, Bathrooms, Bedrooms
g. Dependent Variable: Price
Significance and VIF values of independent variable in model 6
Unstandardized Standardized
Coefficients Coefficients Collinearity Statistics
Model B Std. Error Beta t Sig. Tolerance VIF
6 (Constant) 3067.471 8746.712 1.351 .072
West Location 21937.572 2482.393 .377 8.837 .000 .598 1.673
SqFt 52.149 5.572 .411 9.359 .000 .566 1.767
Brick 17058.771 1942.805 .299 8.780 .000 .938 1.066
Offers -8019.003 1013.011 -.319 -7.916 .000 .670 1.492
Bathrooms 7810.698 2109.060 .150 3.703 .000 .668 1.497
Bedrooms 4070.005 1570.921 .110 2.591 .011 .605 1.653
a. Dependent Variable: Price
VIF values are between 1.5-2.5, hence the multicollinearity in model 6 is not significant.
Intercept & Slope terms are significant as p value<0.1 in all the cases
Adjusted R2 value for the model is 0.862 which is better all previous models.
So, our final model after removing multicollinearity:
Model 4: Price = C0 + C1*West Location + C2*SqFt + C3*Brick + C4*Offers + C5*Bathrooms + C6*Bedrooms
3. Applying a suitable method test the Heteroscedasticity problem. Solve the problem of heteroscedasticity, if
so, by any one of the methods which you think suitable for your example?
ANOVAa
Model Sum of Squares df Mean Square F Sig.
1 Regression 120623666833319424.000 6 20103944472219904.000 .946 .465b
Residual 2570097401831889400.000 121 21240474395304872.000
Total 2690721068665208800.000 127
a. Dependent Variable: sq_residual
b. Predictors: (Constant), West Location, Brick, SqFt, Offers, Bathrooms, Bedrooms
From this table it can be noted that, the significance of the regression model is 0.465
Null Hypothesis, H0 = There is no heteroscedasticity in the residuals of our predicted model 4
Alternate Hypothesis, H1 = There is heteroscedasticity in residuals for predicted model 4
So, from Breusch Pagan Test, we can see that p value of the model is 0.465, hence we fail to reject our Null
Hypothesis. So, there is no heteroscedasticity in error term of our model 4.
4. Tests whether autocorrelation is present or not in your regression? Solve the problem of autocorrelation,
if so, by any one of the methods which you think suitable for your example?
To test autocorrelation, we have used Durbin Watson method in our model. The model summary i
Model Summaryb
Std. Error of the
Model R R Square Adjusted R Square Estimate Durbin-Watson
a
1 .932 .868 .862 9995.067 1.902
a. Predictors: (Constant), West Location, Brick, SqFt, Offers, Bathrooms, Bedrooms
b. Dependent Variable: Price
5.Perform the redundant variable or omitted variable tests to test about the inclusion or exclusion of a
variable into the model.
ANOVAa
Model Sum of Squares df Mean Square F Sig.
6 Regression 79597148827.421 6 13266191471.237 132.793 .000g
Residual 12088065469.454 121 99901367.516
Total 91685214296.875 127
a. Dependent Variable: Price
g. Predictors: (Constant), West Location, SqFt, Brick, Offers, Bathrooms, Bedrooms
Coefficients a
Unstandardized Standardized
Coefficients Coefficients Collinearity Statistics
Model B Std. Error Beta t Sig. Tolerance VIF
1 (Constant) 21144.391 15506.040 1.364 .098
SqFt 41.612 9.312 .328 4.469 .000 .201 4.975
Bedrooms 2959.131 1751.931 .080 1.689 .094 .482 2.073
Bathrooms 5694.238 2582.215 .109 2.205 .029 .442 2.262
Offers -6373.571 1543.132 -.254 -4.130 .000 .287 3.490
Brick 12946.373 3501.441 .227 3.697 .000 .286 3.491
West Location 16118.918 4812.623 .277 3.349 .001 .158 6.338
(Pred)^3 4.222E-12 .000 .228 1.409 .161 .041 24.205
a. Dependent Variable: Price
Note: Pred^2 has been excluded from regression model as it was showing high multicollinearity
From the above table, p value of Pred^3 coefficient i.e. C 8 =0.161, which is >0.10, so we fail to reject Null
hypothesis and hence, C7 and C8 are not significant.
Therefore, our excluded variables are appropriate, and our model are correctly specified.