Price Vs SQFT: Best Fit Trend Line Equation: Price 70.226 Square Feet - 10091

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 7

Q1. Estimate a multiple linear regression model and interpret the result.

Formulate and tests the hypothesis

about the coefficients applying ‘t’ and ‘F’ test based on your theory and interpret them.

Price Vs SqFt


150000 f(x) = 70.23 x − 10091.13




1200 1400 1600 1800 2000 2200 2400 2600 2800
Square feet

Best Fit Trend Line equation: Price = 70.226* Square feet – 10091

Price Vs Bedrooms

f(x) = 5407.21 x² − 14062.77 x + 120689.63



1.5 2 2.5 3 3.5 4 4.5 5 5.5

Best Fit Trend line equation: Price = 5407.2*(Bedroom)2 -14063*(Bedroom)+120690

Price Vs Bathroom
f(x) = 14445.71 x² − 46238.13 x + 153321.21

1.5 2 2.5 3 3.5 4 4.5

Best Fit Trend line equation: Price = 14446(Bathroom)2 – 46238*(Bathroom)+153321

Price Vs Offers

f(x) = − 7880.69 x + 150744.74

0 1 2 3 4 5 6 7

Best Fit Trend line equation: Price = -7880.7(Offer)2 + 150745

Price Vs Brick


f(x) = 25810.91 x + 121958.14


0 0.2 0.4 0.6 0.8 1 1.2

Best Fit Trend line equation: Price = 25811(Brick)+121958

Q2) Obtain the intercepts & slope coefficients of the model and interpret them. Interpret the overall
regression results with a sound theoretical knowledge.
For developing an accurate forecasting model, several models were developed on trial and error basis to arrive at
the best model. They intercept and slope coefficient of the model are as follows:
Model 1:
Price= C0+C1*SqFt+C2*Bedroom+C3*Bathroom+C4*Offers

Regression Statistics
Multiple R R Square Adjusted R Square Standard Error Observations
0.835573 0.698182 0.688367 14999.25 128

Coefficien Standard Lower Upper Lower Upper

  ts Error t Stat P-value 95% 95% 95.0% 95.0%
Intercept -17347.4 12724.9 -1.632 0.1752 -42535.5 7840.775 -42535.5 7840.77
SqFt 61.83995 8.263774 7.4832 1.2E-11 45.48231 78.19758 45.48231 78.19
Bedroom 9319.753 2148.754 4.3372 2.97E-5 5066.425 13573.08 5066.425 13573.08
Bathroom 12646.35 3109.662 4.0667 8.45E-5 6490.962 18801.73 6490.962 18801.73
Offers -13601 1324.819 10.266 3.09E-18 -16223.4 -10978.6 -16223.4 -10978.61

The intercept and slope coefficients are as shown in the table. Noteworthy points are as follows:
 The coefficient of the “Offer” is negative which indicates that as offer increases prices will go down. This
also matches with our intuition. All other coefficients of independent variable are positive.
 All independent variables are statistically significant i.e. p value<0.05. However, the intercept term’s p
value is 0.175 which implies that intercept term is statistically not significant.
 Adjusted R2 for the model is 0.6883

Model 2: Price= C0+ C1*Sq.Ft+ C2*Bedroom+ C3*Bathroom+ C4*Offers+ C5*Bricks

Regression Statistics
Multiple R R Square Adjusted R Square Standard Error Observations
0.884907 0.783061 0.77417 12768.46 128

Coefficient Standard Lower Upper Lower Upper

  s Error t Stat P-value 95% 95% 95.0% 95.0%
Intercept -15831.6 10834.59 -1.46121 0.146529 -37279.7 5616.565 -37279.7 5616.565
SqFt 59.20441 7.045067 8.403669 9.35E-14 45.258 73.15082 45.258 73.15082
Bedrooms 9763.172 1830.303 5.334183 4.49E-07 6139.904 13386.44 6139.904 13386.44
Bathrooms 9823.664 2678.515 3.667579 0.000364 4521.276 15126.05 4521.276 15126.05
Offers -12168.5 1146.684 -10.6119 4.9E-19 -14438.5 -9898.56 -14438.5 -9898.56
Brick 17146.94 2481.854 6.908925 2.38E-10 12233.86 22060.02 12233.86 22060.02

The intercept and slope coefficients are as shown in the table. Noteworthy points are as follows:

 The coefficient of the “Offer” is negative which indicates that as offer increases prices will go down. This
also matches with our intuition. All other coefficients of independent variable are positive.
 All independent variables are statistically significant i.e. p value<0.05. However, the intercept term’s p
value is 0.1465 which implies that intercept term is statistically not significant.
 Adjusted R2 for the model is 0.77417 which is better than model 1.

Model 3: Price = C0+ C1*Sq.Ft+ C2*Bedroom+ C3*Bathroom+ C4*Offers+ C5*Bricks+C6*East loc+C7*West

Loc + C8*North Loc

Regression Statistics
Multiple R R Square Adjusted R Square Standard Error Observations
0.931998406 0.868621029 0.852623922 10018.94419 128

Coefficien Standard t P- Lower Upper Lower Upper

  ts Error Stat value 95% 95% 95.0% 95.0%
Intercept 2159.5 8877.8 0.2 0.81 -15417.9 19736.9 -15417.9 19736.9
SqFt 53.0 5.7 9.2 0.00 41.6 64.3 41.6 64.3
Bedrooms 4246.8 1597.9 2.7 0.01 1083.0 7410.5 1083.0 7410.5
Bathrooms 7883.3 2117.0 3.7 0.00 3691.7 12074.9 3691.7 12074.9
Offers -8267.5 1084.8 -7.6 0.00 -10415.3 -6119.7 -10415.3 -6119.7
Brick 17297.3 1981.6 8.7 0.00 13373.9 21220.8 13373.9 21220.8
location -1560.6 2396.8 -0.7 0.52 -6306.0 3184.8 -6306.0 3184.8
West Loc 20681.0 3149.0 6.6 0.00 14446.3 26915.7 14446.3 26915.7
655 #NU
North Loc 0.0 0.0 3 M 0.0 0.0 0.0 0.0
The intercept and slope coefficients are as shown in the table. Noteworthy points are as follows:
 The value of intercept and slope coefficient of East Location are statistically insignificant
 North Location shows multicollinearity problem as can be seen from above table. Hence, we did not get
any value of p for East location slope coefficient.
 Adjusted R2 for the model is 0.8526 which is better than model 2.
Note: Because of collinearity problem of North location, we have calculated VIF for independent variables
Hypothesis testing based on t values
H0 : The intercepts and slope coefficients are not significant
H1 : The intercepts and slope coefficients are significant
From the above table, for the coefficients to be significant t>1.658. But for intercept and East location, the null
hypothesis cannot be rejected. But other independent variables have t>1.658, hence they are significant.
In the next part, we will attempt to address the multicollinearity problem in the present model and will develop a
improved model for prediction of real estate prices.

2. Test the Multicollinearity problem with a suitable method. Solve the problem of Multicollinearity if so, by any
one of the methods which you think is suitable for your example?

Multicollinearity Test:

VIF values
Independent SqFt Bedrooms Bathrooms Offers Brick West North East
Variable Location Location Location
VIF 1.862 1.702 1.501 1.702 1.104 1.732 1.652 7.845

Since, multicollinearity is shown by East Location, so we will modify our regression model and will use stepwise
regression for counter the problem.

Stepwise Regression:
Model Summary g
Std. Error Change Statistics Watson
R Adjusted of the R Square F Sig. F
Model R Square R Square Estimate Change Change df1 df2 Change
1 .714 .510 .506 18886.376 .510 131.041 1 126 .000
2 .812 .659 .654 15814.680 .149 54.700 1 125 .000
3 .885 .783 .778 12653.739 .124 71.251 1 124 .000
4 .918 .842 .837 10849.280 .059 45.678 1 123 .000
5 .928 .861 .855 10226.392 .019 16.440 1 122 .000
6 .932 .868 .862 9995.067 .007 6.712 1 121 .011 1.902
a. Predictors: (Constant), West Location
b. Predictors: (Constant), West Location, SqFt
c. Predictors: (Constant), West Location, SqFt, Brick
d. Predictors: (Constant), West Location, SqFt, Brick, Offers
e. Predictors: (Constant), West Location, SqFt, Brick, Offers, Bathrooms
f. Predictors: (Constant), West Location, SqFt, Brick, Offers, Bathrooms, Bedrooms
g. Dependent Variable: Price
Significance and VIF values of independent variable in model 6
Unstandardized Standardized
Coefficients Coefficients Collinearity Statistics
Model B Std. Error Beta t Sig. Tolerance VIF
6 (Constant) 3067.471 8746.712 1.351 .072
West Location 21937.572 2482.393 .377 8.837 .000 .598 1.673
SqFt 52.149 5.572 .411 9.359 .000 .566 1.767
Brick 17058.771 1942.805 .299 8.780 .000 .938 1.066
Offers -8019.003 1013.011 -.319 -7.916 .000 .670 1.492
Bathrooms 7810.698 2109.060 .150 3.703 .000 .668 1.497
Bedrooms 4070.005 1570.921 .110 2.591 .011 .605 1.653
a. Dependent Variable: Price

 VIF values are between 1.5-2.5, hence the multicollinearity in model 6 is not significant.
 Intercept & Slope terms are significant as p value<0.1 in all the cases
 Adjusted R2 value for the model is 0.862 which is better all previous models.
So, our final model after removing multicollinearity:
Model 4: Price = C0 + C1*West Location + C2*SqFt + C3*Brick + C4*Offers + C5*Bathrooms + C6*Bedrooms

3. Applying a suitable method test the Heteroscedasticity problem. Solve the problem of heteroscedasticity, if
so, by any one of the methods which you think suitable for your example?

For testing Heteroscedasticity, we will use Bruesch-Pagan Test using SPSS:

Breusch Pagan model:
(Residual of predicted model) 2 = C0 + C1*West Location + C2*SqFt + C3*Brick + C4*Offers + C5*Bathrooms +
In this regression, we are obtaining the below results:

Model Sum of Squares df Mean Square F Sig.
1 Regression 120623666833319424.000 6 20103944472219904.000 .946 .465b
Residual 2570097401831889400.000 121 21240474395304872.000
Total 2690721068665208800.000 127
a. Dependent Variable: sq_residual
b. Predictors: (Constant), West Location, Brick, SqFt, Offers, Bathrooms, Bedrooms

From this table it can be noted that, the significance of the regression model is 0.465
Null Hypothesis, H0 = There is no heteroscedasticity in the residuals of our predicted model 4
Alternate Hypothesis, H1 = There is heteroscedasticity in residuals for predicted model 4
So, from Breusch Pagan Test, we can see that p value of the model is 0.465, hence we fail to reject our Null
Hypothesis. So, there is no heteroscedasticity in error term of our model 4.
4. Tests whether autocorrelation is present or not in your regression? Solve the problem of autocorrelation,
if so, by any one of the methods which you think suitable for your example?

To test autocorrelation, we have used Durbin Watson method in our model. The model summary i

Model Summaryb
Std. Error of the
Model R R Square Adjusted R Square Estimate Durbin-Watson
1 .932 .868 .862 9995.067 1.902
a. Predictors: (Constant), West Location, Brick, SqFt, Offers, Bathrooms, Bedrooms
b. Dependent Variable: Price

Durbin Watson value = 1.902

DW value from table for n=128 and k=6: dL =1.60 & dU = 1.805

Let Null Hypothesis: H0= There is no autocorrelation in error term

Alternate Hypothesis: H1 = Autocorrelation exists in the error term
Since DW of model>du, so we fail to reject our null hypothesis and therefore there is no autocorrelation in error
term of model 4

5.Perform the redundant variable or omitted variable tests to test about the inclusion or exclusion of a
variable into the model.

To test the redundant variable test, we have conducted two operations:

 Stepwise regression in SPSS
 RAMSEY Reset Test
Stepwise regression
Model 4 of our regression was derived from Stepwise regression which was as follows:
Price = C0 + C1*West Location + C2*SqFt + C3*Brick + C4*Offers + C5*Bathrooms + C6*Bedrooms
The test statistics for the model is as follows
Model Summary g
Std. Error Change Statistics Watson
R Adjusted of the R Square F Sig. F
Model R Square R Square Estimate Change Change df1 df2 Change
6 .932f .868 .862 9995.067 .007 6.712 1 121 .011 1.902
f. Predictors: (Constant), West Location, SqFt, Brick, Offers, Bathrooms, Bedrooms
g. Dependent Variable: Price

Model Sum of Squares df Mean Square F Sig.
6 Regression 79597148827.421 6 13266191471.237 132.793 .000g
Residual 12088065469.454 121 99901367.516
Total 91685214296.875 127
a. Dependent Variable: Price
g. Predictors: (Constant), West Location, SqFt, Brick, Offers, Bathrooms, Bedrooms

RAMSEY Reset Test:

For Ramsey Reset Test our model for examination was:

Price = C0 + C1*West Location + C2*SqFt + C3*Brick + C4*Offers + C5*Bathrooms + C6*Bedrooms+
C7*(Pred value)^2 + C8*(Pred Value)^3
Null Hypothesis, H0: C7 and C8 =0
Alternate Hypothesis, H1: Either or both of C7 and C8 not equal to zero

Coefficients a
Unstandardized Standardized
Coefficients Coefficients Collinearity Statistics
Model B Std. Error Beta t Sig. Tolerance VIF
1 (Constant) 21144.391 15506.040 1.364 .098
SqFt 41.612 9.312 .328 4.469 .000 .201 4.975
Bedrooms 2959.131 1751.931 .080 1.689 .094 .482 2.073
Bathrooms 5694.238 2582.215 .109 2.205 .029 .442 2.262
Offers -6373.571 1543.132 -.254 -4.130 .000 .287 3.490
Brick 12946.373 3501.441 .227 3.697 .000 .286 3.491
West Location 16118.918 4812.623 .277 3.349 .001 .158 6.338
(Pred)^3 4.222E-12 .000 .228 1.409 .161 .041 24.205
a. Dependent Variable: Price
Note: Pred^2 has been excluded from regression model as it was showing high multicollinearity

From the above table, p value of Pred^3 coefficient i.e. C 8 =0.161, which is >0.10, so we fail to reject Null
hypothesis and hence, C7 and C8 are not significant.
Therefore, our excluded variables are appropriate, and our model are correctly specified.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy