Statistics Assignment 1
Statistics Assignment 1
Statistics Assignment 1
Student Name PG ID
Amit Mishra 62210088
• The normal model does not provide good accountability of prices because:-
• The normal quantile plot does not follow the straight line and is concave (shown via yellow line). It
is skewed and it has a lot of outliers.
3. Irrespective of your response to Q2, assume that Price ~ N(164K, (68K)2). Given this:
a. Calculate the following probabilities – P(Price > 92.8K), P(Price < 255.5K). Do these numbers
agree with what you see in the data?
b. Once again, assuming the above normal distribution, what percentage of houses should have a
value less than 232K? Does that agree with the data?
The number agrees with the data since it is around a smaller % than 255.5 K
c. Based on the theoretical model, what do you expect should be the price of a house that is
exactly on the 3rd quartile (75th percentile,). How does that compare to the actual?
In a normal distribution, the 75th percentile should be 0.6745 toward the corresponding z-score. So,
the corresponding price for Z-score 0.6745 =164000+68000*0.6745= 209,866.
Therefore, to be exactly in the 3rd quartile (75th percentile), the price of the house should be
209,866. In tune with the data given to us, the quartile of the 75th percentile is around 205,397.
Hence, the estimated value is more than the real value and does not match the real value.
4. Create a histogram and boxplot for the Living Area variable. Is the distribution symmetric?
Check the skewness measure to see if it is consistent with your observation.
The distribution is not symmetric and is inclined towards the right side and so it is right-skewed.
Also, from the data, Skewness is 0.807 validates it.
5. Create a new column in the dataset by taking the logarithm of the Living Area variable. Is the
normal distribution a better fit for this variable or the original (Living Area) variable? Why do you
think this is the case?
It is visible that the skewness is tending to zero and the same is visible for the p-value too. So, the
curve is normally distributed.