Fyp Proposal
Fyp Proposal
Although real estate dealing is a very flourishing industry in Pakistan, not many ways are being designed
to tackle the decisions involved in buying or selling a property. Our project looks forward to making
contributions in developing the process less hectic and reliable through the tool we will design in this
project. Our idea is to use various Machine learning Algorithms to give us noteworthy predictions that
will help us build our tool for consumers. This project is essential as usually, people are not very aware
of what they are getting themselves into when making these important financial decisions as most rely
on brokers to provide them the best offers. However, on the completion of the project, we will make the
conviction that people have foresight when making important investment decisions.
2. INTRODUCTION:
Purchasing a house comes as one of the consequential decisions in an individual's life and needs
a considerable amount of thought and research beforehand. Making a rational decision matter a
lot, and however, relying on brokerage houses can put one in distress due to the vague number
of factors used in the prediction of house prices. However, a way to somehow get insight into
the property's worth with proper analysis could hold dominant significance for the buyers in
general. And so, this project will serve as a tool to predict future house prices to avoid any
blunders on the buyer's side.
1.1 PROBLEM IDENTIFICATION:
Our project aims to resolve the problem that lies on the buyer side, to be more specific.
Regarding the tax that buyers pay for the houses, insufficient knowledge of the actual
worth of the houses may result in people being manipulated to pay higher taxes. Other
than that, our project aims to resolve the problem concerning the actual worth of a
house, taking into account the factors that form the features of a house like the number
of bedrooms, bathrooms, etc. Lastly, our project will boost confidence in people aiming
to buy a property as they will have an idea of the prices with the help of our tool.
1.2 RATIONALE OF THE PROJECT:
The foremost purpose behind working on this project idea is the fact that there is a
limited amount of work done on the prediction of house prices in Pakistan. Through this
project, we further want to make use of our background, existing knowledge related to
programming, statistics, finance, and econometrics. This idea was proposed by Mr Azam
Yahya as he believed as Computational finance students our finance domain must work
hand in hand with Machine learning. To be able to build a modified tool for prediction,
our interest lies in learning more and implementing it practically by using machine
learning algorithms to predict the house prices of Karachi solely.
1.3 SIGNIFICANCE:
In addition to the lack of prior work on house price prediction in Pakistan the other most
noteworthy aspect is the tool that we are aiming to build to make these predictions. The
availability of a house price prediction model helps fill up an important information gap
and improve the efficiency of the real estate market. We will base our tool on this model
and thereby come up with accurate predictions. Our objective is to make this tool very
user friendly and authentic for people who are looking towards buying a house.
2. OBJECTIVES:
To find out which features are most relevant when predicting property value in Karachi through
different machine learning algorithms.
To make price prediction using these features.
To build a tool to predict house prices based on our analysis.
3. LITERATURE REVIEW:
Aaron NG(June 2017), this paper follows the generation of a very similar tool as ours. This is
based on London house price predictions. In order to make predictions various models were
compared such as linear regression , Bayesian linear regression, Relevance Vector Machine and
finally the Gaussian Process which was selected as their model due to its flexible and
probabilistic approach. Their initial dataset was obtained from London Data store having
records exceeding 2.4 million from the year 1995 to 2013. The overall predictions were
obtained by combining predictions from the local models that the dataset was distributed into
for easier computation. They decided a way to utilize as much data as possible in the prediction
system by adopting the concept of distributed GP's. They were successful to build an app which
could make predictions for the user comparative to other house price prediction models.
Uma Gajendragadkar(2020), in the paper ‘House Price Prediction With Zillow Economics Dataset
Using XGBoost and Linear Regression’ based on factors median income, number of schools,
unemployment rate, Number of hospitals, crime rate and their ratings. The aim of his project is
to provide the best countries/areas in the USA to invest in for a national real estate developer,
individual buyers, banks looking for a place to develop a new apartment building or to purchase.
Another goal is to predict the house prices in a country in a next few months. The data he used
is Zillow economic dataset, income level dataset, crime rate dataset, school dataset,
unemployment dataset, ZIP and country FIPS dataset. To predict the house prices, he has done
the predictive analysis using linear regression and XGBoost. After prediction, he used matrices R
square,MSE,RMSE,RMSLE. He found the correlation matrix with different factors and house
prices. And in the result he has done comparative analysis between linear regression and
hyperparameter set 1, hyperparameter set 2 and he concluded that second set of
hyperparameter have given better result.
JiaoYang Wu (2017), in the paper 'House Price Prediction Using Support Vector Regression'
works on predicting house prices to ease the difficulties on both buyers and sellers side. He uses
the sales dataset in King County, USA, along with 20 features and 21613 entries of the sales. To
predict the house prices accurately, he has done a comparative analysis between the two
feature selection and extraction method using the Support Vector Regression. Furthermore,
methods like Lasso, RFE, Ridge, and Random Forest Selector were applied to perform the
feature selection, while Principal Component Analysis was used for the feature extraction.
Finally, the regression model was built using SVR. It was observed that the price accuracy
increased from 0.65 to 0.86 through these methods. The lowest mean square error came out to
be 0.04. In this research, the methods were applied to extract the most valuable features while
not including the ones that were of no impact, hence, making the predictions easy.
Research Paper “Real Estate Price Prediction Using Machine Learning” is written by Aswin Sivam
Ravikumar and published in 2016. This paper presents the implementation of price prediction
project for the real estate markets and housing. In his research he used Real State housing data
along with ten attributes (Latitude, Longitude, Housing median age, Total Rooms, Total
Bedrooms, Population, Households, Median Income, Median House house(Price) and Ocean
Proximity). The main aim of this project to be implemented is to find out the accurate prediction
of the real-estate properties present in the united states of America. By Conducting the
experiment with various machine learning algorithms like Random forest, multiple regression,
Support vector machine, gradient boosted trees, neural networks, and bagging. He observed
that multiple regression has RMSE value of 0.701, support vector machine has the RMSE about
0.636, gradient boosted trees have 0.573 RMSE, neural networks has RMSE about 0.590,
bagging has the RMSE 0.53 and from the random forest the price predicted is accurately about
90% with the RMSE value of 0.012. he conclude that random forest is performing better with
more accuracy percentage and with less error values.